Monday, August 17, 2020

Accuracy vs. Explainability Tradeoff

This fascinating article attempting to explain machine learning for statisticians is very interesting. I don't know if I even properly understand it, and the tone is a bit negative. Here's the simplest way I can think of it.

If I have:
    (b1 * x1) + (b2 * x2) + (b3 + x3) + a = y
the statistician is trying to minimize error in the b values and the machine learning person is trying to minimize error in the y value?



If you get the stats "too correct", the ML guy will know you are overfitting, and model will do worse on new data that was not in the sample/training set.

Is that it?



Saturday, August 15, 2020

The Twitter for blogs

 Thinking about whether there is a way to beat Twitter with open standards. The day Twitter shut off RSS (so that people couldn't build apps that avoided their ads) we lost a huge function of it as a part of an aggregation system. However, tools like NewsBlur show it can still be scraped.

Spotify is trying to create a walled garden around podcasts. Again, shutting off the aggregation function.

Google Reader was very popular. With some additional tools to create self-hosted content, likes, and some additional discovery tools, it could have been a distributed twitter.

We may be able to do this.