Fitting Data

A little bit of a paradox: if we look at data points (x,y) and chose x,y randomly with the uniform distribution in [0,1], we expect the best linear fit y = m x to be y = x due to symmetry. This is wrong. The best linear fit is y = (4/3) x. One can already see that when taking two points (2,1), (1,2). The best fit minimizes (f(m) = (m 2-1)2 + (m 1-2)2 which is 4/5, not 1, as one might think. Here is an animation where we chose 1000 data points and start adding more and more data points, always keeping to compute the best linear fit. By the way, in Mathematica, you can get the best linear fit y=mx with the command Fit[{{1,2},{2,1}},{x},x] which gives 0.8 x. Usually, in statistics, when doing regression, one looks at the best affine fit y = m x + b. This is however a problem for two variables which appears in multi-variable calculus. Linear algebra is the best way to get it. With Mathematica, we get it with Fit[{{1,2},{2,1}},{1,x},x] which gives f(x)=3-x as the best fit.

Direct Media Links: Webm, Ogg Ipod