[time-nuts] Discarding outliers in two dimensions

Wed Dec 9 10:53:06 UTC 2009

Suppose I want to average a bunch of samples.  Sometimes it helps to discard 
the outliers.  I think that helps when there are two noise mechanisms, say 
the typical Gaussian plus sometimes some other noise added on.  If the other 
noise is rare but large, those occasional samples can have a big influence on 
the average.  So discarding those outliers gives better results, for some 
value of "better".

I know how to do it in one dimension.  How do I do it in two dimensions?

Say I have a lot of samples from a GPS system and I want to compute the best 
position to use when shifting into timing mode.

For one dimension, you sort, compute the average, then compute the distance 
of the first and last samples from the average.  Discard the one that is 
farther from the average.

The problem with two dimensions is I don't know how to sort.

Let's ignore efficiency.  I can compute the average without sorting.  I can 
scan the whole list looking for the one that is farthest (radial distance) 
from the average.  Does that work (and do what I want)?  (I think so, but I'm 
not sure.)

Is there a way to do that efficiently?

-- 
These are my opinions, not necessarily my employer's.  I hate spam.