Tuesday, March 15, 2011

Higher dimension!

I was trying to extend my outlier detection technique for higher dimensions. I kept the basic same and try to distinguish outliers in higher dimension. My original technique was based on nearest neighbor and I figure out my approach is not working for higher dimension very well :( I went deep into the program and trying to figure out 'why?' Eventually I realized the average density within certain radius is not very different than the average density of an outlier! In brief the distance is not a discriminative feature of an outlier!

I look into the dataset and it looks okay, then I look into the literature and found something equally interesting and horrifying :( nearest neighbor for higher dimension is not very meaningful; for high dimensional data the ratio of the distance to the nearest neighbor to the distance to the furthest neighbor is 1 and therefore the nearest neighbor based discrimination is not fruitful at all! The situation may improve if we change the distance metric from euclidean to manhattan but not full proof! The good thing about this result is all distance based outlier detection is use less for higher dimension but the bad things is mine is also use less! I have to look for something very different than nearest neighbor.

It is not very intuitive to visualize higher dimensional space and therefore is not easy to imagine the notion of outliers in higher dimension! I guess I need to figure out about what's work on higher dimension. That would not be a easy journey at all.

No comments:

Post a Comment

Please, no abusive word, no spam.