Information loss of the Mahalanobis distance in high dimensions: Matlab implementation

The information loss is estimated and exploited to set a lower limit for the correct classification rate achieved by the Bayes classifier that is used in subset feature selection. Details of the method can be found in journal paper [6]. The functions to estimate the lower limit of correct classification rate (CCR) can be downloaded from Matlab File Exchange.

 

Explanation: For 1000 samples per class (N_Dc), using just 50 features (D, dimenstionality), 12.5% of the information of the Gaussian PDF that models the distribution is lost. So you have an +-12.5% ambiquity that your classification result is accurate.



Stackoverflow profile

profile for jimver04 at Stack Overflow, Q&A for professional and enthusiast programmers

Stackoverflow Reputation curve

Google scholar citations per year

Google scholar citations per year
Click image for more details

Total Blog Pageviews