| ¡¡ | Chinese Journal of Computers Full Text |
| Title | A Margin Based Feature Extraction Algorithm for the Small Sample Size Problem |
| Authors | HUANG Rui HE Ming-Yi YANG Shao-Jun |
| Address | (School of Electrical and Information, Northwestern Polytechnical University, Xi¡¯an 710072) |
| Year | 2007 |
| Issue | No.7(1173¡ª1178) |
| Abstract & Background | Abstract Feature extraction techniques are widely employed to reduce the dimensionality of data and to enhance the discriminatory information for the classification and recognition tasks. Linear Discriminant Analysis(LDA) is the most popular supervised method for feature extraction, but it often suffers the small sample size problem due to the singularity of the within-class scatter which arises if the number of samples is smaller than the dimensionality of samples. A margin based feature extraction algorithm is proposed for the problem. In view of the facts that for the high-dimensional data, the probability of linear separability may grow in case of small samples and the low-dimensional projection is approximately normal, the proposed algorithm introduces a new definition of the margin, which involves not only the between-class scatter and within-class scatter proposed by LDA criterion, but also the differences of the class variances. Through maximalizing the margin, we can obtain the optimal projection vector, and avoid the small sample size problem. Through theoretical analysis, the algorithm is further extended to the multi-class case. The experiment results show that the algorithm outperforms several improved versions of LDA in the case of small samples. At the same time, a satisfying performance is also achieved for larger samples. keywords feature extraction; linear discriminant analysis; small sample size problem; pattern classification; maximum margin background Advances in hyperspectral remote sensing have provided an important means for monitoring the world. The resulting high-dimensional data collected at hundreds of adjoining and narrow wavelengths benefits better discrimination among similar spectral signatures or fingerprints than the traditional multispectral data with low spectral resolution, and have been widely used in aerospace, earth observing, lunar and mars exploration, biomedical engineering etc. However, the vast amount of data volume presents challenging problems for the subsequent information processing. Task-oriented feature extraction has become one of the most important research tasks and attracted more and more attentions. Feature extraction, transforming the original data from a high dimension into a lower dimension with most of the desired information content preserved, has been widely used for dimensionality reduction and discriminatory information enhancement. However, how to extract useful features is still an open issue. Linear Discriminant Analysis(LDA) is one of the most popular supervised techniques for feature extraction. It finds the set of the projection vectors which maximize the ratio of between-class scatter against within-class scatter(Fisher¡¯s criterion). However, LDA may encounter the so-called small sample size(SSS) problem which arises whenever the number of samples is smaller than the dimensionality of samples. In the case of the SSS problem, the within-class scatter becomes singular and thus LDA fails. In recent years, researchers have proposed different schemes to solve this problem. But most of them ignore the difference of class variances as the LDA criterion does. In view of these, the method proposed for the SSS problem in the paper defines a new margin which involves not only the between-class scatter and within-class scatter, but also the differences of the class variances. Besides, it is further extended to the multi-class case through theoretical analysis. This work is supported by a grant from the National Natural Science Foundation of China (No.60572097). |