计算机学报

	Chinese Journal of Computers Full Text
Title	A Comparative Study on Time Series Classification
Authors	YANG Yi-Ming1) PAN Rong2) PAN Jia-Lin2) YANG Qiang1),2) LI Lei1)
Address	1)(Software Institute, Sun Yat-Sen University, Guangzhou 510275) 2)(Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong)
Year	2007
Issue	No.8(1259—1266)
Abstract & Background	Abstract Time series classification or categorization is an important task in time-series analysis. Unlike traditional methods and problem formulations in time-series analysis, time series classification aims to take whole time sequences as input, and produce discrete labels that are assigned to each sequence. Compared to traditional classification problems, time series classification poses additional difficulties. A major difficulty is due to the fact that the time sequences are variable in length, making many traditional classification methods unable to apply directly. Even for sequences of uniform lengths, many methods can still not be applied directly because often the data located at different parts of the sequences are incomparable. Two methods have been tried separately in the past, including distance based methods such as DTW, and model based methods such as Markov models. Using either of these methods as preprocessing steps, a uniform length vector space can be built to enable the classification methods to be applied. In the past, there has been a lack of comparison between these two methods. This paper compares distance and model based methods on several data sets including synthetic and real data sets, to explicate the relative advantages and disadvantages of these methods. This paper presents several key observations on the relative merits of these two methods, and paves the way for further research in developing new methods for time series classification. keywords classification; time series; model based clustering; Markov model; statistical learning background Time-series learning has long been an important topic in machine learning and data mining research due to its wide-ranging impact in applications such as stock-market analysis, speech recognition, hand-written character and word recognition, and sensor-network-based activity recognition to name a few. There are several different aspects of the time-series learning problem, among which the whole time-sequence classification problem, which determines how to classify an entire sequence into one of several discrete labels, is an important sub-problem with many industrial applications. Unlike traditional methods and problem formulations in time-series analysis, time series classification aims to take whole time sequences as input, and produce discrete labels that are assigned to each sequence. Compared to traditional classification problems, time series classification poses additional difficulties. A major difficulty is due to the fact that the time sequences are variable in length, making many traditional classification methods unable to apply directly. Even for sequences of uniform lengths, many methods can still not be applied directly because often the data located at different parts of the sequences are incomparable. This paper focuses on the supervised whole-time-sequence classification problem and offers a novel solution for solving it.