| ¡¡ | Chinese Journal of Computers Full Text |
| Title | New Event Detection Based on Division Comparison of Subtopic |
| Authors | HONG Yu ZHANG Yu FAN Ji-Li LIU Ting LI Sheng |
| Address | (Information Retrieval Laboratory, School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001) |
| Year | 2008 |
| Issue | No.4(687¡ª695) |
| Abstract & Background | Abstract New event detection is an important research in the field of topic detection and tracking, and its task is real-time monitoring the stream of news stories and identifying the new topics in it. Current methods match the topics and stories as they are single-structured vectors of terms, which make the subtopics become noises of each other, and these noises often describe wrong semantics, by which the identification of new topics would be misled. In response to this defect, this paper proposes a new event detection method based on division comparison of subtopics, which divided each topic and story into different subtopics and identified new topic basing on the proportion and distribution relations of the relevant subtopics. This method achieves substantial improvement on TDT4 and TDT5, whose minimum cost of detection error is 0.4061 and missing probability is 0.1859. keywords new event detection; topic detection and tracking; subtopic background Topic Detection and Tracking, named TDT for short, is a challenging direction of research on natural language processing which aims at developing technologies for event-based information organization, such as detecting stories reported on new topics and tracking stories on known topics. Linguistic Data Consortium, named LDC for short, provided multiple sources of information for training and test of TDT. These sources, including both text and speech, are namely newswires, radio and television news broadcast programs, and WWW sources. The source languages are English, Mandarin and Arabic. The information streams are modeled as a sequence of stories, which provide information on many topics. National Institute of Standards and Technology, named NIST for short, provided guides, tools and dry runs for evaluation of TDT. In the initial TDT research, conducted during 1996 and 1997, the notion of a topic was limited to be an "event", meaning something that happens at some specific time and place. In the second TDT project, TDT2, the definition of a topic was a seminal event or activity, along with all directly related events and activities. This definition was retained for the third project, TDT3, TDT4 and TDT5. A story will be considered to be "on topic" whenever it discusses events and activities that are directly connected to that topic¡¯s seminal event. TDT includes many tasks, such as story segmentation, link detection, topic detection, new event detection, topic tracking, etc. New event detection, named NED for short, is one of the most important tasks in TDT, which aims at real-time monitoring a chronologically ordered stream of stories and identifying the first story that discussed an event. |