| ¡¡ | Chinese Journal of Computers Full Text |
| Title | A Survey of Deep Web Data Integration |
| Authors | LIU Wei1) MENG Xiao-Feng1) MENG Wei-Yi2) |
| Address | 1)(School of Information, Renmin University of China, Beijing 100872) 2)(Department of Computer Science, State University of New York, Binghamton 13902) |
| Year | 2007 |
| Issue | No.9(1475¡ª1489) |
| Abstract & Background | Abstract As the rapid development of World Wide Web, there is tremendous information "hiddened" in Deep Web, and its capacity is increasing rapidly. The information can only be accessed by the query interfaces provided by Web database. The data in Deep Web are obtained in the form of dynamic Web pages when users send a query. Due to the poor structure of Web pages and the instability and large scale of Deep Web, it is a very challenging task to integrate the abundant information automatically and use it effectively. Until now, Deep Web data integration has still been a rising research field, and there are a number of challenging issues in it. A great deal of research works is developed in this field, but it is imbalanced on the issues of this field. A framework of Deep Web data integration is proposed in this paper, and the key research works in Deep Web data integration are classified and surveyed according to this framework. At last, the deficiencies in this field are analyzed and the suggestions for future research works are put forward. keywords World Wide Web; Deep Web; Web database; query interface; Deep Web data integration background The work is supported by the National Natural Science Foundation of China under grant No.60273018, the National High Technology Research and Development Program (863 Program) of China under grant No.2002AA11304, and the National Basic Research Program (973 Program) of China under grant No.2003CB317000. As the rapid development of Web, a large number of Web data sources are emerging. So it is more and more difficult for users to get their desired information among these Web data sources manually. The intended purpose of those projects is to provide users an automatic approach to achieve and integrate the information in Web. In recent years, more and more attentions have been given to this area, and a great number of researchers have focused on some issues in it. In the past years, the authors have researched and developed a lot of techniques in the area of Deep Web integration, and these works mainly focus on Web database clustering, Web query interface integration and Web data extraction. A lot of issues in this area still have not been addressed well, or have not been touched even. So the content of this paper mainly provides a summary for previous works and helps researchers pay attention to the interesting issues need to address. |