¡¡Chinese Journal of Computers   Full Text
  TitleDiscovering Signature of Potential Web Communities from Clusters of MCL
  AuthorsYANG Nan LIN Song-Xiang GAO Qiang MENG Xiao-Feng
  Address(School of Information, Renmin University of China, Beijing 100872)
  Year2007
  IssueNo.7(1086¡ª1093)
  Abstract &
  Background
Abstract Web community is an important social activity in the evolution of Web. The paper analyzes typical algorithms of present Web communities¡¯ discovery. Under the condition of non-topic pre-defined and implicit communities, a new method is proposed, which combine both characteristic structure of community and the clusters of Markov Graph Clustering(MCL) to find implicit communities. The procedure of deleting mirror or near-mirror pages is arranged behind graph clustering so that decrease comparing cost considerably. Then a community member select algorithm is used to produce the set of community candidates. The experimental results show the new method works properly and many Web communities are inferred.

keywords Web community; link analysis; MCL graph clustering; flow simulation; random walk

background Web community is an important social activity in the evolution of Web. Many communities have been aware of by people. But there are many communities implicit to people and how to mining them is a hard job. Although many researches on communities have gained great progress, but there are still many problems unsolved.
This work is part of 211 projects of ministry of education, entitled Research on Discovery Technology of Web Resources. In this paper authors analyzes typical algorithms of Web communities¡¯ discovery in the present. Under the ground of non-topic pre-defined and implicit communities, they propose a new method. They combine both characteristic structure of community and the clusters of Markov Graph Clustering(MCL) to find implicit communities. This work begins to apply graph clustering technology to very large scale graph and gain progress. The next work will focused on mining the hierarchical structure of Web and communities and automatic topic extraction on clusters of Web.