¡¡Chinese Journal of Computers   Full Text
  TitleA Method of Detecting Phishing Web Pages Based on Hungarian Matching Algorithm
  AuthorsZHANG Wei-Feng1) ZHOU Yu-Ming2) XU Lei2) XU Bao-Wen2)
  Address1)(Scholl of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003) 2)(Department of Computer Science and Engineering, Nanjing University, Nanjing 210093)
  Year2010
  IssueNo.10(1963¡ª1975)
  Abstract &
  Background
Abstract It is the key problem for detecting the phishing pages how to quickly and efficiently to calculate the similarity of web pages. There is still a large space to improve the detecting efficiency in current anti phishing method. A method of detecting phishing web pages based on bipartite graph matching is brought forward. In this model, the signature of text, the signature of images, and the signature of the overall web page are extracted. Then, by the Hungarian algorithm, the best match in the bipartite graph(signatures in different pages) is found. The pairs of features are then used to measure the similarity between pages in an more objective way, thereby the effectiveness of phishing page detection is improved. A series of simulation experiments show that this method is feasible with high precision and recall rate. Keywords antiphishing; web metric; bipartite graph matching; similarity; web page signature Background Along with the popularization of Internet applications, more and more people have been accustomed to various online Internet services such as online banking, online shopping, etc. In the mean time, the number of phishing web sites aiming to steal the sensitive information of the victims has been rapidly increased. Phishing web pages are the fake web pages created intentionally by some criminals, who copy web pages from real web sites. Therefore, most phishing web pages have high visual similarities to their real counterparts. In general, phishing web sites deceive Internet users by mimicking the interface of their real counterparts. When a user enters in a phishing site, sensitive information, such as user name, password, bank account, credit card number or other important personal information, such information will be stolen and may be illegally used by the phishing web page owners. This is very likely to result in huge loss to the users. Near duplication detection is an effective scheme in phishing detection. In contrast to previous methods each of which uses a simple similarity scheme on the features, image features, or overall features, this paper enhances the performance of phishing detection by using a novel approach that synthetically exploits all the three feature types. The approach uses Hungarian matching algorithm to calculate the similarity between web pages, and adapts regression methods to compute the optimal weights of the three feature types. Specifically, the approach first collects signatures of web pages, i.e. text features, image features, and overall features, from the pages rendered in the browser, by which the overall features from viewpoint can be characterized. Second, the Hungarian algorithm is used to compare the web page signatures, which enhances the effect of similarity comparison. Last, regression models are built up to seek the optimal weights for the three feature types and to seek an appropriate model to detect phishing. Experiments show that it is beneficial to exploit all the three feature types synthetically and to compute similarity using the Hungarian algorithm, as the proposed method achieves better performance than some existing methods.