¡¡Chinese Journal of Computers   Full Text
  TitleReliable Resource Provision Policy for Cloud Computing
  AuthorsTIAN Guan-Hua1),2) MENG Dan1) ZHAN Jian-Feng1)
  Address1)(Institute of Computing Technology, Chinese Academy of Science, Beijing 100190) 2)(Graduate University of Chinese Academy of Sciences, Beijing 100049)
  Year2010
  IssueNo.10(1859¡ª1872)
  Abstract &
  Background
Abstract Cloud computing has become a hot topic, researchers proposed various resource sharing technique and resource provision technique. However, very limited literatures pay attentions to the reliability of dynamically provided resources. This paper proposes failure rules aware node resource provision policies for heterogeneous services consolidated in cloud computing infrastructure, and evaluates the proposed policy with simulation approach, i.e., implements a simulator of heterogeneous service consolidation platform, which take characteristics of heterogeneous services (both characteristics of resource utility and failure rules), into consideration, and uses two production traces to synthesize inputs. In order to evaluate wide ranges of failure rules, this paper proposes a multi-dimension failure modeling framework, i.e., adapt various factors about failure distribution involving temporal and spatial factors to study the proposed policy¡¯s capability. The results of evaluation indicate that the proposed resource provision policy is effective for providing robust nodes for heterogeneous services, i.e., the policy can mask more potential node reboot failures from services and leave less chances of unplanned failures, e.g., service failure or node reboot, compared with baseline fault re-provided policy. In addition, the policy is able to mask non-uniformly distribution among resource¡¯s reliability system wide. Meanwhile, the policy involves no negative impact on service performance and on node¡¯s resource utility, compared with baseline policy. Evaluation with failure rules about temporal and spatial factors indicates that the policy is useful for could computing environment. Keywords resource provisioning; reliability; failure rules; cloud computing; heterogeneous workload Background Cloud computing, as a novel computing paradigm, become popular. almost all leading IT enterprises propose their Cloud architecture and infrastructure to facilitate managing and sharing massive scale computing resources. In fact, researchers proposed various resource sharing techniques and resource provision techniques. However, very limited literatures pay attentions to the reliability of dynamically provided resources. This paper proposes a failure rules aware node resource provision policy for heterogeneous services consolidated in cloud computing infrastructure. This work is supported by National Science Foundation of China under grant of Nos.60703020, 6093300 and the National High Technology Research and Development Program (863 Program) under grant of Nos.2006AA01A102, 2009AA01A129, and 2009AA01Z139. The project focus on improving the availability of large scale systems.This work improves reliability of dynamically provisioned resources in cloud computing. The group is always doing researches in the field of dependability and availability for large scale systems, and has developed Fire Phoenix system, a scalable and fault-tolerant cluster management system for Dawning4000 series, and Dawning5000 series. The work is published in Proceedings of the 7th IEEE Cluster Computing 2005. Meanwhile, the group also got some interesting achievements in the relative fields, e.g., QoS guarantee, fault diagnosis, performance debugging. Jiang Y et al. introduced a promising admission control policy to guarantee QoS of cluster systems, and her work is published in Proceedings of the 8th IEEE Cluster Computing 2006. Wu L et al. proposed a fault diagnose algorithm for cluster systems, and his work is published in Proceedings of the 20th IEEE IPDPS 2006. Zhang Z H et al. proposed a precise request causal path reconstruction algorithm, the work published in Proceedings of the 39th IEEE DSN 2009.