Project Overview

Context

In a Cloud Computing  platform software is migrating from the desktop into the "clouds" of the Internet, promising users anytime, anywhere access to their programs and data. Present Cloud Computing platforms do not   support redundant, self­-recovering programming models for recovering  from many inevitable hardware/software failures.

Panacea  project  discusses the challenges that the cloud computing paradigm poses to application users,  application developers and telecommunication providers and presents solutions.

 

Objective

The main objective of the project "PANACEA" is to provide Proactive Autonomic Management of Cloud Resources, based on Machine Learning,  as a remedy to the exponentially growing Cloud complexity. PANACEA will allow users several advanced possibilities, based on the Machine Learning (ML) framework, and the autonomic principles:

  • Proactive autonomic management of cloud resources.
  • Proactive software migration within the cloud(s).
  • Creating mission-oriented distributed clouds with autonomic self* properties.
  • Efficient use of cloud resources.
  • Monitoring, controlling   and pro-actively managing  applications’ executions (VM migrations, proactive rejuvenation, predicting the threshold violation of response time and the time to crash).

 

Methodology

The PANACEA distributed architecture is a two-level architecture. The first system level is composed of private research clouds controlled by Intra- Autonomic Cloud Manager(s) (Intra-ACM) monitoring, controlling and proactively managing applications’ executions inside each cloud. The second system level is a federation of "private research clouds" controlled by the Inter-Autonomic Cloud Manager (Inter-ACM) in charge of monitoring the quality of the paths in the overlay network and proactively reconfiguring it.

Figure: PANACEA Distributed Architecture

The ML framework will allow predicting the failure time of software, or user applications running on Virtual Machines and the violation of expected response time of cloud services. The complexity of prediction models will be reduced by removing a number of irrelevant parameters from the training data set while preserving the accuracy of predictions.
To deal with the vast number of possible resources to monitor, our main approach will consider the use of mobile agents, which will move on the cloud, interacting with other agents, reading computing and network sensors, and making autonomous decisions on what to measure, when to report and to whom.
Distributed Machine Learning will be used to enforce "self-organizing paths" on an overlay network, which will maintain the quality-of-service of end-to-end flows in the presence of traffic congestion.
To ensure a reliable operation of PANACEA, replication services, leader election and distributed locking mechanisms shall be used wherever critical information is maintained.

The project is organized into 6 work packages. The project will identify use cases, which will help to validate its scientific outputs.

 

Expected impacts

PANACEA will provide the infrastructure foundation that will allow cloud scale autonomic management. The expected impacts are as follows:

  • for the cloud stack: software-enhanced capabilities and more competitive stack for adoption,
  • for the cloud provider: increased resilience and reliability for the client, and lower operating cost for the provider.