ACE: Classification for Information Lifecycle Management

Gauri Shah, Kaladhar Voruganti, Piyush Shivam, Maria del Mar Alvarez Rohena

Fourteenth NAASA Goddard, Twenty-third IEEE Conference on Mass Storage Systems and Technologies. May 2006, To appear.

Abstract

One of the principal problems in Information Lifecycle Management is to align the business value of data with the most cost-effective and appropriate storage infrastructure. In this paper, we present ACE: a framework of tools for ILM, that classifies data and storage resources, and generates a data placement plan for informed utilization of the available storage resources in the system. The goal of ACE is to design a data placement plan that provides cost benefits to an organization while allowing efficient access to all important data. To achieve this goal, ACE uses a policy-based approach to classify data and storage based on the metadata attributes and hardware capabilities respectively. The main advantage of using ACE is that it enables appropriate usage of under-utilized storage systems without extensive human intervention. Another key characteristic of ACE is that it uses a policy-based architecture to automate the process of data valuation and storage classification.

We implement the ACE framework and evaluate its benefits for three real data sets. One data sets consists of $1.28$ million anonymous medical industry record files of total size 1461GB, and we show that ACE provides a cost benefit of greater than 70% over the lifetime of the data. In addition to the novel valuation algorithms and overall architecture, we also demonstrate performance optimizations that reduce the data classification time to 85% of the time taken without these optimizations, while maintaining classification accuracy of over 85%.




Bibtex:
@inproceedings{ShahVSZ2006,
title="ACE: Classification for Information Lifecycle Management",
author="Gauri Shah, Kaladhar Voruganti, Piyush Shivam, Maria del Mar Alvarez Rohena",
booktitle="Fourteenth NAASA Goddard, Twenty-third IEEE Conference on Mass Storage Systems and Technologies",
month={14--16~} #may,
year="2006",
address={College Park, Maryland, USA},
notes={To Appear}
}