Saturday, February 04, 2006

Information Life Cycle Management

Not so long ago, hard drive storage was too expensive to store much data. So computer scientist of those days (1990s) had to work out the IT environment that would co-exist with paper-based process. During dot-coms and the early 2000, hard drive pricing has come down dramatically and not to mention the new hard drive technologies (SAS, SATA - in previous post).

Since hard drive cost is much lower than before. Many organizations were trying to be a paper-less organization during the late 90s. Not knowing that many documents can not be digitized easily, these organization had to convert their paper documents by scanning analog (paper) data into digital (picture) data. Then most had to buy an OCR (Optical Character Recognition) solution to convert certain texts in the digital picture to be digital data. Moreover, these organizations have new challenges since 9/11. Most documents are now entitled to be kept up to 10 years. So these scanned (large picture) document are now entitled to be kept up to 10 years.

So this creates a whole new set of problems:
1. The ability to store those "important" data for a long period of time (10 years).
2. The ability to ensure that you can retrieve these "important" data at anytime either in the online storage or the archive storage.
3. If #1, #2 are not met, they are going to be penalized heavily.

To imagine the amount of data in the problem, let's vision a bank. In a normal bank's home mortgage loan application, the bank would need the pictures of the property, the map to the property, the applicant's financial statement, the applicant's credit history, the guarantor's credit history (if needed), and so on. Imagine that the bank now has to convert everything within a loan application to be digital and keep them for 10 years. In the early 90s, the bank would just filed these papers up, boxed them, and shipped them off to Timbuktu for safe-keeping. They only need to know what amount this applicant is due at what day every month once the application is approved. So the data being entered in the early 90s were much less. They might only put in a reason (240 characters) why this applicant is rejected in the computer for future reference.

Since most organizations are frenzy about digitizing everything, the data grows exponentially. Look at the bank loan example, in the early 90s, that could have meant a 100KB amount of data per applicant. Now, that same application will easily take tens of Megabytes because of digital pictures. I wouldn't argue that having everything online is much faster than going through mountains of loan application to find the map to the property. But it's a sample why we need "Information Life Cycle Management"

ILM (the short name) is part of the corporate information strategy. It outlines the whole value chain of data within an organization beginning from data creation, data distribution, data modification and maintenance, up until data disposition (delete). It will require abundant resources from an organization to set the policies, processes, and procedures around data.

Let's look back at the home morgage loan application, ILM will specify how the data is being created - what data is needed for a loan application - pictures (house, map to property), documents (digital - credit history, financial statement, guarantor's financial statement), and more. Once the bank recieves the data (loan application), ILM will specify who this data is going to. In this sample, the application will be forwarded to loan processing department and the appraisal department. .. The whole application process will run through until the end of the process. When the loan is approved. All documents will be forwarded to the archive waiting for the next retrieval. The archive could be cheap storate, tape media, or sometimes DVD. There are a lot of vendors selling specialized ILM solutions based on industry (financial, automotive, and more).

To simplify thoughts on ILM, you should ask yourself where would you put the data you rarely use? Cheap storage or expensive storage? If it's rarely used, you should put that data in the cheapest possible storage that could serve your access policy (how fast can you retrive that data). I.e. if you must retrieve that information within minutes, the media you moved the data to must be online (disk or tape media or DVD jukebox system). By mentioning online, it means that the tape media must be inside the tape library to be able to serve the requirement.

So ILM tends to save money for corporations, however, since it's a new technology, it rarely gives return on investment faster than 24 months.

So ask yourself today, do you want to keep all available data online?, how fast do you want to restore your data?

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home