Peter on Storage: February 2006

Wednesday, February 22, 2006

Storage Service Provider - Consumer Storage Service Provider

I've mentioned a few things about Network (network bandwidth) for SSPs in the earlier post. However, I could only cover the first part (Consumer Storage Service Provider). I think we need to expand the gist of Consumer Storage Service Provider a little more. Enterprise Storage Service Provider will be longer than Consumer Storage Service Provider (this may run well into the weekend).

For Consumer clients, they won't pay for dedicated network bandwidth (leased line, T1, T3, or MPLS service). Their access pattern will be through the Internet either it's domestic Internet or International Internet. For this reason, Consumer SSP will have to carefully calculate the required network bandwidth as I did earlier.

In addition, Consumer Storage Service Provider (CSSP) wouldn't provide direct connectivity from client's servers to IP SAN storage. CSSP needs to get a PC or a server to do file-sharing, web-based file upload/download, or even file synchronization. My company, Northstar Infosys, is developing a backup software based on the above functionalities called Northstar Backup.

So CSSP will need the following:
1. IP SAN Storage (iSCSI - namely Intransa, Equallogic, Left Hand Networks) - I have relationships to two of of three. Their products are equally good. Pick one.
2. Servers - 1U server, GigEthernet ports, enough RAM, One CPU (any current server CPU will suffice) since CSSP application will be network bound.
3. Develop your own backup software (like what I did) or license it from vendors (like me, Veritas, and the likes).
4. Buy co-location service with large Data centers
5. Start CSSP service business. If you are not familiar with ISP business, you'll need to study the market, write the business plan, remember to think about your Marketing plan. I find a number of books pretty well written. I'll write about them in the later posts.

Remember, invest in CSSP at your own risk. My experience is that people will utilize your CSSP in a totally different manner. So jump into this with both eyes open. If you need any clarification, please feel free to drop me a comment. Since I do consultant work with large telcos in Asia Pacific region. So please be patient if I don't get back to you in a day.

Good luck.

Monday, February 20, 2006

Storage Service Provider - The Network is the Computer

Since the last post, I spent a lot of time thinking about what am I going to do after this CNY. First, NorthPole as a company is starting to get into shape. I'm working as the chief architect of the whole thing. Fortunately, I had the time off to Koh Samed (not Koh Samui). Koh Samui has its place on the world map. However, Koh Samed is only limited to few who has been around in Thailand. You can view pictures from Koh Samed here, here, here, and here. (Pardon Thai language in the post, but you'd get the picture)

Network is a technical aspect of your Service Provider. It's the core competency for every SSP. Since having storage alone does not bring home the service. You need a really large network in order to provide sustainable Storage service.

Let's start from the client towards SSPs. We can easily divide two groups: consumer and corporate clients.
1. Consumer clients (or SMBs) are typical clients that will only utilize the storage at SSP via the Internet.
2. Corporate (enterprise) clients are companies that would require managed storage service through dedicated network links.

The divider between these two groups are based on monthly recurring charge (MRC). Consumer clients usually subscribe less than 10GB and pay not more than 100 USD per month. (Xdrive.com offers storage rental for 10 USD per 1GB per month, webhard is another notable company in this category) Whereas Corporate clients are currently subscribing at least 50GB and willing to pay much higher than 1000 USD per month. If you prepare the serve any of these two groups, you need to design your network infrastructure accordingly.

In case 1 (consumer), you don't need to provide network access (usually broadband internet) to these clients. You only subscribe enough bandwidth to serve your clients' access pattern. In telco term, typically you need to think about BHCA (busy hour call average - in telephone company term) or concurrent sessions. Consumer client with 1GB subscription should update information approx 1% daily or 10MB. However, there is a connection burst for these clients when they first joined. So preparing enough bandwidth for concurrent sessions for existing clients and burstable network for new clients are the key.

For example, to provide consumer storage service to 10,000 people, you need to prepare 10TB of usable space (1GB subscription for 10,000 people). Required network is approx 10,000x10MB = 100GB transfer over 12 hours (daytime or working hours). If we use uniform distribution, average transfer is 8.533 GB/hour or 2.37 MB/second or 18.96 Megabit/second.

So you'll need at least 18.96 Mbps network link. However, if your consumer clients' access pattern is not uniform over 12-hour period. If every clients would like to log-on from 6PM-12AM, you'll need 18.96x2 = 37.92 Mbps network link. You should subscribe network bandwidth between these two numbers.

Let's see how much money are you going to make per month. 10,000 paying clients for average 1GB each, that equals 10,000 * 10 USD/month = 100,000 USD/month revenue. Buying Wintel servers (with or without clustering software) + 10TB iSCSI storage should be well within 200,000 USD. Your hosting charge could be well within 5,000 USD/month (inclusive of 45Mbps link). Your ROI is going to be less than 9 months (if you can not get 10,000 clients to sign up within the first month.)

We'll talk about Case 2: Enterprise clients when i'm back.

Thursday, February 09, 2006

Storage Service Provider - The How-To

How to Start Your Own Storage Service Provider? Flashback from ISP days.

I had the opportunity to setup an ISP during Dot-Com days in the bay area from 1997 - 2000. I also had experience with Wireless gears from Breezecom during 1999-2000 to build a WISP (Wireless ISP). To build an ISP, normally you need to following:
1. Router - for Internet gateway
2. Leased line - connected to your Internet gateway
3. POP (Point of Presence) - your dial-up modems for users to connect
4. RADIUS (authentication server) - for POP to verify users
5. DNS (Domain Name Server)
6. E-Mail system (both e-mail relay, e-mail gateway, and e-mail server)
7. Billing system (this could come with RADIUS) to charge customer based on their usage, or you can choose to charge flat rate
8. Web server - for ISP's sake

The cheapest way to build an ISP is:
1. Router (low-end router that could server T1 line)
2. POP (racks of modems)
3. Authentication/Billing PC (Radius+Billing)
4. E-mail/DNS/Web server PC

All above could be done with 10,000 US dollars. What about WISP? What are the required equipments? Wireless ISP provides Internet for users through wireless traffic (IEEE 802.11 - or WiFi or 2.4GHz free spectrum). Instead of buying a POP, WISP buys a Wireless antenna (directional or others), outdoor Wireless gear and put these on high water tank or up the hill from a village. This could serve a number of clients. So WISP customers would have to buy antenna, WiFi card to connect. So the cheapest way to build a WISP is:
1. Router (same as above)
2. Wireless POP (antenna, wireless radio - access point, wiring from WISP office to WiPOP location)
3. Authentication/Billing PC
4. E-mail, DNS, Web server PC

Since ISP market was under consolidation, the wrath of AOL, a lot of small ISPs and WISPs are going under. It's too bad since return on investment for ISP, WISP is normally within 6-9 months in a rural area without the big ISPs' presence.

How to Build Your Own Storage Service Provider

As mentioned yesterday, you at least need Network and Storage to start SSP service.

1. The right data center

Since Storage systems are very sensitive to heat, dust, and humidity. Finding the right home (data center) is the first issue to solve. Current storage systems do not fare well against vibration (its performance drops significantly), nor heat (typical storage system creates more heat than a blade server with the same size). There are many data centers around the world. They are facility provider (space, power, cooling, and some may provide monitoring service). They charge based on space requirement and additional service.

You'll need data center with 24x7 staff, monitoring service, and a good SLA to respond to your request to change spare hard drives.

2. Large Network Connectivity to the Data Center

Storage traffic is much more bursty than typical Internet traffic. You need to find a co-location provider that's connected to the largest Internet exchange as possible.

3. Large Storage Pool

SSP needs to spend the most of its investment in researching the best partner to deliver this service. There are certain companies with good technology but their management does not have the right mindset to deliver SSP service. So go with the one with aggressive attitude and good technology. Price is not the main issue here. You don't want to work with a vendor that will go away in 6-12 months.

You will need to reserve enough space in the data center to host at least 20 TB of storage. Based on my calculation, 20TB is the right combination between investment (storage price) and cost of maintenance (administrator salary). Buying a small storage system to start with, most of your cost is going to be administrator cost. So buying the right amount of storage is the key.

4. Storage Feature

Storage system feature is another critical issue. Buying a new brand with no distinct feature is a bad idea. This is directly involved with #4. But it deserves its own place. Typical features that you have to look for are: Snapshot, Replication, RAID (0,1,5,10,50), BCV (EMC term), Scalability, Reliability, Availability (whether everything's redundant).

Since storage system will host customer's data, reliability and availability are very important. You cannot afford to stop Storage service once it's started. It must be at least 99.99% uptime.

5. People/process/procedure

You need to hire the best you can afford. This is the make or break issue. I have had the chance to train my team from zero. So it's not very hard but you need to set the courses for new-hires. It's better to make them online so new-hires can go and read before asking the seniors. I recommend to look at ITIL. My company does ITIL training for years. It's already proven and is going to be ISO 20000.

Once you have all the above, you can start your SSP service. The minimum requirement for an SSP is:
1. Storage system (20TB)
2. Management console PC
3. Data Center connectivity (committed 1/10/100 Mbps - depending on customer volume)
4. Storage system administrators (24x7)
5. Network equipments, Fiber-Channel equipments (depending on storage technology that your SSP is going to offer). Currently the most popular storage technology is Fiber Channel. This depends on the end user's requirements.

If you need to clarify these points, please feel free to write a comment or leave me your e-mail address.

Wednesday, February 08, 2006

Storage Service Provider - The Beginning

In The End of Yesterday's Post

Yesterday i've left the Disk-to-disk backup short because i need to meet a friend. This friend owns part of a company that would change the telecommunications world with a technology called JAIN SLEE. I'm the believer of this technology since the day i've heard 3 years ago. It will take another 3-5 years for it to mature. I've known him almost 4 years and still we could have had success in selling to DTAC had i not been driven away from Sun MicroSystems. We are planning our collaboration for this year's activities. I now am a free agent. In addition, since i've been away from Sun for more than 1 year. Legally, I can go back to Sun's customers, Sun's employees (yes, if i want to hire them, i can now!).

So let's look at the technology in another perspective (newbie's perspective)

Disk-to-disk Backup

Backup is the most fundamental thing in IT business. You need backup because everything will fail. (Your computer, your hard drive, any computer components) So the only way to cope with failed equipment is to make sure that you have "backup". Most people think backup is just the data that you've created. In fact, backup is everything. It depends on how important the backup is to you.

If you are a multi-billion-dollar business, you cannot afford to lose computing service at anytime. For example, one of my previous engagement at a large Petro-chemical company brought me to realize that losing their computing service for an hour caused them 5 million US dollars. I was there in 2001 right after the outage of 4 hours. All of their tankers couldn't leave the port, causing 4 hour-delay to its production. So they lost 20 million USD. This corporation's revenue is approx. 100 Billion US dollars a year.

With the above customer, they have to backup everything - Data centers, Computing resources (server, storage, network, electricity, cooling), including people who operate their computing services. Since they didn't have a functional secondary data center, they had to lose 20 million dollars on one not-so-fine day. Had they known the complication, their management would definitely agreed on building backups.

When you put the two words together, Disk-to-disk backup, it means making sure that you copy your data from one disk to another disk. D2D focuses on data redundancy alone. So it means that you will need redundant storage systems to do D2D.

D2D Requirement

1. Redundant storage system (to copy the production data to the secondary (redundant) storage system)
2. Network (data path from where the production data is to the secondary storage system)
3. Backup software (in most cases, backup software is required for ease of setup). But if the production system is not required to run 24x7. The customer may not need backup software. I've seen a customer who has scripts to shutdown the database, copy the data from production to secondary, then restart the database.

What Is The Differences Between D2D And File Copy?

Most D2D format is GNU tar format. It's one large file that would contain all backup'ed files. For example, you have 1000 documents in your "My Document" folder, when you do D2D backup, you'll get one large file called my-document.tar that contains all of your 1000 documents. GNU tar format is comparable to Zip file format. GNU tar supports compression, and encryption as standard.

So when you do D2D for all important files, you will get one large file. When you want to restore, you can simply extract files that you need from that large file.

D2D, in principle, is just having a backup file (GNU tar file) on another disk. So copying all files to another copy is not really D2D backup. Copying all files to another directory gives you the ability to access to your backup'ed files directly. In some sense, copying-all-the-files method is comparable to flash backup (Veritas term).

Flash backup is used when there are too many files to be backed up. For example, a customer of mine have 2 TB (2 Terabyte) of data. However, they are having problems with their backup. They couldn't finish their backup within 3 days. The reason? They have too many files. (20 million files). For a system to open a file, write the data to backup, then close a file. A lot of time is spent opening a file, closing a file. With 20 million files, they need to opt for a flash backup or flash copy. Because flash backup will copy the whole volume to another space block-by-block not by file. Their backup window is reduced from 3 days to 8 hours.

The advantage of flash backup is that the customer can browse their backed up files. Since flash backup also copies the FAT (File allocation table) that would allow their users to browse through the 'flashed' backup.

And What About What I Mentioned Yesterday About Storage Service Provider?

Storage Service Provider is is a company that provides storage space and related management services. SSPs also offer periodic backup and archiving and the ability to consolidate data from multiple company locations so that data can be effectively shared.

Storage Service Providers are not new. There were a lot of them during Dot-com. Back then there were Network Service Provider, Storage Service Provider, Internet Service Provider, and more providers than I ever remember.

During Dot-com days, there were a number of SSPs that tried to convince a number of customers to outsource their storage system to SSPs. They failed because the price (FC-AL technology was the only SAN technology then), the new market (noone would outsource their data to a new service provider), the downturn of Dot-Com (many IT companies were wiped out during those few years). SSPs that survived Dot-Com crash became Storage Software companies. These companies are Creekpath and Storability.

The next question is why now? Why SSP market is going to grow in the next couple of year?

1. Downturn of US economy is almost over (and some of the western economies)
2. Storage demand still grows at the same rate or more than ever before but IT budget only grows slightly (IDC report August 2005), so storage must be managed more efficiently
3. Business continuity requirement - sign up for a replication service is a good idea to satisfy this requirement. From a storage standpoint, data must be replicated so corporate data will be available at the time of disaster.
4. Regulatory requirement - SOX, HIPPA, ISO 17799, and more
5. New storage technologies that lower costs - virtualization, iSCSI (IP SAN), 10GigabitEthernet network, MPLS technology.

Why a company would outsource their storage/data to SSP? There are many companies without enough budget to build fault-tolerant and highly available network or even storage. These companies may want to sign up a service from SSP to manage these systems (network, storage) at their corporate office or at SSP's data center (point-of-presence). Since we've mentioned network, having enough bandwidth between sites is critical (MPLS is a good technology candidate).

Currently, SSP market is growing rapidly. All the big players are already in namely AT&T, MCI, Verizon, British Telecom, or even AOL(with xdrive.com purchase). In Thailand, we currently enjoy the market alone since we strategize to create our small monopoly. (Read Monopoly Rules)

SSP requires the following:
1. Data Center - 24x7 world-class data center is a must
2. Network connectivity - large network pipes are mandatory
3. Large storage pool - best if storage is cost effective
4. Storage feature - snapshot and replication capability is a plus
5. People/Process/Procedure - people understand the technology, process, procedure to manage and deliver SSP service

There are multiple service offerings: D2D backup, Replication(Disaster Recovery), recovery (in case a customer has a disaster), restore (in case of human error), archiving (for regulatory compliance), data migration (customer adding new system), outsourced data mining, or even business continuity test.

I have done all of service descriptions, service details, service delivery, processes, procedures, customer SLA, up until marketing collateral. This requires hundreds of hours to sit down with SSP to work out these detailed documents. This is a simple list of documents SSP will need in order to start offering SSP services.

Next issue, we'll talk about where and how to start your own Storage Service Provider?

Tuesday, February 07, 2006

Disk-to-disk Backup to Storage Service Provider

Earlier Days

Since the earliest days of computer technology, our forefathers (in Internet time) had to cope with computer breakdown (loss of data) differently that what we have to face today.

Believe it or not, from a study conducted by University of California at Berkeley stated that we (human) will create more data in the next three years than the data that we've created in the last 40,000 years.

I just heard this today and realized that it might not be entirely false. This is why:
1. Regulatory compliance for companies to store much more data - HIPPA, SOX, ISO 17799, and more.
2. Explosive data growth in consumer realm - digital cameras, digital video cameras, and more.
3. Increasing value of data especially ones created in the consumer realm - people are attached to what they created - digital pictures, digital arts...

Punch Card, Magnetic Tape, And Disk as Backup Media

Today we'll talk about Backup technology. Since the early days of computer, we learned to keep the backup data media in a vault somewhere in the case that we need to retrieve them. In 1950s, we learned to keep punch cards (kids these days may have never heard) in a vault. But these cards are made of paper. So it's likely that it could be eaten by insects, destroyed in the fire, or damaged by humidity/time. Shortly after punch-card, magnetic tape came into the picture in 1951 to be used as backup starting from the Univac that we all know. Since then our world of backup remains the same by relying on magnetic tape or tape cartridge or tape media.

When i mention that it remains the same, i mean every large corporations rely on tape as a single backup media/strategy. What most people don't know is the reliability for the tape backup system remains not much higher comparing to when it started.

Tape backup system's SLA (Service level agreement) remains at 77%. Or you can not restore approximated 23% of the time. This fact is unknown to most of the IT world. The reason that it fails 23% of the time is caused by the fact that tape media could be moved. 23% could come from human errors (losing the tapes, putting the wrong tape to restore, etc), media errors (tape media is always deteriorating because of heat and humidity), and other configuration errors.

Hard drive is still the only high SLA (99.99%) storage media in the world. Currently hard drive price is declining. Many backup software vendors now start to look at storage as an alternative. This is the beginning of disk-to-disk backup.

Disk-to-disk Backup Format

Disk-to-disk backup means that you backup your data from disk(primary storage) to another disk(secondary storage). While primary storage (FC-AL) price are normally much higher than secondary storage (iSCSI).

The question is that when you do disk-to-disk backup, does it mean that you have two drives with the same data? The answer is no. Backup software vendors support different formats. For example, Veritas NetBackup (now owned by Symantec) supports GNU tar format. Since GNU tar file format supports compression and encryption as standard. Veritas could extend these feature (compress, encrypt) to its products. Veritas BackupExec not only support tar format, it can also support straight volume copy (flash backup) such that end user can grab the targeted file from the backup disk pool.

Suitable HDD Technology with Disk-to-Disk Backup

Since disk-to-disk backup utilizes a large pool of storage, expensive hard drives (Fiber-Channel, SCSI) are not suitable. ATA (P-ATA), SATA or even SAS drives are better matches. As i've discussed about the seek time, transfer time comparison between FC-AL HDD and SATA HDD. Both SATA HDD (5-7ms seek time), FC-AL HDD (1-3ms seek time) have similar transfer time characteristic. D2D is normally a sequential read or write (read while restore, write while backup). So using SATA HDD in D2D application will not cause much performance impact comparing with FC-AL HDD.

What Else Is In Store For Disk-to-Disk Backup?

Since SATA HDD is the suitable hard drive technology. Most SATA drives are offered with iSCSI systems. SATA Storage + iSCSI technology enables D2D backup through IP network. So what's next in store for D2D is the ability to send backup traffic off-site through IP to another IP storage in another data center across the country.

So SMBs ,that do not want to buy their tape backup system and maintain their backup server, have the choice to call the storage service provider (backup recovery service provider) near them. This provider could be all the way across the country (east coast-> west coast in the U.S.). However, multi-country backup recovery service providers do not exist today due to international bandwidth costs much more than domestic bandwidth. Storage service providers are normally serving customers within a country (Verizon, AT&T, MCI, BT)

The storage service provider that I have the priviledge to work closely with is True Corporation. We've gone through the good part of 2005 together from inception to Storage service market. Now we're signing up customers in a bus load. True owns majority of fixed line in Bangkok, third-largest mobile operator, largest pay-tv operator, largest broadband provider, largest IDC in Thailand.

Monday, February 06, 2006

Storage Software development is for everyone (really?)

Today i've had a whole new experience yet again regarding software development. I've asked some of my guys to look out for storage application development for sometime now. It appears that we might be able to put it to a good use.

The easiest way to be in Storage business (storage brand - storage vendor) is... Can you guess? Yes, the easiest way to be in storage business is to find the outsourced partner who would manufacture your system for you. This reduces a lot of pressure from your R&D team to only worry about the quality of the product, not the manufacturing, not the supply chain, not the logistics. You just focus on making sure that your product feature is good enough to compete with the big players. None of the small storage companies are able to afford multi-million manufacturing facility. They all have to rely on one storage OEM to another. The common names in this Storage OEM are Xyratex and Dot Hill . There are many OEM vendors in Taiwan that i wouldn't want to name names. Xyratex is known to manufacture for a number of companies ranging from traditional FC storage vendors like NetApp to new iSCSI vendors like Intransa. Dot Hill is known to produce a bunch of FC storage devices for Sun MicroSystems.

So it's my job this year to look for an OEM vendor who's willing to open their architecture so that we at NorthPole can come up with NorthPole storage offerings. We plan to send out bunch of stuffs ranging from easy-to-use one-click storage for small-office-home-office (SOHO) until medium-end best-of-breed backup software/hardware for small-medium businesses (SMB).

We'll see whether the chief architect who's going to be in London for 5 months can pull this through while pulling a much harder workload. Else, i might have to make a deal with his son (donut) to pow wow with him to get this off the ground. Hopefully Donut has enough special mind trick to send to his slave. ;p Omm.. Omm.. Omm..

Tomorrow we'll go back to Storage technologies again. I have to contact donut's dad now. Omm.. Bow-wow-wow-wow.

Saturday, February 04, 2006

Information Life Cycle Management

Not so long ago, hard drive storage was too expensive to store much data. So computer scientist of those days (1990s) had to work out the IT environment that would co-exist with paper-based process. During dot-coms and the early 2000, hard drive pricing has come down dramatically and not to mention the new hard drive technologies (SAS, SATA - in previous post).

Since hard drive cost is much lower than before. Many organizations were trying to be a paper-less organization during the late 90s. Not knowing that many documents can not be digitized easily, these organization had to convert their paper documents by scanning analog (paper) data into digital (picture) data. Then most had to buy an OCR (Optical Character Recognition) solution to convert certain texts in the digital picture to be digital data. Moreover, these organizations have new challenges since 9/11. Most documents are now entitled to be kept up to 10 years. So these scanned (large picture) document are now entitled to be kept up to 10 years.

So this creates a whole new set of problems:
1. The ability to store those "important" data for a long period of time (10 years).
2. The ability to ensure that you can retrieve these "important" data at anytime either in the online storage or the archive storage.
3. If #1, #2 are not met, they are going to be penalized heavily.

To imagine the amount of data in the problem, let's vision a bank. In a normal bank's home mortgage loan application, the bank would need the pictures of the property, the map to the property, the applicant's financial statement, the applicant's credit history, the guarantor's credit history (if needed), and so on. Imagine that the bank now has to convert everything within a loan application to be digital and keep them for 10 years. In the early 90s, the bank would just filed these papers up, boxed them, and shipped them off to Timbuktu for safe-keeping. They only need to know what amount this applicant is due at what day every month once the application is approved. So the data being entered in the early 90s were much less. They might only put in a reason (240 characters) why this applicant is rejected in the computer for future reference.

Since most organizations are frenzy about digitizing everything, the data grows exponentially. Look at the bank loan example, in the early 90s, that could have meant a 100KB amount of data per applicant. Now, that same application will easily take tens of Megabytes because of digital pictures. I wouldn't argue that having everything online is much faster than going through mountains of loan application to find the map to the property. But it's a sample why we need "Information Life Cycle Management"

ILM (the short name) is part of the corporate information strategy. It outlines the whole value chain of data within an organization beginning from data creation, data distribution, data modification and maintenance, up until data disposition (delete). It will require abundant resources from an organization to set the policies, processes, and procedures around data.

Let's look back at the home morgage loan application, ILM will specify how the data is being created - what data is needed for a loan application - pictures (house, map to property), documents (digital - credit history, financial statement, guarantor's financial statement), and more. Once the bank recieves the data (loan application), ILM will specify who this data is going to. In this sample, the application will be forwarded to loan processing department and the appraisal department. .. The whole application process will run through until the end of the process. When the loan is approved. All documents will be forwarded to the archive waiting for the next retrieval. The archive could be cheap storate, tape media, or sometimes DVD. There are a lot of vendors selling specialized ILM solutions based on industry (financial, automotive, and more).

To simplify thoughts on ILM, you should ask yourself where would you put the data you rarely use? Cheap storage or expensive storage? If it's rarely used, you should put that data in the cheapest possible storage that could serve your access policy (how fast can you retrive that data). I.e. if you must retrieve that information within minutes, the media you moved the data to must be online (disk or tape media or DVD jukebox system). By mentioning online, it means that the tape media must be inside the tape library to be able to serve the requirement.

So ILM tends to save money for corporations, however, since it's a new technology, it rarely gives return on investment faster than 24 months.

So ask yourself today, do you want to keep all available data online?, how fast do you want to restore your data?

Thursday, February 02, 2006

iSCSI continues - The differences between iSCSI (IP SAN) with NAS (File Server)

iSCSI vendors provides better price performance, good security(CHAP authentication, IP SEC encryption), and yet is closing in on performance with the big (Fiber Channel) boys (EMC, HP, HDS, IBM, and more).

iSCSI is the technology to look at. It will give you the ability to do SAN (Storage Area Network) cheaply. Actually today I'd like to backtrack a little bit today. I've jumped ahead to write about iSCSI without saying much about what's out there in the world of storage.

There are three storage topologies in the world:
1. Direct Attached Storage Device (DASD)
2. Network Attached Storage (NAS)
3. Storage Area Network (SAN)

DASD topology is not limited to only internal hard disk drives, it also includes SCSI storage. The external storage that only connects to one server via SCSI cable. SCSI cable is big fat cable, the SCSI cable diameter is about the same with parallel cable (old printer cable). Since DAS or DASD is attached to one server only. That means DAS will only serve one server. If you only have a small number of servers, you will be fine with this topology.

Another example, ABC Corporation has 100+ servers. They have to monitor the free space on every server. If they run out of space, that server will crash, ABC system administrator will spend hours on getting that server back. Utilization is the key for using DAS - imagine this, they have one server that's using up 95% (utilization) of the disk space, the other server is using only 35%. They can not just swap hard drive between the two servers. They'll need to go to ABC finance to get the budget approved, then buy more storage for server with 95% utilization. And they can not do anything to the 35% Util server either.

In DAS environment, you'll find an average of 30-40% of storage (hard drive space) utilization. This means you can save big if you move all your storage together, then share the space.

NAS or File server is next. Any PC (windows, linux, etc..) or server today can serve as a file server. The ability to share files across PCs/servers - file-sharing - is the trademark of a file server. This gives the ability to utilize some space among machines. However, even though now ABC corporation can share some space from the previous 35% utilization server to others. But file server only provides file-based service. E-mail application, database application, and some block-based applications can not run on space provided by a file server.

In addition, managing 100 servers that provide file-server service is a nightmare. It's one big spaghetti all together. That's why we have to move to Storage Area Network (SAN).

SAN is an old idea. Moving all the storage together, then share the storage among servers. 95% util server can request more space online (without stopping application in some cases). 35% utilization server can be taken offline while ABC system administrator move it to a suitable size. Increase 35% util server (35GB usage on 100GB space) to be 70% util server (35GB on 50GB space). And take 50GB from 35% util server to give to 95% util server.

iSCSI and Fiber-Channel are SAN technologies. However since most people know file server (or NAS) that it also works with IP. Some might confuse the relationship between iSCSI and NAS.

NAS provides file-level service (copy, remove, delete, edit). So user can view, edit, delete a file on another machine. Whereas iSCSI provides block-level service, that server will see iSCSI service as a SCSI drive. If you open File Manager on the server, iSCSI service will appear as a drive. So user can put any files into that drive, but user from another machine will not be able to see that drive. Confusing?

Server will treat iSCSI service (space/volume) as a drive. SAN provides high-speed access to its data that would normally provide data reliability through RAID.

Tomorrow, we'll talk about the buzz words. ILM, etc.

Wednesday, February 01, 2006

iSCSI and its world domination strategy (2)

Since yesterday, i've met with a potential client from Japan who's going to implement iSCSI. This customer is a perfect example of the people who are currently evaluating and looking at iSCSI as a way to utilize the most of their resources (a.k.a. budget). Having installed Hitachi Data Systems' Fiber Channel system, this customer is convinced that iSCSI is the nice alternative. So he tried to use FalconStor to be the media server between the rest of the servers and HDS FC-AL system. This FalconStor software is useful in case that you have an outdated FC-AL system, you can turn this out-of-shape or out-of-form system into a new iSCSI-based system.

So the people of Fiber Channel technology, bring me your tired - your old FC-AL system. We can turn it to a new (entry-level) iSCSI system. This is pretty much due to the fact that this FC-AL system isn't fast to begin with. With an overhead from FalconStor, it's likely that this old tired FC-AL system can still be used as an entry-level iSCSI system serving Windows clients.

Let's continue our train of thought from yesterday, performance and security are the top two out of three that people would want to ask about any IT system (price, performance, security). You can only choose two out of three. If iSCSI system is inexpensive, does it mean that it's going to be either low performing, or lack of security?

The answers are no and no. Let's talk about performance perspective for a little while. iSCSI system is built on a newer hard drive technology. Namely SATA and SAS drives are in the driving seat of iSCSI technology. SATA stands for Serial ATA and SAS stands for Serial Attached SCSI. This means that newer the hard drive technology, the better it is, right? Yes and no. Yes, manufacturing capability is increasing, mean-time-between-failure (MTBF) are increasing among cheaper drives. However, the gap between medium-low market (SATA, SAS) and enterprise market (FC-AL drive) is still wide.

Most SATA drives are manufactured with 7,200 RPM, FC-AL drives come with 15,000 RPM. This translates to better seek time. But most SATA drives are either 2x or 3x in size. So the transfer time are almost the same (7200 RPM* 2x data <-> 15000RPM * x data). This leaves the gap to be the seek time.
Note: SATA drives come with 5-10 ms seek time while FC-AL drives come with 1-3 ms seek time.

Current development, there is a vendor who's producing 10,000 RPM SATA drives. This makes the gap between SATA and FC-AL drives smaller!

SAS drives are supposed to be the alternative to SCSI drives for most internal hard drive. However, iSCSI vendors are trying to bring these drives in based on the RPM (10,000) or even 15,000 RPM are coming. Once there is 15,000 RPM SAS drive, iSCSI will be in its full throttle.

After talking about the underlying layer (hard drive and its technology), storage system or storage array is just hard drive with cache. Most storage systems do not differ if they utilize similar design, similar standard. Hopefully we'll get to see the fastest and cheapest storage system in the same bundle from iSCSI.

Security is another issue that most of you will have to be concerned. Since its inception, iSCSI has been endowed with CHAP authentication mechanism and IP SEC encryption. These two industrial standards help iSCSI to expand in areas that security is needed.

Unlike iSCSI, FC-AL does not come with standard authentication mechanism nor encryption. FC-AL security is based on obscurity.

In addition, iSCSI also has another strong support from IP technology, if you are the paranoid type, you can choose any IP security technology such as you want to create a VPN tunnel between your server and iSCSI storage. While you also elect to implement CHAP for authenticating if the right server with the right key to gain access to the right data. Also you can choose to implement IP SEC within VPN (in the case you can not imagine that your VPN breaks loose, you're still protected by IP SEC). I call this double bag.

Since iSCSI adoption rate is climbing exponentially, it will be a part of your corporate planning strategy this year.

Peter on Storage