The HSM Facility

Contents

 

Introduction

Storing large amounts of data in a home directory may cause a filesystem full condition, or the quota for the home directory to be exceeded. The traditional solutions are to buy more disk space, or to store the data on magnetic tapes and retrieve it as needed with manual operator assistance. The former is expensive; the latter requires careful attention to tape identification (label checking), keeping track of locations of files on tape, and requesting tape mounts by an operator. UVa provides an HSM as an alternative to these.

UVa's Hierarchical Storage Management (HSM) system provides access to permanent storage for large amounts of data. The HSM is available to UVa faculty and staff. The HSM consists of an IBM 3494 tape robot controlled by an RS6000 model F50 running Tivoli Storage Manager software. The tape robot contains 204 data tapes, each with a 40 GB (gigabyte) native capacity, for a total capacity of somewhere between 8 terabytes (TB) and 24 TB, depending on data compression. The system uses robotics for automatic retrieval and storage of tapes.

Files are copied to tape in a migration process which is invisible to the user. When an attempt is made to access a migrated file, it is automatically retrieved (recalled). The recall may take several minutes while a tape is mounted by the robot, the tape is positioned, and the file is copied to disk.

The HSM should not be used to store large numbers of files with the expectation that all or most of them can be retrieved in a reasonable amount of time. It can take a significant amount of time to recall a file and a distressingly long amount of time to recall many files; for example, recalling 1000 small files would take over a day. Rather than storing a large number of files, the files should be combined into a few archive files using a utility such as tar or cpio. The archive files should then be stored. When it is time to recall a collection of files, the archive files can be recalled relatively quickly, and the desired files extracted. See the section Using the HSM for examples.

The HSM should not be used to store files that are used regularly. Otherwise, a recall operation may often be necessary.

The maximum size of any one file is 2 GB. The HSM filesystem CANNOT accomodate files of larger size. If you wish to store files that are larger than 2 GB, you must first split them up. For example, if one had a 5 GB file named huge.tar, one would break it up into smaller pieces using:

split -b 2000m huge.tar CHUNK
This command splits huge.tar in to two 2-GB files named CHUNKaa and CHUNKab, and one 1-GB file named CHUNKac. If huge.tar consisted of many subfiles, the split command would most likely break apart one of those subfiles, so one should regenerate the original file before extracting any of the subfiles:
cat CHUNK*

and in the case of a tar file, one could extract the subfiles in the same step:
cat CHUNK* | tar xf - [ subfile1 subfile2 ... ]

Do not expect to be able to retrieve arbitrarily large amounts of data at one time. The size of each user-accessible space-managed filesystem is 20 GB. If 30 GB of data are stored in 15 files of 2 GB each, the retrieval of up to nine files would not be a problem. However, if an attempt is made to retrieve all the files, nine, say, would be retrieved successfully, but as the filesystem fills up, the retrieval of the tenth would cause one of the first nine to be migrated again to the disk cache (or even to tape), the retrieval of the eleventh would cause one of the first ten to be migrated, etc. The end result would be that only the last eight or nine files would be resident in the filesystem. In addition, the whole process would be slowed by the wasted retrieval and re-archival of files.

The proper way to deal with the recall of the 15 2-GB files is to access only about five (at most nine) of the files at a time. If it really is necessary to have all 30 GB in the filesystem for simultaneous use, then the recalled files must be copied to another computer, again a few at a time, since the filesystems on the HSM are not large enough to accomodate the application.

Click here for more details about these issues.

Backup of HSM Data

In order to preserve user data in case of hardware failure or accidental removal, files written into an HSM-managed filesystem are backed up nightly to an offsite location. The HSM has been configured to require the existence of a backup of a file before the file can be migrated from the filesystem. A downside to this requirement is that it limits the rate at which data can be written into the filesystem: since an HSM filesystem can contain only 20 Gb at a time, no more than 20 Gb can be written into it before a backup must be done. Backups are done once per day, so a maximum of 20 Gb of new or modified data can be put into the filesystem in a day. Exceeding this rate causes all space in the filesystem to be exhausted until the next backup is done.

The following is a typical scenario: From a remote computer, file F is created or modified within an HSM-managed filesystem. That night, F is backed up offsite. Subsequently, if filesystem space is needed, F can be migrated to the disk cache or to tape. Once the backup is performed, two copies of F exist: one on disk, the other offsite. In addition, duplicate copies of migrated files are maintained in the HSM, so soon after F is migrated three copies of F exist. If a file is accidentally removed, it can be restored from the backups. Send e-mail to systems@Virginia.EDU if a file restore is needed.

Many Unix filesystems on computers administered by ITC are backed up with the ITC Unix backup software. The HSM does not use this software; commercial Tivoli backup software is used instead, which provides the following:

For details on point-in-time restores, see Point-in-Time Restores

Requesting Space on the HSM

To get space on the HSM e-mail Res-Consult@Virginia.EDU with the following information:

It is necessary to have a blue.unix account before you can get a HSM directory. To get a blue.unix account, go to the ITCWeb User Accounts creation webpage.

Faculty and staff are granted space on the HSM with a maximum of 20 GB upon request to Res-Consult@Virginia.EDU

Graduate Students and Groups

If you are a graduate student or part of a research group it may be best to have one HSM directory, the project leader or supervisor's, with each person being part of a Unix group that has read and write permissions to that directory. The supervisor can create a sub-directory for each of the group members and set read and write permissions for them to that directory using the UNIX chmod command. Each group member can also use chmod to change the permissions to their own files. For example the following will take away read privileges from the group and other
chmod go-r filename

If someone leaves the group the project leader can still have access to their data while the person leaving can request a copy on CD or Magnetic Tape.

To have a HSM directory configured this way email Res-Consult@Virginia.EDU with the following details:

People can be added to and removed from the Unix group on request.

Using the HSM

If you have been granted space on the HSM then you should have a directory named /net/hsm/user_id on blue.unix or on whatever computer you requested the HSM directory be connected. To change to this directory use cd /net/hsm/user_id, for example, if your user-id is mst3k use the command:
cd /net/hsm/mst3k

In order to use the HSM from a non-ITC Unix computer, you or your system administrator must first make a local directory and mount the HSM filesystem to that directory. After receiving confirmation that your computer has been added, become root, create the directory /net/hsm/mst3k, then mount the HSM to this directory with the command

mount -t nfs hsm.itc.virginia.edu:/net/hsm/mst3k /net/hsm/mst3k
If you wish to do this every time the computer is booted, add the appropriate line to your /etc/fstab or equivalent file.

Getting and Using Files

To place a file into the HSM simply move or copy it into this directory. For example to copy the file foo from /home/mst3k to the HSM directory
cp /home/mst3k/foo /net/hsm/mst3k/foo

Suppose the directory /home/mst3k/smallfiles contains various small files. The commands

cd /home/mst3k
tar cf /net/hsm/mst3k/tar.smallfiles smallfiles

combine the contents of the directory smallfiles including any subdirectories, into the single file tar.smallfiles which is stored in the HSM. The entire directory contents may later be retrieved with a single tape operation by accessing tar.smallfiles using commands such as

cd /home/mst3k/tmp
tar xvpf /net/hsm/mst3k/tar.smallfiles

When accessing a file that has been placed into the HSM, expect an initial delay of as much as a few minutes if the file has been migrated to tape. However often the delay is only a few seconds long. For example, it is possible to move an old mail folder to your HSM directory and later to examine the folder using pine. If the mail folder my_old_mail is stored in the HSM directory /net/hsm/ms3tk then typing the command:

pine -f /net/hsm/ms3tk/my_old_mail
will open the folder. Again, expect a brief delay.

To use a file in the HSM as an input file for a program you should first use the command /common/uva/bin/hsmread on the file you want to use before starting the program to guarantee that the file will be on disk and available without delay. So far we have found that the delays, when they occur, are less than five minutes. To be sure that the file foo is on disk and not on tape, use the command (you must type the full path command name as shown below):

/common/uva/bin/hsmread foo

The HSM should not be used to store large temporary files, e.g. input files for programs. That's what /tmp, /bigtmp, and /longtmp are to be used for. Use of the HSM for temporary files can lead to substantial performance problems.

The Public Directory

The HSM has a public directory where data can be stored that can be read by anyone, all users have "read only" access to this area. This is useful for users who wish to make their data publicly available. The public directory is at:

/net/hsm/public

Data in this directory is placed there by ITC. Users who wish to have their data in the public directory should e-mail Res-Consult@Virginia.EDU The data in this area is from ICPSR, CRSP, and Compustat

Requesting a Tape Copy

Users can request that a tape be made of their data in the HSM (and data from their home directory, too).

The tape will be written in "tar" format. Tape copies should be made only onto Exabyte 5 GB or 20 GB, 8mm-tape cartridges. It is your responsibility to supply these cartridges if you request a tape copy. Exabyte 8mm tapes can be purchased at Cavalier Computers.

ITC will notify you by electronic mail when the backup is complete so you can pick up the tape. ITC will not be responsible for tapes that have not been picked up within a week without specific prior arrangement.

If a user wishes to copy data from a tape to the HSM then please see ITC's Tape Reading Procedures.

Mapping the HSM Drive in Windows

Check the Configuration

In order to map the HSM directory as a Network Drive, your computer must be configured correctly for the network. Review the configuration for Windows 98/ME or Windows 2000/XP as appropriate.

Modify DNS Search Suffixes

Once you are sure that your network is configured properly, one modification must be made to the setup.

For Windows 98/ME:

  1. If necessary, open the Network window by clicking the Windows Start button, clicking Settings, clicking Control Panel, and double-clicking the Network icon.
  2. Click the TCP/IP item (e.g., "TCP/IP->PCI Fast Ethernet Adapter" or something similar that represents your Ethernet adapter card) to highlight it, and click the Properties button (see figure below). A TCP/IP Properties window appears. Click the DNS Configuration tab near the top of the window.
  3. Click in the Domain Suffix Search Order field, and type itc.virginia.edu in lower case letters and click Add. Do NOT remove virginia.edu from the Search Order field list. Both virginia.edu and itc.virginia.edu should be visible in the text box.
  4. Click OK. (If you cannot see an OK button, do not click the button! All changes would be lost. Instead, drag the TCP/IP Properties window by its blue title bar until you can see the OK button, then click it.)
For Windows 2000/XP:
  1. Right click on My Network Places and select properties.
  2. Choose the appropriate connection, usually Local Area Connection.
  3. Select properties. Highlight Internet Protocol(TCP/IP), and select Properties again.
  4. A Properties window will be displayed. Now select Advanced. Another diaglog windows will appear.
  5. Select the DNS tab. Under Append these DNS suffixes (in order), you should see virginia.edu in the text box. If not review the steps for network configuration.
  6. Under Append these DNS suffixes, choose Add. For Domain suffix enter "itc.virginia.edu" and select Add.
  7. Click OK on the sequence of dialog boxes.

Map the HSM Directory

Mapping the HSM directory as a Network Drive requires the enabling of clear text passwords.
To map the HSM directory as a Network Drive on Windows 2000, XP, or Windows 98:
  1. Right click on "My Network Places" (Network Neighborhood) or the "My Computer" icon and choose Map Network Drive from the menu.
  2. In the Drive window that appears, be sure to select a drive letter that is not in use.
  3. In the folder window enter \\HSM\mst3k (instead of "mst3k" use your U.Va. computing id). If you are a member of a group and not the owner of the top HSM folder, then you will enter the additional path elements that you must use ordinarily in order to arrive at your folder. E.g., you would use the name \\HSM\mst3k\gno\XXX9X if the owner is "mst3k", the group name is gno, and your folder name is XXX9X.
  4. If you're using Windows 2000 or XP and you are not logged into your computer with your U.Va. computing id, click on the "Connect using a different user name" link below the folder box. If you're using Windows 98, you must be logged into your Windows 98 desktop using your U.Va. computing id in order to connect to the HSM using this method.
  5. Click on the Finish button. You will be prompted to enter your U.Va. computing id and password. Enter the U.Va. computing id and password you use for your blue.unix account. An Explorer window of your HSM directory should appear.
To disconnect the HSM directory, right click on the icon for the drive and select

Getting Help

Additional help with the HSM can be obtained by contacting the Research Computing Support Center by e-mailing Res-Consult@Virginia.EDU or telephoning 243-8800 or coming to the RCSC in Room 244 Wilson Hall between the hours 9:00 AM and 5:00 PM Monday through Friday. Disconnect.

© 2008 by the Rector and Visitors of the University of Virginia.

The information contained on the University of Virginia’s Department of Information Technology and Communication (ITC) website is provided as a public service with the understanding that ITC makes no representations or warranties, either expressed or implied, concerning the accuracy, completeness, reliability or suitability of the information, including warrantees of title, non-infringement of copyright or patent rights of others. These pages are expected to represent the University of Virginia community and the State of Virginia in a professional manner in accordance with the University of Virginia’s Computing Policies.