Storing large amounts of data in a home directory may cause a filesystem full condition, or the quota for the home directory to be exceeded. The traditional solutions are to buy more disk space, or to store the data on magnetic tapes and retrieve it as needed with manual operator assistance. The former is expensive; the latter requires careful attention to tape identification (label checking), keeping track of locations of files on tape, and requesting tape mounts by an operator. UVa provides an HSM as an alternative to these.
UVa's Hierarchical Storage Management (HSM) system provides access to permanent storage for large amounts of data. The HSM is available to UVa faculty and staff. The HSM consists of an IBM 3494 tape robot controlled by an RS6000 model F50 running Tivoli Storage Manager software. The tape robot contains 204 data tapes, each with a 40 GB (gigabyte) native capacity, for a total capacity of somewhere between 8 terabytes (TB) and 24 TB, depending on data compression. The system uses robotics for automatic retrieval and storage of tapes.
Files are copied to tape in a migration process which is invisible to the user. When an attempt is made to access a migrated file, it is automatically retrieved (recalled). The recall may take several minutes while a tape is mounted by the robot, the tape is positioned, and the file is copied to disk.
The HSM should not be used to store large numbers of files with the expectation that all or most of them can be retrieved in a reasonable amount of time. It can take a significant amount of time to recall a file and a distressingly long amount of time to recall many files; for example, recalling 1000 small files would take over a day. Rather than storing a large number of files, the files should be combined into a few archive files using a utility such as tar or cpio. The archive files should then be stored. When it is time to recall a collection of files, the archive files can be recalled relatively quickly, and the desired files extracted. See the section Using the HSM for examples.
The HSM should not be used to store files that are used regularly. Otherwise, a recall operation may often be necessary.
The maximum size of any one file is 2 GB. The HSM filesystem CANNOT accomodate files of larger size. If you wish to store files that are larger than 2 GB, you must first split them up. For example, if one had a 5 GB file named huge.tar, one would break it up into smaller pieces using:
split -b 2000m huge.tar CHUNKThis command splits huge.tar in to two 2-GB files named CHUNKaa and CHUNKab, and one 1-GB file named CHUNKac. If huge.tar consisted of many subfiles, the split command would most likely break apart one of those subfiles, so one should regenerate the original file before extracting any of the subfiles:
cat CHUNK*and in the case of a tar file, one could extract the subfiles in the same step:
cat CHUNK* | tar xf - [ subfile1 subfile2 ... ]
Do not expect to be able to retrieve arbitrarily large amounts of data at one time. The size of each user-accessible space-managed filesystem is 20 GB. If 30 GB of data are stored in 15 files of 2 GB each, the retrieval of up to nine files would not be a problem. However, if an attempt is made to retrieve all the files, nine, say, would be retrieved successfully, but as the filesystem fills up, the retrieval of the tenth would cause one of the first nine to be migrated again to the disk cache (or even to tape), the retrieval of the eleventh would cause one of the first ten to be migrated, etc. The end result would be that only the last eight or nine files would be resident in the filesystem. In addition, the whole process would be slowed by the wasted retrieval and re-archival of files.
The proper way to deal with the recall of the 15 2-GB files is to access only about five (at most nine) of the files at a time. If it really is necessary to have all 30 GB in the filesystem for simultaneous use, then the recalled files must be copied to another computer, again a few at a time, since the filesystems on the HSM are not large enough to accomodate the application.
Click here for more details about these issues.
In order to preserve user data in case of hardware failure or accidental removal, files written into an HSM-managed filesystem are backed up nightly to an offsite location. The HSM has been configured to require the existence of a backup of a file before the file can be migrated from the filesystem. A downside to this requirement is that it limits the rate at which data can be written into the filesystem: since an HSM filesystem can contain only 20 Gb at a time, no more than 20 Gb can be written into it before a backup must be done. Backups are done once per day, so a maximum of 20 Gb of new or modified data can be put into the filesystem in a day. Exceeding this rate causes all space in the filesystem to be exhausted until the next backup is done.
The following is a typical scenario: From a remote computer, file F is created or modified within an HSM-managed filesystem. That night, F is backed up offsite. Subsequently, if filesystem space is needed, F can be migrated to the disk cache or to tape. Once the backup is performed, two copies of F exist: one on disk, the other offsite. In addition, duplicate copies of migrated files are maintained in the HSM, so soon after F is migrated three copies of F exist. If a file is accidentally removed, it can be restored from the backups. Send e-mail to systems@Virginia.EDU if a file restore is needed.
Many Unix filesystems on computers administered by ITC are backed up with the ITC Unix backup software. The HSM does not use this software; commercial Tivoli backup software is used instead, which provides the following:
To get space on the HSM e-mail Res-Consult@Virginia.EDU with the following information:
It is necessary to have a blue.unix account before you can get a HSM directory. To get a blue.unix account, go to the ITCWeb User Accounts creation webpage.
Faculty and staff are granted space on the HSM with a maximum of 20 GB upon request to Res-Consult@Virginia.EDU
chmod go-r filename
If someone leaves the group the project leader can still have access to their data while the person leaving can request a copy on CD or Magnetic Tape.
To have a HSM directory configured this way email Res-Consult@Virginia.EDU with the following details:
People can be added to and removed from the Unix group on request.
cd /net/hsm/mst3k
In order to use the HSM from a non-ITC Unix computer, you or your system administrator must first make a local directory and mount the HSM filesystem to that directory. After receiving confirmation that your computer has been added, become root, create the directory /net/hsm/mst3k, then mount the HSM to this directory with the command
mount -t nfs hsm.itc.virginia.edu:/net/hsm/mst3k /net/hsm/mst3kIf you wish to do this every time the computer is booted, add the appropriate line to your /etc/fstab or equivalent file.
cp /home/mst3k/foo /net/hsm/mst3k/foo
Suppose the directory /home/mst3k/smallfiles contains various small files. The commands
cd /home/mst3k
tar cf /net/hsm/mst3k/tar.smallfiles smallfiles
combine the contents of the directory smallfiles including any subdirectories, into the single file tar.smallfiles which is stored in the HSM. The entire directory contents may later be retrieved with a single tape operation by accessing tar.smallfiles using commands such as
cd /home/mst3k/tmp
tar xvpf /net/hsm/mst3k/tar.smallfiles
When accessing a file that has been placed into the HSM, expect an initial delay of as much as a few minutes if the file has been migrated to tape. However often the delay is only a few seconds long. For example, it is possible to move an old mail folder to your HSM directory and later to examine the folder using pine. If the mail folder my_old_mail is stored in the HSM directory /net/hsm/ms3tk then typing the command:
pine -f /net/hsm/ms3tk/my_old_mailwill open the folder. Again, expect a brief delay.
To use a file in the HSM as an input file for a program you should first use the command /common/uva/bin/hsmread on the file you want to use before starting the program to guarantee that the file will be on disk and available without delay. So far we have found that the delays, when they occur, are less than five minutes. To be sure that the file foo is on disk and not on tape, use the command (you must type the full path command name as shown below):
/common/uva/bin/hsmread foo
The HSM should not be used to store large temporary files, e.g. input files for programs. That's what /tmp, /bigtmp, and /longtmp are to be used for. Use of the HSM for temporary files can lead to substantial performance problems.
The HSM has a public directory where data can be stored that can be read by anyone, all users have "read only" access to this area. This is useful for users who wish to make their data publicly available. The public directory is at:
/net/hsm/public
Data in this directory is placed there by ITC. Users who wish to have their data in the public directory should e-mail Res-Consult@Virginia.EDU The data in this area is from ICPSR, CRSP, and Compustat
Users can request that a tape be made of their data in the HSM (and data from their home directory, too).
The tape will be written in "tar" format. Tape copies should be made only onto Exabyte 5 GB or 20 GB, 8mm-tape cartridges. It is your responsibility to supply these cartridges if you request a tape copy. Exabyte 8mm tapes can be purchased at Cavalier Computers.
ITC will notify you by electronic mail when the backup is complete so you can pick up the tape. ITC will not be responsible for tapes that have not been picked up within a week without specific prior arrangement.
If a user wishes to copy data from a tape to the HSM then please see ITC's Tape Reading Procedures.
In order to map the HSM directory as a Network Drive, your computer must be configured correctly for the network. Review the configuration for Windows 98/ME or Windows 2000/XP as appropriate.
Once you are sure that your network is configured properly, one modification
must be made to the setup.
For Windows 98/ME: