ITC suuports a Hierarchical Storage Management (HSM) service in order to provide large amounts of permanent storage space in a cost-effective manner. The HSM provides network access using NFS or CIFS (SMB) to archival storage which is a combination of on-line disk storage and tape storage. Directories are established as needed by faculty or graduate students for achival storage.
The HSM uses Quantum (formerly ADIC) StorNext filesystem software, with a large disk filesystem allocated on ITC's storage network, and Quantum (ADIC) i2000 automated tape libraries, containing LTO tape drives. The HSM is administered by the ITC Enterprise Systems group.
The HSM should be used to store a archives of infrequently accessed datasets, typically large amounts of data. The HSM should not be used to store large numbers of small files with the expectation that all or most of them can be retrieved in a reasonable amount of time. Because the bulk of the HSM storage is tape resident, it can take a significant amount of time to retrieve a file which is not in the online disk storage filesystem. With a large number of small files, the time to retrieve the individual data files can be extremely long.
Rather than storing a large number of small files, such files should be combined into one or more archive files, using a utility such as tar or cpio. The archive files should then be stored. When it is time to recall a collection of files, the archive files can be recalled relatively quickly, and the desired files extracted.
The HSM should not be used to store files that are used regularly. Otherwise, a recall operation may often be necessary. Unproductive "wars" between applications can occur as each application attempts to use more space than is available in a filesystem.
Do not expect to be able to retrieve arbitrarily large amounts of data and have all the data accessible at the same time. The size of the online disk filesystem is currently 500 GB. If users are actively trying to retrieve more than this capacity, as data are being retrieved from tape, other data just retrieved from tape may have to be released from disk to make room, resulting in repeated attempts to bring all the data back to disk.
On Unix or Linux systems, a user accesses the HSM filesystem through NFS. On ITC servers, this access uses the automounter, and the user directory is mounted when /net/hsm/<user-id> is referenced. On Windows or Mac OS X systems, the user directory may be mapped as network attached storage. On Windows, use the share name
\\hsm.itc.virginia.edu\<user-id>
On Mac OS X, use
smb://hsm.itc.virginia.edu/<user-id>
As data are written into the filesystem, the data are being written
to disk storage. Later, the HSM will begin the task of copying new
data in the online disk storage to tapes. Once two tape copies of
the data have been completed, when the demand for space in the
filesystem reaches a threshold where space needs to be released,
the HSM begins the process of migrating the dataset from online
storage to tape storage. What is left are pointers to the tape
storage; it looks like a regular file, and when the file is next
opened, the HSM begins the process of migrating the data back into
the online disk storage.
Data retrieval (recall) is automatically initiated as soon as
a migrated file is accessed. The time to recall depends on
availability of tape drives, location on the tape, and the
amount of data that must be migrated.
On some machines, it is possible for access to a migrated file to fail. For example, on Solaris you may see the message
file temporarily unavailable on the server, retrying...
A recall operation is initiated
but the accessing program does not wait for the operation
to complete. Such failures may be eliminated by
causing a recall to be done before the program is run.
/common/uva/bin/hsmread /path/to/file1 /path/to/file2 ...
may be used to repeatedly attempt to read the last byte of each file
until the read is successful, at which time
the entire file is disk resident. While this should eliminate
most failures, it is possible that after a file is recalled, it
is again migrated because other large files have subsequently
been recalled.
When disk space becomes low and a file is migrated to the disk cache or to tape, the file is replaced with a "stub file". A stub file appears to Unix commands such as ls to be the original file; it is actually a small file containing the first few blocks of the original file. If an attempt is made to access more than the amount of data in the stub, a recall is performed, and all of the original file contents are restored.
In an HSM-managed filesystem, modifications of files can be slow and awkward because they may require retrievals from tape. Furthermore, storage of many small files is not advised because a separate tape access may be needed for the retrieval of each file. An HSM filesystem is therefore best suited to storing static large files. In reality, occasional modifications or removals may be expected. However, the frequency of such changes, and the percentage of total HSM data affected, is expected to be low.