Having gained access to SCARF, it is useful to take a moment to familiarise yourself with the general filesystem layout and procedures that are used on the cluster. These factors can directly influence your work on the cluster, including the speed of execution of your jobs, the amount of storage available to you, and whether your data will be recoverable in the event of a hardware failure.
Your /home directory
As in any other UNIX based operating system, you will automatically start your login session in your /home directory. This directory is exported over the whole cluster, and is visible from any of the nodes.
Each user has a file limit of 80GB in their home directory, which is enforced by the operating system. If you exceed this limit by a small amount for a brief period of time, but then delete enough data to bring it back into quota, then you will not have any problems. However, if you continue to exceed the limit, you will find that you become unable to add any more files, and that you will be unable to run any jobs. In this circumstance, the only option is to reduce the amount of space that you are using.
You can check your current usage and quota using e.g. the command
The home directories are backed up on a nightly basis.
The /work directories
The /work directories are intended to allow users access to very large amounts of storage, so that they are able to have access to things like large datasets or temporary files that would not fit in their home directories. When you register, you are not automatically given a /work directory. However you are free to create one under the directory corresponding to the group you have been placed in - use the command
and it will be the first entry listed after the colon
The /work directory is exported over the whole cluster, and is visible from any of the nodes.
It is critically important to note that the data in the /work directories is NEVER backed up. As such, vital results or data should not be stored here. However, the data is stored in our Pansas file system which is capable of reconstructing files if one of the underlying storage units fails, so there is some measure of resiliency.
The /work/scratch and /tmp directories
The /work/scratch directory is a temporary filespace that is shared across the whole cluster, to allow parallel jobs to access the same files over the course of their execution. This directory uses the Panasas high speed parallel file system. Please create a subdirectory eg /work/scratch/scarf011 and run your jobs there.
In contrast, the /tmp directories are all local directories, one per node. These should be used to store temporary data for a job that only needs to be read by the local process. Although it is not distributed across the cluster, the benefit of using this space is somewhat increased performance compared to /work/scratch. Please make sure that your jobs delete any files in /tmp when they complete.
Data in these directories is temporary and may be arbitrarily removed at any point once your job has finished running. Do not use them to store important output.
Full backups of the SCARF filestore occur weekly, together with daily incremental dumps, on a two weekly cycle. The primary purpose of backups is to enable reconstruction of the filesystem in the event of total failure of the disks, but requests to recover user files that are deleted in error will be satisfied as soon as possible.
Please note that the only user data that is backed up is the contents of your /home directory.
|File system||Availability||Backups||Available size||Cleanup policy|
|Cluster wide||Weekly backup to tape
Daily Panasas snapshot, kept for 7 days
|80 Gigabytes quota per user||No data will be deleted without consultation|
|Cluster wide||No backups||25 T erabytes, though some groups have their own dedicated areas||No data will be deleted without consultation|
|Cluster wide||No backups||30 Terabytes shared across all users||Data may be deleted without consultation if no jobs for the owner are currently running|
|Local to host||No backups||
At least 100 Gigabye(dependent on hardware generation)
|Data may be deleted without consultation if no jobs for the owner are currently running on the affected host|