Data and storage on Balena

Overview:

  • Teaching: 10 min
  • Exercises: 5 min

Questions

  • Where should I keep my files?
  • What are $HOME and $SCRATCH and how should I use them?
  • Where should I run my jobs?

Objectives

  • Know the different characteristics and use of the different filesystems
  • Understand that $HOME should be used for important data/code/scripts that you want backed up
  • Understand that $SCRATCH should be used for:
    • temporary storage
    • storing large datasets for calculations
    • running calculations
  • Know that you can access your 'H-drive' using $BUCSHOME

Where should I keep my data?

Balena has two file systems: /home and /beegfs/scratch with very different characteristics and uses:

Filesystems /home /beegfs/scratch
User directories /home/X/username /beegfs/scratch/user/X/username
Environment Variables \$HOME \$SCRATCH
Total capacity ~50TB ~660TB
User quota 5 GB Unlimited *
Performance (login node) <500MB/sec 1.2GB/sec (large files)
Performance (compute node) <100MB/sec 1.2GB/sec
Aggregate BW for all users in excess of 10GB/secs
Data policy Backed up Not backed up

*While user quota is unlimited on the scratch storage you should:

  1. ensure that you have the capacity elsewhere to back up any data you rely on for your research
  2. you should back up essential data at the earliest opoprtunity
  3. you should remove data as soon as possible upon completion of jobs using it
  4. remember that the scratch storage is not backed up

Scratch storage

  1. The scratch storage is not backed up
  2. The scratch storage is not backed up
  3. The scratch storage is not backed up

In the following cells where we are demonstrating bash commands you may see a first line:

%%bash2

Don't forget that this line should be ignored, it's purpose is to help use generate the tutorial material, you do not need to type or execute and if you do so will get a command not found error or something to that effect.

Let's explore this now. Once you have successfully logged on try the following commands:

Command:

pwd

Output:

/home/q/rjg20

Exercise: Try these commands

echo $HOME
echo $SCRATCH
cd $SCRATCH
pwd

Go home

How can we use the environment variables we've used above to return to our home directory?

Solution

Where should I keep my files and data?

The $HOME filesystem could be used e.g.:

  1. Template jobscripts
  2. Code/scripts for your calculations

The $SCRATCH fielsystem should be used e.g.:

  1. Large datasets for calculations
  2. Working directory for your jobs

If you are in your $HOME directory there is also a directory link scratch. We will explore this further when we start running jobs in the following sections.

Moving data to and from Balena

If you use the command line on linux you can use scp to move files from your local machine to and from Balena. The command is used in a similar way to cp, for instance to copy a file from your current directory to your scratch area:

scp local_file balena.bath.ac.uk:~/scratch

Note that this assumes that you are using your University username on your local machine. If you are not then you need to include this in the command. For instance to copy a file from your current directory on your local machine to Balena:

scp username@balena.bath.ac.uk/my_data .

Transfering data

NB. Due to the recent security incident you will not currently be able to copy data from Balena to another machine off campus, as in the second example.

You could also consider using packages such as rsync for synchronising data back to your local machine. Used properly this wcan greatly help managing your data and avoid creating multiple backups of your work.

If you prefer to use graphical interfaces tools such as winscp or filezilla offer similar functionality.

Quota and disk usage

We saw earlier that you have a 5GB quota for your home area. Check your home area quota using the following commands:-

$ quota
$ quota -s

You can also use the df command we saw earlier to check partition size and usage information

$ df -h

$BUCSHOME

Besides $HOME and $SCRATCH another helper environment variable is $BUCSHOME. Run the following command and inspect the output to see if you recognise where $BUCSHOME points:

$ ls $BUCSHOME

Hopefully you can identify this as the H0drive or if you use it, your home space on linux.bath.ac.uk. This can be useful for moving data between data between your dekstop/laptop and Balena. Depending on how it was set up we may also be able to provide access to your X-drive but you will need to contact hpc-support@bath.ac.uk to see if this is possible.

Note however that $BUCSHOME is not available to jobs running on Balena, so you need to move any data before or after these have run.

Copying the course material

Copy across the files required for the exercises to your BeeGFS scratch filesystem:

cd $SCRATCH
cp /beegfs/scratch/group/training/balena-intro.tar.gz ./

Note: The “ ./ ” or “dot-slash” at the end of the line indicates the current directory.

Now unzip the files ready for the lesson:

tar xvfz balena-intro.tar.gz
cd balena-intro
ls -l

Make sure that this exercise is completed successfully before continuing as we will need the material later in the lesson.

Key Points:

  • $HOME should be used for important data/code/scripts that you want backed up
  • $SCRATCH should be used for running calculations
  • A range of command line and graphical tools are available to help you move data to and from Balena