Balena Architecture

Overview:

  • Teaching: 10 min
  • Exercises: 5 min

Questions

  • Is there more to Balena than just a collection of PCs?
  • What are all those cables for?
  • Why are there two filesystems and why do they have different uses?

Objectives

  • Know that HPCs consist of a collections of nodes each similar to normal desktop/laptops
  • Know that a high speed network allows these to work on the same problem and access fast storage
  • Understand that different nodes have different purposes to help enable your research
  • Jobs a managed by the scheduler, they do not run immediately

The following architecture diagram illustrates the priciple components of Balena. Most HPC services will have a boradly similar structure:

balena-arch.png

The key thing to understand is that HPC are a collection of resources similar to what you have in your desktop or laptop machine, typically called nodes. The are different sets of nodes each performing a different purpose. When you access Balena you don't go straight to the compute nodes you land on a login node. Here you can submit jobs to a scheduler, which assigns jobs to the compute nodes. The scheduler runs on master nodes which are also used to manage the system, maintenance and fix any issues.

The compute are connected by a high speed network which allows several or many nodes to work together on the same problem. Most of the nodes are the similar but to target specific workloads we also have some with high memory and some with GPU accelerators. As there can often be a queue of user jobs waiting to run we also have nodes reserved for quick turn around and interactive testing to allow researchers developing software to work more efficiently. Because the nodes all need to access the same storage this is separate, and as we have already seen, in the case of Balena has two component, $HOME and $STORAGE.

Some HPC problems require access to large datasets and reading and writing that data can be a significant bottleneck. Therefore the $STORAGE is also connected to the high speed network. Making backups of this storage would be prohibitively expensive, reducing the capacity. So this data is considered at risk and we have $HOME where users can store programs and scripts for setting up and running calculations, this has a smaller capcity but is backed up. This is connected to a standard network which also allows the system to be managed without intefering with jobs that are running.

Key Points:

  • HPC consists of a collection of resources similar to your desktop or laptop
  • These are connected to each other and fast storage by a high speed network
  • Different nodes serve different functions to help manage the system and allow researchers wth different workloads and needs to work efficiently.