How Flux Works

By February 23, 2016

Flux Components

Cores

A core — the allocatable unit in Flux — is one processing element. Cores compose a central processing unit (CPU). In Flux most of the CPUs have twelve cores; some of the CPUs have sixteen or eight.

A node (or computer) comprises:

  • a number of CPUs,
  • memory (RAM),
  • a hard drive,
  • power supply, and
  • network connections.

hpc_how_flux_works_core.png

In Flux all of the nodes are nearly identical, so there is no distinction made between nodes when scheduling jobs, making allocations, setting rates, or for billing.

Flux User Account

A Flux user account consists of a Linux login ID and password (the same as your U-M uniqname and umich.edu Kerberos password, respectively), and a home directory for your files. A Flux user account allows you to:

  • log in,
  • transfer files,
  • compile software, and
  • create and submit job submission scripts.

A user account alone cannot run any jobs.

Flux Project

A Flux project is a collection of Flux user accounts that are associated with one or more Flux allocations. A PI can have more than one Flux project to correspond with different research projects or funding sources, and a Flux user account can belong to more than one Flux project. The project owner (typically the PI or his/her designee) controls the list of users in a project and can change the users.

Flux Allocation

A Flux allocation:

  • describes the limits of how much of the Flux system members of a Flux project can use.
  • is the maximum number of cores and RAM a project can use at any time, and the number of months over which the project can access those cores and RAM.
  • determines your costs.

Your monthly bill is the number of cores in an allocation multiplied by the current rate. An active project requires at least one allocation, but can have as many as makes sense.

A common configuration is for a Flux project to have a “base” allocation of, for example, 50 cores that lasts for 48 months, and, from time to time, supplementary allocations that add to the base as the project grows.

The total number of cores the members of this example project can use collectively during January is 50 and, assuming 4GB RAM per core, the maximum amount of RAM the members of the project can use is 200GB.  One scenario is a set of jobs that fit within the memory limits of the allocation, but the last job in the set would exceed the number of available cores, and thus would wait.

An allocation is made in terms of multiplying the cores and the duration (in seconds) of the allocation, and represented as core*seconds. However, the number of cores that can be used at once is a hard limit, so it is very rare that you would run out of core*seconds before the end date of the allocation.

Flux Billing

The Flux allocations are billed to a U-M shortcode and a brief summary appears on your monthly Statement of Activity. ITS processes the allocations and generates monthly bills as long as you have an active Flux allocation. You can start an allocation at any time during the month and, because bills are generated monthly, you can end an allocation within 30 days of the last bill. Because Flux allocations are billed monthly, it is not possible to pre-pay for Flux allocations; you must have funding available during each month to pay for your Flux allocation.

Flux Job

A Flux job is a compute job that can use any portion of the allocations your project has available. The job is described by a short text file (the PBS file, or batch submission script) that is submitted to the job scheduler. The job scheduler takes into account the job’s requirements, the number of cores available in the Flux project’s allocations, and any other requirements.

Total resources Resources Requested by Job Job State in 50-core, 200GB allocation
9 cores; < 36GB 9 cores; < 4GB per core job starts
18 cores; < 72GB 9 cores; < 4GB per core job starts
27 cores; < 108GB 9 cores; < 4GB per core job starts
36 cores; < 144GB 9 cores; < 4GB per core job starts
45 cores; < 180GB 9 cores; < 4GB per core job starts
46 cores; < 184GB 1 core; < 4GB per core job starts
47 cores; < 188GB 1 core; < 4GB per core job starts
48 cores; < 192GB 1 core; < 4GB per core job starts
52 cores; < 208GB 4 cores; < 4GB per core job waits

Another scenario is a set of jobs with larger memory requirements, where even though there are cores available, the memory associated with the allocation is consumed and jobs are queued.

Total resources Resources Requested by Job Job State in 50-core, 200GB allocation
10 cores; 80 GB 10 cores; 8 GB RAM per core job starts
20 cores; 160GB 10 cores; 8 GB RAM per core job starts
25 cores; 200GB 5 cores; 8 GB RAM per core job starts
26 cores; 201GB 1 core; 1 GB RAM per core job waits

Depending on the requirements of the job and the state of the system, jobs may wait in a queue while other jobs complete and resources become available. The job is started once all the requirements are met. When the job ends, the Flux allocation(s) are debited the core*seconds actually used by the job.

Integrating the Flux Components

These components are used together to support computational research within the U-M business and technology environment. A Flux account is used to submit a Flux job to the cluster, where it is authorized by its associated Flux project to debit a Flux allocation and execute on Flux cores. The existing Flux allocations are aggregated and a Flux bill is applied each month to the university account (chartfield combination) specified at the creation of the allocation.

If you have questions, please send email to hpc-support@umich.edu.