Flux FAQs

By | | No Comments

If these Frequently Asked Questions don’t address your issue, please email hpc-support@umich.edu. Please include your job ID number for specific job troubleshooting.

How can I start using Flux?

In order to use Flux, you will first need a User Account. You can apply for a User Account by filling out this form.

A Flux User Account is different from a Flux Account. A Flux User Account is used by a single user to log onto the Flux nodes, whereas a Flux Account is a collection of Flux User Accounts that are associated with one or more Flux allocations.
Flux uses two-factor authentication for security purposes, so you will also need to get an MToken to be able to log in to one of our login nodes. Once you have a User Account and an MToken, you will be able to copy data to your Flux home directory and run small programs on the the Flux login nodes.
You will need access to a Flux Account with an allocation in order to run jobs. A Flux Account must be paid for, so this is typically provided by a faculty member or your school or college.

How do I log on to Flux?

To connect to Flux you should use secure shell (ssh) from a terminal on your computer to one of the Flux login nodes. If you are using a Mac- or Linux-based operating system, you can use the default terminal for this. If you are using Windows, you will need to download a program. One popular program is PuTTY.

If you are not logged on to a computer using your uniqname, you should specify your username when connecting.
For example:

$ ssh uniqname@flux-login.engin.umich.edu

You will be prompted to input your MToken passcode. (Visit the MToken site for more information on obtaining an MToken.) After entering your MToken passcode successfully, you will be prompted to input your password. This is your kerberos password that you use to login to University services.

login-mac

Above: Connecting to a login node from a Mac.

 

 

 

 

login-putty

Connecting to a login node with PuTTY on Windows.

What is the difference between a compute node and a login node?

A login node is a computer that you can connect to directly through ssh. The login nodes can be used to copy files to your home directory and to queue jobs to run on the compute nodes. The compute nodes are where the actual jobs are run. Compute nodes are automatically assigned to a job when a PBS script is submitted to the job scheduler.

Which Flux Accounts can I use?

If you are a member of the College of Engineering or the College of Literature, Science, and the Arts, you have access to a Flux Account with a College funded allocation.
For Engineering, the Flux Account name is engin_flux. For LSA, it is lsa_flux.
To view which Flux Accounts you have access to, use the following command on one of the flux-login nodes:

$ mdiag -u <username>

For example:

 

midiag-u-uniqnameThe ALIST lists all of the Flux Accounts you are authorized to use. Note that some of these accounts might not have an active allocation. Our example user has access to run jobs on engin_flux and FluxTraining_flux. It is important to note that default_flux cannot be used to run jobs and is simply a placeholder.

How do I access an existing Flux Account?

To be given access to a Flux Account that your group is already using, please have an administrator of that Flux Account send an email to flux-support@umich.edu requesting that you be added. The administrator will usually be the person who pays for the account, or a delegated manager.

How many jobs can I have queued or running at once?

There can be up to 5,000 jobs per user in each queue (like flux or fluxm). There is no built-in limit to the number of jobs a user can have running at one time, but specific Flux Accounts can have per-user job limits. For example, engin_flux allows a maximum of 10 jobs per user at one time.

What is PBS?

Portable Batch System (PBS) is the system that the cluster uses to schedule jobs. Users use PBS scripts to specify important information, such as number of processors and memory that a job requires, to the scheduler. Visit this page for more information about PBS.

What do I change in my PBS script when running on a different Flux Account?

When using standard Flux Accounts, those that end in _flux, you must set the qos parameter and queue parameter to flux in your PBS script.
For example:

#PBS -A example_flux
#PBS -l qos=flux
#PBS -q flux

When using Large Memory Flux Accounts, those that end in _fluxm, you must set the qos to flux and queue to fluxm in your PBS script.

#PBS -A example_fluxm
#PBS -l qos=flux
#PBS -q fluxm
When using GPU Flux Accounts, those that end in _fluxg, you must set the qos to flux and queue to fluxg in your PBS script.
For example:

#PBS -A example_fluxg
#PBS -l qos=flux
#PBS -q fluxg

When using Flux on Demand (FoD) Accounts, those that end in _fluxod, you must set the qos to flux and queue to fluxod in your PBS script.

#PBS -A example_fluxod
#PBS -l qos=flux
#PBS -q fluxod

When using Flux operating Environment (FoE) Accounts, those that end in _fluxoe, you must set the qos to flux and queue to fluxoe in your PBS script.

#PBS -A example_fluxoe
#PBS -l qos=flux
#PBS -q fluxoe

What security measures are in place for Flux?

The Flux high-performance computing system at the University of Michigan has been built to provide a flexible and secure HPC environment. Flux is an extremely scalable, flexible, and reliable platform that enables researchers to match their computing capability and costs with their needs while maintaining the security of their research.

Built-in Security Features

Applications and data are protected by secure physical facilities and infrastructure as well as a variety of network and security monitoring systems. These systems provide basic but important security measures including:

  • Secure access – All access to Flux is via ssh or Globus. Ssh has a long history of high-security. Globus provides basic security and supports additional security if you need it.
  • Built-in firewalls – All of the Flux computers have firewalls that restrict access to only what is needed.
  • Unique users – Flux adheres to the University guideline of one person per login ID and one login ID per person.
  • Multi-factor authentication (MFA) – For all interactive sessions, Flux requires both a UM Kerberos password and an MToken. File transfer sessions require a Kerberos password.
  • Private Subnets – Other than the login and file transfer computers that are part of Flux, all of the computers are on a network that is private within the University network and are unreachable from the Internet.
  • Flexible data storage – Researchers can control the security of their own data storage by securing their storage as they require and having it mounted via NFSv3 or NFSv4 on Flux. Another option is to make use of Flux’s local scratch storage, which is considered secure for many types of data. Note: Flux is not considered secure for data covered by HIPAA.

Flux/Globus & Sensitive Data

To find out what types of data may be processed in Flux or Globus, visit the U-M Sensitive Data Guide to IT Resources.

Additional Security Information

If you require more detailed information on Flux’s security or architecture to support your data management plan or technology control plan, please contact the Flux team at hpc-support@umich.edu.

We know that it’s important for you to understand the protection measures that are used to guard the Flux infrastructure. But since you can’t physically touch the servers or walk through the data centers, how can you be sure that the right security controls are in place?

The answer lies in the third-party certifications and evaluations that Flux has undergone. IIA has evaluated the system, network, and storage practices of Flux and Globus. The evaluation for Flux is published at http://safecomputing.umich.edu/dataguide/?q=node/151 and the evaluation for Globus is published at http://safecomputing.umich.edu/dataguide/?q=node/155.

Shared Security and Compliance Responsibility

Because you’re managing your data in the Flux high-performance computing environment, the security responsibilities will be shared.

Flux operators have secured the underlying infrastructure, and you are obligated to secure anything you put on the your own infrastructure itself, as well meet any other compliance requirement.  These requirements may be derived from your grant or funding agency, or data owners or stewards other than yourself, or state or federal laws and regulations.

The Flux support staff is available to help manage user lists for data access, and information is publicly available on how to manage file system permissions, please see: http://en.wikipedia.org/wiki/File_system_permissions.

Contacting Flux Support

The Flux Support Team encourages communications, including for security-related questions. Please email us at hpc-support@umich.edu.

We have created a PGP key for especially sensitive communications you may need to send.

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1

mQENBFNEDlUBCACvXwy9tYzuD3BqSXrxcAEcIsmmH52066R//RMaoUbS7AcoaF12
k+Quy/V0mEQGv5C4w2IC8Ls2G0RHMJ2PYjndlEOVVQ/lA8HpaGhrSxhY1bZzmbkr
g0vGzOPN87dJPjgipSCcyupKG6Jnnm4u0woAXufBwjN2wAP2E7sqSZ2vCRyMs4vT
TGiw3Ryr2SFF98IJCzFCQAwEwSXZ2ESe9fH5+WUxJ6OM5rFk7JBkH0zSV/RE4RLW
o2E54gkF6gn+QnLOfp2Y2W0CmhagDWYqf5XHAr0SZlksgDoC14AN6rq/oop1M+/T
C/fgpAKXk1V/p1SlX7xL230re8/zzukA5ETzABEBAAG0UEhQQyBTdXBwb3J0IChV
bml2ZXJzaXR5IG9mIE1pY2hpZ2FuIEhQQyBTdXBwb3J0IEdQRyBrZXkpIDxocGMt
c3VwcG9ydEB1bWljaC5lZHU+iQE+BBMBAgAoBQJTRA5VAhsDBQkJZgGABgsJCAcD
AgYVCAIJCgsEFgIDAQIeAQIXgAAKCRDHwuoUZnHdimrSB/4m6P7aQGnsbYVFspJ8
zquGRZd3fDU/IaCvLyjsUN4Qw1KFUmqQjvvfTxix7KjlNMcGy1boUCWKNNk1sFtb
E9Jr2p6Z/M7pm4XWhZIs1UIfHr3XgLdfbeYgXpt4Md2G6ttaXv44D10xL2LYCHE8
DnSVv+2SIG9PhaV+h+aBUo4yKwTwVBZsguU1Z1fsbiu6z6iDrzU2dlQp0NLmw73G
v5HUdYdu/YJdh5frp/2XorLXynrEyCk1SxViXrHY6dc9Y3bUjwl0MOJypLuRhQmj
kVwHIsNsRg1YJ6iyJzom33C7YdRktBiPpstkYDHJf/PVRAw1G4dkyjfUfG2pIoQd
WjOxuQENBFNEDlUBCADNwZ5edW/e08zYFWSGVsdpY4HM2CdsVqkuQru2puHhJqg4
eWS9RAdJ6fWp3HJCDsDkuQr19B3G5gEWyWOMgPJ9yW2tFVCrVsb9UekXAWh6C6hL
Tj+pgVVpNDTYrErYa2nlll0oSyplluVBRlzDfuf4YkHDy2TFd7Kam2C2NuQzLQX3
THhHkgMV+4SQZ+HrHRSoYPAcPb4+83dyQUo9lEMGcRA2WqappKImGhpccQ6x3Adj
/HFaDrFT7itEtC8/fx4UyaIeMszNDjD1WIGBJocOdO7ClIEGyCshwKn5z1cCUt72
XDjun0f1Czl6FOzkG+CHg5mf1cwgNUNx7TlVBFdTABEBAAGJASUEGAECAA8FAlNE
DlUCGwwFCQlmAYAACgkQx8LqFGZx3YrcqggAlKZhtrMDTHNki1ZTF7c7RLjfN17H
Fb342sED1Y3y3Dm0RVSQ2SuUWbezuDwov6CllgQR8SjBZ+D9G6Bt05WZgaILD7H0
LR9+KtBNYjxoVIdNHcGBf4JSL19nAI4AMWcOOjfasGrn9C60SwiiZYzBtwZa9VCi
+OhZRbmcBejBfIAWC9dGtIcPHBVcObT1WVqAWKlBOGmEsj/fcpHKkDpbdS7ksLip
YLoce2rmyjXhFH4GXZ86cQD1nvOoPmzocIOK5wpIm6YxXtYLP07T30022fOV7YxT
mbiKKL2LmxN1Nb/+mf+wIZ5w2ZdDln1bbdIKRHoyS2HyhYuLd1t/vAOFwg==
=yAEg
-----END PGP PUBLIC KEY BLOCK-----

Why can't I connect to Flux from off campus?

Flux does not allow network connections from non-UM networks. To connect to Flux from off-campus, you can use the UM VPN client. Information about the UM VPN client, including terms of use, can be found at http://www.itcs.umich.edu/vpn/.
You can also use an intermediary machine, for example, login.itd.umich.edu. You first connect to it, then from it you connect to Flux. For example, you might use PuTTY to connect to login.itd.umich.edu, then from that login machine, use

% ssh flux-login.engin.umich.edu

where the % is the prompt on the login.itd machine.

How can I find out my job's status?

To determine your job’s status use either of the following commands on a login node:

$ qstat -u uniqname
$ qstat <jobid>

The status of the job is found in the column labeled with the letter “S”.
R means that the job is currently running
Q means that the job is waiting in the queue
C means that the job has already completed
E means that the job is in the process of exiting
H means that the job is on hold (generally set by the user or an unfulfilled job dependency)

Why isn’t my job running? I queued my job, but it isn’t running and has a status of queued.

Jobs may sit in queued status for a variety of reasons. The scheduler makes batch passes of queued jobs and determines if there are sufficient free resources to run a job, and then runs those that it can. If there are sufficient resources at the time a job is submitted, it may still sit in the queue for up to 15 minutes before the scheduler makes a batch pass and starts the job.
If there are not sufficient resources, the job will sit in queue until resources open up. The most common limiting resources are processors and memory. For example, if I try to run a job on a Flux Account that has an allocation with a total of 10 processors and nine are in use, a job asking for two processors will have to wait in queue until another processor is becomes available. Once a processor is freed, the scheduler will assign the job to the two free processors and the job will run.

Why does my job have a status of Batchhold? I queued it but it isn't running.

Jobs are assigned a Batchhold by the scheduler when they have bad PBS credentials and will not run. Jobs are often given this status when the Flux Account name, qos, or queue are misspelled. Jobs can also be given this status if you try to run on a Flux Account to which you do not have access. If you cannot determine why your job is on Batchhold, please contact us at flux-support@umich.edu with your job number.

How many processors or how much memory does my Flux Account have?

You can check the resources available to a Flux Account with the command:

$ mdiag -a  <accountname_flux>

For example:

howmany1

MAXPROC indicates the total number of processors available to the Flux Account.

MAXMEM  indicates the total amount of memory available to the Flux Account in megabytes.

If a Flux Account has a limit set for the maximum number of processors a single user can use at once it will be indicated with MAXPROC[USER]

For example:

howmany2

What jobs are currently running on the Flux Account I use?

You can check the jobs that are queued, blocked and running on a Flux Account with the command:

$ showq -w acct=accountname_flux

What does this e-mail mean: moab job resource violation: ``job ####### exceeded MEM usage soft limit``?

This message is sent when you use more memory than you asked for (default is 768 MB per core).

You can request additional memory by adding “#PBS -l pmem=###MB” to your pbs file, which will ask for ###MB of memory per process asked for (i.e., if you asked for 2 nodes with ppn=2 and pmem=3000MB, you will have asked for 12000MB of memory total). This is not in addition to the default, but replaces it.

How can I specify processor and computer layout with PBS?

Sometimes a program will want to have the processors it uses arranged across the computers in a particular way. There are several ways to tell PBS how many processors there are and on what machines they can go. We’ll look at three cases here, starting with the least detailed and proceeding to the most detailed:

  1. The processors can be anywhere;
  2. There must be a minimum of N processors on each computer;
  3. There must be exactly N processors on each computer.

1. If it does not matter how the processors are divided among the computers, you should use the procs property. Using the procs property usually allows for an eligible job to run with the shortest time queued. Flux is a heterogeneous cluster with computers of varying number of processors. Because of this, asking for a specific computer/processor combination may cause delay in your jobs starting.

If you want N processors spread across the first available processors, regardless of which physical computer they are on, you should use the command:

#PBS -l procs=N

2. If you would like a minimum of M processors on each computer, you should use the nodes property in conjunction with the ppn property.

For example:

#PBS -l nodes=N:ppn=M

Here, nodes is used to assign a group of M processors all to the same physical computer. The number of groups of M processors is assigned by N. Using nodes in conjunction with ppn does not guarantee that the M groups of processors will all be put on separate physical computers. Because of this, you will get a computer with at least M processors, but you may end up with some multiple of M on a computer.

For example:

#PBS -l nodes=3:ppn=4

Here, three groups of four processors could all end up on one computer, or groups of four processors could end up on three separate computers or eight processors could end up on one computer while four end up on another.

3. If you want exactly M processors on exactly N computers, you should use the tpn property in conjunction with procs.

For example:

#PBS -l procs=M,tpn=N

The procs property will specify the total number of processors to be used across all computers, when using the procs property with the tpn property.

Assigning “procs=M” says that you want M processors total, and you want exactly N processors to run on each physical computer. This would give you M/N processors running on N separate physical computers.

How do I get my Network File System (NFS) shares, including Value Storage, mounted on Flux?

To get an NFS mounted on Flux you will need to contact the administrator of the NFS share and ask them to export it to the following IP ranges:

  • 141.212.30.0/23
  • 10.164.0.0/21
  • 10.224.0.0/21

For Value Storage shares purchased through ITS, the email address is: vstore-admins@umich.edu

Once the NFS share has been exported, please contact us at flux-support@umich.edu requesting that we mount the NFS share. Please be sure to include the name of the NFS share in both your email to the NFS share administrators and to flux-support@umich.edu

May I process sensitive data using Flux?

Yes, but only if you use a secure storage solution like Mainstream Storage and Flux’s scratch storage. Flux’s home directories are provided by Value Storage, which is not an appropriate location to store sensitive institutional data.

One possible workflow is to use sftp or Globus to move data between a secure solution and Flux’s scratch storage, which is secure, bypassing your home directory or any of your own Value Storage directories.

Keep in mind that compliance is a shared responsibility.You must also take any steps required by your role or unit to comply with relevant regulatory requirements.

For more information on specific types of data that can be stored and analyzed on Flux, Value Storage, and other U-M services, please see the “Sensitive Data Guide to IT Services” web page on the Safe Computing website: http://safecomputing.umich.edu/dataguide/

Flux Tips (Dos and Don’ts)

By | | No Comments

Do:

  • use the U-M VPN in order to log in to Flux from off campus. WHY: Flux login is restricted to campus IP addresses.
  • have your jobs read their input and write their output to the /scratch filesystem, WHY:/scratch is much faster and more reliable than your Flux home directory.
  • remember to load any modules your job needs each time after logging in to Flux but before submitting any jobs.  Modules that you will always be using can be loaded automatically when you log in by putting the “module load” commands in your ~/privatemodules/defaultfile. WHY: Software will not be available to your job if it is not loaded prior to submitting the job.
  • request 20% more than the maximum memory and maximum walltime you think your jobs might need.  WHY: if a job exceeds the requested memory or walltime, it will be terminated before it can finish.
  • use “#PBS -j oe” in your PBS scripts to combine the PBS output and error messages into a single file.  WHY: It is much easier to figure out what your job did (you won’t have to match up lines between the two files).
  • submit lots of jobs at once rather than submitting one job and waiting for it to complete before submitting another.  WHY: “keeping the queue full” will give you the overall best throughput and utilization for your Flux allocation.
  • perform regular backups of all of your data on Flux yourself, including data in your home directory and in/scratch WHY: If you lose a file, the Flux staff can’t get it back for you.
  • submit interactive jobs using “qsub -I”.  WHY: Interactive jobs on the login nodes will be terminated after 15 minutes or before if they are disrupting normal service.
  • send any requests for help as a new email (not a reply to a previous email) to hpc-support@umich.edu.  WHY: You’ll get quicker help if you don’t send email to individuals directly and don’t reply to old (unrelated) support tickets that may be closed already.
  • run “qdel $(qselect -u $USER)” to delete all your jobs if you need to terminate all your jobs.

Don’t:

  • run interactive jobs or do significant computation on the Flux login nodes.  WHY: processes on the Flux login nodes will be automatically terminated if they use 15 minutes of CPU time, or before if they are disrupting normal service.
  • use /scratch space for long-term storage; files that you’re not using for two weeks or longer should be moved to your home directory or another system. WHY: /scratch is a limited, shared resource; also, no files anywhere on Flux are backed up, and cannot be recovered if lost.
  • run “qdel all” WHY: It will lock up the cluster scheduler for a long time trying to delete jobs that do not belong to you when you do not have the permission to delete them.

Flux in 10 Easy Steps

By | | No Comments

1. Get Duo

You must use Duo authentication to log on to Flux. For more details see http://its.umich.edu/two-factor-authentication.

2. Get a Flux user account

You need to have a user account to use Flux. Apply for one using this form.

3. Get access to an allocation

If you don’t have one already, an allocation must be purchased before you can use Flux. If you don’t have access, request a Flux allocation by emailing hpc-support@umich.edu, and one of our support staff will help you get set up. For more, see our Allocation pages. (LSA users have access to a public allocation.)

4. Get an SSH client

You need a terminal emulator to log into Flux. This video will help teach you some basic Linux navigation commands if needed.

If you are trying to log in from off campus, or using an unauthenticated wireless network such as MGuest, you have a couple of options:

    • Install VPN software on your computer
    • First ssh to login.itd.umich.edu, then ssh to flux-login.arc-ts.umich.edu from there.

Here’s what a login looks like using a terminal emulator:

Mac using terminal: Open terminal

Type: ssh -l uniqname flux-login.arc-ts.umich.edu [replacing your uniqname in the command]

Windows using PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/)

Launch Putty and enter flux-login.arc-ts.umich.edu as the host name then click open.

putty

For both Mac and Windows:

At the “Enter a passcode or select one of the following options:” prompt, type the number of your preferred choice for Duo authentication.

login-mac

5. Get files

The server you can use to get files onto your home directory on Flux is flux-xfer.arc-ts.umich.edu. Here’s what it looks like on

Cyberduck on a Mac (Video: http://tinyurl.com/flux-cyberduck)

  1. Open Cyberduck and click the Open connection button.

  2. Set the “Server:” to: be flux-xfer.arc-ts.umich.edu

  3. Set your Username: to be your uniqname.

  4. Enter your Kerberos password.

  5. Click connect.

    cyberduck

  1. Drag and drop files between the two systems.  Click the Disconnect button when completed.

An alternative for Windows is WinSCP.exe and you can get this from the U-M Blue Disc site. Here’s what transferring files or directories looks like from the command line (Linux or Mac):

% scp localfile uniqname@flux-xfer.arc-ts.umich.edu:remotefile (copy a file)

or

% scp -r localdir  uniqname@flux-xfer.arc-ts.umich.edu:remotedir    (copy an entire directory)

Globus is a great alternative especially for large data transfers. Learn more at the Globus web site or this how-to video.

6. Get an editor

nano is an easy to use editor available to edit your files on Flux.

7. Get a PBS batch script

The cluster requires some information to wrap your code so that it can submit your job for execution. Wrapping these jobs allows the Flux scheduler to fits jobs to available processing on the system. This means you must make a batch file to submit your code.

  1. Video: PBS Basics.

    Sample batch file, named sample.pbs:

#!/bin/sh
####  PBS preamble

#PBS -N sample_job

# Change "bjensen" to your uniqname:
#PBS -M bjensen@umich.edu
#PBS -m abe

# Change the number of cores (ppn=1), amount of memory, and walltime:
#PBS -l nodes=1:ppn=1,mem=2000mb,walltime=01:00:00
#PBS -j oe
#PBS -V

# Change "example_flux" to the name of your Flux allocation:
#PBS -A example_flux
#PBS -q flux
#PBS -l qos=flux

####  End PBS preamble

#  Show list of CPUs you ran on, if you're running under PBS
if [ -n "$PBS_NODEFILE" ]; then cat $PBS_NODEFILE; fi

#  Change to the directory you submitted from
if [ -n "$PBS_O_WORKDIR" ]; then cd $PBS_O_WORKDIR; fi

#  Put your job commands here:
echo "Hello, world"

8. Get modules

Flux makes software available for use by packaging them in modules.  You must load the modules that you need for your job before you submit it. The commands are:

module avail
module list
module load modulename    (eg module load R)
module unload modulename    (eg module unload R)

9. Get a job

You can submit your job and check its status in the queueing system by using the commands below.

Submitting your Job:

qsub filename.pbs    (eg qsub sample.pbs … outputs jobid on successful submission)

Checking the status of your Job:

For a single job: qstat jobid     OR  
checkjob  jobid   (just the numeric portion)

To see all of your jobs: qstat -u uniqname  OR showq -w user=uniqname

Deleting your job:

qdel  jobid

If a job doesn’t start in within 30 minutes, send an email with a copy of your PBS script and the job number to flux-support@umich.edu.  Do not delete the job.

10. Get output

When your job is completed, you will receive an email similar to this one:

PBS Job Id: 11800755.flux.arc-ts.umich.edu
Job Name:   sample_job
Exec host:  flux5301/0
Exit_status=0
resources_used.cput=00:00:03
resources_used.mem=768124kb
resources_used.vmem=770420kb
resources_used.walltime=00:00:13

Exit_status=0 means “OK.” Anything else indicates an error occurred when running the PBS script — check the PBS output and error files to find out what the problem is.

resources_used.vmem=770420kb means your job used a total of 770 MB of memory. This may be used to revise your memory estimate for future submissions.

resources_used.walltime=00:00:13 means your job took 13 seconds to execute. This may be used to revise your walltime estimate for future submissions.

If you have specified the -j oe option in your PBS script, all script output and error message output will appear in a file named jobname.ojobid in your submission directory after the job completes.  In the above sample, the output generated by your batch script (here “Hello, world”) will be placed into the file sample_job.o11800755 .

How Flux Works

By | | No Comments

Flux Components

Cores

A core — the allocatable unit in Flux — is one processing element. Cores compose a central processing unit (CPU). In Flux most of the CPUs have twelve cores; some of the CPUs have sixteen or eight.

A node (or computer) comprises:

  • a number of CPUs,
  • memory (RAM),
  • a hard drive,
  • power supply, and
  • network connections.

hpc_how_flux_works_core.png

In Flux all of the nodes are nearly identical, so there is no distinction made between nodes when scheduling jobs, making allocations, setting rates, or for billing.

Flux User Account

A Flux user account consists of a Linux login ID and password (the same as your U-M uniqname and umich.edu Kerberos password, respectively), and a home directory for your files. A Flux user account allows you to:

  • log in,
  • transfer files,
  • compile software, and
  • create and submit job submission scripts.

A user account alone cannot run any jobs.

Flux Project

A Flux project is a collection of Flux user accounts that are associated with one or more Flux allocations. A PI can have more than one Flux project to correspond with different research projects or funding sources, and a Flux user account can belong to more than one Flux project. The project owner (typically the PI or his/her designee) controls the list of users in a project and can change the users.

Flux Allocation

A Flux allocation:

  • describes the limits of how much of the Flux system members of a Flux project can use.
  • is the maximum number of cores and RAM a project can use at any time, and the number of months over which the project can access those cores and RAM.
  • determines your costs.

Your monthly bill is the number of cores in an allocation multiplied by the current rate. An active project requires at least one allocation, but can have as many as makes sense.

A common configuration is for a Flux project to have a “base” allocation of, for example, 50 cores that lasts for 48 months, and, from time to time, supplementary allocations that add to the base as the project grows.

The total number of cores the members of this example project can use collectively during January is 50 and, assuming 4GB RAM per core, the maximum amount of RAM the members of the project can use is 200GB.  One scenario is a set of jobs that fit within the memory limits of the allocation, but the last job in the set would exceed the number of available cores, and thus would wait.

An allocation is made in terms of multiplying the cores and the duration (in seconds) of the allocation, and represented as core*seconds. However, the number of cores that can be used at once is a hard limit, so it is very rare that you would run out of core*seconds before the end date of the allocation.

Flux Billing

The Flux allocations are billed to a U-M shortcode and a brief summary appears on your monthly Statement of Activity. ITS processes the allocations and generates monthly bills as long as you have an active Flux allocation. You can start an allocation at any time during the month and, because bills are generated monthly, you can end an allocation within 30 days of the last bill. Because Flux allocations are billed monthly, it is not possible to pre-pay for Flux allocations; you must have funding available during each month to pay for your Flux allocation.

Flux Job

A Flux job is a compute job that can use any portion of the allocations your project has available. The job is described by a short text file (the PBS file, or batch submission script) that is submitted to the job scheduler. The job scheduler takes into account the job’s requirements, the number of cores available in the Flux project’s allocations, and any other requirements.

Total resources Resources Requested by Job Job State in 50-core, 200GB allocation
9 cores; < 36GB 9 cores; < 4GB per core job starts
18 cores; < 72GB 9 cores; < 4GB per core job starts
27 cores; < 108GB 9 cores; < 4GB per core job starts
36 cores; < 144GB 9 cores; < 4GB per core job starts
45 cores; < 180GB 9 cores; < 4GB per core job starts
46 cores; < 184GB 1 core; < 4GB per core job starts
47 cores; < 188GB 1 core; < 4GB per core job starts
48 cores; < 192GB 1 core; < 4GB per core job starts
52 cores; < 208GB 4 cores; < 4GB per core job waits

Another scenario is a set of jobs with larger memory requirements, where even though there are cores available, the memory associated with the allocation is consumed and jobs are queued.

Total resources Resources Requested by Job Job State in 50-core, 200GB allocation
10 cores; 80 GB 10 cores; 8 GB RAM per core job starts
20 cores; 160GB 10 cores; 8 GB RAM per core job starts
25 cores; 200GB 5 cores; 8 GB RAM per core job starts
26 cores; 201GB 1 core; 1 GB RAM per core job waits

Depending on the requirements of the job and the state of the system, jobs may wait in a queue while other jobs complete and resources become available. The job is started once all the requirements are met. When the job ends, the Flux allocation(s) are debited the core*seconds actually used by the job.

Integrating the Flux Components

These components are used together to support computational research within the U-M business and technology environment. A Flux account is used to submit a Flux job to the cluster, where it is authorized by its associated Flux project to debit a Flux allocation and execute on Flux cores. The existing Flux allocations are aggregated and a Flux bill is applied each month to the university account (chartfield combination) specified at the creation of the allocation.

If you have questions, please send email to hpc-support@umich.edu.