Flux FAQs

By February 23, 2016

If these Frequently Asked Questions don’t address your issue, please email hpc-support@umich.edu. Please include your job ID number for specific job troubleshooting.

How can I start using Flux?

In order to use Flux, you will first need a User Account. You can apply for a User Account by filling out this form.

A Flux User Account is different from a Flux Account. A Flux User Account is used by a single user to log onto the Flux nodes, whereas a Flux Account is a collection of Flux User Accounts that are associated with one or more Flux allocations.
Flux uses two-factor authentication for security purposes, so you will also need to get an MToken to be able to log in to one of our login nodes. Once you have a User Account and an MToken, you will be able to copy data to your Flux home directory and run small programs on the the Flux login nodes.
You will need access to a Flux Account with an allocation in order to run jobs. A Flux Account must be paid for, so this is typically provided by a faculty member or your school or college.

How do I log on to Flux?

To connect to Flux you should use secure shell (ssh) from a terminal on your computer to one of the Flux login nodes. If you are using a Mac- or Linux-based operating system, you can use the default terminal for this. If you are using Windows, you will need to download a program. One popular program is PuTTY.

If you are not logged on to a computer using your uniqname, you should specify your username when connecting.
For example:

$ ssh uniqname@flux-login.engin.umich.edu

You will be prompted to input your MToken passcode. (Visit the MToken site for more information on obtaining an MToken.) After entering your MToken passcode successfully, you will be prompted to input your password. This is your kerberos password that you use to login to University services.

login-mac

Above: Connecting to a login node from a Mac.

 

 

 

 

login-putty

Connecting to a login node with PuTTY on Windows.

What is the difference between a compute node and a login node?

A login node is a computer that you can connect to directly through ssh. The login nodes can be used to copy files to your home directory and to queue jobs to run on the compute nodes. The compute nodes are where the actual jobs are run. Compute nodes are automatically assigned to a job when a PBS script is submitted to the job scheduler.

Which Flux Accounts can I use?

If you are a member of the College of Engineering or the College of Literature, Science, and the Arts, you have access to a Flux Account with a College funded allocation.
For Engineering, the Flux Account name is engin_flux. For LSA, it is lsa_flux.
To view which Flux Accounts you have access to, use the following command on one of the flux-login nodes:

$ mdiag -u <username>

For example:

 

midiag-u-uniqnameThe ALIST lists all of the Flux Accounts you are authorized to use. Note that some of these accounts might not have an active allocation. Our example user has access to run jobs on engin_flux and FluxTraining_flux. It is important to note that default_flux cannot be used to run jobs and is simply a placeholder.

How do I access an existing Flux Account?

To be given access to a Flux Account that your group is already using, please have an administrator of that Flux Account send an email to flux-support@umich.edu requesting that you be added. The administrator will usually be the person who pays for the account, or a delegated manager.

How many jobs can I have queued or running at once?

There can be up to 5,000 jobs per user in each queue (like flux or fluxm). There is no built-in limit to the number of jobs a user can have running at one time, but specific Flux Accounts can have per-user job limits. For example, engin_flux allows a maximum of 10 jobs per user at one time.

What is PBS?

Portable Batch System (PBS) is the system that the cluster uses to schedule jobs. Users use PBS scripts to specify important information, such as number of processors and memory that a job requires, to the scheduler. Visit this page for more information about PBS.

What do I change in my PBS script when running on a different Flux Account?

When using standard Flux Accounts, those that end in _flux, you must set the qos parameter and queue parameter to flux in your PBS script.
For example:

#PBS -A example_flux
#PBS -l qos=flux
#PBS -q flux

When using Large Memory Flux Accounts, those that end in _fluxm, you must set the qos to flux and queue to fluxm in your PBS script.

#PBS -A example_fluxm
#PBS -l qos=flux
#PBS -q fluxm
When using GPU Flux Accounts, those that end in _fluxg, you must set the qos to flux and queue to fluxg in your PBS script.
For example:

#PBS -A example_fluxg
#PBS -l qos=flux
#PBS -q fluxg

When using Flux on Demand (FoD) Accounts, those that end in _fluxod, you must set the qos to flux and queue to fluxod in your PBS script.

#PBS -A example_fluxod
#PBS -l qos=flux
#PBS -q fluxod

When using Flux operating Environment (FoE) Accounts, those that end in _fluxoe, you must set the qos to flux and queue to fluxoe in your PBS script.

#PBS -A example_fluxoe
#PBS -l qos=flux
#PBS -q fluxoe

What security measures are in place for Flux?

The Flux high-performance computing system at the University of Michigan has been built to provide a flexible and secure HPC environment. Flux is an extremely scalable, flexible, and reliable platform that enables researchers to match their computing capability and costs with their needs while maintaining the security of their research.

Built-in Security Features

Applications and data are protected by secure physical facilities and infrastructure as well as a variety of network and security monitoring systems. These systems provide basic but important security measures including:

  • Secure access – All access to Flux is via ssh or Globus. Ssh has a long history of high-security. Globus provides basic security and supports additional security if you need it.
  • Built-in firewalls – All of the Flux computers have firewalls that restrict access to only what is needed.
  • Unique users – Flux adheres to the University guideline of one person per login ID and one login ID per person.
  • Multi-factor authentication (MFA) – For all interactive sessions, Flux requires both a UM Kerberos password and an MToken. File transfer sessions require a Kerberos password.
  • Private Subnets – Other than the login and file transfer computers that are part of Flux, all of the computers are on a network that is private within the University network and are unreachable from the Internet.
  • Flexible data storage – Researchers can control the security of their own data storage by securing their storage as they require and having it mounted via NFSv3 or NFSv4 on Flux. Another option is to make use of Flux’s local scratch storage, which is considered secure for many types of data. Note: Flux is not considered secure for data covered by HIPAA.

Flux/Globus & Sensitive Data

To find out what types of data may be processed in Flux or Globus, visit the U-M Sensitive Data Guide to IT Resources.

Additional Security Information

If you require more detailed information on Flux’s security or architecture to support your data management plan or technology control plan, please contact the Flux team at hpc-support@umich.edu.

We know that it’s important for you to understand the protection measures that are used to guard the Flux infrastructure. But since you can’t physically touch the servers or walk through the data centers, how can you be sure that the right security controls are in place?

The answer lies in the third-party certifications and evaluations that Flux has undergone. IIA has evaluated the system, network, and storage practices of Flux and Globus. The evaluation for Flux is published at http://safecomputing.umich.edu/dataguide/?q=node/151 and the evaluation for Globus is published at http://safecomputing.umich.edu/dataguide/?q=node/155.

Shared Security and Compliance Responsibility

Because you’re managing your data in the Flux high-performance computing environment, the security responsibilities will be shared.

Flux operators have secured the underlying infrastructure, and you are obligated to secure anything you put on the your own infrastructure itself, as well meet any other compliance requirement.  These requirements may be derived from your grant or funding agency, or data owners or stewards other than yourself, or state or federal laws and regulations.

The Flux support staff is available to help manage user lists for data access, and information is publicly available on how to manage file system permissions, please see: http://en.wikipedia.org/wiki/File_system_permissions.

Contacting Flux Support

The Flux Support Team encourages communications, including for security-related questions. Please email us at hpc-support@umich.edu.

We have created a PGP key for especially sensitive communications you may need to send.

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1

mQENBFNEDlUBCACvXwy9tYzuD3BqSXrxcAEcIsmmH52066R//RMaoUbS7AcoaF12
k+Quy/V0mEQGv5C4w2IC8Ls2G0RHMJ2PYjndlEOVVQ/lA8HpaGhrSxhY1bZzmbkr
g0vGzOPN87dJPjgipSCcyupKG6Jnnm4u0woAXufBwjN2wAP2E7sqSZ2vCRyMs4vT
TGiw3Ryr2SFF98IJCzFCQAwEwSXZ2ESe9fH5+WUxJ6OM5rFk7JBkH0zSV/RE4RLW
o2E54gkF6gn+QnLOfp2Y2W0CmhagDWYqf5XHAr0SZlksgDoC14AN6rq/oop1M+/T
C/fgpAKXk1V/p1SlX7xL230re8/zzukA5ETzABEBAAG0UEhQQyBTdXBwb3J0IChV
bml2ZXJzaXR5IG9mIE1pY2hpZ2FuIEhQQyBTdXBwb3J0IEdQRyBrZXkpIDxocGMt
c3VwcG9ydEB1bWljaC5lZHU+iQE+BBMBAgAoBQJTRA5VAhsDBQkJZgGABgsJCAcD
AgYVCAIJCgsEFgIDAQIeAQIXgAAKCRDHwuoUZnHdimrSB/4m6P7aQGnsbYVFspJ8
zquGRZd3fDU/IaCvLyjsUN4Qw1KFUmqQjvvfTxix7KjlNMcGy1boUCWKNNk1sFtb
E9Jr2p6Z/M7pm4XWhZIs1UIfHr3XgLdfbeYgXpt4Md2G6ttaXv44D10xL2LYCHE8
DnSVv+2SIG9PhaV+h+aBUo4yKwTwVBZsguU1Z1fsbiu6z6iDrzU2dlQp0NLmw73G
v5HUdYdu/YJdh5frp/2XorLXynrEyCk1SxViXrHY6dc9Y3bUjwl0MOJypLuRhQmj
kVwHIsNsRg1YJ6iyJzom33C7YdRktBiPpstkYDHJf/PVRAw1G4dkyjfUfG2pIoQd
WjOxuQENBFNEDlUBCADNwZ5edW/e08zYFWSGVsdpY4HM2CdsVqkuQru2puHhJqg4
eWS9RAdJ6fWp3HJCDsDkuQr19B3G5gEWyWOMgPJ9yW2tFVCrVsb9UekXAWh6C6hL
Tj+pgVVpNDTYrErYa2nlll0oSyplluVBRlzDfuf4YkHDy2TFd7Kam2C2NuQzLQX3
THhHkgMV+4SQZ+HrHRSoYPAcPb4+83dyQUo9lEMGcRA2WqappKImGhpccQ6x3Adj
/HFaDrFT7itEtC8/fx4UyaIeMszNDjD1WIGBJocOdO7ClIEGyCshwKn5z1cCUt72
XDjun0f1Czl6FOzkG+CHg5mf1cwgNUNx7TlVBFdTABEBAAGJASUEGAECAA8FAlNE
DlUCGwwFCQlmAYAACgkQx8LqFGZx3YrcqggAlKZhtrMDTHNki1ZTF7c7RLjfN17H
Fb342sED1Y3y3Dm0RVSQ2SuUWbezuDwov6CllgQR8SjBZ+D9G6Bt05WZgaILD7H0
LR9+KtBNYjxoVIdNHcGBf4JSL19nAI4AMWcOOjfasGrn9C60SwiiZYzBtwZa9VCi
+OhZRbmcBejBfIAWC9dGtIcPHBVcObT1WVqAWKlBOGmEsj/fcpHKkDpbdS7ksLip
YLoce2rmyjXhFH4GXZ86cQD1nvOoPmzocIOK5wpIm6YxXtYLP07T30022fOV7YxT
mbiKKL2LmxN1Nb/+mf+wIZ5w2ZdDln1bbdIKRHoyS2HyhYuLd1t/vAOFwg==
=yAEg
-----END PGP PUBLIC KEY BLOCK-----

Why can't I connect to Flux from off campus?

Flux does not allow network connections from non-UM networks. To connect to Flux from off-campus, you can use the UM VPN client. Information about the UM VPN client, including terms of use, can be found at http://www.itcs.umich.edu/vpn/.
You can also use an intermediary machine, for example, login.itd.umich.edu. You first connect to it, then from it you connect to Flux. For example, you might use PuTTY to connect to login.itd.umich.edu, then from that login machine, use

% ssh flux-login.engin.umich.edu

where the % is the prompt on the login.itd machine.

How can I find out my job's status?

To determine your job’s status use either of the following commands on a login node:

$ qstat -u uniqname
$ qstat <jobid>

The status of the job is found in the column labeled with the letter “S”.
R means that the job is currently running
Q means that the job is waiting in the queue
C means that the job has already completed
E means that the job is in the process of exiting
H means that the job is on hold (generally set by the user or an unfulfilled job dependency)

Why isn’t my job running? I queued my job, but it isn’t running and has a status of queued.

Jobs may sit in queued status for a variety of reasons. The scheduler makes batch passes of queued jobs and determines if there are sufficient free resources to run a job, and then runs those that it can. If there are sufficient resources at the time a job is submitted, it may still sit in the queue for up to 15 minutes before the scheduler makes a batch pass and starts the job.
If there are not sufficient resources, the job will sit in queue until resources open up. The most common limiting resources are processors and memory. For example, if I try to run a job on a Flux Account that has an allocation with a total of 10 processors and nine are in use, a job asking for two processors will have to wait in queue until another processor is becomes available. Once a processor is freed, the scheduler will assign the job to the two free processors and the job will run.

Why does my job have a status of Batchhold? I queued it but it isn't running.

Jobs are assigned a Batchhold by the scheduler when they have bad PBS credentials and will not run. Jobs are often given this status when the Flux Account name, qos, or queue are misspelled. Jobs can also be given this status if you try to run on a Flux Account to which you do not have access. If you cannot determine why your job is on Batchhold, please contact us at flux-support@umich.edu with your job number.

How many processors or how much memory does my Flux Account have?

You can check the resources available to a Flux Account with the command:

$ mdiag -a  <accountname_flux>

For example:

howmany1

MAXPROC indicates the total number of processors available to the Flux Account.

MAXMEM  indicates the total amount of memory available to the Flux Account in megabytes.

If a Flux Account has a limit set for the maximum number of processors a single user can use at once it will be indicated with MAXPROC[USER]

For example:

howmany2

What jobs are currently running on the Flux Account I use?

You can check the jobs that are queued, blocked and running on a Flux Account with the command:

$ showq -w acct=accountname_flux

What does this e-mail mean: moab job resource violation: ``job ####### exceeded MEM usage soft limit``?

This message is sent when you use more memory than you asked for (default is 768 MB per core).

You can request additional memory by adding “#PBS -l pmem=###MB” to your pbs file, which will ask for ###MB of memory per process asked for (i.e., if you asked for 2 nodes with ppn=2 and pmem=3000MB, you will have asked for 12000MB of memory total). This is not in addition to the default, but replaces it.

How can I specify processor and computer layout with PBS?

Sometimes a program will want to have the processors it uses arranged across the computers in a particular way. There are several ways to tell PBS how many processors there are and on what machines they can go. We’ll look at three cases here, starting with the least detailed and proceeding to the most detailed:

  1. The processors can be anywhere;
  2. There must be a minimum of N processors on each computer;
  3. There must be exactly N processors on each computer.

1. If it does not matter how the processors are divided among the computers, you should use the procs property. Using the procs property usually allows for an eligible job to run with the shortest time queued. Flux is a heterogeneous cluster with computers of varying number of processors. Because of this, asking for a specific computer/processor combination may cause delay in your jobs starting.

If you want N processors spread across the first available processors, regardless of which physical computer they are on, you should use the command:

#PBS -l procs=N

2. If you would like a minimum of M processors on each computer, you should use the nodes property in conjunction with the ppn property.

For example:

#PBS -l nodes=N:ppn=M

Here, nodes is used to assign a group of M processors all to the same physical computer. The number of groups of M processors is assigned by N. Using nodes in conjunction with ppn does not guarantee that the M groups of processors will all be put on separate physical computers. Because of this, you will get a computer with at least M processors, but you may end up with some multiple of M on a computer.

For example:

#PBS -l nodes=3:ppn=4

Here, three groups of four processors could all end up on one computer, or groups of four processors could end up on three separate computers or eight processors could end up on one computer while four end up on another.

3. If you want exactly M processors on exactly N computers, you should use the tpn property in conjunction with procs.

For example:

#PBS -l procs=M,tpn=N

The procs property will specify the total number of processors to be used across all computers, when using the procs property with the tpn property.

Assigning “procs=M” says that you want M processors total, and you want exactly N processors to run on each physical computer. This would give you M/N processors running on N separate physical computers.

How do I get my Network File System (NFS) shares, including Value Storage, mounted on Flux?

To get an NFS mounted on Flux you will need to contact the administrator of the NFS share and ask them to export it to the following IP ranges:

  • 141.212.30.0/23
  • 10.164.0.0/21
  • 10.224.0.0/21

For Value Storage shares purchased through ITS, the email address is: vstore-admins@umich.edu

Once the NFS share has been exported, please contact us at flux-support@umich.edu requesting that we mount the NFS share. Please be sure to include the name of the NFS share in both your email to the NFS share administrators and to flux-support@umich.edu

May I process sensitive data using Flux?

Yes, but only if you use a secure storage solution like Mainstream Storage and Flux’s scratch storage. Flux’s home directories are provided by Value Storage, which is not an appropriate location to store sensitive institutional data.

One possible workflow is to use sftp or Globus to move data between a secure solution and Flux’s scratch storage, which is secure, bypassing your home directory or any of your own Value Storage directories.

Keep in mind that compliance is a shared responsibility.You must also take any steps required by your role or unit to comply with relevant regulatory requirements.

For more information on specific types of data that can be stored and analyzed on Flux, Value Storage, and other U-M services, please see the “Sensitive Data Guide to IT Services” web page on the Safe Computing website: http://safecomputing.umich.edu/dataguide/