LSA’s public Flux allocation

By February 24, 2016

Overview

Researchers in the College of Literature, Science, and the Arts have four options for using the Flux High Performance Computing cluster:

Cost Wait for jobs to start Special usage limits Notes
Public LSA allocations Free
(paid for by LSA)
•••• Yes (cores per user, jobs per user, job size, maximum walltime) Only for researchers who do not have access to another allocation for the same Flux service (e.g., Standard Flux).

Resources are shared by researchers College-wide and so there may frequently be waits for jobs to start.

Department or multiple-group allocation $
(paid for by department or cooperating research groups)
Optional (up to purchaser) Best value per dollar in most cases.

Resources are shared only between researchers within the department or groups; whether a job waits to start depends on how the allocation has been sized relative to the needs of the researchers.

Private allocation $$$
(paid for by researcher)
None Both traditional (monthly) and “on demand” (metered) options are available.

Purchased resources are not shared with other researchers although jobs may have to wait to start if certain specific resource configurations are requested.

Flux Operating Environment $$$
(paid for by researcher)
None Typically used with external grants that require the purchase of computing hardware rather than services.  Researchers purchase specific hardware for their exclusive use for 4 years.  Custom hardware configurations (e.g., amount of memory per node) are possible.

The College of Literature, Science, and the Arts provides three public Flux allocations to LSA researchers at no cost.  A researcher can use one of the public allocations if they do not have access to another allocation for the same Flux service.  For example, an LSA researcher can use lsa_fluxm if they do not have access to another Larger Memory Flux allocation.

Allocation name Service Size Usage limits
lsa_flux Standard Flux 120 cores Size (maximums, per person): Normal: 24 cores / 96 GB RAM. Non-busy times: 36 cores / 144 GB RAM
Runtime: maximum 4 core*months remaining across all running jobs (per person)
lsa_fluxm Larger Memory Flux 56 cores Only for jobs that need more memory or cores per node than possible under lsa_flux.
Size: 56 cores / 1400 GB per job
Walltime: maximum 1 week per job
lsa_fluxg GPU Flux 2 GPUs Only for jobs that use a GPU.
Size: 1 GPU, 2 cores, 8 GB RAM per person
Walltime: maximum 3 days per job

Flux Hadoop and Flux Xeon Phi services are also both available to everyone in LSA as no-cost technology previews. Descriptions of these services are available on the Systems and Services Page.

Uses of these allocations include but are not limited to:

  • Running jobs for individual researchers who fit within the usage limits for the LSA allocations, particularly for individual researchers or students who do not have access to funding that could be used to purchase their own Flux allocation (for example, graduate students and undergraduates doing their own research).
  • Testing Flux to determine whether to purchase a Flux allocation.  (Note that PIs can also request a one-time two week trial allocation for this purpose by contacting hpc-support@umich.edu; trial allocations are 16 cores but are for the exclusive use of the PI’s research group).
  • Experimentation and exploration on an ad hoc basis of questions not necessarily tied to any particular research project, without needing to obtain funding and purchasing a Flux allocation first.

The LSA public allocations are neither intended to replace nor supplement other Flux allocations.  Research groups who need more computation than is provided under the public allocation usage limits, or who need their jobs to start running faster than under the public allocations, should obtain their own Flux allocation.  Shared allocations can also be obtained for use of multiple research groups across departments, centers, institutes, or other units.  Graduate students in particular may want to use Rackham Graduate Student Research Grants to purchase their own, private Flux allocation.

Usage limits

The LSA public allocations (lsa_flux, lsa_fluxm, lsa_fluxg) are not meant for use by anyone who has their own Flux allocation, nor by those who have access to another shared allocation such as a departmental allocation or an allocation for their center or institute.

LSA has imposed additional usage limits on its public allocations in order to avoid a single user (or a small group of users) monopolizing the allocations for extended periods of time to the detriment of other researchers who want to use the allocations.

LSA Flux support staff will periodically monitor jobs which are running under the LSA public allocations.  Users who have running jobs exceeding the usage limit will receive an email asking them to delete some of their running jobs.  Users who receive four or more such emails within 120 days may be temporarily or permanently removed from the allocations, at the discretion of LSA Flux support staff.

You can check your current usage of the LSA public allocations at any point in time, to determine if you are under the usage limits, by running the following command:

lsa_flux_check

Only running jobs count against the usage limits; jobs which are idle or on hold do not count against the usage limits.

lsa_flux

Users of lsa_flux can use up to 24 cores or up to 96 GB of memory across all of their running jobs at any point in time.  When there are cores and/or memory that are idle (that is, not being used by other users), then these limits are increased to 36 cores and 144 GB of memory.

Additionally, individual users are restricted to having no more than 4 core*months (2,880 core*hours) worth of jobs running at any one time.  This limit is calculated by summing the product of the remaining walltime and number of cores for all of a given users’ running jobs, as shown by the command “showq -r -u $USER").  4 core*months are sufficient to run a 4 core job for 28 days, a 8 core job for 15 days, a 16 core job for 7 days, and many other combinations.

Important note: if a single job requests more than 24 cores or 96 GB memory when lsa_flux is busy (but less than 36 cores / 144 GB), then this job will not start to run until lsa_flux becomes “not busy”, which could take days, weeks, or even longer.  To avoid this, check to see if lsa_flux is “busy” before submitting a job that is larger than 24 cores / 96 GB by running the following command:

freealloc lsa_flux

If freealloc reports that more cores and more memory are available than you will be requesting in your job, then lsa_flux is not busy and you can request more than 24 cores / 96 GB without having your job start time be delayed.

lsa_fluxm

The requested walltime for each job under lsa_fluxm must be no more than 1 week (168 hours).  This permits a single researcher to use the full lsa_fluxm allocation (all 40 cores / 1 TB RAM) in a single job, but it can also result in very long waits for jobs to start.  Researchers who need jobs to start more quickly should either purchase their own Large Memory Flux allocation, use lsa_flux (if they need 96 GB RAM or less), or use XSEDE.

Use of lsa_fluxm is restricted to jobs that require more memory or more cores per node than is possible under lsa_flux.

lsa_fluxg

Each user of lsa_fluxg can run one job at a time, using a single GPU, up to 2 cores, and up to 8 GB RAM for a maximum of three days (72 hours).  Use of lsa_fluxg is restricted to jobs that make use of a GPU.

Frequently Asked Questions

Am I able to use the LSA public allocations?

Run the command “mdiag -u your-uniqname-here” on a Flux login node.  If you see the LSA public allocation names as a part of ALIST, then you are able to run jobs under them.

[bjensen@flux-login1 ~]$ mdiag -u bjensen
evaluating user information
Name                      Priority        Flags         QDef      QOSList*        PartitionList
Target  Limits
bjensen                          0            -         flux         flux                     -   0.00       -
  GDEF=dada
  EMAILADDRESS=bjensen@umich.edu
  ADEF=default_flux  ALIST=default_flux,lsa_flux,lsa_fluxm,lsa_fluxg,bigproject_flux
[bjensen@flux-login1 ~]$

If you are a member of LSA but the public allocation names do not show up in the ALIST for your Flux account, please contact hpc-support@umich.edu and ask to be added to the allocations.

How do I use the LSA public allocations?

To use lsa_flux, specify the following credentials in your PBS script:

#PBS -A lsa_flux
#PBS -q flux
#PBS -l qos=flux

To use lsa_fluxm, specify the following credentials in your PBS script:

#PBS -A lsa_fluxm
#PBS -q fluxm
#PBS -l qos=flux

To use lsa_fluxg, specify the following credentials in your PBS script:

#PBS -A lsa_fluxg
#PBS -q fluxg
#PBS -l qos=flux

For more information about PBS scripts, see the Flux PBS web page.

How can I tell what jobs are using (or waiting to use) one of the allocations?

Run the following command on a Flux login node to see what jobs are using (or waiting to use) lsa_flux:

showq -w acct=lsa_flux

Replace “lsa_flux” above with “lsa_fluxm” or “lsa_fluxg” as desired.

How close am I to the usage limits?

You can check your current usage of the LSA public allocations at any point in time, to determine if you are under the usage limit, by running the following command:

lsa_flux_check

My job waits a long time before starting, what are my options?

Because the LSA allocations are public resources, they can often be in high demand, resulting in jobs taking hours or even days to start, even if each individual user is under the usage limis.  Options for getting jobs to start more quickly include:

  • Purchase a Flux allocation.  LSA provides cost-sharing for Flux allocations for all LSA faculty, students, and staff; regular-memory Flux allocations are thus available to members of LSA for only $6.60/core/month.  Graduate students are encouraged to apply for Rackham Graduate Student Research Grants which may be (relatively) quick and easy to obtain.
  • Ask your department, center, or institute about the possibility of a shared allocation funded by department discretionary funds, individual researcher contributions, or other sources.
  • PIs can apply for a one-time trial allocation on Flux that lasts two weeks; contact hpc-support@umich.edu for more information.
  • Use XSEDE. XSEDE has relatively large (up to 200,000 service unit) startup allocations that are fairly easy to obtain, requiring a CV and 1-page description of the research and how the research will utilize XSEDE.  Research allocation requests are reviewed four times a year and are awarded based on the results shown from the startup and previous research allocations; research allocations can be larger than startup allocations.  For more information, contact hpc-support@umich.edu.

How can I avoid exceeding the usage limits for the LSA public allocations?

The usage limits for the LSA public allocations are automatically enforced wherever possible, but you may run into problems and have to manage your usage yourself in order to stay within the usage limits if you are requesting more than 4 GB per core under lsa_flux.

A variety of options are available to manage your usage:

  • You can submit a large number of jobs at once, but use PBS job dependencies to divide the jobs into smaller groups, so that each group is under the usage limit for the allocation.  More information is available in the “How to Use PBS Job Dependencies” section of the Flux Torque PBS web page.
  • If you are using PBS job arrays, you can specify a “slot limit” to limit how many of the individual jobs in the array can run simultaneously.  Array jobs are particularly useful if you are doing parameter sweeps or otherwise running the same code many times with different input files or parameters.  To use PBS job arrays with slot limits, add a percent sign followed by the slot limit to the end of the job array specification in your PBS script.  For example, “#PBS -t 1-100%4” will submit the job 100 times, but will ensure that only four of them will be running at any point in time.  More information is available in the “How to Use Job Arrays” section of the Flux Torque PBS web page and the Adaptive Computing web page on job arrays.
  • Submit only a few jobs at a time, staying under the usage limit for concurrently running jobs.  Wait for the jobs to complete before submitting additional jobs.

Can I use the LSA public allocations instead of another allocation?

Yes.  You can send email to hpc-support@umich.edu and ask to be removed from the other allocations in order to become eligible to use one or more of the LSA public allocations.  Note that you only need to do this if the other allocation you are in is of the same service type as the LSA public allocation you want to use.  For example, if you are in a Standard Flux allocation named someproject_flux and you want to use lsa_flux, you will need to be removed from someproject_flux first.  However, you can use lsa_fluxm and lsa_fluxg without being removed from someproject_flux as long as you do not also have access to some other other Larger Memory Flux or GPU Flux allocation.

Please send any questions or requests to hpc-support@umich.edu.