Researchers in the College of Literature, Science, and the Arts have four options for using the Flux High Performance Computing cluster:
|Cost||Wait for jobs to start||Special usage limits||Notes|
|Public LSA allocations||Free
(paid for by LSA)
|••••||Yes (cores per user, jobs per user, job size, maximum walltime)||Only for researchers who do not have access to another allocation for the same Flux service (e.g., Standard Flux).
Resources are shared by researchers College-wide and so there may frequently be waits for jobs to start.
|Department or multiple-group allocation||$
(paid for by department or cooperating research groups)
|••||Optional (up to purchaser)||Best value per dollar in most cases.
Resources are shared only between researchers within the department or groups; whether a job waits to start depends on how the allocation has been sized relative to the needs of the researchers.
(paid for by researcher)
|•||None||Both traditional (monthly) and “on demand” (metered) options are available.
Purchased resources are not shared with other researchers although jobs may have to wait to start if certain specific resource configurations are requested.
|Flux Operating Environment||$$$
(paid for by researcher)
|•||None||Typically used with external grants that require the purchase of computing hardware rather than services. Researchers purchase specific hardware for their exclusive use for 4 years. Custom hardware configurations (e.g., amount of memory per node) are possible.|
The College of Literature, Science, and the Arts provides three public Flux allocations to LSA researchers at no cost. A researcher can use one of the public allocations if they do not have access to another allocation for the same Flux service. For example, an LSA researcher can use
lsa_fluxm if they do not have access to another Larger Memory Flux allocation.
|Allocation name||Service||Size||Usage limits|
||Standard Flux||120 cores||Size (maximums, per person): Normal: 24 cores / 96 GB RAM. Non-busy times: 36 cores / 144 GB RAM
Runtime: maximum 4 core*months remaining across all running jobs (per person)
||Larger Memory Flux||56 cores||Only for jobs that need more memory or cores per node than possible under
Size: 56 cores / 1400 GB per job
Walltime: maximum 1 week per job
||GPU Flux||2 GPUs||Only for jobs that use a GPU.
Size: 1 GPU, 2 cores, 8 GB RAM per person
Walltime: maximum 3 days per job
Flux Hadoop is also available to everyone in LSA as no-cost technology previews. More information is available on the Flux Hadoop page.
Uses of these allocations include but are not limited to:
- Running jobs for individual researchers who fit within the usage limits for the LSA allocations, particularly for individual researchers or students who do not have access to funding that could be used to purchase their own Flux allocation (for example, graduate students and undergraduates doing their own research).
- Testing Flux to determine whether to purchase a Flux allocation. (Note that PIs can also request a one-time two week trial allocation for this purpose by contacting email@example.com; trial allocations are 16 cores but are for the exclusive use of the PI’s research group).
- Experimentation and exploration on an ad hoc basis of questions not necessarily tied to any particular research project, without needing to obtain funding and purchasing a Flux allocation first.
The LSA public allocations are neither intended to replace nor supplement other Flux allocations. Research groups who need more computation than is provided under the public allocation usage limits, or who need their jobs to start running faster than under the public allocations, should obtain their own Flux allocation. Shared allocations can also be obtained for use of multiple research groups across departments, centers, institutes, or other units. Graduate students in particular may want to use Rackham Graduate Student Research Grants to purchase their own, private Flux allocation.
The LSA public allocations (
, lsa_fluxm, lsa_fluxg) are not meant for use by anyone who has their own Flux allocation, nor by those who have access to another shared allocation such as a departmental allocation or an allocation for their center or institute.
LSA has imposed additional usage limits on its public allocations in order to avoid a single user (or a small group of users) monopolizing the allocations for extended periods of time to the detriment of other researchers who want to use the allocations.
LSA Flux support staff will periodically monitor jobs which are running under the LSA public allocations. Users who have running jobs exceeding the usage limit will receive an email asking them to delete some of their running jobs. Users who receive four or more such emails within 120 days may be temporarily or permanently removed from the allocations, at the discretion of LSA Flux support staff.
You can check your current usage of the LSA public allocations at any point in time, to determine if you are under the usage limits, by running the following command:
Only running jobs count against the usage limits; jobs which are idle or on hold do not count against the usage limits.
lsa_flux can use up to 24 cores or up to 96 GB of memory across all of their running jobs at any point in time. When there are cores and/or memory that are idle (that is, not being used by other users), then these limits are increased to 36 cores and 144 GB of memory.
Additionally, individual users are restricted to having no more than 4 core*months (2,880 core*hours) worth of jobs running at any one time. This limit is calculated by summing the product of the remaining walltime and number of cores for all of a given users’ running jobs, as shown by the command “
showq -r -u $USER"). 4 core*months are sufficient to run a 4 core job for 28 days, a 8 core job for 15 days, a 16 core job for 7 days, and many other combinations.
Important note: if a single job requests more than 24 cores or 96 GB memory when
lsa_flux is busy (but less than 36 cores / 144 GB), then this job will not start to run until
lsa_flux becomes “not busy”, which could take days, weeks, or even longer. To avoid this, check to see if
lsa_flux is “busy” before submitting a job that is larger than 24 cores / 96 GB by running the following command:
freealloc reports that more cores and more memory are available than you will be requesting in your job, then
lsa_flux is not busy and you can request more than 24 cores / 96 GB without having your job start time be delayed.
The requested walltime for each job under
lsa_fluxm must be no more than 1 week (168 hours). This permits a single researcher to use the full
lsa_fluxm allocation (all 40 cores / 1 TB RAM) in a single job, but it can also result in very long waits for jobs to start. Researchers who need jobs to start more quickly should either purchase their own Large Memory Flux allocation, use
lsa_flux (if they need 96 GB RAM or less), or use XSEDE.
lsa_fluxm is restricted to jobs that require more memory or more cores per node than is possible under
Each user of
lsa_fluxg can run one job at a time, using a single GPU, up to 2 cores, and up to 8 GB RAM for a maximum of three days (72 hours). Use of
lsa_fluxg is restricted to jobs that make use of a GPU.
Run the command “
mdiag -u your-uniqname-here” on a Flux login node. If you see the LSA public allocation names as a part of
ALIST, then you are able to run jobs under them.
[bjensen@flux-login1 ~]$ mdiag -u bjensen
evaluating user information Name Priority Flags QDef QOSList* PartitionList Target Limits bjensen 0 - flux flux - 0.00 - GDEF=dada EMAILADDRESSfirstname.lastname@example.org ADEF=default_flux ALIST=default_flux,lsa_flux,lsa_fluxm,lsa_fluxg,bigproject_flux [bjensen@flux-login1 ~]$
If you are a member of LSA but the public allocation names do not show up in the
ALIST for your Flux account, please contact email@example.com and ask to be added to the allocations.
lsa_flux, specify the following credentials in your PBS script:
#PBS -A lsa_flux #PBS -q flux #PBS -l qos=flux
lsa_fluxm, specify the following credentials in your PBS script:
#PBS -A lsa_fluxm #PBS -q fluxm #PBS -l qos=flux
lsa_fluxg, specify the following credentials in your PBS script:
#PBS -A lsa_fluxg #PBS -q fluxg #PBS -l qos=flux
For more information about PBS scripts, see the Flux PBS web page.
Run the following command on a Flux login node to see what jobs are using (or waiting to use)
showq -w acct=lsa_flux
lsa_flux” above with “
lsa_fluxm” or “
lsa_fluxg” as desired.
You can check your current usage of the LSA public allocations at any point in time, to determine if you are under the usage limit, by running the following command:
Because the LSA allocations are public resources, they can often be in high demand, resulting in jobs taking hours or even days to start, even if each individual user is under the usage limis. Options for getting jobs to start more quickly include:
- Purchase a Flux allocation. LSA provides cost-sharing for Flux allocations for all LSA faculty, students, and staff; regular-memory Flux allocations are thus available to members of LSA for only $6.60/core/month. Graduate students are encouraged to apply for Rackham Graduate Student Research Grants which may be (relatively) quick and easy to obtain.
- Ask your department, center, or institute about the possibility of a shared allocation funded by department discretionary funds, individual researcher contributions, or other sources.
- PIs can apply for a one-time trial allocation on Flux that lasts two weeks; contact firstname.lastname@example.org for more information.
- Use XSEDE. XSEDE has relatively large (up to 200,000 service unit) startup allocations that are fairly easy to obtain, requiring a CV and 1-page description of the research and how the research will utilize XSEDE. Research allocation requests are reviewed four times a year and are awarded based on the results shown from the startup and previous research allocations; research allocations can be larger than startup allocations. For more information, contact email@example.com.
The usage limits for the LSA public allocations are automatically enforced wherever possible, but you may run into problems and have to manage your usage yourself in order to stay within the usage limits if you are requesting more than 4 GB per core under
A variety of options are available to manage your usage:
- You can submit a large number of jobs at once, but use PBS job dependencies to divide the jobs into smaller groups, so that each group is under the usage limit for the allocation. More information is available in the “How to Use PBS Job Dependencies” section of the Flux Torque PBS web page.
- If you are using PBS job arrays, you can specify a “slot limit” to limit how many of the individual jobs in the array can run simultaneously. Array jobs are particularly useful if you are doing parameter sweeps or otherwise running the same code many times with different input files or parameters. To use PBS job arrays with slot limits, add a percent sign followed by the slot limit to the end of the job array specification in your PBS script. For example, “#PBS -t 1-100%4” will submit the job 100 times, but will ensure that only four of them will be running at any point in time. More information is available in the “How to Use Job Arrays” section of the Flux Torque PBS web page and the Adaptive Computing web page on job arrays.
- Submit only a few jobs at a time, staying under the usage limit for concurrently running jobs. Wait for the jobs to complete before submitting additional jobs.
Yes. You can send email to firstname.lastname@example.org and ask to be removed from the other allocations in order to become eligible to use one or more of the LSA public allocations. Note that you only need to do this if the other allocation you are in is of the same service type as the LSA public allocation you want to use. For example, if you are in a Standard Flux allocation named
someproject_flux and you want to use
lsa_flux, you will need to be removed from
someproject_flux first. However, you can use
lsa_fluxg without being removed from
someproject_flux as long as you do not also have access to some other other Larger Memory Flux or GPU Flux allocation.
Please send any questions or requests to email@example.com.