Explore ARCExplore ARC

Locker User Guide

By | Uncategorized

Using Locker

Locker is a cost optimized, high-capacity, large file storage service for research data.  Locker provides high performance for large files, and allows investigators across U-M to connect their data to computing resources necessary for their research, including U-M’s HPC clusters

Features

  • Locker is available to researchers from any academic unit.
  • Turbo can be accessed from Mac OSX (Mavericks and Yosemite), Windows 7+, and Linux computers using NFSv3.
  • Locker space can be purchased in 1TB increments
  • Locker uses the Globus File Transfer Service
  • Locker does not yet provide the option of secure storage for regulated and/or sensitive data, but it is scheduled on the roadmap.
  • Turbo allows optional daily snapshots and backups of stored research data

Getting Started

To Request or Modify a Locker Storage Volume

To request a Locker CIFS or NFS volume, click on one of the following links:

Globus Server Endpoint

Locker can be made available on existing ARC-TS Globus servers to provide high performance transfers, data sharing and access to Locker from off campus.  To access Locker via Globus, request your Locker volume be added to Globus.

ARC-TS Compute System Support

Locker can be accessed from any ARC-TS compute service that supports the same data classifications as your export.  To have your Locker export added to an ARC-TS resource contact us with the export name and system name. Locker will be available on all login and data transfer nodes at a minimum.

Mounts will be located at
/nfs/locker/<export-name>/

Research groups may also request system group creation to control group access to Locker volumes.

Optional Features

Replication – (Recommended) Optional second copy of all data in a different geographic location.

Snapshots – (Highly Recommended) Tracking of how data in a volume changes over time allowing users to recover deleted, modified, or otherwise damaged data.

Access snapshots at:
<mount-location>/.snapshots/<date>

Using Locker

Mounting on Windows CIFS
Instructions provided when provisioned

Mounting on Linux NFS
Instructions provided when provisioned

Mounting on Apple OSX
Instructions provided when provisioned

Group Access Controls

Linux Set GID

Using Set GID (SGID) on a directory will force all files created in that directory to inherit the same group permissions as the parent directory even if the user creating them primary or effective group is different.  The benefit of this combined with the creation of a group on shared systems is that all files will be created owned and accessible (by default) to members of that group

#list available group
groups
chgrp <groupname> folder
chmod g+s folder

Windows AD Groups

Contact hpc-support@umich.edu

Policies

Small File Limitation

Locker’s target audience are those research projects with massive data volumes in large files. Because of this design each 1 TByte of Locker capacity provides only 1 Million files.  Eg. 10 TByte provides 10 Million files. This works out to 1 Mbyte per file average size.

Sensitive Data — ePHI/HIPAA/ITAR/EAR/CUI

Locker is not currently supported for PHI or other data types.  It is scheduled to be reviewed for support at a later date.

System Abuse

Abuse of Locker intentionally or not may result in performance or access being limited to preserve performance and access for other users.  In the event this happens staff will be in contact with the users to engineer solutions.

Frequently Asked Questions

Q: How do I Check Locker Space and File Usage?
A: Linux or OSX Terminal use:

    Space: df -h <mount-path>
    Files: df -h -i <mount-path>

Q: Can Locker be Mounted on All ARC-TS Cluster Compute Nodes?
A: Currently we do not allow Locker to be mounted by very large numbers of clients.  This could change in the future so let us know if this would help you. Currently we recommend using Globus to stage data between cluster scratch and Locker between runs.  Globus provides a CLI so you can script.

Q: Can I Simultaneously Access Locker from Linux and Windows?
A: Currently Locker supports NFS (Linux) OR CIFS (Windows), Apple OSX supports both. This is known as Multi-Protocol or simultaneous NFS and CIFS access.  Because Linux and Windows have different permissions schemes this is complex to manage. We don’t currently support his on Locker but do support it on Turbo.  At this time, we recommend using Globus to ease data movement between Locker and systems that cannot mount it natively.

Q: Why can’t we use Locker as general purpose storage?
A: To maintain performance, encryption, professional support, and cost,  Locker’s design does not make it well suited for general purpose primary storage. For this see the Turbo and MiStorage services.

Q: I deleted data but Locker still reports full?
A: Likely your export has snapshots enabled.  Snapshots store changes to Locker exports over time thus deleted data is just ‘moved’ to a snapshot.  Eventually snapshots will age out and free space on their own. Snapshot consumption does count against volume space used.  To delete or disable snapshots to free space early contact support.

Q: I have free space but Locker reports full?
A:  Likely you are at your file quota and are running average file size smaller than 1 MByte. This use case is outside the support of Locker’s design and the small files should move to another storage service.

Q: I don’t see my .snapshots folder?
A: Your volume might not have snapshots enabled.  If it does it is a hidden file on Linux and OSX terminals use ls -a to view all files including hidden files.  To show hidden files in OSX and Windows user interfaces varies by version and can be found in their documentation and online.

Q: My volume shows 2x the size I requested!
A: The system Locker is built on tracks all copies of data in its file system.  if a volume requests replication (2 copies of all data) total space will represent the primary and replica copy in total.  Thus 1TB of new data will consume 2TB of Locker space.

Advanced Topics

System Configuration

 Locker consists of two DDN GS14KX-E Gridscaler clusters running IBM Spectrum Scale.  Each cluster is located in different data centers with dedicated fiber for data replication between the two sites.  Each GS14KX-E cluster can hold 1680 hard drives for capacity of 10PB usable using 8TByte drives. Each hard drive is 7200RPM self encrypting and can be added to the system online. If at capacity additional GS14KX-E can be clustered to add additional performance and capacity.

By not including dedicated metadata or flash/NVMe storage we are able to keep the cost of Locker lower than other solutions such at Turbo. Thus Locker will not perform well with small IO operations and is built for capacity.  Thus why we offer both services. The GS14KX-E does have support for adding NVMe/Flash for meta-data and tiering at a later date should the price of such devices become more reasonable.

Locker is directly connected to the Data Den archive via dedicated data movers and to the ARC-TS research network by two IBM Cluster Export Services (CES) nodes or Protocol Nodes.  Each CES node is connected with 100Gbps network connections and work in a active-active high availability configuration. Outside the ARC-TS network performance is limited to 40Gbps from the campus backbone.

Citing and Grants

Order Service

Locker is now available on a pilot basis. Potential pilot users should contact hpc-support@umich.edu.

The rate for Locker will be $40.09 per terabyte per year.

Contact hpc-support@umich.edu with any questions.

 

To order Locker, the following information is required:

  • Amount of storage needed (1TB increments 10TB Minimum)
  • MCommunity Group name (group members will receive service-related notification, and can request service changes)
  • Shortcode for billing
  • NFS
    • Hostnames or IP addresses for each permitted user on the wired U-M network. (If forward and reverse records exist in DNS, please use the fully qualified hostname. If the records do not exist, provide the IP address.)
    • Numeric user ID of person who will administer the top level Locker directory and grant access to other users
  • CIFS
    • UMROOT AD Group Name
  • Specify if regulated or sensitive data will be use
  • Specify if your Locker account should be accessible on the Flux HPC cluster

Fill out this form to order Locker CIFS.

Fill out this form to order Locker NFS.

Related Event

June 20 @ 3:00 pm - 4:30 pm

R package demo: gganimate and patchwork

This brief workshop will demonstrate the capabilities of two recent R packages, gganimate and patchwork. One package allows the data explorer to provide some lively enhancement to an otherwise static…

June 28 @ 1:00 pm - 4:00 pm

Modeling spatially correlated data

This workshop will cover introductory concepts, tools, and techniques to model spatially referenced data observed over a regular or irregular grid. We will cover models such as spatial autoregressive that…

August 5 @ 1:00 pm - 3:30 pm

Introduction to Geostatistics

Geostatistics models continuous variation over geographical space. It is widely used for spatial interpolation. We will use R and ArcGIS to explore and develop an understanding of variogram and kriging…

August 6 @ 1:00 pm - 3:00 pm

Android Application Development: An Overview

Mobile app development is one of the biggest new industries of the last decade. As of 2017, Google and Apple (the creators of Android and iOS, respectively) confirmed a combined…

Data Den User Guide

By | Uncategorized

Using Data Den

Data Den is a service for preserving electronic data generated from research activities.  It is a low cost, highly durable storage system and is the largest storage system operated by ARC-TS.

Features

  • Turbo is available to researchers from any academic unit.
  • Turbo can be accessed from Mac OSX (Mavericks and Yosemite), Windows 7+, and Linux computers using NFSv3 and NFSv4
  • Turbo space can be purchased in 1TB increments
  • Turbo uses the Globus File Transfer Service
  • Turbo provides the option of secure storage for regulated and/or sensitive data
  • Turbo allows optional daily snapshots and backups of stored research data

Establish Service

See our Ordering Service page for details on how to request a Turbo volume.

Turbo costs $19.20 per terabyte per month, or $230.40 per terabyte per year, for replicated data. The cost for unreplicated data is $9.60 per terabyte per month, or $115.20 per terabyte per year. A U-M shortcode is required to order.

Permissions and Directories

When a volume is set up, the administrator associated with the account (i.e., specified by the Numeric User ID) will have full control to create directories and set permissions at the top level. This can be done with the standard Unix permission commands.

For assistance with permissions, email hpc-support@umich.edu.

File Storage and Transfer

Accessing Turbo via CIFS

Turbo Research Storage CIFS volumes can be accessed at the following path:

\\VOLUME-NAME.turbo.storage.umich.edu\VOLUME-NAME

 

For multi-protocol volumes, the UNC path is:

\\VOLUME-NAME-win.turbo.storage.umich.edu\VOLUME-NAME 
(e.g. \\flux-support-win.turbo.storage.umich.edu\flux-support)

 

To connect using Windows or Mac clients:

Windows
1. Double click on the “My Computer” icon on your desktop.
2. Click on the “Tools” menu and then select “Map Network Drive…”
3. In the first drop down menu, select the drive letter for the drive mapping. Choose a letter that is not currently in use, and “after” the letter H, in order to avoid any conflicts with other drives. In the second box, type the path to the file server as provided above.
4. Make sure to place a check the “Reconnect at login” box. This option will remember your drive mapping next time you log in to your computer.
5. Click on the “Connect using a different user name” link underneath the “Reconnect at login” box and use the following settings:
• User Name: UMROOT\uniqname (substituting your actual uniqname for the word “uniqname”)
• Password: your UMICH Kerberos ( aka Level-1) password
6. Click the “OK” button.
7. Click the “Finish” button.

Macintosh
1. In the Finder, click on the “Go” menu and select “Connect to Server…”
2. In the “Server Address” field, type the server name and path: smb://VOLUME_NAME.turbo.storage.umich.edu/VOLUME-NAME
3. Click the “Connect” button.
4. When prompted, enter your uniqname and UMICH Kerberos ( aka Level-1) password.
5. Click the “Connect” button.

Globus

Globus, previously known as Globus Online, is a powerful and easy to use file transfer service that is useful for transferring files virtually anywhere. It is available for Turbo volumes that do not contain protected health information (ePHI).

For more information, please see our Globus File Transfer page.

Flux users can use the Globus service or the Flux transfer hosts.

Lost File Recovery

Turbo volumes that are configured with snapshots will save previous versions of files.  Only files which have been snap-shotted overnight are recoverable.  Files that are lost on the same day they were created may not be recoverable.

From Linux clients: To recover files lost from your directory, navigate to the .snapshot directory at the root of your share.

$ cd /nfs/turbo/flux-admin/.snapshot
$ ls -1
daily_2015-08-24-_23-30
daily_2015-08-25-_23-30
daily_2015-08-26-_23-30
daily_2015-08-27-_23-30
daily_2015-08-28-_23-30
daily_2015-08-29-_23-30
daily_2015-08-30-_23-30

You can navigate to the snapshot directories and copy files back to your file share.
Note: The .snapshot directory may not be visible until you try enter it with cd.

From Windows clients, you can recover lost files from snapshots natively:

  • Open the directory that the deleted file was held in.
  • Right click in the directory that the file or folder was stored and select “Properties”.
  • Click on the “Previous Versions” tab when the Properties window opens.
  • A list of snapshots will be displayed.
  • Select the snapshot from which you wish to restore data.
  • In the new window, locate the file(s) you wish to restore.
  • Simply drag the file(s) or folder to their correct locations

Policies

Maintaining the overall stability of the system is paramount to us. System availability is based on our best efforts. We are staffed to provide support during normal business hours. We try very hard to provide support as broadly as possible, but cannot guarantee support on a 24 hour per day basis. Additionally, we perform system maintenance on a periodic basis, driven by the availability of software updates, staffing availability, and input from the user community. We do our best to schedule around your needs, but there will be times when the system is unavailable. For scheduled outages, we will announce them at least one month in advance on the ARC-TS home page; for unscheduled outages we will announce them as quickly as we can with as much detail as we have on that same page. You can also track ARC-TS at Twitter name ARC-TS.

Frequently Asked Questions

How can I start using Turbo?
Visit our Ordering Turbo page for information on how to order.

How can I mount Turbo on Flux?
Be sure to check the “Available to Flux” box when ordering Turbo. To change the configuration of an existing volume, visit the ITS provisioning site.

Can I store sensitive data on Turbo?
Yes, Turbo is approved for certain types of sensitive data. For more information, see the Data Security section of the Turbo Specifications page.

How can I order Turbo if I’m an LSA faculty member, LSA lecturer, or LSA GSRA/GSRI?
Please request Turbo via the LSA Research Storage Portal.

Data Den is currently available as a free pilot service.

ORDER SERVICE

Contact hpc-support@umich.edu

  • Amount of storage needed in 1TB increments
  • MCommunity Group name (group members will receive service-related notification, and can request service changes)
  • Numeric user ID of person who will administer the top level directory and grant access to other users

Contact hpc-support@umich.edu with any questions.

Related Event

June 20 @ 3:00 pm - 4:30 pm

R package demo: gganimate and patchwork

This brief workshop will demonstrate the capabilities of two recent R packages, gganimate and patchwork. One package allows the data explorer to provide some lively enhancement to an otherwise static…

June 28 @ 1:00 pm - 4:00 pm

Modeling spatially correlated data

This workshop will cover introductory concepts, tools, and techniques to model spatially referenced data observed over a regular or irregular grid. We will cover models such as spatial autoregressive that…

August 5 @ 1:00 pm - 3:30 pm

Introduction to Geostatistics

Geostatistics models continuous variation over geographical space. It is widely used for spatial interpolation. We will use R and ArcGIS to explore and develop an understanding of variogram and kriging…

August 6 @ 1:00 pm - 3:00 pm

Android Application Development: An Overview

Mobile app development is one of the biggest new industries of the last decade. As of 2017, Google and Apple (the creators of Android and iOS, respectively) confirmed a combined…

Close up of Xeon Phi processors

Data Den Research Archive

By | Systems and Services

Data Den Production Tape LibraryData Den is a service for preserving electronic data generated from research activities.  It is a low cost, highly durable storage system and is the largest storage system operated by ARC-TS.

Data Den is a disk-caching, tape-backed archive optimized for data that is not regularly accessed for extended periods of time (weeks to years). Data Den does not replace active storage services like Turbo and Locker, for data that is accessed regularly and frequently for research use.

Data Den can be part of a well-organized data management plan providing international data sharing, encryption, and data durability.

Getting Started

Requesting or Modifying a Data Den Storage Volume

Contact hpc-support@umich.edu

Please include the following:

  • Amount of storage needed in 1TB increments
  • MCommunity Group name (group members will receive service-related notification, and can request service changes)
  • Numeric user ID of person who will administer the top level directory and grant access to other users

Globus Server Endpoint

Data Den supports the use of Globus servers to provide high performance transfers, data sharing and access to Data Den from off campus.  To access Data Den via Globus, request your volume be added to Globus.

Bundling Files

Because of the design of Data Den, often projects will need to be bundled to form larger single file archives.  The most common tool do this with is tar. Tar can optionally compress the data also but can take much longer.

The following command will bundle all file files in directory, store it in the file bundle.tar.bz2, and compress it with bzip2.  It will also create a small text file bundle.index.txt that can be stored with it to quickly reference what files are in the bundle.

tar -cvjf bundle.tar.bz2 driectory | tee bundle.index.txt

To extract the bundle:

tar -xvjf bundle.tar.bz2

Optionally omit -j to save time compressing, and omit -v to not print the bundle progress as it runs.

Compressing an archive can be accelerated on multi-core systems using pigz and lbzip2.  The following will work on all ARC-TS systems:

tar --use-compress-program=lbzip2 -cvf bundle.tar.bz2 brockp | tee bundle.index.txt

To extract the bundle:

tar --use-compress-program=lbzip2 -xvf bundle.tar.bz2

Policies

Small File Limitation and Optimal Size

Data Den’s underlying technology does not work well with small files. Because of this design, each 1TB of Data Den capacity provides only 10,000 files, and only migrates files 100 MByte or larger.  The optimal file size range from 10 – 200 GBytes.

Maximum File Size

    The maximum file size is 8 TByte, but files should optimally not be larger than 1 TByte.   Larger archives can be split before uploading to Data Den with the split -b 200G filename command.

Sensitive Data — ePHI/HIPAA/ITAR/EAR/CUI

Data Den does not currently support PHI or other data types.  It is scheduled to be reviewed for support at a later date.

System Abuse

Abuse, generally by excessive recalls of data better suited for active storage, of Data Den intentionally or not may result in performance or access being limited to preserve performance and access for other users.  In the event this happens, staff will be in contact with users to engineer more appropriate solutions.

Frequently Asked Questions

Q: How do I check Data Den space and file usage?
A: Contact hpc-support@umich.edu

Q: How can I use Data Den in my data management plan?
A: Data Den can provide low cost storage and data sharing with Globus to allow access to select data on Data Den anywhere in the world on demand.

Q: Why can’t Data Den be accessible directly from hosts or my personal machine?
A: Data Den can appear as a local disk, but is not currently allowed as it presents many opportunities to disrupt use of the services by other users.  This will be reevaluated in the future.

Q: Can Data Den encryption be used with my restricted data?
A: Always refer to the Sensitive Data Guide as the definitive source for allowed data types.   Data Den is slated to support sensitive data of some types but is not currently approved. If your data type is not in the Guide, contact us to review.

Q: Why can’t we use Data Den as general purpose storage?
A: Data Den’s design is inherently linear and cannot respond to user requests quickly, the way general purpose storage does.

Q: I need to retrieve a lot (more than 50) of files from Data Den. Can you help me?
A: Yes, contact hpc-support@umich.edu with the list of files you want to recall, and we will optimize the recall from tape, speeding access.

Q: I use Locker heavily and want to automate moving data as it ages to my Data Den space. Can you help me?
A: Yes. Data Den and Locker are inherently connected.  Any Locker volume can potentially have a custom data migration policy automating data flow to Data Den.  Contact hpc-support@umich.edu to set up a custom migration policy.

Advanced Topics

System Configuration

Locker Data Den Architecture - V3Data Den consists of two IBM Spectrum Archive LTFSEE clusters.  Each cluster is connected to a corresponding Locker cluster. The LTFSEE cluster consists of an IBM TS4500 tape library with 20PB of uncompressed capacity and 9 Linear Tape Open (LTO) 8 drives.  Each drive provides optimally 360 MByte/s uncompressed performance. Each library can also add additional frames and drives for new tape technology, increasing the capacity and performance.

Each cluster has a copy of all data separated by miles protecting it against fire and other disasters.  Each cluster has 2 data movers. The purpose of these data movers is to apply policies to specific exports in Locker.   As data are written to these exports the data mover moves the largest and oldest files to the tape in the TS4500 library encrypting it in the process.  The data mover leaves a 1 MByte stub preserving the file layouts and allowing the first 1 MByte of data to be read without involving the tape. When a file that has been migrated is read the data mover instructs the TS4500 to recall the data from the tape and place it back on the faster disk of Locker transparently to the user.

Thus Data Den is able to ingest new data quickly to disk, and over time send it to tape as it is ages and goes unused.  If data is recalled it remains on disk until it again is migrated due to policy.

Citing and Grants

TBD

Data Den is currently available as a free pilot service.

Order Service

Contact hpc-support@umich.edu

  • Amount of storage needed in 1TB increments
  • MCommunity Group name (group members will receive service-related notification, and can request service changes)
  • Numeric user ID of person who will administer the top level directory and grant access to other users

Contact hpc-support@umich.edu with any questions.

Close up of Flux

ARC-TS Storage

By | Systems and Services

Several levels of data storage are provided with an allocation of ARC-TS HPC services, varying by capacity, I/O rate, and longevity of storage.

Storage type / location Description Best Used For Access and Policy Details
/tmp
Local directory unique to each node. Not shared. High-speed read and writes for small files (less than 10GB)
/home
Shared across the entire cluster. Only for use with currently running jobs. Quota of 80GB per user. Currently running jobs.
/scratch
Lustre-based parallel file system shared across all Flux nodes. Large reads and writes of very large data files. Checkpoint/restart files and large data sets that are frequently read from or written to are common examples. Also, code that uses MPI. ARC-TS /scratch information
AFS
AFS is a filesystem maintained and backed up by ITS. It is the only storage option available for Flux that is regularly backed up, and is therefore the most secure choice. It is only available on Flux login nodes and can provide up to 10GB of backed-up storage. Storing important files. NOT available for running jobs on compute nodes. ARC-TS AFS information
Turbo
Turbo Research Storage is a high-speed storage service providing NFSv3 and NFSv4 access. It is available only for research data. Data stored in Turbo can be easily shared with collaborators when used in combination with the Globus file transfer service. Storing research data. ARC-TS Turbo page
Long-term storage
Users who need long-term storage can purchase it from ITS MiStorage. Once established, it can be mounted on the Flux login and compute nodes. Long-term storage. ITS MiStorage page

Researchers in the Medical School and College of Literature, Science, and the Arts can take advantage of free or subsidized storage options through their respective academic units.

Locker Large-File Storage

By | Systems and Services

Locker is a cost optimized, high-capacity, large file storage service for research data.  Locker provides high performance for large files, and allows investigators across U-M to connect their data to computing resources necessary for their research, including U-M’s HPC clusters.

Locker can only be used for research data. It is tuned for large files (1MB or greater) but is capable of handling small files such as documents, spreadsheets, etc. Locker can be used in combination with the Globus data management sharing system for hosting and sharing data with external collaborators and institutes.

Locker is now available on a pilot basis. Potential pilot users should contact hpc-support@umich.edu.

Getting Started

Requesting or Modifying a Locker Storage Volume

Globus Server Endpoint

Locker can be made available on existing ARC-TS Globus servers to provide high performance transfers, data sharing and access to Locker from off campus.  To access Locker via Globus, request your Locker volume be added to Globus.

ARC-TS Compute System Support

Locker can be accessed from any ARC-TS compute service that supports the same data classifications as your export.  To have your Locker export added to an ARC-TS resource contact us with the export name and system name. Locker will be available on all login and data transfer nodes at a minimum.

Mounts will be located at
/nfs/locker/<export-name>/

Research groups may also request system group creation to control group access to Locker volumes.

Optional Features

Replication – (Recommended) Optional second copy of all data in a different geographic location.

Snapshots – (Highly Recommended) Tracking of how data in a volume changes over time allowing users to recover deleted, modified, or otherwise damaged data.

Access snapshots at:
<mount-location>/.snapshots/<date>

Using Locker

Mounting on Windows CIFS
Instructions provided when provisioned

Mounting on Linux NFS
Instructions provided when provisioned

Mounting on Apple OSX
Instructions provided when provisioned

Group Access Controls

Linux Set GID

Using Set GID (SGID) on a directory will force all files created in that directory to inherit the same group permissions as the parent directory even if the user creating them primary or effective group is different.  The benefit of this combined with the creation of a group on shared systems is that all files will be created owned and accessible (by default) to members of that group

#list available group
groups
chgrp <groupname> folder
chmod g+s folder

Windows AD Groups

Contact hpc-support@umich.edu

Policies

Small File Limitation

Locker’s target audience are those research projects with massive data volumes in large files. Because of this design each 1 TByte of Locker capacity provides only 1 Million files.  Eg. 10 TByte provides 10 Million files. This works out to 1 Mbyte per file average size.

Sensitive Data — ePHI/HIPAA/ITAR/EAR/CUI

Locker is not currently supported for PHI or other data types.  It is scheduled to be reviewed for support at a later date.

System Abuse

Abuse of Locker intentionally or not may result in performance or access being limited to preserve performance and access for other users.  In the event this happens staff will be in contact with the users to engineer solutions.

Frequently Asked Questions

Q: How do I Check Locker Space and File Usage?
A: Linux or OSX Terminal use:

    Space: df -h <mount-path>
    Files: df -h -i <mount-path>

Q: Can Locker be Mounted on All ARC-TS Cluster Compute Nodes?
A: Currently we do not allow Locker to be mounted by very large numbers of clients.  This could change in the future so let us know if this would help you. Currently we recommend using Globus to stage data between cluster scratch and Locker between runs.  Globus provides a CLI so you can script.

Q: Can I Simultaneously Access Locker from Linux and Windows?
A: Currently Locker supports NFS (Linux) OR CIFS (Windows), Apple OSX supports both. This is known as Multi-Protocol or simultaneous NFS and CIFS access.  Because Linux and Windows have different permissions schemes this is complex to manage. We don’t currently support his on Locker but do support it on Turbo.  To work around this we recommend using Globus to ease data movement between Locker and systems that cannot mount it natively.

Q: Why can’t we use Locker as general purpose storage?
A: To maintain performance, encryption, professional support, and a low cost,  Locker’s design does not make it well suited for general purpose primary storage. For this see the Turbo and MiStorage services.

Q: I deleted data but Locker still reports full?
A: Likely your export has snapshots enabled.  Snapshots store changes to Locker exports over time thus deleted data is just ‘moved’ to a snapshot.  Eventually snapshots will age out and free space on their own. Snapshot consumption does count against volume space used.  To delete or disable snapshots to free space early contact support.

Q: I have free space but Locker reports full?
A:  Likely you are at your file quota and are running average file size smaller than 1 MByte. This use case is outside the support of Locker’s design and the small files should move to another storage service.

Q: I don’t see my .snapshots folder?
A: Your volume might not have snapshots enabled.  If it does it is a hidden file on Linux and OSX terminals use ls -a to view all files including hidden files.  To show hidden files in OSX and Windows user interfaces varies by version and can be found in their documentation and online.

Q: My volume shows 2x the size I requested!
A: The system Locker is built on tracks all copies of data in its file system.  if a volume requests replication (2 copies of all data) total space will represent the primary and replica copy in total.  Thus 1TB of new data will consume 2TB of Locker space.

Advanced Topics

System Configuration

 Locker consists of two DDN GS14KX-E Gridscaler clusters running IBM Spectrum Scale.  Each cluster is located in different data centers with dedicated fiber for data replication between the two sites.  Each GS14KX-E cluster can hold 1680 hard drives for capacity of 10PB usable using 8TByte drives. Each hard drive is 7200RPM self encrypting and can be added to the system online. If at capacity additional GS14KX-E can be clustered to add additional performance and capacity.

By not including dedicated metadata or flash/NVMe storage we are able to keep the cost of Locker lower than other solutions such at Turbo. Thus Locker will not perform well with small IO operations and is built for capacity.  Thus why we offer both services. The GS14KX-E does have support for adding NVMe/Flash for meta-data and tiering at a later date should the price of such devices become more reasonable.

Locker is directly connected to the Data Den archive via dedicated data movers and to the ARC-TS research network by two IBM Cluster Export Services (CES) nodes or Protocol Nodes.  Each CES node is connected with 100Gbps network connections and work in a active-active high availability configuration. Outside the ARC-TS network performance is limited to 40Gbps from the campus backbone.

Citing and Grants

Order Service

Locker is now available on a pilot basis. Potential pilot users should contact hpc-support@umich.edu.

The rate for Locker will be $40.09 per terabyte per year.

Contact hpc-support@umich.edu with any questions.

 

To order Locker, the following information is required:

  • Amount of storage needed (1TB increments 10TB Minimum)
  • MCommunity Group name (group members will receive service-related notification, and can request service changes)
  • Shortcode for billing
  • NFS
    • Hostnames or IP addresses for each permitted user on the wired U-M network. (If forward and reverse records exist in DNS, please use the fully qualified hostname. If the records do not exist, provide the IP address.)
    • Numeric user ID of person who will administer the top level Locker directory and grant access to other users
  • CIFS
    • UMROOT AD Group Name
  • Specify if regulated or sensitive data will be use
  • Specify if your Locker account should be accessible on the Flux HPC cluster

Fill out this form to order Locker CIFS.

Fill out this form to order Locker NFS.

Close up of ethernet cables on Flux

OSiRIS

By | Systems and Services

Open Storage Research Infrastructure (OSiRIS) is a collaboration between U-M, Wayne State University, Michigan State University and Indiana University to build a distributed, multi-institutional storage infrastructure that will allow researchers at any of our three campuses to read, write, manage and share large amounts of data directly from their computing facility locations on each campus.

By providing a single data infrastructure that supports computational access on the data “in-place,” OSiRIS meets many of the data-intensive and collaboration challenges faced by our research communities and enable these communities to easily undertake research collaborations beyond the border of their own Universities.

OSiRIS will use commercial off-the-shelf hardware coupled with CEPH software to build a high performance software-defined storage system. The system is composed of a number of building-block components: storage head- nodes with SSDs + 60-disk SAS shelf, RHEV cluster (2-hosts), Globus Connect servers, a PERFSONAR network monitoring node and reliable, OpenFlow capable switches.

OSiRIS will deploy a software-defined storage (e.g., commodity hardware with storage logic abstracted into a software layer) service for our universities using the CEPH Storage Cluster as the primary means of organizing the storage hardware required.

OSiRIS is funded by a grant from the National Science Foundation; the Principal Investigator is Shawn McKee, Research Scientist in the Department of Physics and the Director of the Center for Network and Storage-Enabled Collaborative Computational Science (CNSECCS). CNSECCS and OSiRIS are operated under the auspices of the Michigan Institute for Computational Discovery and Engineering (MICDE).

 

Turbo User Guide

By | Uncategorized

Using Turbo

Turbo is a high-capacity, fast, reliable and secure data storage service for researchers across U-M. Turbo is configured to be easily sharable with on-campus resources such as the Flux HPC cluster, as well as off-campus systems and collaborators. Researchers can purchase space on the service by filling out this form. For questions and support, email hpc-storage@umich.edu. More details can be found on the Turbo Specifications page.

Features

  • Turbo is available to researchers from any academic unit.
  • Turbo can be accessed from Mac OSX (Mavericks and Yosemite), Windows 7+, and Linux computers using NFSv, NFSv4 and CIFS.
  • Turbo space can be purchased in 1TB increments
  • Turbo uses the Globus File Transfer Service
  • Turbo provides the option of secure storage for regulated and/or sensitive data
  • Turbo allows optional daily snapshots and backups of stored research data

Establish Service

See our Ordering Service page for details on how to request a Turbo volume.

Turbo costs $19.20 per terabyte per month, or $230.40 per terabyte per year, for replicated data. The cost for unreplicated data is $9.60 per terabyte per month, or $115.20 per terabyte per year. A U-M shortcode is required to order.

Permissions and Directories

When a volume is set up, the administrator associated with the account (i.e., specified by the Numeric User ID) will have full control to create directories and set permissions at the top level. This can be done with the standard Unix permission commands.

For assistance with permissions, email hpc-support@umich.edu.

File Storage and Transfer

Accessing Turbo via CIFS

Turbo Research Storage CIFS volumes can be accessed at the following path:

\\VOLUME-NAME.turbo.storage.umich.edu\VOLUME-NAME

 

For multi-protocol volumes, the UNC path is:

\\VOLUME-NAME-win.turbo.storage.umich.edu\VOLUME-NAME 
(e.g. \\flux-support-win.turbo.storage.umich.edu\flux-support)

 

To connect using Windows or Mac clients:

Windows
1. Double click on the “My Computer” icon on your desktop.
2. Click on the “Tools” menu and then select “Map Network Drive…”
3. In the first drop down menu, select the drive letter for the drive mapping. Choose a letter that is not currently in use, and “after” the letter H, in order to avoid any conflicts with other drives. In the second box, type the path to the file server as provided above.
4. Make sure to place a check the “Reconnect at login” box. This option will remember your drive mapping next time you log in to your computer.
5. Click on the “Connect using a different user name” link underneath the “Reconnect at login” box and use the following settings:
• User Name: UMROOT\uniqname (substituting your actual uniqname for the word “uniqname”)
• Password: your UMICH Kerberos ( aka Level-1) password
6. Click the “OK” button.
7. Click the “Finish” button.

Macintosh
1. In the Finder, click on the “Go” menu and select “Connect to Server…”
2. In the “Server Address” field, type the server name and path: smb://VOLUME_NAME.turbo.storage.umich.edu/VOLUME-NAME
3. Click the “Connect” button.
4. When prompted, enter your uniqname and UMICH Kerberos ( aka Level-1) password.
5. Click the “Connect” button.

Globus

Globus, previously known as Globus Online, is a powerful and easy to use file transfer service that is useful for transferring files virtually anywhere. It is available for Turbo volumes that do not contain protected health information (ePHI).

For more information, please see our Globus File Transfer page.

Flux users can use the Globus service or the Flux transfer hosts.

Lost File Recovery

Turbo volumes that are configured with snapshots will save previous versions of files.  Only files which have been snap-shotted overnight are recoverable.  Files that are lost on the same day they were created may not be recoverable.

From Linux clients: To recover files lost from your directory, navigate to the .snapshot directory at the root of your share.

$ cd /nfs/turbo/flux-admin/.snapshot
$ ls -1
daily_2015-08-24-_23-30
daily_2015-08-25-_23-30
daily_2015-08-26-_23-30
daily_2015-08-27-_23-30
daily_2015-08-28-_23-30
daily_2015-08-29-_23-30
daily_2015-08-30-_23-30

You can navigate to the snapshot directories and copy files back to your file share.
Note: The .snapshot directory may not be visible until you try enter it with cd.

From Windows clients, you can recover lost files from snapshots natively:

  • Open the directory that the deleted file was held in.
  • Right click in the directory that the file or folder was stored and select “Properties”.
  • Click on the “Previous Versions” tab when the Properties window opens.
  • A list of snapshots will be displayed.
  • Select the snapshot from which you wish to restore data.
  • In the new window, locate the file(s) you wish to restore.
  • Simply drag the file(s) or folder to their correct locations

Policies

Maintaining the overall stability of the system is paramount to us. System availability is based on our best efforts. We are staffed to provide support during normal business hours. We try very hard to provide support as broadly as possible, but cannot guarantee support on a 24 hour per day basis. Additionally, we perform system maintenance on a periodic basis, driven by the availability of software updates, staffing availability, and input from the user community. We do our best to schedule around your needs, but there will be times when the system is unavailable. For scheduled outages, we will announce them at least one month in advance on the ARC-TS home page; for unscheduled outages we will announce them as quickly as we can with as much detail as we have on that same page. You can also track ARC-TS at Twitter name ARC-TS.

Frequently Asked Questions

How can I start using Turbo?
Visit our Ordering Turbo page for information on how to order.

How can I mount Turbo on Flux?
Be sure to check the “Available to Flux” box when ordering Turbo. To change the configuration of an existing volume, visit the ITS provisioning site.

Can I store sensitive data on Turbo?
Yes, Turbo is approved for certain types of sensitive data. For more information, see the Data Security section of the Turbo Specifications page.

How can I order Turbo if I’m an LSA faculty member, LSA lecturer, or LSA GSRA/GSRI?
Please request Turbo via the LSA Research Storage Portal.

Order Service

To order Turbo, the following information is required:

  • Amount of storage needed (1TB increments)
  • MCommunity Group name (group members will receive service-related notification, and can request service changes)
  • Shortcode for billing
  • NFS
    • Hostnames or IP addresses for each permitted user on the wired U-M network. (If forward and reverse records exist in DNS, please use the fully qualified hostname. If the records do not exist, provide the IP address.)
    • Numeric user ID of person who will administer the top level Turbo directory and grant access to other users
  • CIFS
    • UMROOT AD Group Name
  • Specify if regulated or sensitive data will be use
  • Specify if your Turbo account should be accessible on the Flux HPC cluster

Fill out this form to order Turbo CIFS.

Fill out this form to order Turbo NFS.

Related Event

There are no upcoming events at this time.

Close up of Turbo

Turbo Research Storage

By | Systems and Services

Turbo is a high-capacity, fast, reliable, and secure data storage service that allows investigators across U-M to connect their data to the computing resources necessary for their research, including U-M’s Flux HPC cluster. Turbo supports storage of sensitive data and ARC-TS’s Armis cluster.

Turbo can only be used for research data. It is tuned for large files (1MB or greater) but is capable of handling small files such as documents, spreadsheets, etc. Turbo in combination with Globus sharing should work well for sharing and hosting data for external collaborators and institutes.

Turbo costs $19.20 per terabyte per month, or $230.40 per terabyte per year, for replicated data. The cost for unreplicated data is $9.60 per terabyte per month, or $115.20 per terabyte per year. A U-M shortcode is required to order.

Researchers in the Medical School and College of Literature, Science, and the Arts can take advantage of free or subsidized storage options through their respective academic units.

Order Service

To order Turbo, the following information is required:

  • Amount of storage needed (1TB increments)
  • MCommunity Group name (group members will receive service-related notification, and can request service changes)
  • Shortcode for billing
  • NFS
    • Hostnames or IP addresses for each permitted user on the wired U-M network. (If forward and reverse records exist in DNS, please use the fully qualified hostname. If the records do not exist, provide the IP address.)
    • Numeric user ID of person who will administer the top level Turbo directory and grant access to other users
  • CIFS
    • UMROOT AD Group Name
  • Specify if regulated or sensitive data will be use
  • Specify if your Turbo account should be accessible on the Flux HPC cluster

Fill out this form to order Turbo CIFS.

Fill out this form to order Turbo NFS.

Turbo Storage Configuration

By | Uncategorized

Turbo Configuration

ARC-TS Turbo Research Storage provides scalable storage to University of Michigan researchers. The service provides CIFS, NFSv3 and NFSv4 access. Performance is intended to be sufficient for both IO operations and bulk file access, allowing researchers to work with data in place and avoid excessive data staging.

Turbo can only be used for research data. It is tuned for large files (1MB or greater) but is capable of handling small files such as documents, spreadsheets, etc. Turbo in combination with Globus sharing should work well for sharing and hosting data for external collaborators and institutes.

Turbo is available 24 hours a day, 7 days a week. When needed, routine, non-disruptive maintenance will be scheduled between 11 p.m. Saturday and 7 a.m. Sunday. Users will be notified of any planned maintenance involving service disruptions 30 days in advance, through the Turbo user group email and the ARC-TS twitter account (@arcts_um), or via the contact information submitted by users.

For support, email hpc-support@umich.edu.

Backup and Snapshot Capability

The Turbo service provides optional replication and snapshots. Snapshots provide customers with a read-only copy of their data that is frozen at a specific point in time. With replication, data is written to storage at a separate location for purposes of data protection and service continuity.

Snapshot capability may be requested for no additional fee, and is available with the following options:

  • Daily snapshot, retained for 1 day
  • Daily snapshot, retained for 3 days
  • Daily snapshot, retained for 7 days
  • Daily snapshot, retained for 7 days, and 1 bi-weekly snapshot.

Files can be restored from the “.snapshot” hidden directory in users’ top level departmental share. Permissions for that directory are set by the Turbo account holder. For help, email hpc-support@umich.edu.

Data Security

The service offers two security levels, one for regulated and/or sensitive data and one for non-sensitive data.

The more secure service includes safeguards required by HIPAA. Accordingly, you may use it to maintain Protected Health Information (PHI). To satisfy internal HIPAA requirements, consult with Information and Infrastructure Assurance (IIA), and they will work with you to document your data sets and their location. (Contact IIA via the ITS Service Center.) Complying with HIPAA’s requirements is a shared responsibility. Users sharing and storing PHI in Turbo Research Storage are responsible for complying with HIPAA safeguards, including:

  • Using and disclosing only the minimum necessary PHI for the intended purpose.
  • Obtaining all required authorizations for using and disclosing PHI.
  • Ensuring that PHI is seen only by those who are authorized to see it.
  • Following any additional steps required by your unit to comply with HIPAA.
  • Social Security numbers should only be used where required by law or where they are essential for university business processes. If you must use SSNs, it is preferred that you use institutional resources designed to house this data, such as the Data Warehouse. IIA can help you explore appropriate storage locations or work with you to appropriately encrypt the data if those alternatives will not work for you.

For more information on security policies, see the University Standard Practice Guide (PDF), and the IIA Sensitive Data Guide

System Requirements

The following Operating Systems and mounting options are supported:

Operating System CIFS NFSv3 NFSv4 NFSv4 with Kerberos
Linux Yes Yes Yes Yes
Mac OSX 10.9 (Mavericks) Yes Yes Yes Yes
Mac OSX 10.10 (Yosemite) Yes Yes Yes No
Windows 7+ Yes Yes Yes No

Order Service

To order Turbo, the following information is required:

  • Amount of storage needed (1TB increments)
  • MCommunity Group name (group members will receive service-related notification, and can request service changes)
  • Shortcode for billing
  • NFS
    • Hostnames or IP addresses for each permitted user on the wired U-M network. (If forward and reverse records exist in DNS, please use the fully qualified hostname. If the records do not exist, provide the IP address.)
    • Numeric user ID of person who will administer the top level Turbo directory and grant access to other users
  • CIFS
    • UMROOT AD Group Name
  • Specify if regulated or sensitive data will be use
  • Specify if your Turbo account should be accessible on the Flux HPC cluster

Fill out this form to order Turbo CIFS.

Fill out this form to order Turbo NFS.