Request an Account
Once you have a Gatorlink, you can request an account on the form here: https://www.rc.ufl.edu/get-started/hipergator/request-hipergator-account/
List Pam as your sponsor.
Storage
/blue/soltis
- We currently have ~50TB of space
- We are unlikely to add more–/blue storage is $140/TB/yr!
- This is primarily for current, active projects. Move files to /orange/soltis when you are not working on the project for more than a few months.
- Each user has their own directory
- You can check your usage with the
blue_quota
command - The /blue/soltis/share/ folder can be used for sharing, common data, etc.
- If you have important data that you need regular access to and this should be backed up, keep it in /blue/soltis/share/Soltis_Backup/. This is the only location on /blue/soltis that is backed up in any way.
/orange/soltis
- The /orange filesystem is somewhat slower than /blue, so not best for active data used in running jobs.
- We currently have ~150TB of space.
- Each user can create their own directory in /orange/soltis/<gatorlink>.
- You can check your usage with the
orange_quota
command. - This is a great place to archive data you are not actively working on.
- It is much cheaper than /blue–only $25/TB/yr.
- One copy of all raw sequence data should go in /orange/soltis/SequenceBackup.
- This folder is backed up 2-3 times a week.
- When you receive sequence data, let Matt know where it is located and he will copy it there.
- Data in this folder should be read only.
- Please provide a README file that describes:
- Taxa and herbarium voucher information
- Library prep type and date
- Sequencing type and date
- Barcode information as applicable
- Other information that will be helpful to someone trying to reuse your data or to you when you go to submit your data to the SRA.
- If you have important data that you do not access regularly, you can put it in /orange/soltis/Backup_and_archive.
Suggested file/folder organization
A few suggestions for organizing files and folders:
- One folder per project–when you leave, I will archive your data by compressing each folder in /blue/soltis/user. If you organize data into projects, one compressed archive corresponds to one project.
- Use dates in filenames, ISO format is preferred: YYYY-MM-DD
- No spaces or special characters in filenames.
- Add a README to each folder explaining the contents, work done, links to git repos, etc.
Storage Backup
As noted above, the ONLY places that are backed up in any way are:
- /blue/soltis/share/Soltis_Backup/: Use for important data that you need regular access to and this should be backed up.
- /orange/soltis/Backup_and_archive/: Use for important data that you do not access regularly.
- /home: Research Computing does maintain a daily snapshot of your home directory for one week. See this video for accessing your snapshots.
- /orange/soltis/SequenceBackup: For archiving raw sequence data. Should be set to read only to minimize accidental data changes.
When you leave the lab
-
- When you leave, we can keep your GatorLink active by requesting an affiliation through Research Computing. Please work with Matt to take care of this if you think you will need access to HiPerGator for more than a few months after you leave.
- Please clean up your data! Delete files you no longer need, organize what you do want to keep. Make sure others will be able to understand what each folder contains and where your data are.
- Move files to /orange/soltis/former_members/ and compress. Ask Matt for help.
- If Matt does this, it will be done using the command:
tar cjvf /orange/soltis/former_members/$user/$outname.tar.bz2 $outname
- Whe $user is your username and $outname is the folder name
- To decompress, use
tar xvf file.tar.bz2
- If Matt does this, it will be done using the command:
- Expect that even if you maintain an active GatorLink, at some point, Matt will archive your data! We typically do not delete things, but each folder in your /blue/soltis/gatorlink folder will be compressed and archived in /orange/soltis/former_members/gatorlink. We simply cannot afford to keep data in /blue forever if it is not actively being used. Current lab members need that space for active research. Please help by taking care of this before you leave!
Running jobs
Most jobs on HiPerGator are submitted to the scheduler to run.
See the UFRC Wiki page with sample SLURM scripts for examples of different types of job scripts.
When requesting resources for your job, it is important to keep in mind that we all share the same resources and there are limits to what is available.
slurmInfo
To view current limits and usage, the slurmInfo
is helpful. Here’s an example of how to run it:
Investment details
The output above shows that, when this was run, the lab had 102 cores and 358 GB of RAM in its investment. Of that, 19 cores and 72 GB of RAM were in use.
When you submit a job, you need to request cores (usually with the –cpus-per-task flag) and RAM (usually with the –mem flag). Please do not request way more cores or RAM than your job will actually use. This wastes resources and prevents others from doing work. See an example below for how we frequently request more RAM than needed.
Everyone is sharing these resources. Please send an email to the lab listserve if you plan on using a large fraction of the available resources.
Burst QOS
In addition to the investment, we can make use of idle resources on the cluster using the burst QOS. Jobs are limited to 4-day (96-hours) and may take longer to run, but we have 9X more resources available (both CPUs and RAM). Large jobs should try to use these resources, especially if they are less than 4-day.
To submit a job to the burst QOS, add: #SBATCH --qos=soltis-b
to your job script or submit your job with
sbatch --qos=soltis-b myscript.sh
.
Memory Requests
Our group is commonly limited more by memory than CPUs. This is partially because many of our applications use lots of RAM. But this is also because we sometimes are not careful in specifying reasonable memory requests. As an example, here’s a summary of the jobs run in August and September of 2019:
This table shows the number of jobs that requested a given fold more memory than was actually used. So, 2,064 jobs (~29%) requested over 1,000 times more RAM than they actually used!
When people have jobs pending because we are limited by RAM, having a job running that is using a tiny fraction of the RAM set aside for it is just wasteful! Yes, you do need to make sure you request more than you need, but 2X is more than enough! 10X is already excessive! Overall, I would read this to suggest that 93% of our jobs are inefficient when it comes to memory requests!
CPU requests
Serial jobs
Many applications and most Python and R scripts will only use a single core. Please do not request more cores than your job will use.
Threaded applications
Many applications, like assemblers and phylogenetics applications can use multiple cores, but they all need to be on the same physical server (node). For jobs like this, the resource request should be similar to:
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 8
This would provide 8-cores for your job to run. Most applications also need to be told how many cores to use. Be sure to specify this. It may be easiest to use the $SLURM_CPUS_ON_NODE
variable for this.
Please make sure to test that your application will use the cores efficiently. My favorite example of this is blastn. Under many conditions, it actually goes slower with more cores–using more than one not only wastes resources, but slows down searches! Ask people, read manuals, check for yourself!
MPI applications
There are relatively few of these in our research. RAxML-ng is one exception–please be sure to read: https://github.com/amkozlov/raxml-ng/wiki/Parallelization
Matt is also happy to help with job scripts for this application, there are some counterintuitive settings that dramatically impact performance. Doing some preliminary testing is important!