Obtaining a Simpool Account
- If you are a member of ACAL actively involved in research, you may be eligible to use the simpool. E-mail an admin with your id_rsa.pub and uniquename to request an account.
- You must have an RSA public key generated on the machine you plan to access the simpool from. This file is typically called: id_rsa.pub and will be located in /home/user/.ssh/ after you generate it. You can generate an SSH key on a typical Linux distro by entering the following command (in general you don't want to use a passphrase when generating your key):
ssh-keygen -t rsa
HTCondor is the job scheduling system on the simpool. Please make sure you read the documentation thoroughly before attempting to launch your jobs on the simpool. The manual has plenty of information about submitting jobs using HTCondor. With HTCondor, jobs are described in a submit file and are submitted using:
There are many variables that can be set inside of the submit file, and they are described in the manual. Here, we will describe some of the more important variables and give an example submit file. The following submit file, key-value.submit, will launch three separate instances of gem5 along with their respective command line arguments. Note: in this example, the output and error files will be empty, as gem5 has been directed to to pipe its stdout and stderr to a different location. Also, the log files will be placed in whichever directory your jobs are submitted from, unless you specify a path along with the filename in log/output/error.
executable = /home/atgutier/key-value-3d/gem5-memcached/build/ARM/m5-atgutier.fast universe = vanilla log = memcached.log.$(Cluster).$(Process) output = memcached.output.$(Cluster).$(Process) error = memcached.error.$(Cluster).$(Process) getenv = true arguments = -e -r --outdir=/home/atgutier/results/flash_get_64 /home/atgutier/key-value-3d/gem5-memcached/configs/example/fs.py --machine-type=VExpress_EMM --cpu-type=arm_inorder --caches --clock=1GHz --mem-read-latency=25000ns --mem-write-latency=250000ns --kernel=vmlinux-3.3-arm-vexpress-emm-pcie --etherdump=memcached_64.pcap -b memcached-64 queue arguments = -e -r --outdir=/home/atgutier/results/flash_get_128 /home/atgutier/key-value-3d/gem5-memcached/configs/example/fs.py --machine-type=VExpress_EMM --cpu-type=arm_inorder --caches --clock=1GHz --mem-read-latency=25000ns --mem-write-latency=250000ns --kernel=vmlinux-3.3-arm-vexpress-emm-pcie --etherdump=memcached_128.pcap -b memcached-128 queue arguments = -e -r --outdir=/home/atgutier/results/flash_get_256 /home/atgutier/key-value-3d/gem5-memcached/configs/example/fs.py --machine-type=VExpress_EMM --cpu-type=arm_inorder --caches --clock=1GHz --mem-read-latency=25000ns --mem-write-latency=250000ns --kernel=vmlinux-3.3-arm-vexpress-emm-pcie --etherdump=memcached_256.pcap -b memcached-256 queue
Submit File Commands
- The executable you want to run. It can be a program binary, or a script, etc.
- This the execution universe in which the job will run. The universe can signify things such as which resources must be available for the job, etc. Because all of the nodes in the simpool contain the same software, and have a shared home directory, the vanilla universe is typically used. The different types of universes are described in detail in the HTCondor manual.
- By default HTCondor will send notification e-mails. If you want to disable notifications set notification=never.
- Will specify the log file for the HTCondor job. The $(Cluster) variable will append the cluster the job is run on and the $(Process) variable will append the job's unique process id.
- Will specify the stdout file for the HTCondor job. The job's cluster number and unique process id will be appended to the file.
- Will specify the stderr file for the HTCondor job. The job's cluster number and unique process id will be appended to the file.
- If getenv is specified as true, then the users environment will be set as the job's environment when it is launched. If you don't want the entire environment to be preserved, and instead want to specify certain environment variables manually, you can do so as well, by using the environment command. This is described in more detail in the manual.
- The command line arguments to be passed to the program specified in executable.
- Will signal HTCondor to queue up the job.
- Allows you to set conditions for the automatic removal of your jobs. This is particularly useful if you wish to limit the amount of wallclock time your job with run. E.g., the following will cause your jobs to run for runtime seconds.
periodic_remove = (RemoteWallClockTime - CumulativeSuspensionTime) > runtime
Useful Condor Commands
- To remove a specific job, give the cluster.proc id (can be obtained via the condor_q command), or to remove all of your jobs give your user id.
- Display all jobs in the HTCondor queue
- To see the status of all nodes available to HTCondor
- To release your held jobs
- The simpool is not a personal data storage device, so you should store anything important on your own machine. Any data you leave on the simpool is subject to deletion at any time, and without warning, if we need to clear up space.
- /home should be considered volatile as the file system is still very experimental and subject to loss at any time. That said, we will do our best to keep it up.
- Cleanup after yourself, if you use an outlandish (more than 10s of GBs) amount of disk space you will be warned and your data may be deleted at an administrator's discretion. Repeat offenders may have their accounts removed.
For returning simpool users
- The OAR batch scheduling has been replaced by HTCondor. Your OAR scripts will not work.
- /z/work is gone, all experiments are now run out of the home directory.
- You need a new user account, the old ones are no longer valid.