An Introduction to HTCondor

The SSCC's HTCondor flock, Condor, makes a tremendous amount of computing power available to SSCC users. Condor can be used to run Stata, Matlab, R, C/C++, and Fortran jobs.

Complete documentation on HTCondor is available from the UW Computer Science Department. This article will give you specific information about our cluster, easy ways we've created to use Condor (including the ability to submit Stata jobs to Condor via the web), and an introduction to some of the basic Condor functions.

The Hardware

SSCC's HTCondor flock is currently made up of the five Linstat servers plus four additional servers dedicated to running jobs submitted to Condor. For details about the flock's specifications, please see Computing Resources at the SSCC. The Condor machines have access to Linux home and project directories just like Linstat. Most Linux programs do not have to be changed at all to run on Condor, though programs written to use Windows Stata, SAS, Matlab, or R will probably require modifications.

The Condor Software

HTCondor is an open source project at the University of Wisconsin's Computer Science Department. HTCondor groups computers into "flocks" and when you submit a job to HTCondor it finds an available computer in the flock to run your job. Thus you don't need to try to identify which computers are busy and which are not.

In a standard HTCondor flock, high priority jobs can preempt low priority jobs, with the progress of the low priority jobs being "checkpointed" (i.e. their progress is saved). Users who are running lots of jobs have their priority temporarily lowered, ensuring others have a chance to run jobs as well.

Unfortunately, checkpointing does not work with the statistical software used at the SSCC, so we've turned off the entire preemption mechanism. Thus the SSCC's HTCondor flock is not a scheduling system that decides when jobs should run and makes sure everyone can run jobs, but a matchmaking system that matches jobs with available computers. Because preemption is turned off, we must ask users to comply with our Server Usage policy for Condor:

You may submit up to 15 jobs to the SSCC HTCondor flock at any time. You may be able to submit additional jobs depending on how long your jobs will take to run and how many slots are unclaimed at the time you submit them. Use condor_status to find out how many slots are unclaimed.

Time your jobs will take to run Total Number of jobs you may submit
is the maximum of 15 or...
< 3 hours The number of unclaimed slots
< 1 day 3/4 of the number of unclaimed slots
> 1 day 1/2 of the number of unclaimed slots

Jobs that Use Multiple Processors

In order to assign jobs to servers efficiently, Condor distinguishes between jobs that use just one processor and thus can share a server with other similar jobs, and jobs that use multiple processors and thus run fastest if they can use all the processors a server has. Jobs submitted using a 'condor' command (as described below) are treated as single-processor jobs, and jobs submitted using a 'condormp' command are treated as multi-processor jobs. This only affects how jobs are allocated: a job submitted using a 'condor' command can still use multiple processors, it just might have to share them with other jobs. (If you had a job that uses multiple processors briefly but spends most of its time using one processor, submitting it using 'condor' might be ideal.)

SSCC's Linux servers have Stata/MP installed, so Stata will always use multiple processors. For other programs, you usually have to explicitly tell them to use multiple processors, but it's possible something like an R package might do that for you. If you're not sure if your job uses multiple processors or not, start it on Linstat in background mode and type top to monitor it. If your job has multiple entries in the top output or a single entry that uses more than 100% CPU time, it uses multiple processors.

Easy Ways to Submit Jobs to Condor

You can submit Condor jobs from any Linstat server.

Stata

To submit a Stata job to Condor, type:

condor_stata dofile

where dofile should be replaced by the name of Stata do file you want to run. (You can also use the same syntax as running a batch job on the server you're using: condor_stata -b do dofile. The result will be the same.) Stata jobs submitted to Condor will use Stata/MP, the multiprocessor version of Stata.

Note that you can also submit Stata jobs to Condor via the web, completely avoiding the need to log into Linstat.

Matlab

To submit a Matlab job to Condor, type:

condor_matlab program.m program.log &

where program should be replaced by the name of the Matlab program you want to run. (The command submitted to the server is actually /software/matlab/bin/matlab -nojvm -nodisplay < program.m > program.log)

If your job uses multiple processors, type:

condormp_matlab program.m program.log &

R

To submit an R job to Condor, type:

condor_R program.R program.log &

where program should be replaced by the name of the R program you want to run. (The command submitted to the server is actually R < program.R > program.log --no-save)

If your job uses multiple processors, type:

condormp_R program.R program.log &

Other Jobs

Use condor_do to run any other simple Linux job. The syntax is simply:

condor_do "command" &

where command is any command you could type at the Linux prompt, including arguments. For example, if you wanted to run an R program called program.R with different arguments than condor_R uses you could type:

condor_do "R < program.R > program.log --vanilla" &

If your job uses multiple processors, type:

condormp_do "command" &

Monitoring the Status of Condor Jobs

Condor will send you a message to your preferred email address when your job is complete. There are also two commands that can tell you the status of the Condor flock or your job.

condor_status tells you the state of all the Condor machines, including whether they are available for new jobs.

condor_q tells you the status of all the jobs currently running or waiting to be run, including yours.

Managing Condor Jobs

If you change your mind, condor_rm can remove jobs from the Condor queue. You must be logged into the same Linstat server you used to submit the job in order to remove it.

condor_rm ID

will remove the job with the specified ID. Use condor_q to find the ID of your job.

condor_rm username

will remove all jobs belonging to you. For example, type:

condor_rm 151

to remove job 151, or

condor_rm rdimond

to remove all jobs belonging to rdimond. You cannot remove other peoples' jobs, for obvious reasons. Note that jobs are marked for removal immediately, but it may be a few minutes before they are actually removed.

Last Revised: 3/20/2017