Terapix Star Formation Region IC 1396, © 2001 CFHT
Operating Condor
Section
Last content update July 4th, 2007

Condor is installed on the cluster, and is configured in such a way that it can be run from any computer connected on the TERAPIX network, including personal computers. Here is a small cookbook on how to submit jobs to the TERAPIX cluster.
-  First, make sure that Condor is installed and configured on your machine (otherwise follow this link to install and configure Condor): on a Unix system it will generally be installed in /opt/condor or /usr/local/condor (from now on we will refer to the Condor installation directory as $CONDOR_CONFIG). You can check that it is running with

% ps ax | grep condor

You should see at least a task called condor_master. If this is not the case, you might want to start the Condor system (as root if necessary) using

% $CONDOR_CONFIG/sbin/condor_master

-  If the path is correctly set, typing condor_status from the shell should return something like:

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

vm1@efigix.ia LINUX       X86_64 Unclaimed  Idle       0.000   975  0+00:20:29
vm2@efigix.ia LINUX       X86_64 Unclaimed  Idle       0.410   975  0+03:25:08
vm3@efigix.ia LINUX       X86_64 Unclaimed  Idle       0.000   975  0+15:29:11
vm4@efigix.ia LINUX       X86_64 Unclaimed  Idle       0.000   975  0+15:28:31
vm1@mix10.iap LINUX       X86_64 Unclaimed  Idle       0.040  4026  0+03:25:07
vm2@mix10.iap LINUX       X86_64 Unclaimed  Idle       0.000  4026  0+15:28:29
...

-  the command condor_submit is used to send jobs to the cluster. You may either provide the name of a "submission file" as an argument, or pipe it to condor_submit. The following commented submission file shows how to send the system command ls and return the result in ls.out:

executable     = /bin/ls    # what to run
universe       = vanilla    # standard job (not MPI, etc.)
arguments      = /          # command line arguments (separated with spaces)
output         = ls.out     # where to write the result (from stdout)
error          = ls.error   # where to write the errors (from stderr)
log            = ls.log     # where to write the Condor log
should_transfer_files = YES # don't use NFS or any other shared filesystem
when_to_transfer_output  = ON_EXIT_OR_EVICT # mandatory for transfering files
queue                       # go!

-  typing quickly condor_q right after submitting the job should show something like:

-- Submitter: kiravix.iap.fr : <194.57.221.16:41872> : kiravix.iap.fr
ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD              
 39.0   bertin          7/4  16:12   0+00:00:00 I  0   9.8  ls /              

1 jobs; 1 idle, 0 running, 0 held

You may also use condor_q -global to list all jobs currently queued for execution (not only those sent from your machine), condor_q -long to list more details, or condor_q -better-analyze to get some hints if the job is not executed as planned. If everything works well, the job should vanish from the condor_q list after a few seconds, and files called ls.out,ls.error and ls.log should appear in the current directory.

-  Here is a more complex "real-life" example which send SExtractor as a "cluster of jobs". It queues 3 jobs, and sends the executable as well as configuration files and data:

#
# Condor submission file for PGC morphology
#
executable              = /usr/local/bin/sex
universe                = vanilla
transfer_executable     = True
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
log                     = sex.log
arguments                = image1.fits -XML_NAME image1.xml -CATALOG_NAME image1.cat
transfer_input_files     = image1.fits,default.sex,default.param,default.conv
queue
arguments                = image2.fits -XML_NAME image2.xml -CATALOG_NAME image2.cat
transfer_input_files     = image2.fits,default.sex,default.param,default.conv
queue
arguments                = image3.fits -XML_NAME image3.xml -CATALOG_NAME image3.cat
transfer_input_files     = image3.fits,default.sex,default.param,default.conv
queue

Note the coma between the filenames of images to be transferred, and the queue command separating each job submission. Files created by the executable are automatically transferred back. Actually the submission file above was written with a shell script called sex.x:

#! /bin/tcsh
echo "#"
echo "# Condor submission file for PGC morphology"
echo "#"
echo "executable              = /usr/local/bin/sex"
echo "universe                = vanilla"
echo "transfer_executable     = True"
echo "should_transfer_files   = YES"
echo "when_to_transfer_output = ON_EXIT_OR_EVICT"
echo "log                     = pgc.log"

foreach file ( $* )
set rfile = $file:r:t
echo "arguments                = "$file" -XML_NAME "$rfile".xml -CATALOG_NAME "$rfile".cat"
echo "transfer_input_files     = "$file",default.sex,default.param,default.conv"
#echo "transfer_output_files    =
#echo "output                  = "$rfile".out"
#echo "error                   = "$rfile".error"
echo "queue"
end

Using pipes we can now send SExtractor jobs for all images in the current directory using

% ./sex.x *.fits | condor_submit

-  If a job is stuck or takes too much time to complete, you might want to remove it from the queue with condor_rm. For instance

% condor_rm bertin

removes all jobs owned by user bertin. Hopefully, condor_rm can be more selective:

% condor_rm 56.3

will remove only job #56.3.


Site Map  -   -  Contact
© Terapix 2003-2011