This tutorial explains how to configure a basic Python environment and how to launch jobs on the Colosse super computer @ Calcul Quebec. The official Colosse wiki can be found at: https://wiki.calculquebec.ca.
Table of contents:
- Connecting to the supercomputer
- File System
- Configuring your development environment
- Submitting jobs
- Other useful info
Connecting to the supercomputer
To connect to Colosse, open a terminal and use the following command.
This will connect you to a login node. You should be greeted with a message like this:
=========================================================================== Vous êtes sur un noeud de login de colosse (Calcul Québec). - Nous n'effectuons pas de sauvegarde de vos fichiers. - N'utilisez pas le noeud de login pour executer votre code. This is a Calcul Québec login node for colosse. - There is no backup of users files. - Do not use this node to run code. Rapportez tout problème à / Report any problems to: email@example.com Documentation: https://wiki.calculquebec.ca/ Suivre sur Twitter/Follow on Twitter: https://twitter.com/CQ_Colosse État des serveurs: http://serveurscq.computecanada.ca ===========================================================================
Login nodes are used to prepare/launch/monitor jobs and to move files around. Do not use them to run code.
Colosse has various file systems, which have different properties listed here. Basically, I use SCRATCH to run experiments and RAP to store results and data that must not be lost.
- Scratch: This directory is placed on a parallel filesystem, Lustre (for Colosse, Mp2, Ms2 and Cottos) or GPFS (for Briarée and Guillimin). It is generally visible from all nodes. Using it is very fast for large files, but not very efficient for many small files. This is the appropriate place to store large files that you use for a few days or weeks only. Periodically, it may be automatically cleaned (files being deleted).
In your home directory, run the following commands:
mkdir $SCRATCH/$USER mkdir $RAP/$USER ln -s $SCRATCH/$USER scratch ln -s $RAP/$USER rap
This will create symbolic links to you scratch and rap folders, which have complicated paths.
Colosse provides a lot of preinstalled software, which is made available through modules.
Listing all available modules
Searching for a specific module
module spider [keyword]
For example, running
module spider gcc returns:
----------------- compilers/gcc: ----------------- Versions: compilers/gcc/4.5 compilers/gcc/4.6 compilers/gcc/4.8 compilers/gcc/4.8.5 compilers/gcc/4.9 compilers/gcc/5.4
To use a module, you must first load it using the following command
module load [module name]
module load compilers/gcc/4.8.5 loads version 4.8.5 of the gcc compiler.
Loading modules on login
Manually loading modules each time you log in is tedious and can be avoided by using a
- Copy the bashrc file provided with this tutorial to the root of your home directory.
- Load it by running the following command, which will be run automatically on login.
Configuring your development environment
In addition to the scratch and rap directories, I like to have a dev directory, where I keep all my code repositories. Run the following commands at the root of your home directory.
mkdir dev mkdir dev/git
Creating a virtual environment
First, create a virtual environment by using the following command at the root of your home directory.
Then, open up your .bashrc file and uncomment the following line in the Software section.
This will load you python environment when you login. Now, run
source ~/.bashrc followed by
which python. The last command should point to an executable in your virtual environment.
First, run the following commands.
pip install --upgrade pip pip install cython
Run the following command, which will tells numpy where the MKL library is located.
cat > ~/.numpy-site.cfg << EOF [mkl] library_dirs = $MKLROOT/lib/intel64 include_dirs = $MKLROOT/include mkl_libs = mkl_rt lapack_libs = EOF
Then, go to the ~/dev/git directory and run the following commands.
git clone https://github.com/numpy/numpy.git cd numpy python setup.py install
Installing other useful packages
Run the following commands.
pip install ipython scipy scikit-learn h5py pandas
You can install any other package using pip.
Now, your environment is all set and you are ready to launch experiments!
Ressource allocation project
First, determine what your ressource allocation project is by running
colosse-info. This will print a lot of stuff, including your various computation allocations. In my case, it prints
RAPI nne-790-aa: 0 used cores / 30 allocated cores (recent history) RAPI nne-790-ae: 39.2049 used cores / 180 allocated cores (recent history) RAPI agq-973-aa: 0 used cores / 30 allocated cores (recent history) RAPI kyk-164-aa: 0 used cores / 30 allocated cores (recent history)
but you might only have one. Pick the allocation you want to use and remember its identifier, e.g., nne-790-ae.
Submitting a job to the scheduler
Now, open the example_job.msub file provided with this tutorial. The file header gives the scheduler some information about your job. For example, the header could be
#!/bin/bash #PBS -l nodes=2:ppn=8,walltime=24:00:00 #PBS -o stdout.out #PBS -e stderr.err #PBS -V #PBS -N myjob #PBS -A nne-790-ae
In this case, the requested computing time is 24 hours. The job requires 2 nodes, with 8 CPUs each. The stderr and stdout are redirected to user specified files. The name of the job is myjob. The ressource allocation to use is nne-790-ae.
Copy the example_job.msub file to a directory called ~/scratch/example_job. Replace the ressource allocation project number by yours. Then, submit the job using the following command.
Our example job will run for 5 minutes, so it should not be queued for a long time.
Once the job is submitted, you can use the
i command to list the jobs that are in the waiting queue, i.e., the IDLE state. The
r command shows all the jobs that are running and the
b command shows all the jobs that are blocked, i.e., that the server refuses to run for the moment.
That’s it! You can now submit jobs on Colosse.
Other useful info
Logging in without a password
If you don’t want to have to type your password every time you connect to Colosse, do the following:
If you don’t already have an ssh key for your computer, generate one by typing
ssh-keygenin a terminal
Copy the key to the supercomputer using the following command:
- Enter your password when prompted and you’re done.
Copying files over ssh
To copy files to the supercomputer, you can use the
scp utility. For example, you could use this command to copy a directory called
mydir on your computer to the
myremotedir directory on Colosse.
scp -r mydir firstname.lastname@example.org:myremotedir/
You could use a similar command to get the directory from Colosse:
scp -r email@example.com:myremotedir/mydir/ .