Table of Contents |
---|
Using Conda Environments
...
Requesting a GPU
==================
You will need to be part of gpuq2. Request to be added to the gpuq2.
Do Not Run Jupyter on the Login Nodes
Base Conda Environment
Follow Qs 45: How I install my own conda environment without root access
Custom Conda Environment
...
source /usr/local/bin/s3proxy.sh
module load anaconda3/2022.10
source activate myenv
conda install -c anaconda jupyter
After installation, you can do the following to launch an interative job:
qsub -I -q gpuq2 -l select=1:ncpus=1:ngpus=1:mem=12gb,walltime=0:13:00
jupyter-notebook --no-browser --port=8555
On a seperate terminal (terminal for Mac/linux and Windows Command Prompt (for windows desktop users) type this to enable tunneling:
ssh -N -f -L port:gn061:port s123456@clogin1.rcs.griffith.edu.au
e.g: ssh -N -f -L 8555:gn061:8555 s123456@clogin1.rcs.griffith.edu.au
It should return nothing if successful.
Now, open a web browser on your laptop/desktop and copy and paste the URL from the jupyter notebook output
Running as a batch job
An example pbs script (pbs.jupyter) is as follows:
No Format |
---|
#!/bin/bash #PBS -N jupyterN #PBS -m abe #PBS -M YourEmail@griffithunimyemail@griffithuni.edu.au ##PBS #PBS -Nq workq cembd_jlab #PBS -q dljungpuq2 #PBS -W group_list=deeplearning -A deeplearning ###Other options group_list=aspen -A aspen ### Number of nodes:Number of CPUs:Number of threads per node. ###If not using gpu,you should not request ngpus #PBS -l select=1:ncpus=16:ngpus=1:mem=100gb,walltime=600:00:00 ###PBS -l select=1:ncpus=32:ngpus=0:mem=100gb,walltime=300:00:00 ### Add current shell environment to job (comment out if not needed) # The job's working directory cd $PBS_O_WORKDIR module load python/3.8.8 module load gcc/4.9.3 source /usr/local/bin/s3proxy.sh unset PYTHONPATH source venv_temp_py38/bin/activate jupyter labl select=1:ncpus=1:ngpus=1:mem=12gb,walltime=0:13:00 # get tunneling info XDG_RUNTIME_DIR="" node=$(hostname -s) user=$(whoami) cluster="clogin1" ##Please change below port as it may be in use ##choose your own unique port between 8000 and 9999 port=8895 cd $PBS_O_WORKDIR # print tunneling instructions tunnel.$PBS_JOBID.txt JJID=`echo $PBS_JOBID|sed 's/\.cadmin//g'` echo -e " Command to create ssh tunnel: ssh -N -f -L ${port}:${node}:${port} ${user}@${cluster}.rcs.griffith.edu.au Use a Browser on your local machine to go to: localhost:${port} (prefix w/ https:// if using password)" >tunnel.$JJID.txt # load modules or conda environments here module load anaconda3/2021.11 source activate myenv # Run Jupyter jupyter-notebook --no-browser --port=${port=5678 |
...
} --ip=${node} 2>&1 | tee jupnote.$JJID.log |
qsub pbs.jupyter
#It gives a jobID (e.g 218157). If you do a listing of the file in the PBS_O_WORKDIR when the job runs, you will see two files:
tunnel.JOBID.txt (e.g tunnel.218157.txt) and jupnote.JOBID.log (e.g jupnote.218157.log)
You can look into the content of both files to get the syntax for tunnelling (in the tunnel file) and the actual web addess (in jupnote file)
cat tunnel.JOBID.txt
you will see something like this:
ssh -N -f -L 8889:gn061:8889 s123456@clogin1.rcs.griffith.edu.au
cat jupnote.JOBID.log
You will see something like this:
http://127.0.0.1:8895/?token=9d109cd760cd214d689825d87db60302103712acb4560921
Lastly, open a web browser on your laptop/desktop and copy and paste the URL from the previous output:
Note: If needed only: you may run the following command on your local machine to start port forwarding.
For gn060 gpu node
ssh -CNL 56788889:localhost:5678 s123456@n0608889 s123456@gn060.rcs.griffith.edu.au
For gpu node gn061:
ssh -N -f -L 8889:gn061:8889 -J s123456@clogin1.rcs.griffith.edu.au s123456@gn061
Note that we selected the Linux port 8889 in the above command to connect to the notebook. If you don't specify the port, it will default to port 8888 but sometimes this port can be already in use either on the remote machine or the local one
(i.e., your laptop). If the port you selected is unavailable, you will get an error message, in which case you should just pick another one. It is best to keep it greater than 1024.
Consider starting with 8888 and increment by 1 if it fails, e.g., try 8888, 8889, 8890 and so on. If you are running on a different port then substitute your port number for 8889.
FAQ and Troubleshooting
To list the listening port
lsof -i tcp:<port>
e.g. lsof -i tcp:8889
lsof -i tcp:8889
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ssh 32099 .......................... localhost:ddi-tcp-2 (LISTEN)
ssh 32099 ......................... TCP localhost:ddi-tcp-2 (LISTEN)
To kill it:
kill -9 32099