Installing JupyterHub with Microk8s¶
keywords: jupyterhub, microk8s, kubernetes, scalable, jupyter, python, nbgitpuller, docker, apache
Introduction¶
We need a scalable and flexible implementation of Jupyter for education and research. In educational settings, students typically require one or two CPU cores and a small amount of memory, whereas researchers often need many cores and large memory allocations. Our goal is to build a system capable of supporting up to 150 students in a single day, after which it can be reconfigured to serve a smaller group of researchers with higher resource demands.
To achieve this, we are utilizing two of our main departmental servers (jupiter and saturn) for computation, while a virtual server (pluto) manages orchestration and interfacing.
Having successfully used JupyterHub for several years, we chose to continue with it for this project. While JupyterHub scales well within a single server for smaller groups (<30 students), a more robust multi-node solution is needed. Therefore, we are leveraging Kubernetes to distribute JupyterHub and the spawned Jupyter instances across multiple servers.
For simplicity and efficiency, we are using Microk8s, a lightweight yet powerful Kubernetes distribution. Microk8s makes it easy to deploy Kubernetes on multiple nodes and includes all necessary tools for configuration and management, allowing us to quickly set up and maintain our system.
For setting up Microk8s, we followed most of the instructions from the official documentation:
For setting up JupyterHub with Kubernetes, we found the following guide particularly helpful:
This document provides all the steps required to set up the service, including brief explanations of certain choices and their rationale.
Storage¶
For education, all students receive their own home directory in a shared NFS storage located on jupiter and saturn. To simplify the setup, we are initially configuring the shared storage as local storage rather than NFS. This decision may be revisited in a later phase.
Web interface¶
The JupyterHub web interface is served via Apache on pluto and is initially configured in Kubernetes using the simplest method: NodePort. In the future, we plan to replace this with Ingress for improved flexibility and scalability.
Setup and control user¶
To enable multiple users to maintain this service, we have created a local, independent user kube
with separate home directories on each server:
/home/kube
on pluto/home/kube_jupiter
on jupiter/home/kube_saturn
on saturn
Docker container¶
The jupyter/scipy-notebook
Docker container includes most of the packages needed for education and research. For educational purposes, we need to add the nbgitpuller
package, which enables students to retrieve course materials via a provided URL. Therefore, we create a new Docker container based on scipy-notebook
, incorporating nbgitpuller
.
Authentication¶
Access to JupyterHub is restricted to users with a valid TU Delft account (netid). Authentication is handled via OpenID, which is supported by TU Delft. We have requested the necessary security credentials to enable this service. At present, it is not possible to restrict access to specific user groups, but we are hopeful that this feature can be implemented in the future.
Instructions¶
On all servers¶
-
install
microk8s
sudo snap install microk8s --classic
-
create
kube
user NOTE: the localkube
users cannot share the home directory!on plutosudo useradd -m -d /home/kube -s /bin/bash kube
on jupitersudo useradd -m -d /home/kube_jupiter -s /bin/bash kube
on saturnsudo useradd -m -d /home/kube_saturn -s /bin/bash kube
on all serverssudo usermod -aG sudo kube sudo usermod -aG microk8s kube
-
login as
kube
Login with your account via ssh and do:su - kube
Or, if you have sudo rights and do not know the
kube
password:sudo -i -u kube
-
setup shell for microk8s
-
fish:
~/.config/fish/conf.d/microk8s.fishalias kubectl='microk8s kubectl' alias helm='microk8s helm' kubectl completion fish | source
-
bash:
~/.bashrcalias kubectl='microk8s kubectl' alias helm='microk8s helm' source <(kubectl completion bash)
Logout and login again to activate.
-
-
check status
microk8s status
If
microk8s
is not yet started:microk8s start
Set permissions on
~/.kube
sudo chown -R kube:kube $HOME/.kube
microk8s inspect
Solve any errors you encounter with
inspect
. If needed you can createlocalnode.yaml
withtouch
. -
configure firewall
sudo ufw allow 16443 sudo ufw allow 10250 sudo ufw allow 10255 sudo ufw allow 12379 sudo ufw allow 10257 sudo ufw allow 10259 sudo ufw allow 19001 sudo ufw allow 25000 sudo ufw allow 4789/udp sudo ufw enable
On a computer with Docker installed¶
As explained we need to build a custom Docker container to be used for starting Jupyter. This container must be made available on
-
Create the following
Dockerfile
:FROM quay.io/jupyter/scipy-notebook:latest RUN pip install --no-cache-dir nbgitpuller
-
Build the Docker container for the right platform:
docker buildx build --platform linux/amd64 . -t ronligt/jupyter-nbgitpuller:latest
-
Push the container to the repository:
docker login docker.io docker push ronligt/jupyter-nbgitpuller
On pluto¶
-
turn off
swap
:sudo swapoff -a
-
enable addons
microk8s enable dns microk8s enable hostpath-storage microk8s enable dashboard
-
check dashboard
microk8s dashboard-proxy
Create an ssh-tunnel from the information and use the given token to authenticate.
ssh -L 10443:localhost:10443 pluto
pluto
is defined in.ssh/config
with aproxyjump
And visit this link on your local webbrowser:
-
add nodes Per worker node you have to do the following on the pluto:
microk8s add-node
and follow the instructions.
-
label the worker nodes as such
kubectl label nodes jupiter-imphys jupyterhub-node=true kubectl label nodes saturn-imphys jupyterhub-node=true
These labels are used in the configuration of Kubernetes to start the JupyterHub and Jupyter pods only on the main servers and not on the control server.
-
create storage for jupyterhub
jupyterhub-storage.yamlapiVersion: v1 kind: PersistentVolume metadata: name: jupyterhub-user-pv spec: capacity: storage: 50Gi accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain storageClassName: local-storage local: path: /data nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: jupyterhub-node operator: In values: - "true" --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: jupyterhub-user-pvc spec: accessModes: - ReadWriteMany resources: requests: storage: 10Gi storageClassName: local-storage volumeName: jupyterhub-user-pv
and than apply the configuration file
kubectl apply -f jupyter-storage.yaml
-
install jupyterhub with
helm
setup
helm
repositoryhelm repo add jupyterhub https://hub.jupyter.org/helm-chart/ helm repo update
create configuration file
config.yamlhub: nodeSelector: jupyterhub-node: "true" config: JupyterHub: authenticator_class: generic-oauth Authenticator: enable_auth_state: true allow_all: true GenericOAuthenticator: client_id: "jupyterhub-imphys.tnw.tudelft.nl" client_secret: "<SECRET>" oauth_callback_url: "https://jupyterhub-imphys.tnw.tudelft.nl/hub/oauth_callback" authorize_url: "https://auth-test.tudelft.nl/auth/realms/login-tudelft/protocol/openid-connect/auth" token_url: "https://auth-test.tudelft.nl/auth/realms/login-tudelft/protocol/openid-connect/token" userdata_url: "https://auth-test.tudelft.nl/auth/realms/login-tudelft/protocol/openid-connect/userinfo" login_service: "TU Delft netID" scope: - openid username_claim: "netid" singleuser: nodeSelector: jupyterhub-node: "true" storage: type: static static: pvcName: jupyterhub-user-pvc subPath: "{username}" image: name: ronligt/jupyter-nbgitpuller tag: latest cmd: null memory: limit: 2G guarantee: 2G cpu: limit: 2 guarantee: 2 proxy: service: type: NodePort
and use this config file with
helm
helm upgrade -- cleanup-on-fail --install jupyterhub jupyterhub/jupyterhub --version=4.1.0 -f config.yaml
Later you can edit this
config.yaml
and update withhelm
:helm upgrade jupyterhub jupyterhub/jupyterhub -f config.yaml
Using nbgitpuller
¶
Students can retrieve data in their home directory using a URL link created via this website:
Please read the instructions on this website.