Installing JupyterHub with Microk8s¶

keywords: jupyterhub, microk8s, kubernetes, scalable, jupyter, python, nbgitpuller, docker, apache

Introduction¶

We need a scalable and flexible implementation of Jupyter for education and research. In educational settings, students typically require one or two CPU cores and a small amount of memory, whereas researchers often need many cores and large memory allocations. Our goal is to build a system capable of supporting up to 150 students in a single day, after which it can be reconfigured to serve a smaller group of researchers with higher resource demands.

To achieve this, we are utilizing two of our main departmental servers (jupiter and saturn) for computation, while a virtual server (pluto) manages orchestration and interfacing.

Having successfully used JupyterHub for several years, we chose to continue with it for this project. While JupyterHub scales well within a single server for smaller groups (<30 students), a more robust multi-node solution is needed. Therefore, we are leveraging Kubernetes to distribute JupyterHub and the spawned Jupyter instances across multiple servers.

For simplicity and efficiency, we are using Microk8s, a lightweight yet powerful Kubernetes distribution. Microk8s makes it easy to deploy Kubernetes on multiple nodes and includes all necessary tools for configuration and management, allowing us to quickly set up and maintain our system.

For setting up Microk8s, we followed most of the instructions from the official documentation:

https://microk8s.io

For setting up JupyterHub with Kubernetes, we found the following guide particularly helpful:

https://z2jh.jupyter.org/

This document provides all the steps required to set up the service, including brief explanations of certain choices and their rationale.

Storage¶

For education, all students receive their own home directory in a shared NFS storage located on jupiter and saturn. To simplify the setup, we are initially configuring the shared storage as local storage rather than NFS. This decision may be revisited in a later phase.

Web interface¶

The JupyterHub web interface is served via Apache on pluto and is initially configured in Kubernetes using the simplest method: NodePort. In the future, we plan to replace this with Ingress for improved flexibility and scalability.

Setup and control user¶

To enable multiple users to maintain this service, we have created a local, independent user kube with separate home directories on each server:

/home/kube on pluto
/home/kube_jupiter on jupiter
/home/kube_saturn on saturn

Docker container¶

The jupyter/scipy-notebook Docker container includes most of the packages needed for education and research. For educational purposes, we need to add the nbgitpuller package, which enables students to retrieve course materials via a provided URL. Therefore, we create a new Docker container based on scipy-notebook, incorporating nbgitpuller.

Authentication¶

Access to JupyterHub is restricted to users with a valid TU Delft account (netid). Authentication is handled via OpenID, which is supported by TU Delft. We have requested the necessary security credentials to enable this service. At present, it is not possible to restrict access to specific user groups, but we are hopeful that this feature can be implemented in the future.

Instructions¶

On all servers¶

install microk8s
```
sudo snap install microk8s --classic
```

create kube user NOTE: the local kube users cannot share the home directory!

on pluto

sudo useradd -m -d /home/kube -s /bin/bash kube

on jupiter

sudo useradd -m -d /home/kube_jupiter -s /bin/bash kube

on saturn

sudo useradd -m -d /home/kube_saturn -s /bin/bash kube

on all servers

sudo usermod -aG sudo kube
sudo usermod -aG microk8s kube

login as kube Login with your account via ssh and do:
```
su - kube
```
Or, if you have sudo rights and do not know the kube password:
```
sudo -i -u kube
```

setup shell for microk8s

fish:

~/.config/fish/conf.d/microk8s.fish

alias kubectl='microk8s kubectl'
alias helm='microk8s helm'
kubectl completion fish | source

bash:

~/.bashrc

alias kubectl='microk8s kubectl'
alias helm='microk8s helm'
source <(kubectl completion bash)

Logout and login again to activate.

check status
```
microk8s status
```
If microk8s is not yet started:
```
microk8s start
```
Set permissions on ~/.kube
```
sudo chown -R kube:kube $HOME/.kube
```
```
microk8s inspect
```
Solve any errors you encounter with inspect. If needed you can create localnode.yaml with touch.

configure firewall

sudo ufw allow 16443
sudo ufw allow 10250
sudo ufw allow 10255
sudo ufw allow 12379
sudo ufw allow 10257
sudo ufw allow 10259
sudo ufw allow 19001
sudo ufw allow 25000
sudo ufw allow 4789/udp
sudo ufw enable

On a computer with Docker installed¶

As explained we need to build a custom Docker container to be used for starting Jupyter. This container must be made available on via a user account.

Create the following Dockerfile:

FROM quay.io/jupyter/scipy-notebook:latest
RUN pip install --no-cache-dir nbgitpuller

Build the Docker container for the right platform:

docker buildx build --platform linux/amd64 . -t ronligt/jupyter-nbgitpuller:latest

Push the container to the repository:

docker login docker.io
docker push ronligt/jupyter-nbgitpuller

On pluto¶

turn off swap:
```
sudo swapoff -a
```

enable addons

microk8s enable dns
microk8s enable hostpath-storage
microk8s enable dashboard

check dashboard
```
microk8s dashboard-proxy
```
Create an ssh-tunnel from the information and use the given token to authenticate.
```
ssh -L 10443:localhost:10443 pluto
```
pluto is defined in .ssh/config with a proxyjump

And visit this link on your local webbrowser:
- https://localhost:10443
add nodes Per worker node you have to do the following on the pluto:
```
microk8s add-node
```
and follow the instructions.
label the worker nodes as such
```
kubectl label nodes jupiter-imphys jupyterhub-node=true
kubectl label nodes saturn-imphys jupyterhub-node=true
```
These labels are used in the configuration of Kubernetes to start the JupyterHub and Jupyter pods only on the main servers and not on the control server.

create storage for jupyterhub

jupyterhub-storage.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
name: jupyterhub-user-pv
spec:
capacity:
    storage: 50Gi
accessModes:
    - ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
    path: /data
nodeAffinity:
    required:
    nodeSelectorTerms:
        - matchExpressions:
        - key: jupyterhub-node
            operator: In
            values:
            - "true"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jupyterhub-user-pvc
spec:
accessModes:
    - ReadWriteMany
resources:
    requests:
    storage: 10Gi
storageClassName: local-storage
volumeName: jupyterhub-user-pv

and than apply the configuration file

kubectl apply -f jupyter-storage.yaml

install jupyterhub with helm

setup helm repository

helm repo add jupyterhub https://hub.jupyter.org/helm-chart/
helm repo update

create configuration file

config.yaml

hub:
    nodeSelector:
        jupyterhub-node: "true"

    config:
        JupyterHub:
        authenticator_class: generic-oauth

        Authenticator:
        enable_auth_state: true
        allow_all: true

        GenericOAuthenticator:
        client_id: "jupyterhub-imphys.tnw.tudelft.nl"
        client_secret: "<SECRET>"
        oauth_callback_url: "https://jupyterhub-imphys.tnw.tudelft.nl/hub/oauth_callback"
        authorize_url: "https://auth-test.tudelft.nl/auth/realms/login-tudelft/protocol/openid-connect/auth"
        token_url: "https://auth-test.tudelft.nl/auth/realms/login-tudelft/protocol/openid-connect/token"
        userdata_url: "https://auth-test.tudelft.nl/auth/realms/login-tudelft/protocol/openid-connect/userinfo"
        login_service: "TU Delft netID"
        scope:
            - openid
        username_claim: "netid"

singleuser:
    nodeSelector:
        jupyterhub-node: "true"
    storage:
        type: static
        static:
        pvcName: jupyterhub-user-pvc
        subPath: "{username}"
    image:
        name: ronligt/jupyter-nbgitpuller
        tag: latest
    cmd: null
    memory:
        limit: 2G
        guarantee: 2G
    cpu:
        limit: 2
        guarantee: 2

proxy:
    service:
        type: NodePort

and use this config file with helm

helm upgrade -- cleanup-on-fail --install jupyterhub jupyterhub/jupyterhub --version=4.1.0 -f config.yaml

Later you can edit this config.yaml and update with helm:

helm upgrade jupyterhub jupyterhub/jupyterhub -f config.yaml

Using `nbgitpuller`¶

Students can retrieve data in their home directory using a URL link created via this website:

https://nbgitpuller.readthedocs.io/en/latest/link.html

Please read the instructions on this website.