Skip to content

Installing JupyterHub with Microk8s

keywords: jupyterhub, microk8s, kubernetes, scalable, jupyter, python, nbgitpuller, docker, apache

Introduction

We need a scalable and flexible implementation of Jupyter for education and research. In educational settings, students typically require one or two CPU cores and a small amount of memory, whereas researchers often need many cores and large memory allocations. Our goal is to build a system capable of supporting up to 150 students in a single day, after which it can be reconfigured to serve a smaller group of researchers with higher resource demands.

To achieve this, we are utilizing two of our main departmental servers (jupiter and saturn) for computation, while a virtual server (pluto) manages orchestration and interfacing.

Having successfully used JupyterHub for several years, we chose to continue with it for this project. While JupyterHub scales well within a single server for smaller groups (<30 students), a more robust multi-node solution is needed. Therefore, we are leveraging Kubernetes to distribute JupyterHub and the spawned Jupyter instances across multiple servers.

For simplicity and efficiency, we are using Microk8s, a lightweight yet powerful Kubernetes distribution. Microk8s makes it easy to deploy Kubernetes on multiple nodes and includes all necessary tools for configuration and management, allowing us to quickly set up and maintain our system.

For setting up Microk8s, we followed most of the instructions from the official documentation:

For setting up JupyterHub with Kubernetes, we found the following guide particularly helpful:

This document provides all the steps required to set up the service, including brief explanations of certain choices and their rationale.

Storage

For education, all students receive their own home directory in a shared NFS storage located on jupiter and saturn. To simplify the setup, we are initially configuring the shared storage as local storage rather than NFS. This decision may be revisited in a later phase.

Web interface

The JupyterHub web interface is served via Apache on pluto and is initially configured in Kubernetes using the simplest method: NodePort. In the future, we plan to replace this with Ingress for improved flexibility and scalability.

Setup and control user

To enable multiple users to maintain this service, we have created a local, independent user kube with separate home directories on each server:

  • /home/kube on pluto
  • /home/kube_jupiter on jupiter
  • /home/kube_saturn on saturn

Docker container

The jupyter/scipy-notebook Docker container includes most of the packages needed for education and research. For educational purposes, we need to add the nbgitpuller package, which enables students to retrieve course materials via a provided URL. Therefore, we create a new Docker container based on scipy-notebook, incorporating nbgitpuller.

Authentication

Access to JupyterHub is restricted to users with a valid TU Delft account (netid). Authentication is handled via OpenID, which is supported by TU Delft. We have requested the necessary security credentials to enable this service. At present, it is not possible to restrict access to specific user groups, but we are hopeful that this feature can be implemented in the future.

Instructions

On all servers

  1. install microk8s

    sudo snap install microk8s --classic
    
  2. create kube user NOTE: the local kube users cannot share the home directory!

    on pluto
    sudo useradd -m -d /home/kube -s /bin/bash kube
    
    on jupiter
    sudo useradd -m -d /home/kube_jupiter -s /bin/bash kube
    
    on saturn
    sudo useradd -m -d /home/kube_saturn -s /bin/bash kube
    
    on all servers
    sudo usermod -aG sudo kube
    sudo usermod -aG microk8s kube
    
  3. login as kube Login with your account via ssh and do:

    su - kube
    

    Or, if you have sudo rights and do not know the kube password:

    sudo -i -u kube
    
  4. setup shell for microk8s

    • fish:

      ~/.config/fish/conf.d/microk8s.fish
      alias kubectl='microk8s kubectl'
      alias helm='microk8s helm'
      kubectl completion fish | source
      
    • bash:

      ~/.bashrc
      alias kubectl='microk8s kubectl'
      alias helm='microk8s helm'
      source <(kubectl completion bash)
      

    Logout and login again to activate.

  5. check status

    microk8s status
    

    If microk8s is not yet started:

    microk8s start
    

    Set permissions on ~/.kube

    sudo chown -R kube:kube $HOME/.kube
    
    microk8s inspect
    

    Solve any errors you encounter with inspect. If needed you can create localnode.yaml with touch.

  6. configure firewall

    sudo ufw allow 16443
    sudo ufw allow 10250
    sudo ufw allow 10255
    sudo ufw allow 12379
    sudo ufw allow 10257
    sudo ufw allow 10259
    sudo ufw allow 19001
    sudo ufw allow 25000
    sudo ufw allow 4789/udp
    sudo ufw enable
    

On a computer with Docker installed

As explained we need to build a custom Docker container to be used for starting Jupyter. This container must be made available on via a user account.

  1. Create the following Dockerfile:

    FROM quay.io/jupyter/scipy-notebook:latest
    RUN pip install --no-cache-dir nbgitpuller
    
  2. Build the Docker container for the right platform:

    docker buildx build --platform linux/amd64 . -t ronligt/jupyter-nbgitpuller:latest
    
  3. Push the container to the repository:

    docker login docker.io
    docker push ronligt/jupyter-nbgitpuller
    

On pluto

  1. turn off swap:

    sudo swapoff -a
    
  2. enable addons

    microk8s enable dns
    microk8s enable hostpath-storage
    microk8s enable dashboard
    
  3. check dashboard

    microk8s dashboard-proxy
    

    Create an ssh-tunnel from the information and use the given token to authenticate.

    ssh -L 10443:localhost:10443 pluto
    

    pluto is defined in .ssh/config with a proxyjump

    And visit this link on your local webbrowser:

  4. add nodes Per worker node you have to do the following on the pluto:

    microk8s add-node
    

    and follow the instructions.

  5. label the worker nodes as such

    kubectl label nodes jupiter-imphys jupyterhub-node=true
    kubectl label nodes saturn-imphys jupyterhub-node=true
    

    These labels are used in the configuration of Kubernetes to start the JupyterHub and Jupyter pods only on the main servers and not on the control server.

  6. create storage for jupyterhub

    jupyterhub-storage.yaml
    apiVersion: v1
    kind: PersistentVolume
    metadata:
    name: jupyterhub-user-pv
    spec:
    capacity:
        storage: 50Gi
    accessModes:
        - ReadWriteMany
    persistentVolumeReclaimPolicy: Retain
    storageClassName: local-storage
    local:
        path: /data
    nodeAffinity:
        required:
        nodeSelectorTerms:
            - matchExpressions:
            - key: jupyterhub-node
                operator: In
                values:
                - "true"
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: jupyterhub-user-pvc
    spec:
    accessModes:
        - ReadWriteMany
    resources:
        requests:
        storage: 10Gi
    storageClassName: local-storage
    volumeName: jupyterhub-user-pv
    

    and than apply the configuration file

    kubectl apply -f jupyter-storage.yaml
    
  7. install jupyterhub with helm

    setup helm repository

    helm repo add jupyterhub https://hub.jupyter.org/helm-chart/
    helm repo update
    

    create configuration file

    config.yaml
    hub:
        nodeSelector:
            jupyterhub-node: "true"
    
        config:
            JupyterHub:
            authenticator_class: generic-oauth
    
            Authenticator:
            enable_auth_state: true
            allow_all: true
    
            GenericOAuthenticator:
            client_id: "jupyterhub-imphys.tnw.tudelft.nl"
            client_secret: "<SECRET>"
            oauth_callback_url: "https://jupyterhub-imphys.tnw.tudelft.nl/hub/oauth_callback"
            authorize_url: "https://auth-test.tudelft.nl/auth/realms/login-tudelft/protocol/openid-connect/auth"
            token_url: "https://auth-test.tudelft.nl/auth/realms/login-tudelft/protocol/openid-connect/token"
            userdata_url: "https://auth-test.tudelft.nl/auth/realms/login-tudelft/protocol/openid-connect/userinfo"
            login_service: "TU Delft netID"
            scope:
                - openid
            username_claim: "netid"
    
    singleuser:
        nodeSelector:
            jupyterhub-node: "true"
        storage:
            type: static
            static:
            pvcName: jupyterhub-user-pvc
            subPath: "{username}"
        image:
            name: ronligt/jupyter-nbgitpuller
            tag: latest
        cmd: null
        memory:
            limit: 2G
            guarantee: 2G
        cpu:
            limit: 2
            guarantee: 2
    
    proxy:
        service:
            type: NodePort
    

    and use this config file with helm

    helm upgrade -- cleanup-on-fail --install jupyterhub jupyterhub/jupyterhub --version=4.1.0 -f config.yaml
    

    Later you can edit this config.yaml and update with helm:

    helm upgrade jupyterhub jupyterhub/jupyterhub -f config.yaml
    

Using nbgitpuller

Students can retrieve data in their home directory using a URL link created via this website:

Please read the instructions on this website.