Monthly Archives: February 2024

Spark, JupyterHub, Minio, and Helm on Kubernetes

At work we recently got Databricks which utilizes open source technologies under the hood. This got me thinking whether I could create a Databricks equivalent with open source software in my homelab. Over the Thanksgiving holiday week, I started playing around with deploying JupyterHub, Minio, and Spark on Kubernetes with Helm. I was able to get a working proof of concept (PoC) that would allow me to read raw log data from Minio using Spark jobs initiated by Python Jupyter notebook to ingest those events into a Spark schema, write that data as a Delta table, and then query said Delta table using a Jupyter notebook.

Continue reading