Category Archives: Data engineering

Ingesting BTV’s DC30 dataset into Databricks

Detection engineering teams are increasingly adopting data warehouses for their ability to handle massive data volumes, capabilities that traditional SIEM technologies often lack. Because data warehouses already power many organizations’ big-data needs, security teams can leverage the same platforms the business uses and gain access to valuable business data, such as application-based logs, for detection purposes. This approach brings security closer to the business by eliminating data silos and disparate toolsets, streamlining detection teams so they can focus on building detections rather than managing logging pipelines.

This blog post walks you through creating the resources, manually or with Terraform, to ingest the DC30 Project Obsidian dataset (produced by Blue Team Village) into Databricks. Using the Databricks Medallion Architecture, we’ll load raw logs from S3 into bronze tables and then normalize them to create silver tables. This post serves as a primer for future articles that will demonstrate how to leverage Databricks for detection engineering.

Continue reading