Facebook released an awesome open-source tool named Osquery that is being maintained by a thriving community supported by the Linux Foundation and several product leaders such as Kolide, TrailOfBits, and Uptycs. However, Facebook did not release the server component of Osquery and that has led to the creation of many projects: Kolide, Uptycs, Doorman, OSCRTL, and SGT just to name a few. Furthermore, not all projects have the ability to support the Osquery file carve functionality, more specifically the open-source version of Kolide Fleet. This project set out on a mission to provide an open-source Osquery file carving server for file uploads and downloads that could be used with Kolide.

This blog will provide a deep dive into the architecture of this project, design decisions, and lessons learned as an evolving incident response engineer. This project has been a 6-month long effort that resulted in the creation of 4 blog posts, 3 Udemy certificates/courses, and 3 separate Github repos. The collection of these experiences and research has led to the creation of this project. My hope is that this project benefits the community and provides an additional capability to Osquery that may not be supported by all fleet managers.

Background

What is Osquery?

Osquery exposes an operating system as a high-performance relational database. This allows you to write SQL-based queries to explore operating system data. With Osquery, SQL tables represent abstract concepts such as running processes, loaded kernel modules, open network connections, browser plugins, hardware events, or file hashes.

What is file carving?

File carving is when you instruct Osquery to TAR up a file on an endpoint and send the TAR to a server for further analysis. The goal of this capability is removing the need of the incident response team having to manually obtain the file from a remote machine.

Architecture

Network diagram

Security over feasibility

This project was designed with security in mind and therefore running this project is not as easy as docker-compose up -d. First, the default configuration used by NGINX enforces mutual TLS (mTLS) for all attempts to access Kolide or the osquery-file-carve server. Therefore to use this project in the default configuration you need an existing certificate infrastructure. Second, the Vault configuration enforces an HTTPS connection to the Vault server. Third, to request files from the osquery-file-carve server the user must be authenticated by Vault and given permissions to access that resource.

Overview

Phase 0: Deploy client certificates

Since the Osquery file carve functionality does NOT support any type of authentication, I decided to implement mutual TLS. Mutual TLS enforces that only clients with the proper certificates can communicate with Kolide and the osquery-file-carve server. The Osquery-file-carve server can be set up without mutual TLS but that setup is NOT advised. That setup will allow arbitrary uploads to the Osquery-file-carve server which can result in a DDOS or more likely the storage backend becoming full. If you’re curious about how to set up mutual TLS, please see my blog post on: Kolide and Osquery with mutual TLS.

Phase 1: Config Osquery and NGINX

The various components of this architecture act like a symphony. Everyone has to be on the same musical page to play the song in harmony. The configurations provided by the repo for this project are configured for Osquery to send and for NGINX to receive 1MB file chunks. If Osquery is configured to send 3MB file chunks but NGINX has a client_max_body_size of 1MB it will reject all the data blocks. Furthermore, Osquery has another setting called read_max which sets the maximum size of files it can interact with. If the read_max is set to 100MB and the file you want to upload is 1GB, Osquery will not upload the file. This activity also happens when you ask Osquery to generate a file hash for a file larger than the read_max value.

Phase 3: Kolide request a file carve

Once everything has been configured properly, the incident responder can initiate a file carve from the Kolide console. First, the incident responder will find the machine they want to request a file from. Next, the incident responder will generate a file carve request with the following query: SELECT * FROM carves WHERE path=<path to file> AND carve=1;. More on this process will be demonstrated below (Section: Let’s carve this turkey). Once the request has been submitted to Kolide, the incident responder will have to wait for the Osquery client to check into Kolide to obtain the latest task.

Phase 4: Osquery start_upload

When Osquery receives the file carve request it will perform the following actions described in this section. First, Osquery will calculate the size of the file to determine if the file is smaller than the configured read_max value. Second, Osquery will generate a file carve GUID which is a unique value for that file carve request. The remaining actions are: Osquery will generate: a SHA256 file hash of the file, generate an upload timestamp, generate block count (file size / block size), and the file carve size in bytes after the file has been TARed up. Next, Osquery will send an HTTP POST request to the osquery-file-carve server (/start_uploads) which contains the following data: block_count, block_size, carve_size, and carve_id (file carve GUID).

Before the HTTP POST request is received by the osquery-file-carve-server the TLS connection is evaluated by NGINX. NGINX is enforcing mutual TLS (mTLS) so all connections must have a client certificate to ensure they are authorized to access this network resource. Once the mTLS check has been verified the request is forwarded to the osquery-file-carve server. The osquery-file-carve server will create a session ID for the file upload request. The server will record that the incoming file carve GUID and the newly generated session ID are related. Therefore all future HTTP POST requests that include this sessionID are related to this file carve.

Lastly, the server returns a session ID to the Osquery client/agent which will be used for the remainder of the file carve upload process. It’s important to note, that the osquery-file-carve server does not implement any type of authentication and that is because the Osquery agent does not support it and it relies on the mTLS verification.

Phase 5: Osquery upload_blocks

Once the Osquery agent receives a session ID it will start to send blocks of data IN ORDER to the server. Each data block sent is actually an HTTP POST request that includes the following data: current block_id, session_id, and block_data, which is a base64 blob. When the server receives a data block it verifies that the session ID is indeed a valid session ID, if so it will accept the incoming HTTP POST request, if not it will be rejected. Next, the server will extract the various sections of the HTTP POST request.

For each data block received, the server updates the file carve session dictionary (map) with the latest block_id received, updates the timestamp value with the current time which is used to keep track of the last time a block was received, the current block_id is added to a list to keep track of all blocks received, and the block_data section of the HTTP POST request is extracted and decoded. On the first data block received, the server will create a file stream or a Mongo stream for the file carve upload process. All future data blocks with a corresponding session ID that are received by the server will use this stream to write the decoded block_data section to the appropriate storage backend.

Lastly, the server evaluates if all the blocks have been received by the Osquery agent. If not all blocks have been received a simple HTTP status of code of 200 is returned to let the Osquery agent know the block was successfully received and written to the storage backend. If all blocks have been received the HTTP status code is 200 and a JSON blob of {"success": true} is sent back to the Osquery agent.

Phase 6: Vault token request

At this point, the Osquery client has successfully uploaded the file carve to the osquery-file-carve server. To retrieve this file carve from the osquery-file-carve server the incident responder must first authenticate themselves. This portion of the project took the longest because I didn’t want to pigeon hole the osquery-file-carve server to a single platform/service for authentication. After some research, I decided to implement support for Vault into the osquery-file-carve server because it provides the ability for users to authenticate using various methods. However, during my research, I had a hard time understanding how the various Vault components connected to create this functionality, check out this blog post for more information.

By implementing the model demonstrated in the blog post provided, I was able to support Github, Auth0, and LDAP for authentication for the osquery-file-carve server. First, I created a blank Vault policy intentionally. Policies are used to allocate permissions to users/groups to perform certain actions. However, I just needed something that could say this user has the ability to request files from the osquery-file-carve server. Therefore, I created a blank policy named osquery-carve-file-request and attached it to certain users/groups.

When a file request is received by the osquery-file-carve server it first checks that the Vault token is valid and if so the token must be associated with the osquery-carve-file-requestpolicy. If the token is invalid or is not associated with the proper policy the file request is rejected.

Phase 7: Create a file request from Mongo

The file request to the osquery-file-carve server contains the necessary information to authenticate the request but it also contains the file carve GUID. If the file request is successfully authenticated the server checks to ensure the file carve GUID exists. The reason this check is performed is that the osquery-file-carve server will perform a clean up of old files, by default, this is set to 30 days. Next, the osquery-file-carve server creates a stream, reads the file in chunks, and sends each chunk to the incident responder.

Generate certificates

The repo for this project contains self-signed certificates. These certificates should ONLY be used for development purposes and NOT PRODUCTION. If you use the certificates provided by this repo you will need to import the device certificate, click here for more information. If you would prefer to generate your own certificates with Vault, check out this blog post. If your organization does not have a certificate manager and you would like to develop your own self-signed root CA with OpenSSL, for testing please click here for more information. The rest of this blog post will proceed with the self-signed certificates provided in this repo and will assume mTLS is being enforced.

git clone https://github.com/CptOfEvilMinions/osquerey-file-carve-server
cd osquerey-file-carve-server
The root CA should be placed at conf/tls/root_ca/
The certificate and private key for NGINX for mTLS should be placed at conf/tls/kolide/
The certificate and private key for Osquery for mTLS should be placed at conf/tls/device/

Spin up Docker stack

docker-compose build
docker-compose run --rm kolide fleet prepare db --config /etc/kolide/kolide.yml
1. Initializes Kolid database
docker-compose up -d

Kolide setup

Set username & password

Open a web browser to https://<IP addr of Docker/FQDN of Kolide>:8443
Enter “admin” for username
Enter password
Enter “<e-mail for admin>” for email
Select “Submit”

Setup organization

Enter “<Org name>” into organization name
Enter “<ORG URL logo>”
Select “Submit”

Set Kolide web address

Enter “https:<FQDN for Kolide or Docker IP addr>:8443”
Select “Submit”
Select “Finish”

Obtain Osquery enroll secret

Select “Add new host” in the top right
Select “show” above the text box
Copy the enroll secret

Vault setup

This blog post is going to assume you have a Vault instance, if not please see this blog post. This blog post also assumes you have auth backends configured for authentication and a group for the incident responders set up, if not, please see this blog post.

Create osquery-file-carve-file-requests policy

cd osquerey-file-carve-server
Login into Vault as an admin
vault policy write osquery-file-carve-file-requests conf/vault/osquery-file-carve-file-requests.hcl
1. If you modify the name of the policy you will need to modify the policy value in conf/osquery-file-carve/osquery-file-carve.yml
Attach this policy to your incident responder group

Osquery test client setup on macOS

This blog post assumes you have Osquery setup and it’s communicating with Kolide, if not please see this blog post. The instructions below and configurations in the repo for this project are to set up a test instance of Oqsuery.

brew install osquery
1. Install Osquery
cd osquerey-file-carve-server
echo '<Osquery Enroll Secret>' > conf/osquery/osquery.key
1. Set Osquery Enroll Secret
sed -i '' s/kolide.hackinglab.local:8443/<Docker IP addr or Kolide FQDN>/g' conf/osquery/osquery.flags
1. Configure the location of Kolide via IP address or FQDN
sudo osqueryd --verbose -flagfile conf/osquery/osquery.flags
1. Spin up a test instance of Osquery

Let’s carve this turkey

Download test file

Open a terminal
cd /tmp
wget http://ipv4.download.thinkbroadband.com/50MB.zip

Initiate file carve

Login into Kolide
Select your newly added host to run a query
Query: SELECT * FROM carves WHERE path like '/tmp/50MB.zip' AND carve=1;
1. This query will NOT return any data to Kolide
Query: SELECT * FROM carves WHERE path like '/tmp/%';

Generate Vault token

Log into Vault with a user apart of the incident responder group
1. This user must be part of the group that has the osquery-file-carve-file-requests policy
Copy token and token_accessor for later

Download file

Copy File GUID from Kolide query above
Open a terminal
cd /tmp
mv /tmp/50MB.zip /tmp/orig_50MB.zip
curl -k https://<KOLIDE FQDN>/file_request -A "osquery/4.3.0" --cert <file path to device cert> --key <file path to device key> --cacert <file path to root CA cert> -d '{"file_carve_guid": "<File Carve GUID>". "token_accessor": "<token_accessor>", "token": "<token>"}' --output /tmp/<File Carve GUID>.tar
tar -xvf <File Carve GUID>.tar
1. Will UNtar the file carve
file /tmp/*50MB.zip
shasum /tmp/*.zip

Discussion

Read_max, carve_block_size, and client_max_body_size OHHH MY

This architecture has three variable settings that can affect the operation of this application. The first setting is the read_max setting for Osquery. This setting will restrict Osquery from interacting with any files that are larger than the configured value. By default, the value is set to 50MB which is a generous amount but if you want to upload larger files you will need to increase this value. The second Osquery setting is the carver_block_size which defines the size of each block/file chunk. For example, our config has this setting set to 1MB so if you upload a 30MB file it will be evenly divided into thirty 1MB blocks. The last setting is the client_max_body_size setting configured by NGINX which limits the payload size a client can send to the server.

The important thing to understand is the carve_block_size relies on the client_max_body_sizeto be equal to or greater. If the carve_block_size is larger than the NGINX configured threshold it will drop incoming payloads. The settings configured for this blog are in no way optimized for the best performance. Based on your goals and infrastructure these settings will need to be modified.

Memory consumption

This section of the discussion focuses on the consumption of memory for a Mongo GridFS backend vs. a file system backend. Hands down the filesystem backend is way more performant on memory than using GridFS. To test the difference I created a Python script that mocked the Osquery client. Our first test was uploading a 10MB.zip to the osquery-file-carve server with ten threads. This means 10 file uploads were occurring simultaneously. Our second test was the exact same test but with a 100MB.zip test file.

With a Mongo GridFS backend, the first test consumed 430MB, and the second test consumed 530MB of memory for an average of 480MBs. The same tests were performed with a file system backend and the average memory consumption was 125MB. Both backends have there pros and cons. The Mongo backend may consume more resources but it allows for scalability horizontally and backups. The file system backend makes it hard to scale out this architecture because the files will reside on each instance.

Lessons learned

I am currently reading a book called “Cracking the Coding Interview” and it is a great book. One interesting part of the book is their matrix to describe projects you worked on and the matrix contains the following sections which are: challenges, mistakes/failures, enjoyed, leadership, conflicts, and what would you’d do differently. I am going to try and use this model at the end of my blog posts to summarize and reflect on the things I learn. I don’t blog to post things that I know, I blog to learn new things and to share the knowledge of my security research.

New skills/knowledge

Read the Osquery’s open-source code to understand how it interacts with a file server for file carve uploads
Learned about GoLang mutexes and implemented them to avoid race conditions
Learned about Mongo GridFS and implemented it as a storage backend
Learned how to optimize file uploads and downloads to reduce resource consumption, specifically memory consumption
Learned how to implement JSON web tokens (JWT) with GoLang but removed the capability because it was replaced by Vault
Learned about gRPC and how to implement NGINX to be a gRPC proxy
Learned how to generate a root CA and leaf certificates with OpenSSL

Challenges

Learned the hard way that the zero value for a GoLang map is nil. Therefore when I attempted to add items to the map I received an unhelpful error.
Learned the hard way that Osquery will include trailing spaces as part of the config value. In my config I had --carver_start_endpoint=/start_uploads ( <—- trailing space) so that made Osquery do an HTTP request for https://kolide.hackinglab.local:8000/start_uploads , which doesn’t exist
Learned the hard way that Mongo will not create an empty database. It will only create a database upon data being written to it according to this StackOverFlow post.
GoLang’s middleware doesn’t allow the re-use of the http.Requets handler multiple times. This made it hard to implement a token/auth validator middleware in GoLang.
Kolide uses HTTP and gRPC for its API which requires a specific NGINX proxy setup
When selecting Mongo GridFS as the storage backend the memory consumption was 2-4x as much memory compared to the file system backend.

Enjoyed

Reading the Osquery’s open-source code to understand how it interacts with a file server for file carve uploads

What You’d Do Differently

Implement the token/auth validator as middleware and not as a function call.
Reduce the memory consumption of the Mongo GridFS storage backend for file uploads

HoldMyBeer

Cause every great story starts with "Hold my beer"

Setup my GoLang Osquery-file-carving server with Kolide