Create a custom Splunk search commands with Python3

This blog post will demonstrate how to create a custom Python search command for Splunk and will demystify common roadblocks such as: how to create a custom search command with Python, how to store secrets for a custom search command, and how to install external Python libraries. With each roadblock discussed we will also cover the solution as code examples and hands-on exercises. To do this, we must first start with an introduction to the architecture of a custom Python search command.

DISCLAIMER

This blog post is providing proofs-of-concepts for how to engineer different custom Python search commands for Splunk. This blog post does not supersede the documentation provided by Splunk.  The code being released is licensed under the MIT license.

DISCLAIMER

Background

What is Splunk?

Splunk is an advanced, scalable, and effective technology that indexes and searches log files stored in a system. It analyzes the machine-generated data to provide operational intelligence. The main advantage of using Splunk is that it does not need any database to store its data, as it extensively makes use of its indexes to store the data. Splunk is a software mainly used for searching, monitoring, and examining machine-generated Big Data through a web-style interface.

Splunk performs capturing, indexing, and correlating the real-time data in a searchable container from which it can produce graphs, reports, alerts, dashboards, and visualizations. It aims to build machine-generated data available over an organization and is able to recognize data patterns, produce metrics, diagnose problems, and grant intelligence for business operation purposes. Splunk is a technology used for application management, security, and compliance, as well as business and web analytics.

What is a Splunk custom search command?

Custom search commands are user-defined Splunk Search Processing Language (SPL) commands that extend SPL to serve your specific needs. Although Splunk software includes an extensive set of search commands, these existing commands might not meet your exact requirements. Custom search commands let you perform additional data analysis in Splunk Enterprise.

potato potahto – Splunk app

Splunk app?!?!!?!? Don’t you mean Splunk search command? A Splunk search command is really a Python script bundled inside a Splunk app. When Splunk starts it loads all the Splunk apps and in our case it registers the custom search command.

How custom search commands work

This section is copied straight from the Splunk documentation. Custom search commands process data through an external Python script that runs alongside splunkd at search time.

  1. Splunk Enterprise parses each line of SPL and determines whether the search command is custom. Custom search commands are designated by a stanza in the commands.conf file.
  2. If the search command is custom, Splunk Enterprise runs the Python script for the command.
  3. Splunk Enterprise pipes search results through the Python script in chunks through STDIN and writes them out through STDOUT.
  4. After processing the Python script, the search results re-enter the main search pipeline.

The following diagram illustrates how search results exit and re-enter the search pipeline for a custom search command:

Throughout the custom search command process, splunkd and the Python script exchange metadata through a series of getinfo and execute commands.

  1. splunkd sends the getinfo command to request information, including the command type and required fields, from the Python script.
  2. splunkd sends a separate execute command for each chunk of search results in the pipeline.
  3. The Python script processes each chunk of search results.
  4. The Python script sends a response back to splunkd.

After all search results pass through the Python script, splunkd closes the STDIN pipe to terminate the process.

Splunk app file structure

The diagram above displays the file structure of a Splunk app. For this blog post, we will use the following sub-directories: default/, lib/, bin/, and local/.

  • default/ – The default/ directory contains the original configuration and dashboard files for the app
  • local/ – The local/ directory contains local modifications to the configuration and dashboard files for the app. Files in local/ override the equivalent items in the default/ directory. When an app is upgraded, the default/ configuration can be overwritten but the local/ directory is preserved.
  • lib/ – Store any Python dependencies in lib/, not in bin/, and include splunklib if required.
  • bin/ – Store any Python files that are referenced by a CONF file in bin/
  • default/commands.conf – contains descriptions for the setting/value pairs that you can use for creating search commands for custom search scripts.
  • local/apps.conf – contains the state of a given app in the Splunk platform

Splunk app Python environment (lib/)

Understanding the difference between the Python environment located at lib/ bundled with each Splunk app and the Python environment installed on the OS can be tricky to grasp. Typically, the Python library known as Splunklib is installed to lib/ and contains all the necessary code components to build a custom search command with Python. Additionally, this lib/ directory contains all the necessary external Python libraries. In Example 2: Community ID search command + external PIP module section we will provide instructions to install an external Python library to this directory.

When the Python script for a custom search command is executed it runs the local Python interpreter provided by the OS. As you will see in the code examples below, Python will be instructed to load all libraries from the local lib/ directory within the Splunk app.

It should be noted that since Splunklib is not installed globally on the system each Splunk app contains it’s own instance. This decision design means that each Splunk app will contain an instance of Splunklib which creates storage overhead on your Splunk server. Based on some research, this design decision was made to reduce dependency conflicts. One advantage of this design is each app is bundled with all the necessary software to run.

Spin up Splunk with Docker

docker-compose

  1. git clone https://github.com/CptOfEvilMinions/BlogProjects
  2. cd BlogProjects/splunk-custom-search-command-python
  3. docker-compose up

Create new indexes

  1. Open a browser to http://127.0.0.1:8000
    1. Username: admin
    2. Password: Changeme123!
  2. Settings > Data > Indexes
  3. Select “New index” in the top right
    1. Enter “Zeek” for index name
    2. Leave all the settings as default
    3. Select “Save”

Upload test data

  1. Select “Splunk enterprise” in the top left
  2. Select “Add data” at the top
  3. Select “Upload – files from my computer”
  4. Select File
    1. Add Data
      1. Select “zeek_conn.json” from BlogProjects/splunk-custom-search-command-python/sample_logs/zeek_conn.json
    2. Set Source Type
      1. Set  Source type to bro:conn:json
    3. Input settings
      1. Select the Zeek index
    4. Submit
  5. Repeat the steps above for BlogProjects/splunk-custom-search-command-python/sample_logs/zeek_files.json source type set to bro:files:json

Example 1: Building our first Splunk custom search command – Hello World

When learning any new programming language the first thing you program is printing “Hello world!” to the screen. This section will demonstrate how to create a custom Python search command for Splunk that appends “Hello world” to each log entry. This section will also provide a hands-on experience with creating your very first custom search command made with Python.

Step 1: Create Splunk app environment

  1. mkdir hello_world
    1. Create Splunk app directory
  2. cd hello_world
  3. mkdir lib local default bin
    1. Create the app directory structure
  4. pip3 install --target=./lib splunklib
    1. Install Splunklib to the Splunk app Python environment
  5. pip3 install --target=./lib splunk-sdk
    1. Install Splunk SDK to the Splunk app Python environment
  6. touch bin/hello_world.py
  7. Open bin/hello_world.py with your favorite text editor

Step 2: Generate Python code for Splunk app

The goal of this entire blog post is to demonstrate different Python code examples but for this example we will keep it simple. The screenshot below is displaying the minimal amount of code needed to make a Splunk streaming command made with Python. First, you start with by defining the interpreter (line 1) to use when executing this script, like you would with a BASH script. Second, we import the necessary Python libraries (line 2) to instruct Python (line 4) that all Python library imports from this point forward can be located at ../lib. In the previous section we installed the Splunk Python libraries to the lib/ directory of our app. Third, we import the necessary Splunk Python modules (line 5) for our app, and in the examples to come, we will load additional Python modules.

Fourth, a Python class is created (line 8) and you can name it anything you would like, the typical naming convention is class <name>Command(StreamingCommand):. Fifth, the stream function is defined (line 9) and this is the entry point for this script. In addition, this function is passed two parameters which are self and records. Records is a Python list of log events that can be iterated over.

In our hello world example, we will iterate over each record (line 10) and append a key-value pair of key: hello and value: world (line 11). The last thing we do in this function is return/yield the record back to Splunk (line 12). The final line for a Python custom search command script is to end with the dispatch function call (line 14). This function takes in several parameters but the one that matters to us is the first parameter which should contain the name of the class created above. As you can see in the screenshot below, the class name is HelloWorldCommand and the dispatch function call contains the same name in the first parameter.

This blog post is only going to cover creating a stream command but there are other types. You might be asking at this point so how do I know what code to use for the other Splunk custom command types? I used the following Github repo code examples to learn what code is required for each custom command type, which can be found here.

Step 3: commands.conf

This config file defines the name of the Splunk custom search command, which is defined at the top within the brackets ([helloworld]). As you can see from the screenshot below our command will be named helloworld. Next, a Python version should be defined since at the time of this writing the default Python version used by Splunk is Python2 which is end-of-life (EOL). For our custom search commands we will be exclusively using Python3.

Next, you need to set chunked to true (chunked = true) which means/instructs Splunk to stream logs to the Python script in chunks. Lastly, we define the name of the Python script to be executed when the command defined in the brackets above is executed. It should be noted that by default the Splunk app will look in the hello_world/bin/ directory for the following Python script: hello_world.py. Our Splunk app only contains a single custom search command and a single Python script but an app could be bundled with more than one. For additional settings please see this Splunk documentation.

  1. touch default/commands.conf
  2. Add the following code:

Step 4: Installing the Splunk app

The default location to store Splunk apps is located at $SPLUNK_HOME/etc/apps. When Splunk starts it registers the custom search command and instructs Splunk where the Python script is located to transform the data. The docker-compose for this blog post mounts the local Splunk application directories into the Splunk container at the following path $SPLUNK_HOME/etc/apps/<app_name>. At the start, Splunk will load all the applications from this directory which can be seen from the console if you select “App: Search & reporting” > “Manage apps”.

  1. docker-compose down
  2. docker-compose up

Step 5: Using custom search command

As you can see from the screenshot below there is a new appended key-value pair of key: hello and value: world. This is a very simple example but I hope you can see the limitless potential here.

  1. Open a browse to http://127.0.0.1:8000 and login
  2. Select “Search & Reporting” on the left
  3. Query: index="zeek" sourcetype="bro:conn:json" | helloworld | table uid, src_ip, dest_ip, hello

 

Example 2: Community ID search command + external PIP module

This example is demonstrating how to create a custom Python search command for Splunk that depends on external dependencies. This example is using the Corelight Python library to generate Community IDs based on the network connection information in Zeek logs.

Code explanation

The code above starts by defining multiple Option for the user to specify the field names for destination IP address (dest_ip), destination port (dest_port), source IP address (src_ip), source port (src_port), and protocol (protocol). Second, this Python script extracts all the necessary values from the Splunk record. Additionally, it is super important to understand that to extract fields from a record it needs to be an extracted field defined by the sourcetype. Our app contains the Zeek sourcetypes for Splunk to extract all the fields. This custom search command WILL NOT work if the data in the index does not conform to the sourcetype. Once all the fields have been extracted, we use the Corelight library to generate a Community ID. A new field is added to the record named community_id with the value generated.

Setup custom search command

  1. docker-compose down
  2. cd BlogProjects/splunk-custom-search-command-python
  3. pip3 install -r community_id_pip_search_command/requirements.txt --target=community_id_pip_search_command/lib
    1. This command will install the external Python libraries (splunk-sdk, splunklib, and communityid) to the apps lib directory
  4. docker-compose up

Query time

  1. Log into Splunk
  2. Select “Search & Reporting”
  3. Query: index="zeek" sourcetype="bro:conn:json" | communityid( dest_ip=dest_ip, dest_port=dest_port, protocol=proto, src_ip=src_ip, src_port=src_port ) | table src_ip, src_port, dest_ip, dest_port, proto, community_id

Example 3: Hybrid Analysis search command + Splunk credential store

This example is demonstrating how to create a custom Python search command for Splunk that requires secret(s). This custom search command will query Hybrid Analysis for a threat score for a particular file hash.

Code explanation

setup.xml

Typically, when you install a Splunk app it will require some setup to have it function as desired. The setup.xml file contains the necessary code to setup up a Splunk app such as asking the user to enter an API key. The XML in the screenshot below is requesting the user to enter an API key for Hybrid Analysis. The user input is store in <app_directory>/local/passwords.conf.

passwords.conf

In the previous section, an API key is provided by the user and stored in <app_directory>/local/passwords.conf. Any passwords entered by the user are encrypted using Splunk’s private key located at $SPLUNK_HOME/etc/auth. This private key is unique to each Splunk instance and therefore you can’t copy a password.conf from one Splunk instance to another.

prepare()

Below we have a Python function named prepare() which is the entry point for this script.This function utilizes the Splunklib library to request Splunk to fetch the API key for Hybrid Analysis from passwords.conf and return the clear text password. Once the API key has been retrieved it is stored in a global Python variable named API_KEY.

The important thing to note in this section is password.conf (screenshot above) sets the secret name to hybridanalysis and that is the same thing being queried on line 74 (screenshot below). Lastly on lines 78 and 88, if the API key could not be retrieved the Python script should return an error.

stream()

The stream function (line 81) starts the manipulation of log entries. This function is passed two parameters which are self and records. Records is a Python list of log events that can be iterated over. This app iterates over each record (line 82) and requests the threat score from Hyrbid Analysis (line 83) to append the threat score to the log record. Lastly, this modified record is returned to Splunk with new information (line 84).

Setup custom search command

  1. docker-compose down
  2. cd BlogProjects/splunk-custom-search-command-python
  3. pip3 install -r requirements.txt --target=hybrid_analysis_cred_store_search_command/lib
    1. This command will install the external Python libraries (splunk-sdk, splunklib) to the apps lib directory
  4. docker-compose up

Creating a secret

Method 1: Create a secret with a CURL post

This method uses the Splunk API to create a secret that is encrypted by Splunk.

  1. curl -k https://localhost:8089/servicesNS/nobody/<app name>/storage/passwords -u admin:<admin password> -d name=<secret name> -d password=<secret>
  2. docker exec -it --user=0 splunk bash
  3. cat etc/apps/search/local/passwords.conf

Method 2: App setup with setup.xml

This method is an alternative to providing users access to Splunk API which should be restricted to a need only basis. The Hybrid Analysis custom search command is actually a Splunk app and this app contains a setup.xml that will take input from users via the web console to generate encrypted secrets.

  1. docker-compose up -d
  2. Login into Splunk
  3. Select “Manage apps”
  4. Find the “hybrid_analysis_cred_store_search_command” entry and select “Set up”
    1. Enter API key into the Password
  5. docker exec -it --user=0 splunk bash
  6. cat etc/apps/hybrid_analysis_cred_store_search_command/local/passwords.conf

Method 3: Storing the password in plaintext

I am only mentioning this method to let you know this method SHOULD NOT BE USED. Storing the secret in passwords.conf will store the password in a text file in plaintext which is BAD. Use one of the methods demonstrated above for secret storage.

Query time

  • Query 1: index="zeek" sourcetype="bro:files:json" | dedup sha1 | hybridanalysislookup( file_hash=sha1 ) | table sha1, hash_result

Debugging your app

Writing your app as you go

When creating your Python app first start with necessary Splunk configs you can start the Dockerized Splunk. Next, open the Python script with your favorite text editor and edit the file on your host machine. When your done making changes save the file and re-run your Splunk query.

Writing output to text files

When an app has an error I like to comment out all the new lines and step through the app one line at a time. I found it was super useful to output the result of variables and the changes of state to a text file. First, I would write code similar to the screenshot below. Next, I would run the following command to enter the Splunk container docker run -it --user=0 splunk bash and then run tail -f /tmp/record.txt.

This method allow me to update my code, re-run my custom search command, and monitor the change output.

Lessons learned

I am currently reading a book called “Cracking the Coding Interview” and it is a great book. One interesting part of the book is their matrix to describe projects you worked on and the matrix contains the following sections which are: challenges, mistakes/failures, enjoyed, leadership, conflicts, and what would you do differently. I am going to try and use this model at the end of my blog posts to summarize and reflect on the things I learn. I don’t blog to post things that I know, I blog to learn new things and to share the knowledge of my security research.

New skills/knowledge

  • Learned how to create a streaming custom search command for Splunk
  • Learned how to install external PIP modules for Splunk apps
  • Learned how to implement the Splunk credential store for Splunk apps

Challenges

  • Understanding how to interact with the Python environment within each Splunk app vs. the Python environment bundled with the OS

References

Leave a Reply

Your email address will not be published. Required fields are marked *