How We Track Mean Time to Patch (4-part blog series)

Greg Pothier Posted on 03.15.17 — by Greg Pothier

This is the third of our four-part blog series exploring how we use the Tanium platform in our own organization. In this installment, SecOps Engineer Greg Pothier shares how to use Tanium and Splunk together to deliver patch management data, which CSOs and other security leaders can tap to assess risk potential.

How useful would it be to track Mean Time to Patch for every device on your network? What about being able to instantly deliver accurate results about the number of open patches anytime your CIO, CSO, or VP of Ops asks for it?

These were questions we started asking ourselves here at Tanium. The answers hold real business value for our security and IT leaders, who can use the data to monitor our attack surface and patch performance on an ongoing basis.

It’s true that most organizations don’t even know the current status of patches across their network, but at Tanium we not only do this, we also created a metric to track the average time it takes to patch.

This is the single most accurate way we at Tanium use the Tanium Platform to gather metrics on how well our organization is performing patch management. Tanium makes it simple to gather accurate results, such as open critical patches and the state of every installed application, across even the largest enterprise. For CSOs and other security leaders, a snapshot of open patches across the enterprise is a key metric to have, as it helps them assess risk potential. With Tanium, we can look a step further, measuring how well we are patching by gauging the average (mean) time to apply patches across the enterprise. Here’s how we accomplish this, using Tanium and Splunk, a data analytics tool.

First, we set up a Tanium Connect that feeds Splunk our Tanium Question “Get Computer Name and Operating System and Available Patches from all machines.” The operating system is optional; we add it in case we ever want to filter results based on specific operating systems in the future.

With the Connect in Tanium created, we run it and check in Splunk to ensure the results are being processed as expected. We begin the search with the query index=”tanium” because we have configured this connection to be indexed as Tanium. Then, we add the following:

index=”tanium”

Question="Get-Computer-Name-and-Computer-Serial-Number-and-Available-Patches-from-all-machines"

This provides all the available patch results listed by computer name. Now we have a list of all patches, including the Title Severity, CVE_ID etc. (Note: We use a table for formatting results in the below screenshot for visual consumption only; this formatting is not otherwise necessary.)

At this point, we create a nice visual display of patches by severity type. In order to ensure we don’t process duplicate results, we pipe the connect information into a deduplication function such as:

| dedup Computer_Serial_Number, KB_Article, CVE_ID, Title

This ensures these values (Computer_Serial_Number, KB_Article, CVE_ID, Title ) are reported only once and thus are unique. Once we have the unique values, we run a simple function to chart the results by severity:

| chart count by Severity

The results create a nice chart illustrating the total counts of each type of severity, as well as the percentage:

Now we have a solid, automated dashboard of current patches open by severity. Note that we could easily do the same for machines running other operating systems, such as Linux or Mac OSX. Let’s continue with our objective of creating an automated solution which calculates and displays mean time to patch.

Next, we go back to our base query and remove the deduplication so we can work with all of the patch logs which came in. Then we add:

| stats earliest(_time) as First, latest(_time) as Last by Computer_Name, CVE_ID, Severity, Title

This leverages the statistics function to check the first time the patch was reported, as well as the last time the patch was reported, per computer name, CVE_ID, Severity, and title. This report provides us with a timestamp of when the patches were first seen and last seen. Now, we take the difference of the two values to see what the duration is, and then convert the epoch time format to a readable string:

| eval diff = Last - First | eval diff2 = tostring(diff, "duration")

The results list the computer name, the patch information, and the amount of time each patch was reported as open on each computer. Now, we calculate the actual maximum, minimum, and mean of those difference values:

| stats count, max(diff), min(diff), mean(diff)

And that’s it. We now know the maximum, minimum, and mean values of all patches. We make it pretty by renaming mean(diff) as average, rounding off the extra decimal values, and creating a nicely formatted string showing day hours, minutes, etc.

| rename mean(diff) as average | eval average = round(average, 0) |eval averageTime=tostring(average,"duration") | eval averageTime = replace(averageTime,"(\d+)\+(\d+)\:(\d+)\:(\d+)","\1d \2h \3min") | table averageTime

In this blog post, we used one saved Question from Tanium, sent once a day via Tanium Connect into Splunk, to create two important, automated metrics. IT leadership can use this information to measure our attack surface area and patching performance, in current state and over time.

This is part three in a four-part series. In part one, I explored how we track critical vulnerabilities; in part two, I discussed how we track key compliance metrics; in the next, and final, installment, I’ll describe how we use Tanium to track our mean time to respond.

Like what you see? Click here and sign up to receive the latest Tanium news and learn about our upcoming events.