How We Track Mean Time To Respond (4-part blog series)

Greg Pothier Posted on 05.04.17 — by Greg Pothier

This is the last in our four-part blog series illustrating how we track critical metrics. In this installment, Tanium SecOps Engineer Greg Pothier discusses how we respond to incidents, offers guidelines on mean time to respond, and explains why rapid response times are key for every business.

We routinely use Tanium in our security hygiene practice to bulk up our defensive posture, using patching as an action as well as a process. Doing so reduces our attack surface and promotes a homogeneous environment.  

Before we pat ourselves on the back for a job well done, though, it’s important to remain vigilant. As you know, attackers are creative; and once in a network, publicly disclosed breaches have shown they can remain undetected for months.

The impact and cost of a breach is directly associated with the amount of time an attacker remains within a network. As a result, it is critical we aim to decrease our mean-time to remediate as much as possible. To make our lives more efficient and effective, we use Tanium to automate as much of the initial detection and alerting process as possible. These automated detections alert us of potential malicious activity, at which point we can begin to measure our response time.

There are three broad phases in the typical attack lifecycle:

  • Initial infection. This is when an attacker successfully first executes malicious code on an endpoint to gain access.
  • Persistence. This is when an attacker has gained access to an endpoint and is able to maintain access across system reboots.
  • Lateral Movement and Achieve Objective. Attackers rarely land where they want. Therefore, once they gain access, attackers typically use that access to move laterally from system to system, collecting more data until they eventually achieve their objective.

Below are examples of how we use Tanium to address activity in each phase of the attack lifecycle, and our expectations for mean time to respond in each phase.

Initial Infection Phase: Catching Malicious Activity

Tanium is a platform, which means we can immediately pivot from detection and alerting to scoping and investigating – all from a single console. We scope all detections and potential initial infections by using Tanium Trace to connect to the endpoint and create timeline recordings and analysis of key forensic data on endpoints. These forensic artifacts include process execution, file system and registry changes, driver loads, security events, as well as several other relevant user and system events.

Tanium Trace continuously records endpoint activity, regardless of whether or not the endpoint is online. Trace allows us to apply pattern-of-attack detection for anomalies and suspicious events based on complex attack methodologies.

Examples of some of the more common detection mechanisms include:

  • Script files in suspicious paths;
  • Patterns of process and file events indicative of Microsoft Office macro malware;
  • Adobe Acrobat Reader creating unusual file types; and
  • Command shells spawned by unusual parent applications.

As we collect evidence artifacts from Trace, we can use Tanium Detect to search for these indicators of compromise (IOC) across the entire enterprise. Detect ingests IOCs from a variety of sources, including local threat detection repositories, TAXII Streams, and other threat intelligence feeds. Detect allows us to tailor scans based on the state of the endpoint, enabling precise targeting based on the endpoint operating system, user group, or systems with privileged administrators.

For Detect, our internal security team subscribes to several external sources of threat data, including a leading reputation database service and Palo Alto Networks Wildfire. Through Tanium Connect, we can natively work with both products’ APIs to check hashes of every running process, autorun, and loaded drivers against dozens of antivirus engines. When a malicious hash is identified, we are immediately alerted and a ticket is automatically generated.

In addition to Detect, we can determine if similar indicators are located elsewhere simply by using Tanium Core to conduct an ad-hoc, enterprise-wide search across all of our endpoints.

how we track mean time to respond

We use Tanium to remediate the infected endpoint and, depending on our findings, we can deploy Tanium actions to kill malicious running processes, uninstall rogue applications, and/or patch the endpoint on the spot.

Many incident response investigations conclude at this point. If that is the case, then we can stop the clock from initial detection to remediation and record how long the process took. Our objective is to reduce our mean-time to remediate as much as possible.

Persistence: Finding Attackers Where They Lurk

Tanium provides a variety of methods to monitor and detect common persistence mechanisms, including standalone executables, rogue service DLLs, hosted service modules, and COM components.

Tanium’s real-time searching and stacking allows our hunting operations to be efficient and effective when detecting malware persistence via common attacker methods such as Windows Management Instrumentation (WMI) and scheduled tasks. Real-time stacking is also a critical component of threat hunting, providing outlier analysis of all persistent mechanisms.

New persistence mechanisms are reviewed and, if found to be malicious, Tanium is used to deploy a quarantine action, isolating the infected endpoint until further analysis and remediation can be completed.

Now that the initial infection was contained, we need to determine how the infection occurred in the first place.

Tanium Trace can answer these questions by analyzing process execution, provide the associated process tree history, and leverage the forensic evidence to create indicators of compromise, which will assist us to answer the next question: Was the successful persistence method attempted and successful on other endpoints?

how we track mean time to respond with Tanium Trace

We create automated detection actions with Detect by plugging in our IOC or Yara rule and searching across the enterprise for evidence of the malicious activity. We aim to respond to scope alerts with initial triage in the first 15 minutes, and a more detailed gathering of information using the above methods within the first hour.

Lateral Movement and Exfiltration: Stopping Attackers In Their Tracks

During the Lateral Movement and Achieve Objective phase, we use Tanium to detect common attacker domain reconnaissance techniques and create alerts on them. As an example, Tanium alerts us when an attacker – or even a rogue insider – attempts to perform possible malicious activity such as leveraging net.exe to discover user account groups, or even attempts to add a new user.

Powershell has become a top tool for attackers, and we use Tanium to easily search for suspicious Powershell behavior by examining process trees, command lines, or even HTTPS connections. We monitor and alert on network type 3 logins, including psexec remote command execution, WMI remote command execution, suspicious mounting of shares, and scheduled remote tasks. Many attackers prefer using tools that reside primarily in memory for stealth, so we use Tanium Incident Response to examine endpoint memory across the entire enterprise at speed.

During attempted exfiltration, attackers often will compress data to encrypt file contents and file names. We use Tanium Index to search for files with suspicious indicators, including name, path, hash, and header bytes. So, when an attacker attempts to compress and encrypt files and their contents, Tanium Index alerts us of any newly created file containing a suspicious header byte sequence.

When responding to potential lateral movement or exfiltration, Tanium affords us several options to push immediate actions to remediate the malicious activity. We can demote and/or delete local accounts with elevated permissions, deploy a package to kill any malicious processes, reset compromised user credentials, and close unauthorized connections/ports.

This workflow of detection, in seconds and at scale, based on the capability to drill down into endpoints across the enterprise, keeps our mean time to respond within our targeted range, and provides us the capability to catch the attackers before they are allowed to execute their objectives. Tanium as a platform allows us to do in seconds, minutes, and hours, what typically takes Incident Response teams days, weeks, or even months.

This is the last installment in our four-part series. In part one, I reveal how we at Tanium track critical vulnerabilities. In part two, I describe how we use Tanium to track critical compliance metrics. In part three, I talk about how to use Tanium and Splunk together to deliver patch management data. To see how Tanium can help you start measuring the same for your environment, sign-up for a proof-of-concept now.

Read more:

Like what you see? Click here and sign up to receive the latest Tanium news and learn about our upcoming events.