This is the third and final installment in our three-part EDR Matters blog series. In this series, Tyler Oliver, Director of Tanium’s Endpoint Detection and Response team, shares his personal views on how organizations can quickly and effectively remediate security incidents to return their businesses to normal operations.
Incident remediation is a key piece of the investigative lifecycle. Ensuring that your organization has the necessary tools in place to quickly and effectively contain and remove a threat from your environment is vital to maintaining a secure network.
The primary goal of any incident remediation plan is to restore operations as quickly as possible so the business can get back to revenue-creating activities. This goal sometimes conflicts with the desire of the security team to fully investigate and understand the threat. Having the ability to rapidly scope the incident helps resolve this conflict by allowing the security team to meet the business needs while also gaining valuable insight into the threats being faced. In the long term, your ability to both remediate and investigate fully serves to strengthen your organization’s overall security controls and improve security posture.
When it comes to your organization’s endpoints, being effective at scoping an incident prior to remediation may require the ability to quickly access event information, user activity details, or possibly enumerate unique system artifacts. Having quick access to these details enables you to gauge which systems were infected or accessed by an attacker.
If an organization fails to fully scope an incident, the remediation effort may fail to eradicate an attacker’s foothold in the environment – or allow them to easily regain access if underlying vulnerabilities have not been resolved. An incomplete investigation may also fail to identify the business impact of an incident. For example, an incident could have led to an exposure of regulated data or proprietary information which goes unnoticed because you didn’t have the ability to investigate fully.
In parts one and two of our EDR Matters series, we explored the Detection and Investigation phases of incident response, and highlighted why a toolset offering adaptability and speed is critical to your ability to properly scope an incident.
A properly scoped incident will provide you with some or all of following pieces of information, which can then be put to use in an effective remediation plan:
- Systems infected with malware or tools
- Systems accessed during the incident
- Location and other metadata related to malware or tools
- Compromised accounts (used to move laterally, responsible for malware execution, potentially stolen credentials)
- Application vulnerabilities that led to the incident
- Other system artifacts that need to be cleaned or removed
After you have completed the initial scoping phase of the incident, the details collected in your investigation will help you determine what needs to be remediated or removed from your environment so you can develop your remediation plan. Once you have determined what is to be remediated, and have the tools to ensure your remediation efforts are effective, you can execute on a plan. The plan should deliver on both the short-term and long-term aspects of removing a threat.
In the short term, stopping the threat from continuing or spreading is critical. The long-term tasks should provide the assurance that the same threat will not be successful again, or — if it is not entirely preventable — that the impact on the organization will be reduced if the threat should recur.
Short term: plug the leaks
There are many immediate, sometimes automated, steps you can take to reduce the risk or remove the threat. These include quarantining a machine, collecting key system artifacts for analysis, or temporarily suspending a user account until longer-term controls can be implemented. These actions may be seen as outside normal day-to-day operations for a security team. Therefore, they will likely require you to develop an “in case of emergency break glass” process if you don’t already have one in place.
At minimum, your security team should have the ability to perform the following activities regularly, or in a “break glass” approach, if they are to be effective in remediating threats quickly:
- Execute a system quarantine
- Kill a running malicious process
- Eliminate malware persistence mechanisms (such as in the registry)
- Distribute out-of-band patches in critical scenarios
The above list is by no means an exhaustive one. Rather, it’s intended to provide some thoughts on what capabilities you may need in a critical situation.
Long term: strengthen the dam
In parallel with executing these short-term remediation tasks, it is also essential to develop and initiate long-term changes to mitigate the risk of the same incident recurring in the future. These long-term changes can range from simply patching at-risk systems more frequently to executing a full-scale business transformation around detection, investigation, and response.
For all long-term remediation tasks, it’s important to ensure that they are also monitored and verified going forward. For example, when you’re implementing a new patch, you should be able to verify the patch is still applied. Likewise, when you’re disabling accounts, you should also be able to continuously monitor said account to ensure it isn’t being used.
Understanding what is required to remediate a threat is only one of many moving pieces related to response. It’s important to actively test your processes and capabilities to confirm they meet not only your expectations, but those of your management team. Over the last several months, we have seen a few scenarios, such as WannaCry, Meltdown, and Spectre, which have given organizations useful information about what works and what does not from a remediation standpoint. Take these events as lessons for situations that are likely to arise in the future.
Incident remediation with confidence: Tanium’s approach
By providing our customers with speed and control of their endpoints, we are empowering their security teams to rapidly respond to threats. By increasing the speed with which they gather information by utilizing Tanium, our customers are closing the “dwell time” from incident to detection and are rapidly able to investigate. This control provides them the ability to execute response actions rapidly across their entire estate, regardless of size, and deliver ongoing monitoring to confirm their remediation efforts are successful.
For example, Tanium Threat Response provides access to the immediate response mechanisms your security team will need to execute their actions. These actions can then be limited behind a full-featured role-based access control model, enabling the right level of access to specific team members or roles. Tanium’s adaptable model for content creation and delivery allows our customers to quickly implement new capabilities above and beyond static response processes. This was highlighted recently during the response to Meltdown and Spectre, with the release of incident-specific response content.
Along with rapid response capabilities, Tanium delivers the ability to implement and monitor long-term response plans. For example, Tanium Patch allows you to evolve your current patching process and gain more visibility into whether patching is or isn’t successful. This basic information can help your security team understand which systems have been at risk and allow them to implement the necessary changes more quickly.
Effective long-term remediation programs should also include ongoing validation that the original mitigating controls are in place and not being circumvented. This is not simply limited to understanding if the patch that was put in place is still effective; it can extend to more advanced controls. Tanium Comply allows you to identify if a system is currently vulnerable by applying host-based vulnerability scanning as well as actively monitoring for deviations from compliance models. For example, you may currently utilize a benchmarking framework such as one from the Center for Internet Security (CIS) for guidance on endpoint controls. This and other frameworks provide a great method for checking the current and future state of potential remediation configuration changes.
Be proactive, measure progress
The response process is critical when security controls have failed, but there is a lot that can be done before a detection occurs. For example, you can begin reviewing your patching policy. As you do so, take the following metrics into consideration:
- Mean time to remediation-readiness from the onset of an investigation
- Mean time to enact a remediation strategy across all endpoints
- Types of remediation tasks consuming the most time and creating the most friction
- Rate of re-compromise due to incomplete or failed remediation
- Number of point solutions or technology stacks required to enact the remediation strategy
Does your current EDR solution allow you to rapidly react to threats you’ve detected? Can it help you go beyond those alerts with comprehensive investigative visibility needed to scope a compromise? Will it be flexible enough to provide varying response capabilities where necessary? To learn more about these topics, view our webinar, Best Practices for Remediating Cybersecurity Incidents At Scale.
- In part one of our EDR Matters blog series, we expand on the detection phase of the incident response process and detail how the Tanium platform can be used to rapidly apply intelligence for detection.
- In part two, we explore the investigation phase of the process and review how the Tanium platform can be used to gain more control and flexibility.