What is an incident in the world of cybersecurity? NIST provides the following definition: “A computer security incident is a violation or imminent threat of violation of computer security policies, acceptable use policies, or standard security practices.” Examples of cybersecurity incident are a phishing attempt, a brute-force attack against a service the company runs and a compromise of a server.
SEE: Google Chrome: Security and UI tips you need to know (TechRepublic Premium)
What is a CSIRT? What is a CERT?
Most cybersecurity incidents are actually quite easy and straightforward to describe, yet the answer to them is generally very complex and involves several actions in a short period of time from experienced IT people. This is where CERT/CSIRT comes in.
A CSIRT is a Computer Security Incident Response Team, and a CERT is a Computer Emergency Response Team. Basically, it is the same, but the CERT acronym is a registered trademark from the Carnegie Mellon University.
CSIRTs are structured entities that provide different services to their customers, such as the company they work for or externalized companies who would rent their services. Those services vary greatly from one CSIRT to the other. While the core of a CSIRT team is almost always to coordinate and do the operational incident response, some teams might also provide educational and preventive services.
These teams also vary a lot in their staffing, the smallest CSIRTs structures being made of a couple of people, some even only being involved part-time, to structures made of dozens of employees with a capability to deal with incidents 24/7.
The 6 steps to successful security incident handling
Some incidents really need heavy expertise, like the infamous APT (advanced persistent threats) like cyberespionage operations. In those cases, incident handlers need to find the initial compromise of the network, find all malware and tools installed by the attackers (which can be on just one computer out of thousands), find other items like new user accounts created by the attacker in the Active Directory, find what data has been exfiltrated from the company, and a lot more.
Those incidents need real expertise from several people working full time on it for days or weeks, in a structured way, to make the best out of the time they have.
To help dealing with such incidents, the SANS Institute, whose goal is to empower cybersecurity professionals with the practical skills and knowledge they need, has developed a list of steps for proper incident handling (Figure A). Let’s dive in those steps to see how they help incident response.
The first step, known as preparation, is the only step that can be done without any incident happening; therefore, it is good to invest a lot of time in it before anything bad happens in the company.
It consists of bringing the CSIRT into the capability of properly launching any incident response and being comfortable at working on it. It might not be as easy as it sounds, depending on the infrastructure and the company size.
- Defining policies, rules and practices to guide security processes.
- Develop incident response plans for every kind of incident that might target the company.
- Have a precise communication plan: people to reach internally and externally, how to reach them, etc.
- Have incident response tools ready and up to date at any time. This also means spending time to test new tools, selecting new ones and maintaining knowledge about them. Also, all tooling should be in a jump bag that would be ready and available for incident handlers as soon as there is a need to physically move to other places for incident handling.
- Do regular trainings on simulated incidents, to ensure every CSIRT member and every mandatory outsider knows how to react and handle cases.
In this phase, an incident is discovered or reported to the CSIRT. Several actions are done here, in particular:
- Identifying the incident precisely, and carefully checking it is actually a real incident and not a false detection.
- Defining the scope of the incident and its investigation.
- Setting up monitoring.
- Detecting incidents by correlating and analyzing multiple data from endpoints (monitoring activity, event logs, etc.) and on the network (analyzing log files, error messages, etc.).
- Assigning incident handlers to the incident.
- Start to document the case.
The goal in this phase is to limit the current damage resulting from the incident and prevent any further damage.
The first step is generally to prevent the attacker from communicating any more with the compromised network. This can be done by isolating network segments or devices affected by the incident.
SEE: Password breach: Why pop culture and passwords don’t mix (free PDF) (TechRepublic)
The second move is to create backups and preserve evidence of the incident for further investigations if the incident is criminal.
The final step is to apply fixes to affected systems and devices in order to allow them to be back online. It means patching vulnerabilities, removing fraudulent accesses, while preparing the next phase.
Since there is always a chance that several backdoors are in place and one or more has not been found, it is important to do things in a timely manner here and quickly move to the next phase.
The moment has come to remove all found artifacts of the incident and make sure it cannot happen again.
You might think it’s enough to delete all discovered malware and backdoors, change all user passwords, apply security fixes and patch all systems. It is of course the most comfortable and less expensive way for a company to come back to a normal situation, but it is not recommended. Depending on the way the network is built, what log files it has, what log files it might miss, what log files might have been tampered with by an attacker, how stealth some malware has been, it is possible that an attacker might come back to a system restored this way.
The recommended way here to eradicate all badness from the incident is actually to fully reinstall systems that have been affected, from a safe image, and immediately have the latest security fixes deployed to it.
It is time to bring all the systems back into production, after verifying that they are all patched and hardened where possible.
In some cases, it might mean fully reinstalling the Active Directory and change all employees’ passwords, and do whatever possible to avoid the same incident from happening again.
Careful monitoring needs to be defined and started here, for a defined period of time, to observe any abnormal behavior.
After several days or weeks spent on an incident, it certainly feels good to know it has been handled properly and that the threat is definitely gone. But a last effort needs to be done, and it is one of the most important: the lessons-learned phase.
Shortly after the recovery is done, and everything is back to normal, all the people involved on the incident should meet and discuss it. What have they learned? What has been difficult? What could be done better next time a similar incident happens?
All documentation written during the incident should be completed, and answer as many questions as possible regarding the what-where-why-how-who questions.
Every incident should be seen as an opportunity to improve the whole incident handling process in the company.
Disclosure: I work for Trend Micro, but the views expressed in this article are mine.