Cyber Security Security Operations
Security Operations is often contained within a SOC ("Security Operations Center"). Terms are used interchangeably.
Typically the SOC's responsibility is to detect threats in the environment and stop them from developing into expensive problems.
SIEM ("Security Information Event Management")
Most systems produces logs often containing important security information.
An event is simply observations we can determine from logs and information from the network, for example:
- Users logging in
- Attacks observed in the network
- Transactions within applications
An incident is something negative we believe will impact our organization. It might be a definitive threat or the potential of such a threat happening. The SOC should do their best to determine which events can be concluded to actual incidents, which should be responded to.
The SIEM processes alerts based on logs from different sensors and monitors in the network, each which might produce alerts that are important for the SOC to respond to. The SIEM can also try to correlate multiple events to determine an alerts.
SIEM's typically allow events from the following areas to be analyzed:
- Network
- Host
- Applications
Events from the network is the most typical, but least valuable as they don't hold the entire context of what has happened. The network typically reveals who is communicating where, over which protocols, and when, but not the intricate details about what happened, to whom and why.
Host events give more information in regards to what actually happened and to whom. Challenges such as encryption is no longer blurred and more visibility is gained into what is taking place. Many SIEM's are enriched with great details about what happens on the hosts themselves, instead of only from the network.
Events from application is where the SOC typically can best understand what is going on. These events give information about the Triple A, AAA ("Authentication, Authorization and Account"), including detailed information about how the application is performing and what the users are doing.
For a SIEM to understand events from applications it typically requires work from the SOC Team to make the SIEM understand these events, as support is often not included "out-of-the-box". Many applications are proprietary to an organization and the SIEM does not already have an understanding of the data the applications forward.
SOC Staffing
How a SOC is staffed greatly varies based on the requirements and structure of an organization. In this section we take a quick look at typical roles involved in operating a SOC. An overview of potential roles:
As in most organized teams, a role is appointed to lead the department. The SOC Chief determines the strategy and tactics involved to counter threats against the organization.
The SOC Architect is responsible for ensuring the systems, platforms and overall architecture is capable of delivering what the team members require to perform their duties. A SOC Architect will help build correlation rules across multiple points of data and ensures incoming data conforms to the platform requirements.
Analyst Lead is responsible that processes, or playbooks, are developed and maintained to ensure analysts are capable to find the information necessary to conclude alerts and potential incidents.
Level 1 Analysts serve as the first responders to alerts. Their duty is, within their capabilities, to conclude alerts and forward any troubles to a higher level analyst.
Level 2 Analysts are distinguished by having more experience and technical knowledge. They should also ensure any troubles in resolving alerts are forwarded to the Analyst Lead to aid the continuous improvement of the SOC. The Level 2, together with the Analyst Lead, escalates incidents to the Incident Response Team.
The IRT ("Incident Response Team") is a natural extension to the SOC Team. The IRT team is deployed to remediate and solve the issues impacting the organization.
Penetration Testers ideally also support the defense. Penetration Testers have intricate knowledge of how attackers operate and can help in root cause analysis and understanding how break-ins occur. Merging attack and defense teams is often referred to as Purple Teaming and is considered a best-practice operation.
Escalation Chains
Some alerts require immediate actions. It is important for the SOC to have defined a process of whom to contact when different incidents occur. Incidents can occur across many different business units, the SOC should know who to contact, when and on which communication mediums.
Example of an escalation chain for incidents impacting one part of a organization:
- Create an Incident in the appointed Incident Tracking System, assigning it to correct department or person(s)
- If no direct action happens from department/person(s): send SMS and Email to primary contact
- If still no direct action: phone call primary contact
- If still no direct action: call secondary contact
Classification of Incidents
Incidents should be classified according to their:
- Category
- Criticality
- Sensitivity
Depending on the incidents classification and how it is attributed, the SOC might take different measures to solve the issue at hand.
The category of incident will determine how to respond. There exists many kinds of incident and it is important for the SOC to understand what each incident type means for the organization. Example incidents are listed below:
- Inside Hacking
- Malware on Client workstation
- Worm spreading across the network
- Distributed Denial of Service Attack
- Leaked Credentials
The criticality of an incident is determined based on how many systems is impacted, the potential impact of not stopping the incident, the systems involved and many other things. It is important for the SOC to be able to accurately determine the criticality so the incident can be closed accordingly. Criticality is what determines how fast an incident should be responded to.
Should the incident be responded to immediately or can the team wait until tomorrow?
Sensitivity determines who should be notified about the incident. Some incidents require extreme discretion.
SOAR ("Security Orchestration, Automation and Response")
To counter the advancements of threat actors, automation is key for a modern SOC to respond fast enough. To facilitate fast response to incidents, the SOC should have tools available to automatically orchestrate solutions to respond to threats in the environment.
The SOAR strategy means ensuring the SOC can use actionable data to help mitigate and stop threats which are developing more real-time than before. In traditional environments it takes attackers very short time from the time of compromise until they have spread to neighboring systems. Contrary to this it takes organizations typically a very long time to detect threats that have entered their environment. SOAR tries to help solve this.
SOAR includes concepts such as IAC "Infrastructure as Code" to help rebuild and remediate threats. SDN ("Software Defined Networking") to control accesses more fluently and easily, and much more.
What to monitor?
Events can be collected across many different devices, but how do we determine what to collect and monitor? We want the logs to have the highest quality. High fidelity logs that are relevant and identifying to quickly stop the threat actors in our networks. We also want to make it hard for attackers to circumvent the alerts we configure.
If we look at different ways to catch attackers, it becomes evident where we should focus. Here is a list of possible indicators we can use to detect attackers, and how hard it is considered for attackers to change.
Indicator | Difficulty to change |
---|---|
File checksums and hashes | Very Easy |
IP Addresses | Easy |
Domain Names | Simple |
Network and Host Artifacts | Annoying |
Tools | Challenging |
Tactics, Techniques and Procedures | Hard |
File checksums and hashes can be used to identify known pieces of malware or tools used by attackers. Changing these signatures are considered to be trivial for attackers as their code can be encoded and changed in multiple different ways, making the checksums and hashes change.
IP Addresses are also easy to change. Attackers can use IP addresses from other compromised hosts or simply use IP addresses within the jungle of different cloud and VPS ("Virtual Private Server") providers.
Domain Names can also be reconfigured quite easily by attackers. An attacker can configure a compromised system to use a DGA ("Domain Generation Algorithm") to continuously use a new DNS name as time passes. One week the compromised system uses one name, but the next week the name has changed automatically.
Network and Host Artifacts are more annoying to change, as this involves more changes for the attackers. Their utilities will have signatures, like a user-agent or the lack of thereof, that can be picked up by the SOC.
Tools become increasingly harder to change for attackers. Not the hashes of the tools, but how the tools behave and operate when attacking. Tools will be leaving traces in logs, loading libraries and other things which we can monitor to detect these anomalies.
If the defenders are capable of identifying Tactics, Techniques and Procedures threat actors use, it becomes even harder for attackers to get to their objectives. For example, if we know the threat actor likes to use Spear-Phishing and then Pivoting peer-to-peer via to other victim systems, defenders can use this to their advantage. Defenders can focus training to staff at risk for spear-phishing and start implementing barriers to deny peer-to-peer networking.