Chapter 7: Communication Efficient Distributed Agnostic Boosting
8.1 Introduction
CHAPTER 8
PREDICTING CYBER THREATS WITH VIRTUAL SECURITY PRODUCTS
Cybersecurity analysts are often presented suspicious machine activity that does not conclu-sively indicate compromise, resulting in undetected incidents or costly investigations into the most appropriate remediation actions. There are many reasons for this: deficiencies in the number and quality of security products that are deployed, poor configuration of those security products, and incomplete reporting of product-security telemetry. Managed Security Service Providers (MSSP’s), which are tasked with detecting security incidents on behalf of multiple customers, are confronted with these data quality issues, but also possess a wealth of cross-product security data that enables innovative solutions. We use MSSP data to develop Virtual Product, which addresses the aforementioned data challenges by predicting what security events would have been triggered by a security product if it had been present.
This benefits the analysts by providing more context into existing security incidents (albeit probabilistic) and by making questionable security incidents more conclusive. We achieve up to 99% AUC in predicting the incidents that some products would have detected had they been present.
Product Type Alert Description (Event) Gateway TCP Urgent Data Enforcement
Gateway TCP anomaly
Gateway TCP Out of Sequence
Gateway ICMP Echo Request
Windows Cryptographic operation
Windows Attempt to unprotect auditable protected data Windows Logon attempt using explicit credentials Windows Key file operation
Windows Filter Manager Event 1
Windows Attempt to register a security event source Windows Attempt to unregister a security event source Windows Special privileges assigned to new logon Windows A privileged service was called
Windows A network share object was accessed Firewall TCP Connection
Firewall UDP Connection
Proxy TCP Cache Hit
Proxy TCP Cache Miss: Non-Cacheable Object
Table 8.1: A long list of inconclusive alerts generated in a real incident of a machine infected by the infamous Zbot Trojan. These alerts overwhelm a cybersecurity analyst, and do not help answer important questions such as: Is this machine compromised? How severe is the attack? What actions should be taken?Our technique, Virtual Product, correctly predicts the presence of the infamous Zbot Trojan, which would have been identified by an AV product, had it been installed.
confidence in either pursuing or ignoring a potential compromise is often less than ideal.
The key to improving detection rates in this environment is to learn from the vast amounts of telemetry produced by the prevalent defense-in-depth approach to computer security, wherein multiple security products are deployed alongside each other, producing highly correlated alert data. By studying this data, we are able to accurately predict which security alerts a product would have triggered in a particular situation, even though it was not deployed. A representative example is shown in Table 8.1, wherein security alerts produced by several products hint at the possibility of a security problem, but do not present conclusive evidence. Our models, however, are able to correctly predict the presence of the Zeus (also known as the Zbot) trojan, as the cause of the anomalous system and network behavior on the machine.
We introduce and formulate the novel problem of Virtual Product, the first known attempt to predict the security events and high-severity incidents that would have been identified by a product if it had been deployed. Given sufficient data from many organizations deploying different sets of security products, we posit it should be possible to predict the events that would have been reported by additional security products that were not deployed. This analysis benefits from the observations that many security products detect the same threats, and that attacks are typically automated and therefore proceed in predictable sequences of behavior.
Figure 8.1 shows how Virtual Product works. We formulate incident data as a large matrix. Each row, called a machine-day, tracks all of the security events that were observed on a particular machine, on a given date. Although many entries will be empty since machines are at most protected by a handful of products, we can predict the likely events that would have been triggered by those products that were not deployed. The security officers can then hopefully make a more informed decision about the trade off of cost and value of what other security products would provide. For the analyst, Virtual Product enriches each incident (i.e., row) with more context to understand the severity of the threat posed by the
Figure 8.1: Virtual Product helps our user Sam discover and understand cyber-threats, and informs deployment decisions (e.g., add firewall?) through semi-supervised non-negative matrix factorization on telemetry data from other users (with firewalls deployed). In the data matrix, each row represents a machine-day, and each column a security event’s occurrences.
Missing events from undeployed products are shown as gray blocks. The last column indicates if the firewall has detected an incident. Our virtual firewall serves as a proxy to the actual product and predicts the output Sam may observe (dark green block) if he deploys it.
observed activity. Our work makes the following contributions:
• Novel Idea of Virtual Product. We introduce the problem of simulating a security product’s individual security events and the security incidents that these events would have raised, had it actually been deployed. We formulate techniques by which the security data managed by Security Incident and Event Managers (SIEM’s) and Managed Security Service Providers (MSSP’s) on behalf of multiple products can be used for this purpose
(see Table 8.2 for definitions of these terms).
• Effective Approach. We provide a practical implementation for this problem by adapting semi-supervised non-negative matrix factorization techniques, which simultaneously addresses the problem of security incident and event prediction for the absent products, with high accuracy.
• Impact to Security Industry. Our Virtual Product model will impact the security industry by increasing company security at significantly reduced costs. We are working towards making Virtual Product events and security incidents available to customers of an MSSP.
By deploying Virtual Product on behalf of customers, we provide a new way for them to experience the potential benefits of security products without deploying them, allowing them to make more informed purchasing decisions.
To enhance readability of this chapter, we have listed the terminology used in this chapter in Table 8.2. The reader may want to return to this table throughout this chapter for technical terms’ meanings and synonyms used in various contexts of discussion. We proceed by discussing related work in Section 8.2, and present our proposed Virtual Product model in Section 8.3. We evaluate the performance of our algorithm in Section 8.4. Next, we discuss the expected impact of Virtual Product and concrete deployment plans studies in Section 8.5. Finally, we discuss our findings and conclude in Section 8.6.