Machine learning keeps malware from getting in through security cracks

Algorithms quickly examine data to identify patterns, classify threats so humans can take action

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someonePrint this page

Machine learning has been a hot topic in cybersecurity circles lately.

The cybersecurity sector generates oceans of data. Sorting through threat data to glean useful intelligence is a huge task, akin to finding a needle in a proverbial “needlestack.”

The first level of sorting is definitely beyond human manual capabilities—but it’s perfect for a digital workhorse. Modern computers are terrific at crunching data, recognizing patterns and showing basic trends.

Related podcast: Why it’s time to get smarter about threat intelligence

I had several lively discussions about how machine learning is being brought to bear on information security at the recent Black Hat conference in Las Vegas. Here’s a snapshot of one with Liviu Arsene, senior analyst at endpoint security vendor BitDefender. Text edited for clarity and length.

Third Certainty: Why is machine learning coming to the fore?

Arsene: There are about 500 million malware samples out there a year, so it would take hundreds of thousands of security researchers to go through them manually. So that’s why you need to automate the process and use some sort of algorithm or program that can do all that filtering for us—that can identify malicious files by looking for specific patterns.

3C: Use of algorithms in behavioral analysis modeling isn’t new. So why now, why is it all of a sudden a hot topic in the cybersecurity world?

Arsene: The fact we have so many interconnected things right now, so many sensors, means we’re getting a lot of data from everything. But the thing is, how do you actually make sense of all that data? If you talk about the Internet of Things alone, we’re going to have at least 50 billion devices by 2020, so that’s a huge amount of information and a huge amount of potential risk that everybody is exposing themselves to. So you need to have some sort of automated system that can go through all that data, identify malicious patterns, identify attackers.

3C: So are you talking about putting another layer of machine learning on top of algorithms already baked into endpoint security systems, firewalls and so on?

Arsene: When you talk about security we don’t use a single machine-learning algorithm; we actually use several of them. We’re actually trying to separate ransomware from Trojans and Trojans from keyloggers. We do this by designing a machine-learning algorithm that’s specifically built to identify ransomware patterns, keylogger patterns, Trojan patterns and so on. So when analyzing one of those files, you’re actually running it through all those machine-learning algorithms to see which one triggers the highest score for the file.

3C: So it’s about categorizing malware?

Arsene: Basically you pre-train that specific machine-learning algorithm to understand what the threat is. It learns everything it can about the behavior and features of those files. And they build a knowledge base, so even if the machine has never seen the code before, the algorithm will be able to understand the way the file interacts with the system. And that’s how it knows it’s malicious and whether to include it in the Trojan cluster or the ransomware cluster.

3C: What’s the human’s role?

Arsene: Those algorithms are not perfect all the time, and a human needs to fine-tune them and make adjustments to the way the machine makes predictions. You need to fine-tune it in order to serve your purpose. Man and machine have to work together. … It’s a process, a way to identify as many malicious files as possible, and let the humans do the direct the analysis.

3C: How does BitDefender leverage machine learning?

Arsene: We use machine learning specifically in the cloud for malware. Mostly it’s about malware detection these days when you talk about machine learning. You’re seeing a lot of startups praise their ability to use machine learning to identify threats. We have been using machine learning for at least six years. It’s not about the fact that you’re using it; it’s how you use it and how accurate you are at detecting stuff.

More stories about machine learning:
Machine learning helps detect real-time network threats
Machine learning helps organizations strengthen security, identify inside threats
Predictive threat intelligence roots out cyber threats before they occur