Machine learning keeps malware from getting in through security cracks

Algorithms quickly examine data to identify patterns, classify threats so humans can take action

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someonePrint this page

Machine learn­ing has been a hot top­ic in cyber­se­cu­ri­ty cir­cles lately.

The cyber­se­cu­ri­ty sec­tor gen­er­ates oceans of data. Sort­ing through threat data to glean use­ful intel­li­gence is a huge task, akin to find­ing a nee­dle in a prover­bial “needlestack.”

The first lev­el of sort­ing is def­i­nite­ly beyond human man­u­al capabilities—but it’s per­fect for a dig­i­tal work­horse. Mod­ern com­put­ers are ter­rif­ic at crunch­ing data, rec­og­niz­ing pat­terns and show­ing basic trends.

Relat­ed pod­cast: Why it’s time to get smarter about threat intelligence

I had sev­er­al live­ly dis­cus­sions about how machine learn­ing is being brought to bear on infor­ma­tion secu­ri­ty at the recent Black Hat con­fer­ence in Las Vegas. Here’s a snap­shot of one with Liviu Arsene, senior ana­lyst at end­point secu­ri­ty ven­dor Bit­De­fend­er. Text edit­ed for clar­i­ty and length.

Third Cer­tain­ty: Why is machine learn­ing com­ing to the fore?

Arsene: There are about 500 mil­lion mal­ware sam­ples out there a year, so it would take hun­dreds of thou­sands of secu­ri­ty researchers to go through them man­u­al­ly. So that’s why you need to auto­mate the process and use some sort of algo­rithm or pro­gram that can do all that fil­ter­ing for us—that can iden­ti­fy mali­cious files by look­ing for spe­cif­ic patterns.

3C: Use of algo­rithms in behav­ioral analy­sis mod­el­ing isn’t new. So why now, why is it all of a sud­den a hot top­ic in the cyber­se­cu­ri­ty world?

Arsene: The fact we have so many inter­con­nect­ed things right now, so many sen­sors, means we’re get­ting a lot of data from every­thing. But the thing is, how do you actu­al­ly make sense of all that data? If you talk about the Inter­net of Things alone, we’re going to have at least 50 bil­lion devices by 2020, so that’s a huge amount of infor­ma­tion and a huge amount of poten­tial risk that every­body is expos­ing them­selves to. So you need to have some sort of auto­mat­ed sys­tem that can go through all that data, iden­ti­fy mali­cious pat­terns, iden­ti­fy attackers.

3C: So are you talk­ing about putting anoth­er lay­er of machine learn­ing on top of algo­rithms already baked into end­point secu­ri­ty sys­tems, fire­walls and so on?

Arsene: When you talk about secu­ri­ty we don’t use a sin­gle machine-learn­ing algo­rithm; we actu­al­ly use sev­er­al of them. We’re actu­al­ly try­ing to sep­a­rate ran­somware from Tro­jans and Tro­jans from key­log­gers. We do this by design­ing a machine-learn­ing algo­rithm that’s specif­i­cal­ly built to iden­ti­fy ran­somware pat­terns, key­log­ger pat­terns, Tro­jan pat­terns and so on. So when ana­lyz­ing one of those files, you’re actu­al­ly run­ning it through all those machine-learn­ing algo­rithms to see which one trig­gers the high­est score for the file.

3C: So it’s about cat­e­go­riz­ing malware?

Arsene: Basi­cal­ly you pre-train that spe­cif­ic machine-learn­ing algo­rithm to under­stand what the threat is. It learns every­thing it can about the behav­ior and fea­tures of those files. And they build a knowl­edge base, so even if the machine has nev­er seen the code before, the algo­rithm will be able to under­stand the way the file inter­acts with the sys­tem. And that’s how it knows it’s mali­cious and whether to include it in the Tro­jan clus­ter or the ran­somware cluster.

3C: What’s the human’s role?

Arsene: Those algo­rithms are not per­fect all the time, and a human needs to fine-tune them and make adjust­ments to the way the machine makes pre­dic­tions. You need to fine-tune it in order to serve your pur­pose. Man and machine have to work togeth­er. … It’s a process, a way to iden­ti­fy as many mali­cious files as pos­si­ble, and let the humans do the direct the analysis.

3C: How does Bit­De­fend­er lever­age machine learning?

Arsene: We use machine learn­ing specif­i­cal­ly in the cloud for mal­ware. Most­ly it’s about mal­ware detec­tion these days when you talk about machine learn­ing. You’re see­ing a lot of star­tups praise their abil­i­ty to use machine learn­ing to iden­ti­fy threats. We have been using machine learn­ing for at least six years. It’s not about the fact that you’re using it; it’s how you use it and how accu­rate you are at detect­ing stuff.

More sto­ries about machine learning:
Machine learn­ing helps detect real-time net­work threats
Machine learn­ing helps orga­ni­za­tions strength­en secu­ri­ty, iden­ti­fy inside threats
Pre­dic­tive threat intel­li­gence roots out cyber threats before they occur