In a recent post I wrote about how artificial intelligence, and more specifically a reinforcement learning agent, was able to optimize sorting algorithms writing assembly code. This hints at using AI to write entirely novel viruses and malware agents from scratch. Fortunately for us, this is an extremely expensive (compute and development) proposition. Unfortunately, it’s not the only way to use reinforcement learning to deploy malware.
Earlier this year, researchers from Europe wrote a paper in which they used reinforcement learning to deploy malware. The experiments centered around the stealthy deployment of encryption malware, which could be used in extortion schemes (i.e. pay us some loot or we’ll leave your system encrypted). Rather than requiring some complex new environment and variant of Alpha Zero, this agent was based on good old Q learning.
How It Works
Naturally, there exist a broad variety of encryption malware packages out there. Most bad actors will deploy a single one, and hopefully your anti-virus software catches it in action and neutralizes it. Machine learning can indeed be used to detect the action of malware, and it works fingerprinting your system.
In plain terms, this means keeping track of the normal processes run your operating system. Certainly the use of malware to encrypt your system will result in your system operating outside of its normal bounds, and the degree to which it does this can cause a statistically significant signal. Unsupervised machine learning, in particular an algorithm called the Isolation Forest can be used to categorize normal operations and suspicious activities.
From the bad actors’ perspective, it would be great to be able to vary the encryption malware used, such that the deviations from the baseline fingerprint would appear just like random noise, rather than the telltale sign of encryption. This is precisely what RansomAI sets out to do.
RansomAI is a deep Q learning agent whose actions correspond to deploying one of several different encryption algorithms. This malware is deployed on a Raspberry Pi, and the resulting finger print (an observation vector with 50 elements)and reward is fed back into the agent.
Rewards are given based on the bit rate of encryption, while a penalty as assigned based on detection. The logic being that we want to encrypt as quickly and surreptitiously as possible.
The authors performed an exhaustive grid search on the myriad of hyperparameters associated with fine tuning a deep reinforcement learning agent. They publish all their efforts in the paper, and it’s quite refreshing to see such transparency. I suppose that’s the downside of not being from Deep Mind: people are going to view your work much more skeptically and you’ll have a higher burdern of proof. But I digress. I won’t be covering the full extent of their search here, but suffice it to say they did their homework and I recommend you check out the paper to get a feel for the extent of their research.
The end result is an agent that can, within an hour, train an agent to encrypt a system with 99% accuracy. More startingly, 91% accuracy is achieved in a mere two minutes. Here the accuracy means that the agent was able to successfully encrypt the target system without being detected the anomaly detector.
While this is a brief article, I don’t think its importance should be understated. This is a frightening new development, and it portends an arms race in artificial intelligence research. Researchers will keep coming up with new malware agents, and the white hats will have to develop new agents to keep up.
I don’t know where it leads, but I’m guessing it’s not good. Coupled with the advanced in large language models, it’s only a matter of time before a malware deploying chatbot is able to talk a user into deploying the malware themselves. Then the malware itself can do the dirty work while remaining undetected. Heck perhaps the chatbot can convince the users its a legitimate package and to run it in administrator mode (for those unfortunate enough to be stuck on Windows).
As per usual, I’d recommend you stick to running a Linux based system. Security through obscurity and all that. Security aside, it’s just a better overall operating system. Sorry Apple fan boys.