But attackers can exploit this – instructions embedded in the agent’s memory, possibly through external content such as emails, web pages or documents, can manipulate its behaviour.
Known as memory poisoning, attackers can add inputs in fragments over time. The agent stores these fragments in its long-term memory, and they later combine into a harmful set of instructions.
In practical terms, a user could think the agent is just preparing a report. But it could also be following hidden instructions embedded earlier through emails, webpages or documents, said Associate Professor Goh Weihan with the Singapore Institute of Technology (SIT).
OpenClaw can also learn skills from external sources, and these skills are often made by other users and do not undergo rigorous vetting, which opens up further risks.
Applying this to practical uses, an individual may, for example, allow OpenClaw access to their personal email inbox.
If their agent is compromised, then the information in their personal email accounts is also not safe.
In order to automate tasks, the agent knows everything about you, which allows it to give very smart answers.
“But the thing is, that very thing also makes it very dangerous, because now it has access to all the context of what you do in your daily life. There’s a lot of compromising information that it can give,” said Mr Chen.
An AI agent that has access to a person’s emails already knows who they are in contact with. It could impersonate them or reveal information about those they are in contact with, he added.
Even if an individual only uses OpenClaw as a personal assistant, this access could still reveal that the individual works for a certain company. It could lead to a chain of events that compromises the larger organisation.
Since OpenClaw has become so popular, many people are trying to break the application and exploit it, said Mr Chen.
“It’s just too viral for its own good at this moment right now,” he added.
What sets agentic AI systems apart is that they can move from giving suggestions to performing actions, said SIT’s Assoc Prof Goh.
“Your normal AI chatbot may give a poor answer, and that’s pretty much the end of it. An AI agent, with access to emails, files, code repositories, or cloud systems, may act on that answer,” he added.
Any unintended errors or malicious instructions can have a much larger, real-world impact, beyond just a bad answer, said Assoc Prof Goh, citing the incident where a Meta AI security researcher had her entire email inbox deleted by OpenClaw in February.
The AI agent seemingly bypassed safety instructions to ask for permission, ignored stop commands and deleted hundreds of emails, he added.