sábado, fevereiro 22, 2025
HomeTechnologyAI Agents Take Control: Exploring Computer-Use Agents

AI Agents Take Control: Exploring Computer-Use Agents



Two years after the generative AI boom really began with the launch of ChatGPT, it no longer seems that exciting to have a phenomenally helpful AI assistant hanging around in your web browser or phone, just waiting for you to ask it questions. The next big push in AI is for AI agents that can take action on your behalf. But while agentic AI has already arrived for power users like coders, everyday consumers don’t yet have these kinds of AI assistants.

That will soon change. Anthropic, Google DeepMind, and OpenAI have all recently unveiled experimental models that can use computers the way people do—searching the web for information, filling out forms, and clicking buttons. With a little guidance from the human user, they can do thinks like order groceries, call an Uber, hunt for the best price for a product, or find a flight for your next vacation. And while these early models have limited abilities and aren’t yet widely available, they show the direction that AI is going.

“This is just the AI clicking around,” said OpenAI CEO Sam Altman in a demo video as he watched the OpenAI agent, called Operator, navigate to OpenTable, look up a San Francisco restaurant, and check for a table for two at 7pm.

Zachary Lipton, an associate professor of machine learning at Carnegie Mellon University, notes that AI agents are already being embedded in specialized software for different types of enterprise customers such as salespeople, doctors, and lawyers. But until now, we haven’t seen AI agents that can “do routine stuff on your laptop,” he says. “What’s intriguing here is the possibility of people starting to hand over the keys.”

AI Agents from Anthropic, Google DeepMind, and OpenAI

Anthropic was the first to unveil this new functionality, with an announcement in October that its Claude chatbot can now “use computers the way humans do.” The company stressed that it was giving the models this capability as a public beta test, and that it’s only available to developers who are building tools and products on top of Anthropic’s large language models. Claude navigates by viewing screenshots of what the user sees and counting the pixels required to move the cursor to a certain spot for a click. A spokesperson for Anthropic says that Claude can do this work on any computer and within any desktop application.

Next out of the gate was Google DeepMind with its Project Mariner, built on top of Google’s Gemini 2 language model. The company showed Mariner off in December but called it an “early research prototype” and said it’s only making the tool available to “trusted testers” for now. As another precaution, Mariner currently only operates within the Chrome browser, and only within an active tab, meaning that it won’t run in the background while you work on other tasks. While this requirement seems to somewhat defeat the purpose of having a time-saving AI helper, it’s likely just a temporary condition for this early stage of development.

Finally, in January OpenAI launched its computer-use agent (CUA), called Operator. OpenAI called it a “research preview” and made it available only to users who pay US $200 per month for OpenAI’s premium service, though the company said it’s working toward broader release. Yash Kumar, an engineer on the Operator team, says the tool can work with essentially any website. “We’re starting with the browser because this is where the majority of work happens,” Kumar says. But he notes that “the CUA model is also trained to use a computer, so it’s possible we could expand it” to work with other desktop apps.

Like the others, Operator relies on chain-of-thought reasoning to take instructions and break them down into a series of tasks that it can complete. If it needs more information to complete a task—like, for example, if you prefer to buy red or yellow onions—it will pause and ask for input. It also asks for confirmation before taking a final step, like booking the restaurant table or putting in the grocery order.

Safety Concerns for Computer-Use Agents

Here are some things that computer-use agents can’t yet do: log in to sites, agree to terms of service, solve captchas, and enter credit card or other payment details. If an agent comes up against one of these roadblocks, it hands the steering wheel back to the human user. OpenAI notes that Operator doesn’t take screenshots of the browser while the user is entering login or payment information.

The three companies have all noted that putting an AI in charge of your computer could pose safety risks. Anthropic has specifically raised the concern of prompt injection attacks, or ways in which malicious actors can add something to the user’s prompt to make the model take an unexpected action. “Since Claude can interpret screenshots from computers connected to the internet, it’s possible that it may be exposed to content that includes prompt injection attacks,” Anthropic wrote in a blog post.

CMU’s Lipton says that the companies haven’t revealed much information about the computer-use agents and how they work, so it’s hard to assess the risks. “If someone is getting your computer operator to do something nefarious, does that mean they already have access to your computer?” he wonders, and if so, why wouldn’t the miscreant just take action directly?

Still, Lipton says, with all the actions we take and purchases we make online, “It doesn’t require a wild leap of imagination to imagine actions that would leave the user in a pickle.” For example, he says, “Who will be the first person who wakes up and says, ‘My [agent] bought me a fleet of cars?’”

The Future of Computer-Use Agents

While none of the companies have revealed a timeline for making their computer-use agents broadly available, it seems likely that consumers will begin to get access to them this year—either through the big AI companies or through startups creating cheaper knockoffs.

OpenAI’s Kumar says it’s an exciting time, and that Operator marks a step toward a more collaborative future for humans and AI. “It’s a stepping stone on our path to AGI,” he says, referring to the long-promised dream/nightmare of artificial general intelligence. “The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks.”

If you remember the prescient 2013 movie Her, it seems like we’re edging toward the world that existed at the beginning of the film, before the sultry-voiced Samantha began speaking into the protagonist’s ear. It’s a world in which everyone has a boring and neutral AI to help them read and respond to messages and take care of other mundane tasks. Once the AI companies solidly achieve that goal, they’ll no doubt start working on Samantha.

From Your Site Articles

Related Articles Around the Web

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments