Google DeepMind Introduces Gemini 2.5 Computer Use Model

Praveen Kumar
Oct 08
1.4k
0
4

News

Image Courtesy: Google

Google DeepMind has officially unveiled the Gemini 2.5 Computer Use model, a powerful new addition to the Gemini 2.5 family designed to enable AI agents to interact directly with computer interfaces — just like humans.

Now available in public preview via the Gemini API on Google AI Studio and Vertex AI, this model brings a major leap in how developers can build autonomous, UI-controlling agents that perform real-world digital tasks across browsers and mobile apps.

What Makes Gemini 2.5 Computer Use Special

While traditional AI models interact through APIs, many real-world digital workflows still depend on graphical user interfaces (GUIs) — think clicking buttons, filling out forms, scrolling pages, or navigating dashboards.

The Gemini 2.5 Computer Use model is purpose-built for this. It enables agents to:

Understand and analyze on-screen elements.
Perform human-like actions such as clicking, typing, and dragging.
Handle login screens and interactive elements like dropdowns or filters.
Request user confirmation for sensitive or high-stakes actions.

This marks a step closer to true general-purpose AI agents that can autonomously operate digital tools and websites without human intervention.

How It Works

The new computer_use tool within the Gemini API allows developers to create agents that work in a loop:

The agent receives a user request, a screenshot, and a history of actions.
The model processes these inputs and outputs the next action — such as “click,” “type,” or “scroll.”
The system executes the action and feeds back an updated screenshot.
The cycle continues until the task is completed.

This design allows the model to perform multi-step workflows, such as filling forms, organizing dashboards, or booking appointments — all autonomously.

Benchmark Performance

According to Google DeepMind, Gemini 2.5 Computer Use outperforms all major competitors on multiple control benchmarks, including:

Online-Mind2Web
WebVoyager
AndroidWorld

It delivers over 70% task accuracy at significantly lower latency than other models — making it not just smarter, but faster.

Built with Safety in Mind

Given the potential risks of AI systems that control computers, DeepMind has placed a heavy emphasis on responsible deployment.

The model includes built-in safeguards such as:

Per-step safety service that reviews every model action before execution.
User confirmation prompts for high-risk actions like purchases or deletions.
Developer-level controls to restrict or customize behavior for secure use cases.

Getting Started

Developers can now explore the Gemini 2.5 Computer Use model through:

Google AI Studio — for quick demos and experimentation.
Vertex AI — for enterprise integration.
Browserbase — for hosted testing and automation.