Gemini analyzes the scene and replies with a function call such as “click,” “type,” or “scroll,” which the client executes.
Google unveils Gemini 2.5 Computer Use AI that can click, type, and scroll like humans, outperforming OpenAI and Anthropic in ...