Home >Backend Development >Python Tutorial >How to Build AI Agents that can Use any Website
Connecting AI Agents to the Web: A Developer's Journey and the Rise of Computer Use
One major hurdle in AI agent development over the past two years has been reliably granting web access. Consider an AI agent designed to send emails: how do you connect it to Gmail or Outlook? APIs, websites, or autonomous web agents? This article explores various methods.
APIs and SDKs: A Limited Approach
Many developers utilize APIs and SDKs. This offers low latency and robust authentication, but limitations exist:
Fortunately, several services offer API call libraries:
However, for universal web service access, we must move beyond APIs.
Website Interaction: The Human Approach
Reliable AI agent website interaction enables automation of any web-based human task. But how?
Many developers initially use browser testing frameworks like Selenium or Playwright. This approach, however, faces challenges:
To address these issues, we experimented with a Browser SDK that:
get_element("find the login button")
) instead of brittle CSS selectors.This work, now open-source (Dendrite SDK), is no longer under active development but remains available for study and adaptation. Similar alternatives include:
Computer Use: The Future of Web AI Agents?
Rich Sutton's "Bitter Lesson" highlights the dominance of generalizable AI solutions scalable with increased compute. Anthropic's Computer Use embodies this principle, allowing LLMs to directly control computers/browsers using mouse and keyboard input, eliminating the need for scripts and API calls. Their approach emphasizes general computer skills over task-specific tools. This aligns perfectly with the Bitter Lesson, suggesting that the most versatile AI agents will directly interact with the web like humans. Early results show high reliability in complex tasks using well-crafted prompts, often enhanced by Anthropic's prompt improver.
Conclusion: Embracing the Future
While APIs remain valuable, the future likely favors Computer Use-like approaches for most AI agents. If an agent can log in and use a website's search function, extracting conclusions from top results, why rely on the entire database via an API? The question for AI developers is whether to embrace this generalizable approach or risk facing the limitations of more specialized methods.
Note: This is my first dev.to post. Feedback on improving future posts is welcome. Questions on AI agents or AI-driven task automation are also encouraged.
The above is the detailed content of How to Build AI Agents that can Use any Website. For more information, please follow other related articles on the PHP Chinese website!