Introduction
Analyzing open source code is one of the best ways to become a better developer. Open source projects often require highly generic implementations, forcing developers to use advanced design patterns and architectural techniques. Understanding these patterns makes everyday business code easier to grasp and write.
Today, we're diving into Stagehand, an AI-powered browser automation library. Unlike traditional frameworks where you write driver.findElement("#submit-btn").click(), Stagehand lets you write natural instructions: stagehand.act("click the submit button"). The key innovation? Your frontend code can change as much as needed - Stagehand uses AI to find elements dynamically, keeping automation working without constant maintenance.
Stagehand provides three core methods:
-
act()- Perform actions (clicking, typing, navigating) -
extract()- Extract structured data from pages -
observe()- Plan actions before executing
But here's the architectural challenge: How do you build a library that works with multiple LLM providers (OpenAI, Anthropic, Google) AND allows companies to integrate their own custom LLM gateways?
Companies can't pass API keys directly to third-party libraries. They need LLM gateways - centralized systems that control access, track budgets, enforce rate limits, and maintain compliance. Stagehand needs to work with these custom gateways without knowing their implementation details.
const stagehand = new Stagehand({
env: "LOCAL",
model: "gpt-4.1-mini"
});
await stagehand.init();
await stagehand.act("click the add to cart button");
The Problem: Integrating Multiple LLM Providers
Every LLM provider has completely different SDKs and response formats. OpenAI returns one structure, Google Gemini returns another, Anthropic Claude yet another. Company LLM gateways might have custom implementations that proxy multiple providers.
Why Companies Use LLM Gateways
Large organizations build internal LLM gateways to provide:
-
Centralized access control - Determines which teams can use which models
-
Budget tracking and limits - Prevents cost overruns
-
Logging and compliance - Audits all LLM interactions
-
Rate limiting - Prevents abuse
-
Multi-provider routing - Routes requests based on business rules
The challenge: Stagehand can't know how to call every company's custom gateway.
The traditional approach would hard-code one or two providers, leading to vendor lock-in. Stagehand takes a different approach:
The diagram shows Stagehand's decision tree - either use a custom client provided by the user, or select a built-in provider based on model name.
The Solution: Interface-Based Architecture
Stagehand's solution: define a contract that all LLM providers must implement.
Here's the key interface (lib/v3/llm/LLMClient.ts):
export abstract class LLMClient {
public modelName: AvailableModel;
abstract createChatCompletion<T = LLMResponse>(
options: CreateChatCompletionOptions
): Promise<T>;
}
This is the contract: any client must implement createChatCompletion().
Input: Standardized ChatCompletionOptions (messages, temperature, schemas, tools)
Output: Unified LLMResponse format based on OpenAI's structure (industry standard):
Why OpenAI's format? It's the most widely adopted, making it familiar to developers.
export type LLMResponse = {
id: string;
model: string;
choices: [{ message: {...}, finish_reason: string }];
usage: { prompt_tokens: number, completion_tokens: number, ... };
};
Every implementation must accept CreateChatCompletionOptions and return LLMResponse or LLMParsedResponse<T>. This abstraction enables everything else.
Concrete Implementations: The Adapters
OpenAIClient: No Adaptation Needed
export class OpenAIClient extends LLMClient {
private client: OpenAI;
async createChatCompletion<T = LLMResponse>({
options,
logger,
}: CreateChatCompletionOptions): Promise<T> {
const response = await this.client.chat.completions.create(body);
// OpenAI's response is already in LLMResponse format!
return response as T;
}
}
Source(lib/v3/llm/OpenAiClient.ts)
Key insight: OpenAI's SDK already returns LLMResponse format, so this is a thin wrapper.
GoogleClient: The Adapter Pattern in Action
export class GoogleClient extends LLMClient {
private client: GoogleGenAI;
async createChatCompletion<T = LLMResponse>({
options,
logger,
}: CreateChatCompletionOptions): Promise<T> {
// Call Google's API
const result = await this.client.models.generateContent({
model: this.modelName,
contents: formattedMessages,
config: generationConfig,
});
// Convert Google's format to OpenAI-compatible LLMResponse format
const llmResponse: LLMResponse = {
// Map Google's response structure to OpenAI format
// result.text -> choices[0].message.content
// result.usageMetadata -> usage tokens
// ...
};
return llmResponse as T;
}
}
Source(lib/v3/llm/GoogleClient.ts)
The adapter translates input (Stagehand → Google format) and output (Google → OpenAI-compatible format). This is the Adapter Pattern in action - wrapping different provider APIs behind a unified interface.
The Factory: LLMProvider.getClient()
The Factory Pattern determines which client to instantiate (lib/v3/llm/LLMProvider.ts):
export class LLMProvider {
getClient(modelName: AvailableModel, clientOptions?: ClientOptions): LLMClient {
const provider = modelToProviderMap[modelName]; // "gpt-4o" -> "openai"
switch (provider) {
case "openai":
return new OpenAIClient({ modelName, clientOptions });
case "anthropic":
return new AnthropicClient({ modelName, clientOptions });
case "google":
return new GoogleClient({ modelName, clientOptions });
// ...
}
}
}
The factory centralizes client creation. Adding new providers is just adding a new case.
Dependency Injection: Custom Clients
In Stagehand's constructor (lib/v3/v3.ts):
export class V3 {
constructor(opts: V3Options) {
if (opts.llmClient) {
// User provided custom client (company gateway)
this.llmClient = opts.llmClient;
} else {
// Use factory to get default client
this.llmClient = this.llmProvider.getClient(
this.modelName,
this.modelClientOptions
);
}
}
}
Dependency Injection: Custom client or factory-created client - both work.
How handlers use it (lib/v3/handlers/actHandler.ts):
export class ActHandler {
private readonly llmClient: LLMClient; // Depends on interface!
private async getActionFromLLM({ llmClient, ... }) {
const response = await actInference({
llmClient, // Don't care which implementation!
instruction,
domElements,
});
}
}
ActHandler depends on the LLMClient interface, not concrete implementations. It doesn't know if it's OpenAI, Google, or a custom gateway. This is programming to interfaces, not implementations - a fundamental principle that enables flexibility and extensibility.
Extensibility: Building Your Own Adapter
This architecture makes Stagehand extensible for real-world enterprise needs. Companies typically can't use LLM providers directly - they need to integrate with their organization's centralized LLM gateway. These gateways provide access control, budget tracking, compliance logging, and rate limiting. By implementing the LLMClient interface, you can connect Stagehand to your company's gateway, and the library works with your implementation exactly like it works with built-in providers.
Here's a simplified example:
import { LLMClient, LLMResponse } from "@browserbasehq/stagehand";
// Your custom adapter
export class CompanyGatewayClient extends LLMClient {
constructor(config) {
super("custom-model");
}
async createChatCompletion({ options }) {
// Call your gateway
const result = await yourGateway.chat(options.messages);
// Return OpenAI-compatible format
return {
model: this.modelName,
choices: [{ message: { content: result.text } }],
usage: { prompt_tokens: result.tokens },
};
}
}
// Use it
const stagehand = new Stagehand({
llmClient: new CompanyGatewayClient({ url: "..." })
});
That's it. Implement one method, return the expected format, and you've extended Stagehand to work with your custom infrastructure. This is the power of interface-based design - the library becomes a platform that adapts to your needs, not the other way around.
Conclusion
Stagehand's architecture shows how deep knowledge of software development patterns solves modern AI integration challenges.
The combination of Strategy, Adapter, Factory, and Dependency Injection patterns creates a flexible, extensible system where:
-
Interfaces enable flexibility by depending on abstractions, not implementations
-
Adapters unify diverse provider APIs behind a common format
-
Factories centralize creation logic in one maintainable place
-
Dependency injection allows customization without modifying source code
These aren't AI-specific patterns - they're foundational software engineering principles. Understanding polymorphism, interfaces, and design patterns makes you better at building any system.
Practical value:
-
Integrate any LLM provider with minimal code
-
Use company gateways for compliance and control
-
Keep code maintainable and testable
-
Future-proof for providers that don't exist yet