Analyzing Stagehand's Source Code: Classic Design Patterns Solving Modern AI Problems

Introduction

Analyzing open source code is one of the best ways to become a better developer. Open source projects often require highly generic implementations, forcing developers to use advanced design patterns and architectural techniques. Understanding these patterns makes everyday business code easier to grasp and write.

Today, we're diving into Stagehand, an AI-powered browser automation library. Unlike traditional frameworks where you write driver.findElement("#submit-btn").click(), Stagehand lets you write natural instructions: stagehand.act("click the submit button"). The key innovation? Your frontend code can change as much as needed - Stagehand uses AI to find elements dynamically, keeping automation working without constant maintenance.

Stagehand provides three core methods:

act() - Perform actions (clicking, typing, navigating)
extract() - Extract structured data from pages
observe() - Plan actions before executing

But here's the architectural challenge: How do you build a library that works with multiple LLM providers (OpenAI, Anthropic, Google) AND allows companies to integrate their own custom LLM gateways?

Companies can't pass API keys directly to third-party libraries. They need LLM gateways - centralized systems that control access, track budgets, enforce rate limits, and maintain compliance. Stagehand needs to work with these custom gateways without knowing their implementation details.

const stagehand = new Stagehand({
  env: "LOCAL",
  model: "gpt-4.1-mini"
});

await stagehand.init();
await stagehand.act("click the add to cart button");

The Problem: Integrating Multiple LLM Providers

Every LLM provider has completely different SDKs and response formats. OpenAI returns one structure, Google Gemini returns another, Anthropic Claude yet another. Company LLM gateways might have custom implementations that proxy multiple providers.

Why Companies Use LLM Gateways

Large organizations build internal LLM gateways to provide:

Centralized access control - Determines which teams can use which models
Budget tracking and limits - Prevents cost overruns
Logging and compliance - Audits all LLM interactions
Rate limiting - Prevents abuse
Multi-provider routing - Routes requests based on business rules

The challenge: Stagehand can't know how to call every company's custom gateway.

The traditional approach would hard-code one or two providers, leading to vendor lock-in. Stagehand takes a different approach:

The diagram shows Stagehand's decision tree - either use a custom client provided by the user, or select a built-in provider based on model name.

The Solution: Interface-Based Architecture

Stagehand's solution: define a contract that all LLM providers must implement.

Here's the key interface (lib/v3/llm/LLMClient.ts):

export abstract class LLMClient {
  public modelName: AvailableModel;

  abstract createChatCompletion<T = LLMResponse>(
    options: CreateChatCompletionOptions
  ): Promise<T>;
}

This is the contract: any client must implement createChatCompletion().

Input: Standardized ChatCompletionOptions (messages, temperature, schemas, tools)
Output: Unified LLMResponse format based on OpenAI's structure (industry standard):

Why OpenAI's format? It's the most widely adopted, making it familiar to developers.

export type LLMResponse = {
  id: string;
  model: string;
  choices: [{ message: {...}, finish_reason: string }];
  usage: { prompt_tokens: number, completion_tokens: number, ... };
};

Every implementation must accept CreateChatCompletionOptions and return LLMResponse or LLMParsedResponse<T>. This abstraction enables everything else.

Concrete Implementations: The Adapters

OpenAIClient: No Adaptation Needed

export class OpenAIClient extends LLMClient {
  private client: OpenAI;

  async createChatCompletion<T = LLMResponse>({
    options,
    logger,
  }: CreateChatCompletionOptions): Promise<T> {
    const response = await this.client.chat.completions.create(body);

    // OpenAI's response is already in LLMResponse format!
    return response as T;
  }
}

Source(lib/v3/llm/OpenAiClient.ts)

Key insight: OpenAI's SDK already returns LLMResponse format, so this is a thin wrapper.

GoogleClient: The Adapter Pattern in Action

export class GoogleClient extends LLMClient {
  private client: GoogleGenAI;

  async createChatCompletion<T = LLMResponse>({
    options,
    logger,
  }: CreateChatCompletionOptions): Promise<T> {
    // Call Google's API
    const result = await this.client.models.generateContent({
      model: this.modelName,
      contents: formattedMessages,
      config: generationConfig,
    });

    // Convert Google's format to OpenAI-compatible LLMResponse format
    const llmResponse: LLMResponse = {
      // Map Google's response structure to OpenAI format
      // result.text -> choices[0].message.content
      // result.usageMetadata -> usage tokens
      // ...
    };

    return llmResponse as T;
  }
}

Source(lib/v3/llm/GoogleClient.ts)

The adapter translates input (Stagehand → Google format) and output (Google → OpenAI-compatible format). This is the Adapter Pattern in action - wrapping different provider APIs behind a unified interface.

The Factory: LLMProvider.getClient()

The Factory Pattern determines which client to instantiate (lib/v3/llm/LLMProvider.ts):

export class LLMProvider {
  getClient(modelName: AvailableModel, clientOptions?: ClientOptions): LLMClient {
    const provider = modelToProviderMap[modelName]; // "gpt-4o" -> "openai"

    switch (provider) {
      case "openai":
        return new OpenAIClient({ modelName, clientOptions });
      case "anthropic":
        return new AnthropicClient({ modelName, clientOptions });
      case "google":
        return new GoogleClient({ modelName, clientOptions });
      // ...
    }
  }
}

The factory centralizes client creation. Adding new providers is just adding a new case.

Dependency Injection: Custom Clients

In Stagehand's constructor (lib/v3/v3.ts):

export class V3 {
  constructor(opts: V3Options) {
    if (opts.llmClient) {
      // User provided custom client (company gateway)
      this.llmClient = opts.llmClient;
    } else {
      // Use factory to get default client
      this.llmClient = this.llmProvider.getClient(
        this.modelName,
        this.modelClientOptions
      );
    }
  }
}

Dependency Injection: Custom client or factory-created client - both work.

How handlers use it (lib/v3/handlers/actHandler.ts):

export class ActHandler {
  private readonly llmClient: LLMClient; // Depends on interface!

  private async getActionFromLLM({ llmClient, ... }) {
    const response = await actInference({
      llmClient, // Don't care which implementation!
      instruction,
      domElements,
    });
  }
}

ActHandler depends on the LLMClient interface, not concrete implementations. It doesn't know if it's OpenAI, Google, or a custom gateway. This is programming to interfaces, not implementations - a fundamental principle that enables flexibility and extensibility.

Extensibility: Building Your Own Adapter

This architecture makes Stagehand extensible for real-world enterprise needs. Companies typically can't use LLM providers directly - they need to integrate with their organization's centralized LLM gateway. These gateways provide access control, budget tracking, compliance logging, and rate limiting. By implementing the LLMClient interface, you can connect Stagehand to your company's gateway, and the library works with your implementation exactly like it works with built-in providers.

Here's a simplified example:

import { LLMClient, LLMResponse } from "@browserbasehq/stagehand";

// Your custom adapter
export class CompanyGatewayClient extends LLMClient {
  constructor(config) {
    super("custom-model");
  }

  async createChatCompletion({ options }) {
    // Call your gateway
    const result = await yourGateway.chat(options.messages);

    // Return OpenAI-compatible format
    return {
      model: this.modelName,
      choices: [{ message: { content: result.text } }],
      usage: { prompt_tokens: result.tokens },
    };
  }
}

// Use it
const stagehand = new Stagehand({
  llmClient: new CompanyGatewayClient({ url: "..." })
});

That's it. Implement one method, return the expected format, and you've extended Stagehand to work with your custom infrastructure. This is the power of interface-based design - the library becomes a platform that adapts to your needs, not the other way around.

Conclusion

Stagehand's architecture shows how deep knowledge of software development patterns solves modern AI integration challenges.

The combination of Strategy, Adapter, Factory, and Dependency Injection patterns creates a flexible, extensible system where:

Interfaces enable flexibility by depending on abstractions, not implementations
Adapters unify diverse provider APIs behind a common format
Factories centralize creation logic in one maintainable place
Dependency injection allows customization without modifying source code

These aren't AI-specific patterns - they're foundational software engineering principles. Understanding polymorphism, interfaces, and design patterns makes you better at building any system.

Practical value:

Integrate any LLM provider with minimal code
Use company gateways for compliance and control
Keep code maintainable and testable
Future-proof for providers that don't exist yet