Testing AI Features in Laravel: The Most Underrated Part of the SDK - Blog

You built an AI feature. It works. Your agent returns structured data, your images generate, your transcriptions fire.

Now test it.

Without hitting the API. Without burning credits. Without flaky tests that break because OpenAI had a bad day.

This is where most Laravel developers get stuck. And this is where the AI SDK quietly solves a problem nobody’s talking about.

The old way

Before the SDK, testing AI features meant faking HTTP responses:

Http::fake([
    'api.openai.com/*' => Http::response([
        'id' => 'chatcmpl-abc123',
        'object' => 'chat.completion',
        'choices' => [[
            'index' => 0,
            'message' => [
                'role' => 'assistant',
                'content' => json_encode([
                    'priority' => 'high',
                    'category' => 'billing',
                ]),
            ],
            'finish_reason' => 'stop',
        ]],
        'usage' => [
            'prompt_tokens' => 150,
            'completion_tokens' => 50,
            'total_tokens' => 200,
        ],
    ]),
]);

$result = app(TicketService::class)->classify($ticket);

$data = json_decode($result, true);
$this->assertEquals('high', $data['priority']);

Http::assertSent(function ($request) {
    $body = json_decode($request->body(), true);
    return str_contains($body['messages'][1]['content'] ?? '', 'billing');
});

You’re not testing your business logic. You’re testing OpenAI’s response format. Switch to Anthropic? Rewrite every mock. And if you skip the faking and hit the real API? Each test run costs money and breaks when the provider has a bad day.

The SDK’s testing pattern

The Laravel AI SDK ships with fakes that work like every other Laravel fake. Http::fake(), Queue::fake(), Mail::fake() — now Agent::fake(). Same pattern.

TicketClassifier::fake();

$response = $this->post('/api/tickets/classify', [
    'ticket_id' => $ticket->id,
]);

$response->assertOk();

TicketClassifier::assertPrompted(
    fn ($prompt) => $prompt->contains('billing')
);

No HTTP mocks. No fixture files. No API keys in CI. Switch providers without touching tests.

The same fake() and assertion pattern works for every AI operation: agents, images, audio, transcription, embeddings, reranking, files, and vector stores. The official documentation covers the full API for each one.

Here are the real-world scenarios the docs don’t show you.

The setup: a ticket classifier

Every scenario builds on the same feature: a support ticket classifier for a SaaS app. A customer submits a ticket. The agent reads it, assigns a priority, picks a category, and drafts a suggested response.

class TicketClassifier implements Agent, HasStructuredOutput
{
    use Promptable;

    public function instructions(): string
    {
        return <<<'PROMPT'
            You are a support ticket classifier for a SaaS platform.
            Read the ticket content and classify it.
            Assign a priority based on urgency and business impact.
            Pick the most relevant category.
            Draft a short suggested first response for the support team.
        PROMPT;
    }

    public function schema(JsonSchema $schema): array
    {
        return [
            'priority' => $schema->string()->enum(['low', 'medium', 'high', 'urgent'])->required(),
            'category' => $schema->string()->enum([
                'billing', 'bug', 'feature-request', 'account', 'general',
            ])->required(),
            'suggested_response' => $schema->string()->required(),
        ];
    }
}

The controller:

class TicketController extends Controller
{
    public function classify(Request $request)
    {
        $ticket = Ticket::findOrFail($request->input('ticket_id'));

        $result = (new TicketClassifier)->prompt(
            "Classify this support ticket:\n\nSubject: {$ticket->subject}\n\nBody: {$ticket->body}"
        );

        $ticket->update([
            'ai_priority' => $result['priority'],
            'ai_category' => $result['category'],
            'ai_suggested_response' => $result['suggested_response'],
        ]);

        return response()->json($ticket->fresh());
    }
}

That’s the foundation. Most tutorials stop here. Here’s what testing actually looks like when your feature hits production.

Testing that your prompt carries the right context

In production, your prompt isn’t just the ticket body. It includes customer context that changes how the agent classifies. Enterprise customer, 5 open tickets, waiting 48 hours. That’s a different priority than a free-tier user who submitted 10 minutes ago.

Here’s what the controller looks like once you’ve shipped v2:

$ticket = Ticket::findOrFail($request->input('ticket_id'));
$customer = $ticket->customer;
$openCount = $customer->tickets()->where('status', 'open')->count();
$waitingHours = $ticket->created_at->diffInHours(now());

$result = (new TicketClassifier)->prompt(<<<PROMPT
    Classify this support ticket.

    Customer plan: {$customer->plan}
    Open tickets: {$openCount}
    Waiting: {$waitingHours} hours

    Subject: {$ticket->subject}

    {$ticket->body}
PROMPT);

// ... update ticket and return response

The prompt now includes the customer’s plan, their open ticket count, and how long they’ve been waiting. Here’s the test:

public function test_prompt_includes_customer_context(): void
{
    TicketClassifier::fake();

    $customer = Customer::factory()->create(['plan' => 'enterprise']);
    Ticket::factory()->count(3)->for($customer)->create(['status' => 'open']);

    $ticket = Ticket::factory()->for($customer)->create([
        'subject' => 'Cannot export reports',
        'body' => 'The export button returns a 500 error.',
        'status' => 'open',
        'created_at' => now()->subHours(48),
    ]);

    $this->post('/api/tickets/classify', ['ticket_id' => $ticket->id]);

    TicketClassifier::assertPrompted(function ($prompt) {
        return $prompt->contains('enterprise')
            && $prompt->contains('Open tickets: 4')
            && $prompt->contains('48 hours')
            && $prompt->contains('Cannot export reports');
    });
}

This catches a real class of bugs. Someone refactors the controller, changes the query, accidentally drops the customer plan from the prompt. The AI still responds. It just gives worse classifications because it’s missing context. Without this test, you don’t know until customers complain.

The docs show assertPrompted with a simple string match. In practice, you’re asserting that data from multiple sources made it into the prompt correctly. That’s where the bugs hide.

When two agents work together

Most examples stop at one agent, one call. In production, you chain them. One classifies, another acts on the classification.

Add a ResponseDrafter that writes a customer-facing reply based on how the ticket was classified:

class ResponseDrafter implements Agent
{
    use Promptable;

    public function instructions(): string
    {
        return 'Draft a professional, empathetic customer support response. '
             . 'Match the tone to the ticket priority — urgent tickets need immediate reassurance.';
    }
}

The service that chains them:

class TicketService
{
    public function classifyAndDraft(Ticket $ticket): void
    {
        $classification = (new TicketClassifier)->prompt(
            "Classify this ticket:\n\n{$ticket->subject}\n\n{$ticket->body}"
        );

        $ticket->update([
            'ai_priority' => $classification['priority'],
            'ai_category' => $classification['category'],
        ]);

        $draft = (new ResponseDrafter)->prompt(<<<PROMPT
            Write a response for this {$classification['priority']} priority
            {$classification['category']} ticket:

            {$ticket->subject}

            {$ticket->body}
        PROMPT);

        $ticket->update(['ai_draft_response' => $draft]);
    }
}

The test fakes both agents and verifies data flows between them:

public function test_classifies_then_drafts_response(): void
{
    TicketClassifier::fake([
        json_encode([
            'priority' => 'urgent',
            'category' => 'billing',
            'suggested_response' => '...',
        ]),
    ]);

    ResponseDrafter::fake([
        'We sincerely apologize for the billing issue. I have escalated this to our payments team and they will reach out within the hour.',
    ]);

    $ticket = Ticket::factory()->create([
        'subject' => 'Charged $500 instead of $50',
        'body' => 'My card was overcharged by 10x on the last invoice.',
    ]);

    app(TicketService::class)->classifyAndDraft($ticket);

    TicketClassifier::assertPrompted(
        fn ($prompt) => $prompt->contains('Charged $500')
    );

    ResponseDrafter::assertPrompted(function ($prompt) {
        return $prompt->contains('urgent')
            && $prompt->contains('billing')
            && $prompt->contains('Charged $500');
    });

    $ticket->refresh();
    $this->assertEquals('urgent', $ticket->ai_priority);
    $this->assertNotNull($ticket->ai_draft_response);
}

This catches a specific bug: someone changes the service and forgets to pass the classification to the second agent, or passes the wrong field. The ResponseDrafter still runs, but with wrong context. This test proves the data flows correctly between agents. The docs show how to fake one agent. They don’t show multi-agent coordination.

When the AI call fails

OpenAI goes down. Anthropic rate-limits you. This will happen in production. Your test suite should prove your app handles it.

public function test_handles_provider_failure_gracefully(): void
{
    TicketClassifier::fake(function () {
        throw new \RuntimeException('Provider unavailable');
    });

    $ticket = Ticket::factory()->create([
        'subject' => 'Need help with my account',
        'body' => 'Something is wrong with my subscription.',
    ]);

    $response = $this->post('/api/tickets/classify', [
        'ticket_id' => $ticket->id,
    ]);

    $response->assertStatus(503);
    $response->assertJsonFragment([
        'message' => 'AI classification is temporarily unavailable. Our team has been notified.',
    ]);

    $ticket->refresh();
    $this->assertNull($ticket->ai_priority);
    $this->assertNull($ticket->ai_category);
}

Two things matter here. First, no partial state — the ticket isn’t half-classified with a priority but no category. Second, the user gets a clear error, not a 500 with a stack trace.

This test forces you to write the error handling. Wrap the agent call in a try/catch:

try {
    $result = (new TicketClassifier)->prompt(
        "Classify this ticket:\n\n{$ticket->subject}\n\n{$ticket->body}"
    );
} catch (\RuntimeException $e) {
    report($e);

    return response()->json([
        'message' => 'AI classification is temporarily unavailable. Our team has been notified.',
    ], 503);
}

The docs don’t cover error testing at all. Every production AI feature needs it.

Making sure AI doesn’t run when it shouldn’t

This is the one that saves you money.

Your SaaS has three plans. Free users get manual ticket handling. Pro and Enterprise get AI classification. Your ticket creation endpoint auto-classifies for paid users. But free-tier users should never trigger an AI call.

public function test_free_tier_skips_ai_classification(): void
{
    TicketClassifier::fake()->preventStrayPrompts();

    $user = User::factory()->create(['plan' => 'free']);

    $response = $this->actingAs($user)->post('/api/tickets', [
        'subject' => 'Question about pricing',
        'body' => 'What features are included in the Pro plan?',
    ]);

    $response->assertCreated();
    TicketClassifier::assertNeverPrompted();
}

preventStrayPrompts() is the safety net. If any code path triggers this agent without a matching fake, it throws. Combined with assertNeverPrompted(), you have two layers of protection.

Here’s why this test matters: someone adds a new feature that runs all incoming tickets through the classifier, regardless of plan. Without this test, it passes silently. Your next invoice shows 10x the API calls you expected. With preventStrayPrompts, the test explodes the moment an unexpected AI call happens.

The docs mention preventStrayPrompts() as a method. They don’t show you this is how you guard against accidental API spend.

Testing the agent separately from its tools

Your classifier uses a SearchKnowledgeBase tool to find similar resolved tickets before classifying:

class TicketClassifier implements Agent, HasStructuredOutput, HasTools
{
    use Promptable;

    public function tools(): iterable
    {
        return [new SearchKnowledgeBase];
    }

    // ... instructions and schema same as before
}

class SearchKnowledgeBase implements Tool
{
    public function description(): string
    {
        return 'Search resolved tickets for similar issues.';
    }

    public function handle(Request $request): string
    {
        return Ticket::query()
            ->where('status', 'resolved')
            ->where('subject', 'like', "%{$request['query']}%")
            ->limit(3)
            ->get()
            ->map(fn ($t) => "{$t->subject}: {$t->resolution}")
            ->implode("\n");
    }

    public function schema(JsonSchema $schema): array
    {
        return [
            'query' => $schema->string()->required(),
        ];
    }
}

When you fake the agent, the tools don’t execute. The fake intercepts the entire call before the agent runs. No database queries, no external calls:

public function test_classifies_with_knowledge_base(): void
{
    TicketClassifier::fake();

    $ticket = Ticket::factory()->create([
        'subject' => 'Dashboard not loading after update',
        'body' => 'Since the latest update, my dashboard shows a blank page.',
    ]);

    $this->post('/api/tickets/classify', ['ticket_id' => $ticket->id]);

    TicketClassifier::assertPrompted(
        fn ($prompt) => $prompt->contains('Dashboard not loading')
    );
}

But you should test the tool itself in isolation:

public function test_knowledge_base_returns_similar_resolved_tickets(): void
{
    Ticket::factory()->create([
        'subject' => 'Dashboard blank after v2.1 update',
        'status' => 'resolved',
        'resolution' => 'Clearing browser cache fixes it.',
    ]);

    $tool = new SearchKnowledgeBase;
    $result = $tool->handle(new Request(['query' => 'Dashboard blank']));

    $this->assertStringContainsString('Clearing browser cache', $result);
}

Agent tests and tool tests are separate concerns. The agent test verifies the right prompt was sent. The tool test verifies the tool returns useful results. You test them independently. Same way you’d test a controller separately from the service it calls. The docs don’t make this distinction.

Production checklist

A few things that tie all of this together.

Fake everything in CI. No API keys in your test environment. Tests are deterministic, fast, and free.

Use preventStrayPrompts on every faked agent. It catches AI calls you didn’t expect. In a large app, this is your first line of defense against surprise costs.

Validate structured output server-side too. The provider enforces your schema, but add your own validation:

$result = (new TicketClassifier)->prompt("Classify: {$ticket->body}");

$validated = validator($result->toArray(), [
    'priority' => ['required', 'in:low,medium,high,urgent'],
    'category' => ['required', 'in:billing,bug,feature-request,account,general'],
    'suggested_response' => ['required', 'string', 'max:500'],
])->validate();

Run AI jobs on a separate queue:

php artisan queue:work --queue=ai --timeout=180 --memory=512

What this actually gives you

The docs give you the methods. This post shows you what to actually test.

Test your prompt construction. Verify data flows between agents. Prove your app handles failure. Catch accidental API spend before it hits your invoice.

Your CI runs clean. No API keys. No flaky tests. Ship it.