Streaming responses from LLMs

Explore streaming on Vercel with code samples that work out of the box.
Table of Contents

AI providers can be slow when producing responses, but many make their responses available in chunks as they're processed. Streaming enables you to show users those chunks of data as they arrive rather than waiting for the full response, improving the perceived speed of AI-powered apps.

Vercel recommends using Vercel's AI SDK to stream responses from LLMs and AI APIs. It reduces the boilerplate necessary for streaming responses from AI providers.

This example demonstrates a function that sends a message to one of Open AI's GPT models and streams the response:

  • You must have installed the ai and openai packages:
    terminal
    pnpm install ai openai zod
  • Copy an OpenAI API key in the .env.local file with name OPENAI_API_KEY. See the AI SDK docs for more information on how to do this
  • You should be using Node.js 18 or later
Next.js (/app)
Next.js (/pages)
Other frameworks
app/api/chat-example/route.ts
import OpenAI from 'openai';
import { OpenAIStream, StreamingTextResponse } from 'ai';
 
const openai = new OpenAI({
  apiKey: process.env.OPEN_API_KEY,
});
// This method must be named GET
export async function GET() {
  // Make a request to OpenAI's API based on
  // a placeholder prompt
  const response = await openai.chat.completions.create({
    model: 'gpt-3.5-turbo',
    stream: true,
    messages: [{ role: 'user', content: 'Say this is a test.' }],
  });
  // Log the response
  for await (const part of response) {
    console.log(part.choices[0].delta);
  }
  // Convert the response into a friendly text-stream
  const stream = OpenAIStream(response);
  // Respond with the stream
  return new StreamingTextResponse(stream);
}

If you're not using a framework, you must either add "type": "module" to your package.json or change your JavaScript Functions' file extensions from .js to .mjs

Build your app and visit localhost:3000/api/chat-example. You should see the text "This is a test." in the browser.

Last updated on May 10, 2024