AI and human facing each other, with matrix-style floating lines connecting them. The lines transition from digital codes near the AI to written words near the human, illustrating real-time communication between them.

Stream Real-time Feedback with ChatGPT: SSE via Fetch in Node.js

When it comes to optimizing user experience, real-time feedback is crucial. Nobody enjoys watching a spinner for what feels like an eternity while waiting for a response.

Introducing: Server-Sent Events (SSE) via ChatGPT API's stream parameter.

Animated GIF of a terminal window. The user inputs the command `$ node ask-chatgpt.js "Write an Elevenie about nodejs."` After a brief pause, the output unfolds word by word in real-time, eventually revealing an Elevenie poem about NodeJS: 'NodeJS, Software environment, Running JavaScript, Out of browsers, Evolution.'.

However, implementing this isn't straightforward, since the most common method for SSE is via GET requests using the EventSource API. However, using SSE with POST requests can be a tad tricky in Node.js.

Note: The ChatGPT API should not be accessed directly in the browser to ensure the API key remains confidential. Instead, a nodejs server should be utilized, through which your messages are tunneled to the ChatGPT API and enriched with the secret API key.

Table of Contents

  1. Setting Up the Call to the ChatGPT API
  2. Receive the data as a stream
  3. Processing the Streamed Data
  4. Wrapping Up (Full Code)
  5. Bonus: re-useable function using async generators

Setting Up the Call to the ChatGPT API

Before we start, make sure you're familiar with the ChatGPT API documentation, especially regarding the stream property. Setting the stream property to true allows partial message deltas to be sent as they become available. When the stream concludes, you'll see a data: [DONE] message.

The response JSON documentation also provides in-depth details on the response properties.

Here's our Node.js implementation:

const chatGptApiUrl = "https://api.openai.com/v1/chat/completions";
const response = await fetch(chatGptApiUrl, {
  method: "POST",
  headers: {
    Authorization: `Bearer <YOUR_API_KEY>`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gpt-3.5-turbo-0613",
    stream: true,
    messages: [{ role: "user", content: "Hello AI, I am a human." }],
  }),
});

The code above sets up a POST request to the ChatGPT API with the stream property set to true. This allows the response to be streamed back.

Receive the data as a stream

Let's process this data as it arrives.

const reader = response?.body?.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const stringDecodedData = new TextDecoder("utf-8").decode(value);
}

For a deeper dive into how to stream server-sent events via native fetch, you can check my previous article on SSE via native fetch:

Processing the Streamed Data

Now, for the fun part:

while (true) {
  // ... stream data via getReader, as shown above
  const dataObjects = stringDecodedData.split("data: ").slice(1);

  for (const dataString of dataObjects) {
    if (dataString.trim() === "[DONE]") break;

    const json = JSON.parse(dataString);
    const responsePart = json.choices[0];

    if (responsePart.finish_reason) {
      process.stdout.write("\n");
    } else {
      process.stdout.write(responsePart.delta.content);
    }
  }
}

In this section, the incoming data is split by "data: " to process the streamed chunks of data. Each chunk is then processed, and if the message isn't done, it's printed out to the terminal, giving the user real-time feedback.

Certainly! Let's condense and streamline the "Processing Each Chunk" section:

Understanding the "data: " Split

Server-Sent Events (SSE) use a specific format to send messages. In this format, each message from the server begins with the keyword data:. The content that follows this keyword is the actual message data. So when we receive streamed content, it might look something like this:

data: { "some": "message" }
data: { "another": "message" }

Given this structure, splitting by data: is a convenient way to break up the incoming stream into individual messages or chunks of data that the server is sending.

Since the first chunk of data will always be empty, we can safely ignore it using .slice(1). The rest of the chunks will be JSON strings, so we can parse them into JSON objects.

Wrapping Up

Here is the full code to stream data from the ChatGPT API to the terminal in real-time:

const message = process.argv[2];

const chatGptApiUrl = "https://api.openai.com/v1/chat/completions";
const response = await fetch(chatGptApiUrl, {
  method: "POST",
  headers: {
    Authorization: "Bearer <YOUR_API_KEY>",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gpt-4-0613",
    stream: true,
    messages: [{ role: "user", content: message }],
  }),
});
const reader = response?.body?.getReader();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const str = new TextDecoder("utf-8").decode(value);
  const dataObjects = str.split("data: ").slice(1);

  for (const dataString of dataObjects) {
    if (dataString.trim() === "[DONE]") break;

    const json = JSON.parse(dataString);
    const responsePart = json.choices[0];

    if (responsePart.finish_reason) {
      process.stdout.write("\n");
    } else {
      process.stdout.write(responsePart.delta.content);
    }
  }
}

Save this file as ask-chatgpt.js, replace <YOUR_API_KEY> with your actual API key, and run it with the following command:

node ask-chatgpt.js "Hello AI, I am a human."

Since we use nodejs native fetch(), make sure you're using nodejs version 18.14 or above.

Bonus: Async Generators for ChatGPT Streaming

Here's a function for real-time streaming from the ChatGPT API to the terminal:

import { streamChatgptApi } from "./streamChatgptApi.js";

const message = "Hello AI, I am a human.";
for await (const responsePart of streamChatgptApi(message)) {
  if (responsePart.finish_reason) {
    console.log("Reason:", responsePart.finish_reason);
  } else {
    process.stdout.write(responsePart.delta.content);
  }
}

Full code: GitHub gist.