Blog
/
Education

Beyond Functions: Seamlessly build AI enhanced APIs with OpenAI

cover
Jens Neuse

Jens Neuse

min read

Cosmo: Full Lifecycle GraphQL API Management

Are you looking for an Open Source Graph Manager? Cosmo is the most complete solution including Schema Registry, Router, Studio, Metrics, Analytics, Distributed Tracing, Breaking Change detection and more.

Today we're announcing the WunderGraph OpenAI integration / Agent SDK to simplifly the creation of AI enhanced APIs and AI Agents for Systems Integration on Autopilot. On a high level this integration enables two things:

  1. Build AI enhanced APIs with OpenAI that return structured data (JSON) instead of plain text
  2. Build AI Agents that can perform complex tasks leveraging your existing REST, GraphQL and SOAP APIs, as well as your databases and other systems

Examples

Before we dive deep into the problem and technical details, let's have a look at two examples.

Example 1: AI Agent creation with OpenAI

Here's a simple example that shows how we can use OpenAI to create an Agent that can call multiple APIs and return structured data (JSON) conforming to our defined API schema.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

Example 2: OpenAI enhanced API

How about extracting meta data from a website and exposing the functionality as a JSON API? Sounds simple enough, right?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

The second example is a bit more complex, but it shows how you can describe more complex tasks with a prompt and have the AI Agent execute it for you. Additionally, we're passing an Operation as a function to the Agent, which is another Agent under the hood, meaning that this API is actually composed of multiple Agents.

With these two examples, you should get a good idea of what's possible with the WunderGraph OpenAI integration. Let's now rewind a bit and talk about the problems we're trying to solve here.

The Problem: Building AI enhanced APIs and Agents is challenging

When trying to build AI enhanced APIs and Agents, you'll quickly realize that there are a couple of challenges that you need to overcome. Let's quickly define what we mean by AI enhanced APIs and Agents and then talk about the challenges.

What are AI enhanced APIs?

An AI enhanced API is an API that accepts an input in a predefined format and returns structured data (e.g. JSON), allowing it to be described using a schema (e.g. OpenAPI, GraphQL, etc.). Tools like ChatGPT are fun to play with, but they're not very usefuly when you want to build APIs that can be consumed by other systems. So, the bare minimum for an AI enhanced API is that we can describe it using a schema, in our case we're using JSON Schema which plays nicely with OpenAPI and OpenAI as you'll see later.

What are AI Agents?

An AI Agent is a dialog between a large language model (e.g. GPT-3) and a computer program (e.g. a WunderGraph Operation) that is capable of performing a task. The dialog is initiated by a prompt (e.g. a question or a task description). We can provide additional functionality to the Agent by passing functions to it which we have to describe using a schema as well. Once the dialog is initiated, the Agent can come back to us, asking to execute one of the functions we've provided. It will provide the input to call the function, which will follow the schema we've defined. We execute the function and add the result to the dialog and the Agent will continue performing the task until it's done. Once the Agent is done, it will return the result to us, ideally in a format that we can describe using a schema.

Challenges

1. LLMs don't usually return structured data, but plain text

If you've used ChatGPT before, you'll know that it's fun to play with if a powerful enough "Agent" sits in front of it, like a human (you). But what if you want to build an API that can be consumed by other systems? How are services supposed to consume plain text without any structure?

2. Prompt Injection: We cannot trust user input

When building an API, we usually have to deal with user input. We can ask the user to provide a country name as the input to our API, but what if the user provides a prompt instead of a country name that is designed to trick the AI? This is called prompt injection and it's a real problem when building AI enhanced APIs.

3. Pagination & Batching: LLMs can only process a limited amount of tokens at once

LLMs are powerful, but they're not infinitely powerful. They can only process a limited amount of tokens at once. This means that we have to paginate the input, process it in chunks, and then merge the results back together, all in a structured way so that we can parse the result later.

4. Composing Agents: We need to be able to compose Agents

You will usually start building lower level Agents that perform a specific task, like loading the content of a website or summarizing the content of a website. Once you have these Agents, you want to be able to compose them to build more powerful higher-level Agents. How can we make it easy to compose AI Agents?

5. LLMs like OpenAI cannot call external APIs and Databases directly

OpenAI is allows you to describe functions that can be called by the Agent. The challenge is that you have to describe the functions using plain JSON Schema. This means that you cannot directly call REST, GraphQL or SOAP APIs, or even databases. You have to describe the function using JSON Schema and then implement a mechanism that calls APIs and databases on behalf of the Agent.

LLMs can generate GraphQL Operations or even SQL statements, but keep in mind that these need to be validated and sanitized before they can be executed.

In addition, requiring an LLM to manually generate GraphQL Operations, REST API calls or SQL statements comes with another problem: You have to describe the GraphQL Schema, REST API or the database schema, and all of these input will count towards the token limit of the LLM. This means that if you provide a GraphQL Schema with 16k tokens to a 16k-limited LLM, there's no space left for the actual prompt. Wouldn't it be nice if we could describe just a few "Operations" that are useful to a specific Agent?

Yes, absolutely! But then there's another problem: How can we describe Operations in a unified way that is compatible with OpenAI but works across different APIs like REST, SOAP, GraphQL and databases?

The Solution: The WunderGraph OpenAI Integration / Agent SDK

Let's now talk about the solution to these problems using the WunderGraph OpenAI integration.

If you're not yet familiar with WunderGraph, it's an Open Source API Integration / BFF (Backend for Frontend) / Programmable API Gateway toolkit. At the core of WunderGraph is the concept of "API Dependency Management / API Composition".

WunderGraph allows you to describe a set of heterogeneous APIs (REST, GraphQL, SOAP, Databases, etc.) using a single schema. From this description, WunderGraph will generate a unified API that you can define "Operations" for.

Operations are the core building blocks of exposing functionality on top of your APIs. An Operation is essentially a function that can be called by a client. Both the input and the output of an Operation are describe using JSON Schema. All Operations exposed by a WunderGraph Application are described using an OpenAPI Specification (OAS) document or a Postman Collection, so it's easy to consume them from any programming language.

Having the "Operations" abstraction on top of your API Dependency Graph allowed us to keep the Agent as simple as it is. All you need to do is add your API dependencies, define a couple of Operations that are useful to your Agent, and pass them along with a prompt to the Agent.

It doesn't matter if you're using REST, GraphQL, SOAP, a Database or just another TypeScript function as an Operation, they all look the same to the Agent, they all follow the same semantics.

Let's now talk about the challenges we've mentioned earlier and how the WunderGraph OpenAI integration solves them.

How the WunderGraph Agent SDK helps you to return structured data from OpenAI

By default, OpenAI will return plain text. So, when OpenAI is done processing our prompt, we'll get back a string of text. How can we turn this into structured data?

Let's recall the Agent definition from earlier:

1
2
3
4
5
6
7
8
9
10
11
12

We pass two functions to the Agent and define a schema that describes the output we expect from the Agent using the zod library. Internally, we will compile the schema to JSON Schema. Once the Agent is done, we'll create a new "dialog" asking the Agent to call our "out" function and pass the result to it. To describe the input we're expecting to receive from the Agent, we'll use the generated JSON Schema. This will prompot the Agent to call our "out" function and pass the result to it in a structured way that we can parse. We can the use the zod library to parse the result and raise an error if the result doesn't match the schema we've defined.

As WunderGraph Operations are using TypeScript, we can infer the TypeScript types from the zod schema description, which means that the result of "out" will be typed automatically. More importantly, we're also using the TypeScript compiler to infer the response type of Operations in general. So if you're returning out.structuredOutput from an Operation, another Operation can call our Operation in a type-safe way, or even use our Operation as a function for another Agent.

How the WunderGraph Agent SDK helps you to prevent prompt injection

Let's recall another example from earlier:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

If we would pass the user input directly to our Agent, we would be vulnerable to prompt injection. This means that a malicious user could pass a prompt that would cause the Agent to execute arbitrary code.

To prevent this, we're first running the user input through the openAI.parseUserInput function. This function parses the input into our desired schema and validates it. Furthermore, it will check for prompt injection attacks and throws an error if it detects one.

How the WunderGraph Agent SDK helps you to process large amounts of data

Let's say you'd like to summarize the content of a website. Websites can be of arbitrary length, so we cannot just pass the content of the website to the Agent because LLMs like GTP have a token limit. Instead, what we can do is to split the content into pages, process each page individually and then combine the results.

Here's an abbreviated example of how you can apply pagination to your Agent:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

In this case, we're dividing the website content into 3 pages, each page is 15kb in size. The Agent will process each page individually and then combine the results.

How the WunderGraph Agent SDK helps you to compose multiple Agents

If you recall the second example, we were passing a function named openai/summarize_url_content to our Agent. This Operation contains the logic to summarize the content of a website, using an Agent by itself.

In the prompt to our metadata extraction Agent, we ask it to summarize the content of the website, so our Agent will use the openai/summarize_url_content function to do so.

As you can wrap Agents in an Operation, you can easily compose multiple Agents together.

The recommended way to do so is to start creating low-level Agents that are capable of doing a single thing. You can then compose these low-level Agents into higher-level Agents that perform two or more tasks, and so on.

How the WunderGraph Agent SDK helps you to integrate OpenAI with your existing APIs like REST, GraphQL, SOAP or Databases

As explained earlier, WunderGraph Operations are an abstraction on top of your API Dependency Graph, allowing you to integrate any API into an AI Agent.

You can provide Operations in two way to the Agent, either by using a GraphQL Operation against your API Graph, or by creating a custom TypeScript Operation, which might contain custom business logic, call other APIs or even other Agents. Most importantly, we need a way to describe the input and functionality of an Operation to the LLM Agent.

All of this is abstracted away by the WunderGraph Agent SDK and works out of the box. All you need to do is add a description to your Operation and the Agent SDK will take care of the rest.

Here's an example using a GraphQL Operation:

1
2
3
4
5
6
7
8
9
10
11

The Agent SDK will automatically parse the GraphQL Operation and generate a JSON Schema for the input including the description.

Here's an example using a custom TypeScript Operation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Again, the Agent SDK will parse the TypeScript Operation as well and generate a JSON Schema from the zod schema, adding the description (Summarize the content of a URL) so that the LLM Agent understands what the Operation is doing.

Getting started with the WunderGraph Agent SDK

If you need more info on how to get started with WunderGraph and OpenAI, check out the OpenAI Integration Docs .

ps: make sure you're not leaking your API key in your GitHub repo!

Conclusion

In this article, we've learned how to use the WunderGraph Agent SDK to create AI Agents that can be used to integrate any API into your AI Agent. We've tackled some of the most common problems when building AI Agents, like prompt injection, pagination, and Agent composition.

If you like the work we're doing and want to support us, give us a star on GitHub .

I'd love to hear your thoughts on this topic, so feel free to reach out to me on Twitter or join our Discord server to chat about it.