在 API 中引入结构化输出-20240806

OpenAI 在其 API 中引入了“结构化输出”功能，使模型输出符合开发者提供的 JSON Schema。这一功能提高了输出的可靠性，允许模型生成符合特定格式的输出，特别适用于将非结构化输入转换为结构化数据。它支持函数调用和新的 response_format 参数，提供 Python 和 Node 的原生 SDK 支持，使开发者更容易实现结构化输出。这一更新提升了 AI 在需要严格数据格式的应用中的精度和实用性。

<aside> 💡 官网原文地址：https://openai.com/index/introducing-structured-outputs-in-the-api/

</aside>

20240807原文存档

我们在 API 中引入了结构化输出——模型输出现在可靠地遵循开发者提供的 JSON 模式。

去年在 DevDay 上，我们推出了 JSON 模式——这是一个有用的构建模块，适合那些希望利用我们的模型构建可靠应用程序的开发者。虽然 JSON 模式提高了模型生成有效 JSON 输出的可靠性，但它并不能保证模型的响应将符合特定的模式。今天，我们在 API 中推出了结构化输出，这是一项新功能，旨在确保模型生成的输出将完全匹配开发者提供的 JSON 模式。

从非结构化输入生成结构化数据是当今应用中人工智能的核心用例之一。开发者使用 OpenAI API 构建强大的助手，这些助手能够通过函数调用(在新窗口中打开) 来获取数据和回答问题，提取用于数据录入的结构化数据，并构建多步骤的代理工作流程，使 LLMs 能够采取行动。开发者们长期以来一直通过开源工具、提示和反复重试请求来解决 LLMs 在该领域的局限性，以确保模型输出符合与其系统互操作所需的格式。结构化输出通过限制 OpenAI 模型以契合开发者提供的架构，并通过训练我们的模型更好地理解复杂架构来解决这个问题。

在我们对复杂 JSON 架构的评估中，我们的新模型 gpt-4o-2024-08-06 具有结构化输出的得分达到完美的 100%。相比之下，gpt-4-0613 的得分不到 40%。

通过结构化输出，gpt-4o-2024-08-06在我们的评估中实现了 100%的可靠性，完美匹配输出模式。

如何使用结构化输出

我们将在 API 中以两种形式引入结构化输出：

1. 函数调用: 通过 工具 可用，方法是在您的函数定义中设置 strict: true。此功能适用于所有支持工具的模型，包括所有模型 gpt-4-0613 和 gpt-3.5-turbo-0613 及更高版本。当启用结构化输出时，模型输出将与提供的工具定义匹配。

请求输出 JSON 请求输出 JSON

JSON

1POST /v1/chat/completions
2{
3  "model": "gpt-4o-2024-08-06",
4  "messages": [
5    {
6      "role": "system",
7      "content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function."
8    },
9    {
10      "role": "user",
11      "content": "look up all my orders in may of last year that were fulfilled but not delivered on time"
12    }
13  ],
14  "tools": [
15    {
16      "type": "function",
17      "function": {
18        "name": "query",
19        "description": "Execute a query.",
20        "strict": true,
21        "parameters": {
22          "type": "object",
23          "properties": {
24            "table_name": {
25              "type": "string",
26              "enum": ["orders"]
27            },
28            "columns": {
29              "type": "array",
30              "items": {
31                "type": "string",
32                "enum": [
33                  "id",
34                  "status",
35                  "expected_delivery_date",
36                  "delivered_at",
37                  "shipped_at",
38                  "ordered_at",
39                  "canceled_at"
40                ]
41              }
42            },
43            "conditions": {
44              "type": "array",
45              "items": {
46                "type": "object",
47                "properties": {
48                  "column": {
49                    "type": "string"
50                  },
51                  "operator": {
52                    "type": "string",
53                    "enum": ["=", ">", "<", ">=", "<=", "!="]
54                  },
55                  "value": {
56                    "anyOf": [
57                      {
58                        "type": "string"
59                      },
60                      {
61                        "type": "number"
62                      },
63                      {
64                        "type": "object",
65                        "properties": {
66                          "column_name": {
67                            "type": "string"
68                          }
69                        },
70                        "required": ["column_name"],
71                        "additionalProperties": false
72                      }
73                    ]
74                  }
75                },
76                "required": ["column", "operator", "value"],
77                "additionalProperties": false
78              }
79            },
80            "order_by": {
81              "type": "string",
82              "enum": ["asc", "desc"]
83            }
84          },
85          "required": ["table_name", "columns", "conditions", "order_by"],
86          "additionalProperties": false
87        }
88      }
89    }
90  ]
91}

输出JSON

1{
2  "table_name": "orders",
3  "columns": ["id", "status", "expected_delivery_date", "delivered_at"],
4  "conditions": [
5    {
6      "column": "status",
7      "operator": "=",
8      "value": "fulfilled"
9    },
10    {
11      "column": "ordered_at",
12      "operator": ">=",
13      "value": "2023-05-01"
14    },
15    {
16      "column": "ordered_at",
17      "operator": "<",
18      "value": "2023-06-01"
19    },
20    {
21      "column": "delivered_at",
22      "operator": ">",
23      "value": {
24        "column_name": "expected_delivery_date"
25      }
26    }
27  ],
28  "order_by": "asc"
29}

2. 一个新的选项用于 response_format 参数： 开发者现在可以通过 json_schema 提供 JSON Schema，这是 response_format 参数的新选项。当模型不是在调用工具，而是以结构化的方式响应用户时，這是非常有用的。此功能适用于我们最新的 GPT-4o 模型： gpt-4o-2024-08-06，今天发布，以及 gpt-4o-mini-2024-07-18。当使用 response_format 时提供 strict: true，模型输出将与提供的 schema 匹配。

请求输出 JSON 请求输出 JSON