LangChain之Output parsers

2023-11-17

LangChain之Output parsers

Output parsers将LLM输出的文本，转换为structured data

CommaSeparatedListOutputParser

解析结果为List，提示词如下：

def get_format_instructions(self) -> str:
    return (
        "Your response should be a list of comma separated values, "
        "eg: `foo, bar, baz`"
    )

解析方法如下：

def parse(self, text: str) -> List[str]:
    """Parse the output of an LLM call."""
    return text.strip().split(", ")

DatetimeOutputParser

解析结果为日期时间

output_parser = DatetimeOutputParser()
print(output_parser.get_format_instructions())

上面代码输出的提示词

Write a datetime string that matches the 
            following pattern: "%Y-%m-%dT%H:%M:%S.%fZ". Examples: 477-06-08T11:31:35.756750Z, 245-12-26T14:36:39.117625Z, 711-05-08T07:41:23.815247Z

EnumOutputParser

解析结果为枚举类型，且枚举类型只支持str

提示词如下：

def get_format_instructions(self) -> str:
    return f"Select one of the following options: {', '.join(self._valid_values)}"

示例代码：

class Colors(Enum):
    RED = "red"
    GREEN = "green"
    BLUE = "blue"

parser = EnumOutputParser(enum=Colors)
print(parser.get_format_instructions())

输出结果：

Select one of the following options: red, green, blue

PydanticOutputParser

解析结果为Json结构

如何使用：

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")

# 提问问题
actor_query = "Generate the filmography for a random actor."
# 构造一个解析器，输出的json结构按照Actor定义
parser = PydanticOutputParser(pydantic_object=Actor)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

_input = prompt.format_prompt(query=actor_query)

# 输出prompt最终格式化结果
print(_input.to_string())

输出结果

Answer the user query.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"description": "name of an actor", "title": "Name", "type": "string"}, "film_names": {"description": "list of names of films they starred in", "items": {"type": "string"}, "title": "Film Names", "type": "array"}}, "required": ["name", "film_names"]}
```
Generate the filmography for a random actor.

其中除了第一行和最后一行内容，中间部分为parser.get_format_instructions()执行结果

PydanticOutputParser使用的提示词模板如下，只需要将最后 {schema} 根据给定的结构体填充即可

PYDANTIC_FORMAT_INSTRUCTIONS = """The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {{"properties": {{"foo": {{"title": "Foo", "description": "a list of strings", "type": "array", "items": {{"type": "string"}}}}}}, "required": ["foo"]}}
the object {{"foo": ["bar", "baz"]}} is a well-formatted instance of the schema. The object {{"properties": {{"foo": ["bar", "baz"]}}}} is not well-formatted.

Here is the output schema:
```
{schema}
```

OutputFixingParser

上面解析json的例子中如果LLM输出错误，需要重新通过LLM修正错误，具体示例如下

class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")
        
actor_query = "Generate the filmography for a random actor."

parser = PydanticOutputParser(pydantic_object=Actor)
# 错误的地方在于，这里json字符串内部应该使用双引号，而不是单引号。。
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"

parser.parse(misformatted)

# 这里输出会报错

使用OutputFixingParser修正错误

import os
from dotenv import load_dotenv

# 加载.env文件中的环境变量，包括OPENAI_API_KEY等
load_dotenv()

# 通过OutputFixingParser纠正json格式错误
from langchain.output_parsers import OutputFixingParser
from langchain.chat_models import ChatOpenAI
new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())

new_parser.parse(misformatted)
# 最后能够正确解析为Json格式

# 输出如下
# Actor(name='Tom Hanks', film_names=['Forrest Gump'])

OutputFixingParser实现原理：

只使用PydanticOutputParser时，逻辑如下

构建prompts
调用LLM获取输出结果
将输出结果解析为对应类

使用OutputFixingParser后，逻辑变为

构建prompts
调用LLM获取输出结果
将输出结果解析为对应类，但是解析失败
构建修正的prompts + 解析错误时的错误信息，重新调用LLM，获取结果
将输出结果解析为对应类

上面加粗部分为新增的处理逻辑

其中第4步中OutputFixingParser构建的prompts如下

NAIVE_FIX = """Instructions:
--------------
{instructions}
--------------
Completion:
--------------
{completion}
--------------

Above, the Completion did not satisfy the constraints given in the Instructions.
Error:
--------------
{error}
--------------

Please try again. Please only respond with an answer that satisfies the constraints laid out in the Instructions:"""

NAIVE_FIX_PROMPT = PromptTemplate.from_template(NAIVE_FIX)

解析代码

def parse(self, completion: str) -> T:
    try:
        parsed_completion = self.parser.parse(completion)
    except OutputParserException as e:
        new_completion = self.retry_chain.run(
            instructions=self.parser.get_format_instructions(),
            completion=completion,
            error=repr(e),
        )
        parsed_completion = self.parser.parse(new_completion)

    return parsed_completion

使用LLMChain

转载请注明出处

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

langchain

gpt

python

prompt