Managing Outputs with Output Parsers

Introduction

While the language models can only generate textual outputs, a predictable data structure is always preferred in a production environment. For example, imagine you are creating a thesaurus application and want to generate a list of possible substitute words based on the context. The LLMs are powerful enough to generate many suggestions easily. Here is a sample output from the ChatGPT for several words with close meaning to the term “behavior.”

Here are some substitute words for "behavior":

Conduct
Manner
Demeanor
Attitude
Disposition
Deportment
Etiquette
Protocol
Performance
Actions

The problem is the lack of a method to extract relevant information from the mentioned string dynamically. You might say we can split the response by a new line and ignore the first two lines. However, there is no guarantee that the response have the same format every time. The list might be numbered, or there could be no introduction line.

The Output Parsers help create a data structure to define the expectations from the output precisely. We can ask for a list of words in case of the word suggestion application or a combination of different variables like a word and the explanation of why it fits. The parser can extract the expected information for you.

This lesson covers the different types of parsing objects and the troubleshooting processing.

1. Output Parsers

There are three classes that we will introduce in this section. While the Pydrantic parser is the most powerful and flexible wrapper, knowing the other options for less complicated problems is beneficial. We will implement the thesaurus application in each section to better understand the details of each approach.

1-1. PydanticOutputParser

This class instructs the model to generate its output in a JSON format and then extract the information from the response. You will be able to treat the parser’s output as a list, meaning it will be possible to index through the results without worrying about formatting.

💡
It is important to note that not all models have the same capability in terms of generating JSON outputs. So, it would be best to use a more powerful model (like OpenAI’s DaVinci instead of Curie) to get the most satisfactory result.

This class uses the Pydantic library, which helps define and validate data structures in Python. It enables us to characterize the expected output with a name, type, and description. We need a variable that can store multiple suggestions in the thesaurus example. It can be easily done by defining a class that inherits from the Pydantic’s BaseModel class. Remember to install the required packages with the following command:

pip install -qU langchain-openai
pip install -qU langchain-community
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitue words based on context")

    # Throw error in case of receiving a numbered-list from API
    @validator('words')
    def not_start_with_number(cls, field):
        for item in field:
            if item[0].isnumeric():
                raise ValueError("The word can not start with numbers!")
        return field

parser = PydanticOutputParser(pydantic_object=Suggestions)

We always import and follow the necessary libraries by creating the Suggestions schema class. There are two essential parts to this class:

  1. Expected Outputs: Each output is defined by declaring a variable with desired type, like a list of strings (: List[str]) in the sample code, or it could be a single string (: str) if you are expecting just one word/sentence as the response. Also, It is required to write a simple explanation using the Field function’s description attribute to help the model during inference. (We will see an example of having multiple outputs later in the lesson)
  2. Validators: It is possible to declare functions to validate the formatting. We ensure that the first character is not a number in the sample code. The function’s name is unimportant, but the @validator decorator must receive the same name as the variable you want to approve. (like @validator(’words’)) It is worth noting that the field variable inside the validator function will be a list if you specify it as one.

We will pass the created class to the PydanticOutputParser wrapper to make it a LangChain parser object. The next step is to prepare the prompt.

from langchain_core.prompts.prompt import PromptTemplate

template = """
Offer a list of suggestions to substitue the specified target_word based the presented context.
{format_instructions}
target_word={target_word}
context={context}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["target_word", "context"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

model_input = prompt.format_prompt(
			target_word="behaviour",
			context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."
)

As discussed in previous lessons, the template variable is a string that can have named index placeholders using the following {variable_name} format. The template outlines our expectations for the model, including the expected formatting from the parser and the inputs. The PromptTemplate receives the template string with the details of each placeholder’s type. They could either be 1) input_variables whose value is initialized later on using the .format_prompt() function, or 2) partial_variables to be initialized instantly.

The prompt can send the query to models like GPT using LangChain’s OpenAI wrapper. (Remember to set the OPENAI_API_KEY environment variables with your API key from OpenAI) We are using the Davinci model, one of the more powerful options to get the best results, and set the temperature value to 0, making the results reproducible.

💡
The temperature value could be anything between 0 to 1, where a higher number means the model is more creative. Using larger value in production is a good practice if you deal with tasks requiring creative output.
from langchain_openai import ChatOpenAI

# Before executing the following code, make sure to have
# your OpenAI key saved in the “OPENAI_API_KEY” environment variable.
model = ChatOpenAI(model_name='gpt-4o-mini', temperature=0.0)

output = model(model_input.to_string())

parser.parse(output)
The sample code.
Suggestions(words=['conduct', 'manner', 'action', 'demeanor', 'attitude', 'activity'])
The output.

The parser object’s parse() function will convert the model’s string response to the format we specified. There is a list of words that you can index through and use in your applications.

Multiple Outputs Example

Here is a sample code for Pydantic class to process multiple outputs. It requests the model to suggest a list of words and present the reasoning behind each proposition.

Replace the template variable and Suggestion class with the following codes to run this example. The template changes will ask the model to present its reasoning, and the suggestion class declares a new output named reasons. Also, the validator function manipulates the output to ensure every reasoning ends with a dot. Another use case of the validator function could be output manipulation.

template = """
Offer a list of suggestions to substitute the specified target_word based on the presented context and the reasoning for each word.
{format_instructions}
target_word={target_word}
context={context}
"""
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitue words based on context")
    reasons: List[str] = Field(description="the reasoning of why this word fits the context")
    
    @validator('words')
    def not_start_with_number(cls, field):
      for item in field:
        if item[0].isnumeric():
          raise ValueError("The word can not start with numbers!")
      return field
    
    @validator('reasons')
    def end_with_dot(cls, field):
      for idx, item in enumerate( field ):
        if item[-1] != ".":
          field[idx] += "."
      return field
The multiple outputs sample code.
Suggestions(words=['conduct', 'manner', 'demeanor', 'comportment'], reasons=['refers to the way someone acts in a particular situation.', 'refers to the way someone behaves in a particular situation.', 'refers to the way someone behaves in a particular situation.', 'refers to the way someone behaves in a particular situation.'])
The output.

1-2. CommaSeparatedOutputParser

It is evident from the name of this class that it manages comma-separated outputs. It handles one specific case: anytime you want to receive a list of outputs from the model. Let’s start by importing the necessary module.

from langchain.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()

The parser does not require a setting up step. Therefore it is less flexible. We can create the object by calling the class. The rest of the process for writing the prompt, initializing the model, and parsing the output is as follows.

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

# Prepare the Prompt
template = """
Offer a list of suggestions to substitute the word '{target_word}' based the presented the following text: {context}.
{format_instructions}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["target_word", "context"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

model_input = prompt.format(
  target_word="behaviour",
  context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."
)

# Loading OpenAI API
model = OpenAI(model_name='gpt-3.5-turbo-instruct', temperature=0.0)

# Send the Request
output = model(model_input)
parser.parse(output)
The sample code.
['Conduct',
 'Actions',
 'Demeanor',
 'Mannerisms',
 'Attitude',
 'Performance',
 'Reactions',
 'Interactions',
 'Habits',
 'Repertoire',
 'Disposition',
 'Bearing',
 'Posture',
 'Deportment',
 'Comportment']
The output.

Although most of the sample code has been explained in the previous subsection, two parts might need attention. Firstly, we tried a new format for the prompt’s template to show different ways to write a prompt. Secondly, the use of .format() instead of .format_prompt() to generate the model’s input. The main difference compared to the previous subsection’s code is that we no longer need to call the .to_string() object since the prompt is already in string type.

As you can see, the final output is a list of words that has some overlaps with the PydanticOutputParser approach with more variety. However, requesting additional reasoning information using the CommaSeparatedOutputParser class is impossible.

1-3. StructuredOutputParser

This is the first output parser implemented by the LangChain team. While it can process multiple outputs, it only supports texts and does not provide options for other data types, such as lists or integers. It can be used when you want to receive one response from the model. For example, only one substitute word in the thesaurus application.

from langchain.output_parsers import StructuredOutputParser, ResponseSchema

response_schemas = [
    ResponseSchema(name="words", description="A substitue word based on context"),
    ResponseSchema(name="reasons", description="the reasoning of why this word fits the context.")
]

parser = StructuredOutputParser.from_response_schemas(response_schemas)

The above code demonstrates how to define a schema. However, we are not going to go into details. This class has no advantage since the PydanticOutputParser class provides validation and more flexibility for more complex tasks, and the CommaSeparatedOutputParser option covers more straightforward applications.

2. Fixing Errors

The parsers are powerful tools to dynamically extract the information from the prompt and validate it to some extent. Still, they do not guarantee a response. Imagine a situation where you deployed your application, and the model’s response [to a user’s request] is incomplete, causing the parser to throw an error. It is not ideal! In the following subsections, we will introduce two classes acting as fail-safe. They add a layer on top of the model’s response to help fix the errors.

💡
The following approaches work with the PydanticOutputParser class since it is the only one with a validation method.

2-1. OutputFixingParser

This method tries to fix the parsing error by looking at the model’s response and the previous parser. It uses a Large Language Model (LLM) to solve the issue. We will use GPT-3 to be consistent with the rest of the lesson, but it is possible to pass any supported model. Let’s start by defining the Pydantic data schema and show a sample error that could occur.

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitue words based on context")
    reasons: List[str] = Field(description="the reasoning of why this word fits the context")

parser = PydanticOutputParser(pydantic_object=Suggestions)

missformatted_output = '{"words": ["conduct", "manner"], "reasoning": ["refers to the way someone acts in a particular situation.", "refers to the way someone behaves in a particular situation."]}'

parser.parse(missformatted_output)
Sample code.
The output.
The output.

As you can see in the error message, the parser correctly identified an error in our sample response (missformatted_output) since we used the word reasoning instead of the expected reasons key. The OutputFixingParser class could easily fix this error.

from langchain.llms import OpenAI
from langchain.output_parsers import OutputFixingParser

model = OpenAI(model_name='gpt-3.5-turbo-instruct', temperature=0.0)

outputfixing_parser = OutputFixingParser.from_llm(parser=parser, llm=model)
outputfixing_parser.parse(missformatted_output)
Sample code.
Suggestions(words=['conduct', 'manner'], reasons=['refers to the way someone acts in a particular situation.', 'refers to the way someone behaves in a particular situation.'])
The output.

The from_llm() function takes the old parser and a language model as input parameters. Then, It initializes a new parser for you that has the ability to fix output errors. In this case, it successfully identified the misnamed key and changed it to what we defined.

However, fixing the issues using this class is not always possible. Here is an example of using OutputFixingParser class to resolve an error with a missing key.

missformatted_output = '{"words": ["conduct", "manner"]}'

outputfixing_parser = OutputFixingParser.from_llm(parser=parser, llm=model)

outputfixing_parser.parse(missformatted_output)
Sample code.
Suggestions(words=['conduct', 'manner'], reasons=["The word 'conduct' implies a certain behavior or action, while 'manner' implies a polite or respectful way of behaving."])
The output.

Looking at the output, it is evident that the model understood the key reasons missing from the response but didn’t have the context of the desired outcome. It created a list with one entry, while we expect one reason per word. This is why we sometimes need to use the RetryOutputParser class.

2-2. RetryOutputParser

In some cases, the parser needs access to both the output and the prompt to process the full context, as demonstrated in the previous section. We first need to define the mentioned variables. The following codes initialize the LLM model, parser, and prompt, which were explained in more detail earlier.

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List

# Define data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitue words based on context")
    reasons: List[str] = Field(description="the reasoning of why this word fits the context")

parser = PydanticOutputParser(pydantic_object=Suggestions)

# Define prompt
template = """
Offer a list of suggestions to substitue the specified target_word based the presented context and the reasoning for each word.
{format_instructions}
target_word={target_word}
context={context}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["target_word", "context"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

model_input = prompt.format_prompt(target_word="behaviour", context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson.")

# Define Model
model = OpenAI(model_name='gpt-3.5-turbo-instruct', temperature=0.0)

Now, we can fix the same missformatted_output using the RetryWithErrorOutputParser class. It receives the old parser and a model to declare the new parser object, as we saw in the previous section. However, the parse_with_prompt function is responsible for fixing the parsing issue while requiring the output and the prompt.

from langchain.output_parsers import RetryWithErrorOutputParser

missformatted_output = '{"words": ["conduct", "manner"]}'

retry_parser = RetryWithErrorOutputParser.from_llm(parser=parser, llm=model)

retry_parser.parse_with_prompt(missformatted_output, model_input)
Sample code.
Suggestions(words=['conduct', 'manner'], reasons=["The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson, so 'conduct' is a suitable substitute.", "The students' behaviour was inappropriate, so 'manner' is a suitable substitute."])
Sample output.

The outputs show that the RetryOuputParser has the ability to fix the issue where the OuputFixingParser was not able to. The parser correctly guided the model to generate one reason for each word.

The best practice to incorporate these techniques in production is to catch the parsing error using a try: ... except: ... method. It means we can capture the errors in the except section and attempt to fix them using the mentioned classes. It will limit the number of API calls and avoid unnecessary costs that are associated with it.

Conclusion

We learned how to validate and extract the information in an easy-to-use format from the language models’ responses which are always a string. Additionally, we reviewed LangChain’s fail-safe procedures to guarantee the consistency of the output. Combining these approaches will help us write more reliable applications in production environments. In the next lesson, we will learn how to build a knowledge graph and capture useful information or entities from texts.

In the next lesson we’ll modify the news summarizer we built in the previous module by improving how we manage the prompts.

You can find the code of this lesson in this online notebook.