Introduction
Large Language Models have demonstrated great utility in diverse tasks, from coding assistance and content summarization to answering everyday questions. As our understanding of their strengths and limitations grows, numerous innovative methods for improvement and expanding their range of tasks have emerged in recent months.
This lesson will delve into some of these advancements. We will learn about the ReAct framework, a prompt-based paradigm designed to synergize reasoning and acting in language models for general task solving.
Additionally, this module will cover the latest upgrades to ChatGPT, including plugins integration. We will also explore new enhancements available through the OpenAI API, such as function calling. This feature enables LLMs to produce structured outputs, further augmenting their reliability and utility in many applications.
Overview of ReAct
This framework aims to enhance the utility of language models by using them to create autonomous agent systems, which are systems that can operate and make decisions independently. A way to accomplish that is to make these models reason about an input question and context, create an action plan, and execute it.
In the ReAct framework, language models are prompted to generate verbal reasoning traces, which are detailed records of the model's thought process, and actions interleaved when accomplishing a task. The process is done iteratively until the answer to a question is found.
The verbal reasoning step in ReAct allows the model to dynamically create, maintain, and adjust high-level plans for acting, which refers to the execution of specific tasks or actions. When acting, the model can also interact with external environments (like Wikipedia) to incorporate additional information into reasoning.
As a side note, allowing these models to access sources of information like Wikipedia can also reduce the number of hallucinations and biases.
To better understand the framework, let’s look at the example below from the paper.
The figure above provides a comparative analysis of different methods a language model is utilized to accomplish a specific task: identifying a device that can control the same program the Apple Remote was first designed to interact with.
- (1a) The model is asked to answer the question directly.
- (1b) Uses Chain-of-Thought prompting, which asks the LLM to reason about the question before answering.
- (1c) An act-only process where the LLM is not prompted to reason about the question.
- (1d) Using the ReAct framework, the LLM is prompted to reason about the question and perform a specific action to find the answer.
The authors of ReAct note that the framework recipe accomplishes various tasks. The main steps include:
- Thought Step: The LLM is prompted to think critically about the task. Given the question, it evaluates which actions might lead to finding the answer.
- Action Steps: In this phase, the LLM interacts with an external environment. It can utilize external APIs to acquire necessary information if needed.
- Observation Step: After taking action, the LLM receives a result from the external environment. These observations are crucial for the LLM to determine the effectiveness of the action and plan the next steps.
- Next Thought Step: Equipped with the information from the action and observation, the LLM reevaluates the situation. This evaluation allows the model to consider and decide on the subsequent action.
This sequential process continues until the LLM successfully finds the answer.
ReAct framework in code
Code implementations of the ReAct framework are available for those interested in creating autonomous agents. For a practical demonstration, consider looking at the author's implementation. This link directs you to a notebook showcasing an example of utilizing text-davinci-002
to create an agent that answers questions using Wikipedia as a source of information.
There is also a LangChain implementation of ReAct, enabling you to create a capable agent in less time as they have many available agent tools.
OpenAI Function calling
OpenAI has recently introduced a function calling feature for their language models through their API.
In an API call, you can describe functions to gpt-3.5-turbo-0613
and gpt-4-0613
, and have the model output a JSON object containing arguments to call those functions. Be aware that the Chat completions API does not call the function; instead, the model generates a JSON object that you can use to call the function in your code.
To include this feature, OpenAI fine-tuned the models gpt-3.5-turbo-0613
and gpt-4-0613
to detect when a function should be called (depending on the user input) and to respond with a JSON object that adheres to the function signature. It’s not disclosed how they implement this, but they may be using a form of prompt engineering similar to ReAct for this.
With this feature, you can more easily create:
- Chatbots that answer questions by calling external tools, such as sending an email.
- Convert natural language into API calls or database queries so the model can answer questions such as “Who are my top ten customers this month?”
- Extract structured data from text, such as extracting the names of all the locations mentioned in a Wikipedia article.
Check out this example in the OpenAI documentation to learn how to set up function calling.
If your application makes use of LangChain, there is also a way to use function calling; take a look at the documentation here.
ChatGPT Plugins
If you are subscribed to ChatGPT Plus, you can easily augment LLMs with tools for personal or professional use without using the Open AI API. They made available the use of third-party Plugins through their chat interface. It’s not disclosed how OpenAI implement this, but they may be using a form of prompt engineering similar to ReAct to abilitate the plugins.
These plugins or tools can give their language models access to more recent, personal, or specific information. Here are some of the most popular use cases plugins can help with.
With these third-party plugins, you can directly upload your documents, such as PDFs, and ask questions about the information in those documents. You can provide a link to a GitHub repository and ask questions or let the language model explain the code to you. There are also plugins to create diagrams, flow charts, or graphs.
Conclusion
In this module, we explore various ways to enhance the capabilities of current language models. These new methods also reduce the risks of hallucinations. We learn about the ReAct framework, a tool that empowers language models to act independently. We discuss the diverse features available via OpenAI services, including function calling and third-party plugins. Function calling assists developers by allowing the use of custom functions during user interaction with the model. Concurrently, plugins available through the chat interface enable users to utilize enhanced OpenAI language models without coding.