基于Langchain的Agent实战，实现Agent对话

在这篇文章中，我们将以实战为导向，详细探讨如何实现一个Agent。我们选用的是LangChain框架，并将带领你从零到一地构建一个完整的Agent。然而，在深入了解如何利用LangChain实现Agent之前，我们需要先了解一些基本概念。

模型优化师

1074人浏览 · 2024-10-09 14:54:47

模型优化师 · 2024-10-09 14:54:47 发布

前言

概念

在LangChain系统中，Agent扮演着关键的角色。它是一个决策链，由语言模型和提示驱动，决定下一步的行动。Agent接收三种主要输入：可用调用的函数（Tools）、高级目标（User input）以及为实现目标先前执行的操作与工具输出对（intermediate_steps）。基于这些信息，Agent产生下一步的行动或者最终响应。

为了表达行动，我们使用AgentAction数据类，其中包括应调用的工具名称（tool）和该工具的输入（tool_input）。当任务完成时，Agent会生成AgentFinish数据类，包含一个需要返回给用户的字典（return_values）。

LangChain提供了许多内置的Agent和工具以便使用。不同的Agent具有不同的prompting styles，不同的输入编码方式以及不同的输出解析方式。

AgentExecutor则负责运行Agent，执行所选操作，并将操作结果反馈给Agent。如果遇到复杂情况，如选择不存在的工具、工具错误、无法解析的输出等，AgentExecutor也会进行处理。

以下是其工作流程的伪代码, 可以看到内部是个循环：

next_action = agent.get_action(...)      
while next_action != AgentFinish:		 
    observation = run(next_action)         
    next_action = agent.get_action(..., next_action, observation)  
return next_action

总结来说，代理的工作原理是：使用语言模型来决定接下来的操作（AgentAction），当完成任务时产生结束信号（AgentFinish），并通过记录先前的操作和输出（intermediate_steps）来进行未来的迭代决策。

AgentExecutor 类是LangChain中主要的agent runtime，除此之外，也支持其它实验中的runtimes，比如Plan-and-execute Agent、Baby AGI、Auto GPT。

浅尝辄止

from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.tools.render import format_tool_to_openai_function
from langchain.chat_models import ChatOpenAI


from langchain_core.tools import tool



llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system",
         "You are very powerful assistant, but bad at calculating lengths of words.",
         ),
        ("user",
         "{input}"),
        MessagesPlaceholder(
            variable_name="agent_scratchpad"),
    ])

# 定义tool
@tool
def get_weather(city: str) -> str:
    """Returns the weather of a city."""
    return "hot"


tools = [get_weather]
# tool 集成, 使用format_tool_to_openai_function将工具函数格式化为OpenAI函数格式
llm_with_tools = llm.bind(
    functions=[
        format_tool_to_openai_function(t) for t in tools])


agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        ),
    }
    | prompt
    | llm_with_tools
    | OpenAIFunctionsAgentOutputParser()
)

agent.invoke({"input": "What's the weather like in Shanghai?", "intermediate_steps": []})

执行之后最终的输出是一个包含有关调用的工具（get_weather）以及相应消息记录的AgentActionMessageLog:

{
  "tool": "get_weather",
  "tool_input": {
    "city": "Shanghai"
  },
  "log": "\nInvoking: `get_weather` with `{'city': 'Shanghai'}`\n\n\n",
  "message_log": [
    {
      "AIMessage": {
        "content": "",
        "additional_kwargs": {
          "function_call": {
            "arguments": "{\"city\":\"Shanghai\"}",
            "name": "get_weather"
          }
        },
        "response_metadata": {
          "token_usage": {
            "completion_tokens": 15,
            "prompt_tokens": 59,
            "total_tokens": 74
          },
          "model_name": "gpt-3.5-turbo",
          "system_fingerprint": null,
          "finish_reason": "function_call",
          "logprobs": null
        },
        "id": "run-9f4a819c-178b-4294-95d7-9988555526a0-0"
      }
    }
  ]
}

我们接下来解释一下这段代码中的关键部分：

format_to_openai_function_messages函数用于将中间步骤的信息转化为适合发送给模型的格式。
OpenAIFunctionsAgentOutputParser负责将模型输出的消息解析为AgentAction/AgentFinish。
虽然我们构建的提示看起来非常简单，其实这得益于OpenAI的Function Calling的优秀性能。在创建提示时，我们只需依赖用户的输入（input）和先前工具调用的结果（agent_scratchpad），无需提供复杂的指令给LLM
每次模型的输入都是一个字典，有两个键input和intermediate_steps，分别代表用户输入和中间步骤结果。这两个键的值通过lambda函数，传入agent 组件第一步的 "input"和 "agent_scratchpad"中（字典表示并行执行）

自定义粗糙的runtime

上面返回的结果只是调用工具的信息，我们可以拿到调用信息后在本地进行函数调用，将结果再给llm让llm进行回答。

from langchain_core.agents import AgentFinish

user_input = "What's the weather like in Shanghai?"
intermediate_steps = []
while True:
    output = agent.invoke(
        {
            "input": user_input,
            "intermediate_steps": intermediate_steps,
        }
    )
    if isinstance(output, AgentFinish):
        final_result = output.return_values["output"]
        break
    else:
        print(f"TOOL NAME: {output.tool}")
        print(f"TOOL INPUT: {output.tool_input}")
        tool = {"get_word_length": get_word_length}[output.tool]
        observation = tool.run(output.tool_input)
        intermediate_steps.append((output, observation))
print(final_result)

输出结果如下：

TOOL NAME: get_weather
TOOL INPUT: {'city': 'Shanghai'}
The weather in Shanghai is hot.

其中intermediate_steps被format更OpenAI的数据格式如下：

[
  {
    "AIMessage": {
      "content": "",
      "additional_kwargs": {
        "function_call": {
          "arguments": "{\"city\":\"Shanghai\"}",
          "name": "get_weather"
        }
      },
      "response_metadata": {
        "token_usage": {
          "completion_tokens": 16,
          "prompt_tokens": 69,
          "total_tokens": 85
        },
        "model_name": "gpt-3.5-turbo",
        "system_fingerprint": None,
        "finish_reason": "function_call",
        "logprobs": None
      },
      "id": "run-f7cd175e-9a04-45b8-b028-dda48e91b108-0"
    }
  },
  {
    "FunctionMessage": {
      "content": "hot",
      "name": "get_weather"
    }  
  }
]

使用AgentExecutor

使用AgentExecutor可以简化这一过程，并且它提供了一些改进的功能，比如帮忙管理intermediate_steps等。使用也很简单，只需要在创建AgentExecutor对象时捆绑agent和tools就行。

from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "What's the weather like in Shanghai"})

执行结果如下:

> Entering new AgentExecutor chain...

Invoking: `get_weather` with `{'city': 'Shanghai'}`


hotThe weather in Shanghai is hot.

> Finished chain.
{'input': "What's the weather like in Shanghai", 'output': 'The weather in Shanghai is hot.'}

Adding memory

虽然我们现在的agent能够正常运行，但是它无法记住先前的交互信息，这是因为它是“无状态”的。为了解决这个问题，我们可以添加一个内存（memory）机制。具体步骤如下：

在构建提示（prompt）时，添加一个名为memory的变量，这个变量将保存agent之前的交互信息。
设计一种方法来跟踪和更新聊天历史记录（chat history）。

from langchain.prompts import MessagesPlaceholder

MEMORY_KEY = "chat_history"
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are very powerful assistant.",
        ),
        MessagesPlaceholder(variable_name=MEMORY_KEY),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

agent = (
    # 第一个组件的输入字典，多了一个"chat_history"键, 用于传给prompt
        {
            "input": lambda x: x["input"],
            "agent_scratchpad": lambda x: format_to_openai_function_messages(
                x["intermediate_steps"]
            ),
            "chat_history": lambda x: x["chat_history"],
        }
        | prompt
        | llm_with_tools
        | OpenAIFunctionsAgentOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

input1 = "What's the weather like in Dangtu"
result = agent_executor.invoke({"input": input1, "chat_history": chat_history})
chat_history.extend(
    [
        HumanMessage(content=input1),
        AIMessage(content=result["output"]),
    ]
)
print(agent_executor.invoke({"input": "is that a real city?", "chat_history": chat_history}))

输出结果如下：

> Entering new AgentExecutor chain...

Invoking: `get_weather` with `{'city': 'Dangtu'}`


hotThe weather in Dangtu is currently hot.

> Finished chain.


> Entering new AgentExecutor chain...
Yes, Dangtu is a real city. It is located in Anhui Province, China. If you would like, I can provide the current weather information for Dangtu.

> Finished chain.
{'input': 'is that a real city?', 'chat_history': [HumanMessage(content="What's the weather like in Dangtu"), AIMessage(content='The weather in Dangtu is currently hot.')], 'output': 'Yes, Dangtu is a real city. It is located in Anhui Province, China. If you would like, I can provide the current weather information for Dangtu.'}

如果希望每次对话时自动将chat_history添加到memory中，可以在创建AgentExecutor对象时，启用memory:

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
agent_executor = AgentExecutor(agent=agent, tools=tools, memory=memory, verbose=True)

ReAct实战

from langchain import hub
from langchain.agents.format_scratchpad import format_log_to_str
from langchain.tools.render import render_text_description
from langchain.agents.output_parsers import JSONAgentOutputParser
from langchain.agents import initialize_agent, AgentType
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0)

prompt = hub.pull("hwchase17/react-chat-json")
prompt = prompt.partial(
    tools=render_text_description(tools),
    tool_names=", ".join([t.name for t in tools]),
)

# 我们需要额外的指导，否则模型有时会忘记如何回应TEMPLATE_TOOL_RESPONSE = """TOOL RESPONSE: 
---------------------
{observation}

USER'S INPUT
--------------------

Okay, so what is the response to my last comment? If using information obtained from the tools you must mention it explicitly without mentioning the tool names - I have forgotten all TOOL RESPONSES! Remember to respond with a markdown code snippet of a json blob with a single action, and NOTHING else - even if you just want to respond to the user. Do NOT respond with anything except a JSON snippet no matter what!"""


llm_with_stop = llm.bind(stop=["\nObservation"])

# 创建了一个代理的管道，其中包含了输入处理、prompt、LLM和输出解析器
agent = (
        {
            "input": lambda x: x["input"],
            "agent_scratchpad": lambda x: format_log_to_messages(
            x["intermediate_steps"], template_tool_response=TEMPLATE_TOOL_RESPONSE
        ),
        }
        | prompt
        | llm_with_stop
        | JSONAgentOutputParser()
)


agent_executor = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
print(agent_executor.invoke("What's the weather like in Shanghai?"))

输出结果如下：

> Entering new AgentExecutor chain...
Shanghai is a large city in China, and its weather can vary. To provide an accurate answer, I need to check the current weather conditions.

Action: get_weather
Action Input: Shanghai
Observation: hot
Thought:I now know the final answer.

Final Answer: The weather in Shanghai is currently hot.

> Finished chain.
{'input': "What's the weather like in Shanghai?", 'output': 'The weather in Shanghai is currently hot.'}

上面的代码使用的模型是chat models，也可以换成LLMs，只是prompt和解析器不一样而已。

我们可以加上debug来看看具体的调用流程：

from langchain.globals import set_debug

set_debug(True)

为了防止我们的代理陷入无限循环，我们可以设置一个max_iterations参数来限制代理的迭代次数。默认情况下，代理将返回一个预设的字符串作为其输出。然而，如果您希望代理在达到最大迭代次数后能生成更具意义的响应，您可以选择使用generate方法。此时，代理将在完成所有迭代后，利用语言模型（LLM）生成一个与用户输入相关且自然的回复。

initialize_agent(
    tools,
    llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    verbose=True,
    max_iterations=3,
    early_stopping_method="generate",
)