LangChain开发框架入门：从基础到Agent

引言：为什么需要LangChain？

在大型语言模型（LLM）应用开发中，我们经常面临以下挑战：

复杂的流程编排：LLM应用通常需要多个步骤，如提示工程、模型调用、结果解析等
工具集成困难：如何让LLM调用外部工具和API
状态管理复杂：在多轮对话中保持上下文一致性
可复用性差：相似的逻辑需要在不同项目中重复实现

LangChain应运而生，它是一个基于LLM开发应用程序的框架，将调用LLM的过程组织成链式结构，同时提供了丰富的工具生态系统。本文将手把手教你如何使用LangChain构建智能应用。

1、LangChain核心架构

LangChain的模块组成：

Model I/O：与语言模型进行接口
Retriever：与特定于应用程序的数据进行接口
Memory：在Pipeline运行期间保持记忆状态
Chain：构建调用序列链条
Agent：让管道根据高级指令选择使用哪些工具
Callback：记录和流式传输任何管道的中间步骤

2、快速开始：你的第一个LangChain应用

让我们从一个简单的例子开始，了解LangChain的基本工作流程：

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# 1. 定义提示模板
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a world-class technical documentation writer."),
    ("user", "{input}")
])

# 2. 初始化模型
llm = ChatOpenAI(
    model_name='qwen1.5-32b-chat-int4',
    openai_api_base='http://20.20.136.251:8001/v1',
    openai_api_key='q7r8s9t0-u1v2-w3x4-y5z6-a7b8c9d0e1f2'
)

# 3. 定义输出解析器
output_parser = StrOutputParser()

# 4. 创建链条：prompt → llm → output_parser
chain = prompt | llm | output_parser

# 5. 执行链条
response = chain.invoke({"input": "how can langsmith help with testing?"})
print(response)

这个简单的例子展示了LangChain的核心概念：LCEL（LangChain Expression Language）。使用 | 操作符可以将各个组件链接在一起，形成一个处理流水线。

3、深入理解Model I/O模块

Model I/O是与LLM直接交互的核心模块，包含三个关键组件：

3.1 Prompt Templates：智能提示工程

PromptTemplate：创建单个字符串类型的prompt

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "Tell me a {adjective} joke about {content}."
)
result = prompt_template.format(adjective="funny", content="chickens")
print(result)  # Tell me a funny joke about chickens.

ChatPromptTemplate：创建对话式prompt，支持多轮对话

from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI bot. Your name is {name}."),
    ("human", "Hello, how are you doing?"),
    ("ai", "I'm doing well, thanks!"),
    ("human", "{user_input}"),
])

messages = chat_template.format_messages(
    name="Bob", 
    user_input="What is your name?"
)

FewShotPromptTemplate：少样本学习提示模板

from langchain.prompts import PromptTemplate, FewShotPromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector

# 定义示例
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
]

# 创建示例模板
example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

# 创建示例选择器
example_selector = LengthBasedExampleSelector(
    examples=examples,
    example_prompt=example_prompt,
    max_length=25
)

# 创建少样本提示模板
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:",
    input_variables=["adjective"],
)

print(dynamic_prompt.invoke({"adjective": "funny"}).text)

3.2 Language Models：模型调用接口

LangChain支持多种语言模型接口：

LLM接口：文本输入，文本输出

from langchain_community.llms import vllm

llm = vllm.VLLM(model="/path/to/your/model")
response = llm.invoke("how can langsmith help with testing?")

Chat Model接口：消息列表输入，消息输出

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

chat = ChatOpenAI(
    model_name='qwen1.5-32b-chat-int4',
    openai_api_base='http://20.20.136.251:8001/v1',
    openai_api_key='q7r8s9t0-u1v2-w3x4-y5z6-a7b8c9d0e1f2'
)

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(content="What is the purpose of model regularization?"),
]

response = chat.invoke(messages)

3.3 Output Parsers：结构化输出解析

PydanticOutputParser：使用Pydantic数据类定义输出格式

from typing import List
from langchain_openai import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator

# 定义输出数据结构
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")
    
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field

# 初始化解析器
parser = PydanticOutputParser(pydantic_object=Joke)

# 创建提示模板
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# 创建链条
chain = prompt | llm | parser

# 执行
result = chain.invoke({"query": "Tell me a joke."})
print(result)

JsonOutputParser：JSON格式输出解析

from langchain_core.output_parsers import JsonOutputParser
from langchain.prompts import PromptTemplate

parser = JsonOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | llm | parser
result = chain.invoke({"query": "Tell me a joke."})

StructuredOutputParser：结构化输出解析（适合功能较弱的模型）

from langchain.output_parsers import ResponseSchema, StructuredOutputParser

response_schemas = [
    ResponseSchema(name="answer", description="answer to the user's question"),
    ResponseSchema(name="source", description="source used to answer the user's question"),
]

output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions()

prompt = PromptTemplate(
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions},
)

chain = prompt | llm | output_parser
result = chain.invoke({"question": "what's the capital of france?"})

4、Chain模块：构建复杂处理流程

Chain是LangChain应用程序的核心构建模块。LLMChain结合了三个关键组件：

LLM：语言模型作为核心推理引擎
Prompt Templates：提供语言模型的指令
Output Parsers：将LLM的原始响应转换为更易处理的格式

4.1 基础Chain示例

from langchain.chains import LLMChain

# 创建LLMChain
llm_chain = LLMChain(
    llm=llm,
    prompt=prompt,
    output_parser=output_parser
)

# 执行Chain
response = llm_chain.invoke({"input": "Explain quantum computing in simple terms"})

4.2 顺序Chain（SequentialChain）

from langchain.chains import SequentialChain, SimpleSequentialChain

# 定义第一个Chain：生成菜谱名称
recipe_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate.from_template(
        "Suggest a recipe for {ingredient}"
    ),
    output_key="recipe_name"
)

# 定义第二个Chain：生成详细步骤
steps_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate.from_template(
        "Write detailed steps for making {recipe_name}"
    ),
    output_key="steps"
)

# 创建顺序Chain
overall_chain = SequentialChain(
    chains=[recipe_chain, steps_chain],
    input_variables=["ingredient"],
    output_variables=["recipe_name", "steps"]
)

result = overall_chain.invoke({"ingredient": "chicken"})

4.3 路由Chain（RouterChain）

from langchain.chains.router import MultiPromptChain
from langchain.chains.router.llm_router import LLMRouterChain, RouterOutputParser
from langchain.prompts import PromptTemplate

# 定义不同主题的提示模板
physics_template = """You are a physics expert. Answer physics questions.
Question: {input}"""

math_template = """You are a math expert. Answer math questions.
Question: {input}"""

prompt_infos = [
    {
        "name": "physics",
        "description": "Good for answering physics questions",
        "prompt_template": physics_template
    },
    {
        "name": "math",
        "description": "Good for answering math questions",
        "prompt_template": math_template
    }
]

# 创建路由Chain
chain = MultiPromptChain.from_prompts(
    llm=llm,
    prompt_infos=prompt_infos,
    default_chain=LLMChain(llm=llm, prompt=PromptTemplate.from_template("{input}"))
)

# 使用路由Chain
physics_response = chain.invoke({"input": "What is quantum entanglement?"})
math_response = chain.invoke({"input": "What is the derivative of x^2?"})

5、Agent模块：让LLM学会使用工具

Agent是LangChain最强大的功能之一，它允许LLM根据需求动态选择和使用工具。

5.1 创建你的第一个Agent

让我们实现一个统计英文单词字数的Agent：

import logging
from langchain_openai import ChatOpenAI
from langchain.agents import tool, initialize_agent, AgentType
from langchain.prompts import ChatPromptTemplate

# 配置日志
logging.basicConfig(level=logging.INFO)

# 1. 初始化语言模型
llm = ChatOpenAI(
    model_name='qwen1.5-32b-chat-int4',
    openai_api_base='http://20.20.136.251:8001/v1',
    openai_api_key='q7r8s9t0-u1v2-w3x4-y5z6-a7b8c9d0e1f2'
)

# 2. 定义工具
@tool
def get_word_length(word: str) -> int:
    """返回单词的长度"""
    length = len(word)
    logging.info(f"get_word_length tool invoked with word: {word}, length: {length}")
    return length

# 3. 创建工具列表
tools = [get_word_length]

# 4. 创建提示模板
prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个非常优秀的人，但是不会计算单词的长度。你必须使用工具get_word_length来解决此问题。"),
    ("user", "{input}")
])

# 5. 初始化Agent
agent_executor = initialize_agent(
    agent_type=AgentType.OPENAI_FUNCTIONS,
    tools=tools,
    llm=llm,
    prompt=prompt,
    verbose=True
)

# 6. 执行Agent
result = agent_executor.invoke({
    "input": "关于color有多少个字母? 要求必须使用工具get_word_length解决问题"
})
logging.info(f"Agent execution result: {result}")

5.2 多工具Agent示例

from langchain.agents import Tool
from langchain.utilities import WikipediaAPIWrapper, PythonREPL

# 定义多个工具
wikipedia = WikipediaAPIWrapper()
python_repl = PythonREPL()

tools = [
    Tool(
        name="Wikipedia",
        func=wikipedia.run,
        description="Useful for searching information on Wikipedia"
    ),
    Tool(
        name="PythonREPL",
        func=python_repl.run,
        description="Useful for executing Python code"
    ),
    Tool(
        name="Calculator",
        func=lambda x: str(eval(x)),
        description="Useful for mathematical calculations"
    )
]

# 创建多工具Agent
multi_tool_agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

# 执行复杂任务
result = multi_tool_agent.invoke(
    "Calculate the factorial of 5 using Python, then search Wikipedia for information about factorials"
)

5.3 自定义工具类

from langchain.tools import BaseTool
from typing import Type

class CustomCalculatorTool(BaseTool):
    name = "AdvancedCalculator"
    description = "Performs advanced mathematical operations"
    
    def _run(self, expression: str) -> str:
        """执行数学表达式"""
        try:
            result = eval(expression)
            return f"The result of {expression} is {result}"
        except Exception as e:
            return f"Error: {str(e)}"
    
    async def _arun(self, expression: str) -> str:
        """异步执行数学表达式"""
        return self._run(expression)

# 使用自定义工具
custom_tool = CustomCalculatorTool()
tools.append(custom_tool)

总结与下一篇预告

本文详细介绍了LangChain的核心概念和基础模块，包括：

Model I/O模块：Prompt Templates、Language Models、Output Parsers
Chain模块：基础Chain、顺序Chain、路由Chain
Agent模块：基础Agent、多工具Agent、自定义工具

在掌握了这些基础知识后，我们已经具备了使用LangChain构建简单应用的能力。然而，真正的挑战在于如何将这些模块组合起来，构建一个完整的、实用的智能应用。

下一篇预告：《LangChain开发框架实践（二）：构建智能问答系统》

在下一篇文章中，我们将综合运用本文所学知识，构建一个完整的智能问答系统。该系统将包含以下功能：

文档加载与预处理
向量数据库构建
语义检索与问答
多轮对话支持
性能优化与部署

通过这个实战项目，你将学习到如何将LangChain的各个模块有机地结合起来，解决实际业务问题。敬请期待！

本文是LangChain开发框架实践系列的第一篇，主要介绍基础概念和核心模块。建议在阅读下一篇实战文章前，先掌握本文中的代码示例和概念理解。

「真诚赞赏，手留余香」