AI Agent Reflexion模式：AI自我反思與迭代優化框架｜多輪評估改進機制深度解析(設計篇) 詳情 - lua,json,ci,Java,後端開發,yyds乾貨盤點 xiaogeit 博客

在這裏插入圖片描述

肖哥彈架構 跟大家“彈彈” 大模型Agent 設計與實戰應用，需要代碼關注

歡迎關注，點贊，留言。

關注公號Solomon肖哥彈架構獲取更多精彩內容

歷史熱點文章

解鎖大語言模型參數：零基礎掌握大型語言模型參數奧秘與實踐指南
6張圖掌握提示詞工程師工作範圍與工作技巧(提示詞原理篇)
一個項目代碼講清楚DO/PO/BO/AO/E/DTO/DAO/ POJO/VO
寫代碼總被Dis：5個項目案例帶你掌握SOLID技巧,代碼有架構風格
里氏替換原則在金融交易系統中的實踐，再不懂你咬我

⚠️ 原創不易搬運必究

Reflexion框架通過"生成→評估→改進"的閉環機制，解決傳統AI單輪問答在準確性、完整性和實用性方面的侷限。本文完整解析其工作代理、反射代理、備用LLM的三角色協作架構，通過RAG技術詳解的實際案例，展示從初稿到最終答案的完整多輪迭代過程。涵蓋模板系統、解析器機制、控制流邏輯等核心技術細節，幫助構建具備自我修正能力的高質量AI問答系統。

一、 Reflexion技術背景

單輪問答的侷限：一次生成常出現事實偏差、覆蓋不全、表達不清等問題，且難以自我修正。
人類審稿的啓發：先產出，再按標準審視與修改，逐輪趨近高質量答案。
工程化需求：需要可控的輪次上限、結構化的評估/改進迴路、可觀測與可恢復。

Reflexion 通過“工作代理 → 反射代理 → 改進”的閉環，讓答案在若干輪內可預測地提升質量，同時與 OxyGent 的權限、追蹤、SSE、斷點等機制協同。

1. Reflexion設計思路

在這裏插入圖片描述

二、業務數據快照：從輸入到最終答案（含多輪反射）

0) 用户輸入

{
  "oxy_request": {
    "caller": "user",
    "callee": "reflexion_flow",
    "arguments": {
      "query": "解釋 RAG 的核心流程，並給出一個簡短示例"
    }
  }
}

1) 第1輪：工作代理 → 反射代理 → （可能）改進

{
  "worker_call": { "callee": "worker_agent", "arguments": { "query": "解釋 RAG..." } },
  "worker_response": { "output": "RAG 是... (初稿)" },

  "evaluation_query": {
    "template": "評估模板(準確性/完整性/清晰度/相關性/實用性)...",
    "filled": "原始問題: 解釋 RAG...\n答案: RAG 是..."
  },
  "reflexion_call": { "callee": "reflexion_agent", "arguments": { "query": "[filled]" } },
  "reflexion_response": {
    "output": {
      "is_satisfactory": false,
      "evaluation_reason": "缺少檢索-重寫-生成的細節",
      "improvement_suggestions": "補充檢索召回與重寫步驟，並提供代碼示例"
    }
  },

  "improvement_query": {
    "template": "改進模板",
    "filled": "[加入改進建議及上一輪答案]"
  }
}

2) 第2輪：根據改進建議再次生成與評估

{
  "worker_response": { "output": "RAG 流程：檢索->重寫->生成；示例: ... (改進稿)" },
  "reflexion_response": {
    "output": {
      "is_satisfactory": true,
      "evaluation_reason": "覆蓋要點且示例清晰",
      "improvement_suggestions": ""
    }
  },
  "final_oxy_response": {
    "state": "COMPLETED",
    "output": "通過 2 輪反射優化後的最終答案:\n\nRAG 流程...",
    "extra": { "reflexion_rounds": 2, "final_evaluation": { "is_satisfactory": true } }
  }
}

若達到 max_reflexion_rounds 仍不滿意，將彙總最後一次答案與評估理由，請備用 LLM 收尾。

三、 Reflexion 思路與步驟

核心思想：生成→評估→改進的閉環迭代，用結構化評估信號指導後續改寫。
參與角色：
- worker_agent：給出當前答案草稿；
- reflexion_agent：按模板評估答案（結構化輸出 ReflectionEvaluation）；
- 備用 LLM：在輪次用盡時做“結合反饋的最終回答”。
數據與狀態：
- original_query/current_query：原始/當前用於生成的提示；
- current_answer：最新草稿；
- evaluation：is_satisfactory/evaluation_reason/improvement_suggestions。
流程：
1. 調用 worker 生成答案並解析；
2. 按評估模板調用 reflexion 得到 ReflectionEvaluation；
3. 若滿意則完成；否則根據建議構造改進提示進入下一輪；
4. 超過 max_reflexion_rounds 用備用 LLM 收尾。

四、 Reflexion流程

sequenceDiagram
    participant User as 用户
    participant Flow as Reflexion流程
    participant Worker as 工作代理
    participant Judge as 反射代理
    participant LLM as 備用LLM

    User->>Flow: 發起請求(query)
    Flow->>Flow: 初始化(current_query=original_query)

    loop 迭代至滿意或達到上限
        Flow->>Worker: call(worker_agent, current_query)
        Worker-->>Flow: 返回答案(current_answer)

        Flow->>Judge: call(reflexion_agent, evaluation_template.format(query, answer))
        Judge-->>Flow: 返回ReflectionEvaluation

        alt 答案滿意
            Flow-->>User: COMPLETED(最終答案 + 評估摘要)
        else 需要改進
            Flow->>Flow: 構造 improvement_query(模板 + 建議 + 上一稿)
            Flow->>Flow: current_query = improvement_query
        end
    end

    alt 達到最大輪次
        Flow->>LLM: call(llm_model, 彙總最後答案與評估)
        LLM-->>Flow: 返回最終總結
        Flow-->>User: COMPLETED(最終總結 + reached_max_rounds)
    end

五、Reflexion架構與落地

1. 公開 API 與配置

入口：Reflexion._execute(oxy_request) -> OxyResponse
關鍵字段：max_reflexion_rounds、worker_agent、reflexion_agent、llm_model
解析擴展：func_parse_worker_response、func_parse_reflexion_response、pydantic_parser_reflexion
模板：evaluation_template、improvement_template

2. 角色契約

Worker：輸入 query，輸出可讀答案文本；解析器可歸一化。
Reflexion：輸入 evaluation_query，輸出 ReflectionEvaluation（結構化）。
LLM 收尾：輸入“原問題+最新答案+評估理由”，輸出最終總結。

3. 錯誤與超時策略

OxyRequest.call 統一處理：超時→FAILED，異常→FAILED，權限不足→SKIPPED。
可選重試：參考 OxyRequest.retry_execute 的語義。

4. 可觀測性與恢復

建議記錄：每輪的 current_answer 摘要與 evaluation；trace/node 消息寫 ES。
斷點恢復：可基於 from_trace_id/restart_node_id 與歷史節點輸出，恢復到上一輪繼續。

5. 性能與安全

控制輪次與模板長度，避免提示過長；current_answer 做摘要。
權限隔離評估代理，評估模板避免泄露敏感上下文。

6. 端到端最小示例

flow = Reflexion(worker_agent="worker_agent", reflexion_agent="reflexion_agent", max_reflexion_rounds=3)
resp = await flow.execute(OxyRequest(caller="user", callee="reflexion_flow", arguments={"query": "解釋 RAG"}))

7. 測試清單

工作代理多樣輸出能被解析；
評估解析文本/Pydantic 兩條路徑；
滿意/不滿意/上限收尾三分支覆蓋；
超時/異常/權限；SSE 流與取消；多會話併發不串線。

六、Reflexion與LLM 交互詳解

詳細展示 Reflexion 流程中每一步與大模型交互的完整請求與響應內容，分解其工作原理。

1. 場景示例

用户輸入問題：

"解釋 RAG 的核心流程，並給出一個簡短示例"

配置參數：

max_reflexion_rounds = 3（最多迭代 3 輪）
worker_agent = "worker_agent"（負責生成答案）
reflexion_agent = "reflexion_agent"（負責評估答案）
llm_model = "gpt-4"（備用 LLM 用於最終收尾）

第 1 輪迭代

1.1 Worker Agent 調用（生成初稿）

請求構造：

# 代碼位置：_execute 方法中
worker_response = await oxy_request.call(
    callee=self.worker_agent,  # "worker_agent"
    arguments={"query": current_query}
)

完整請求內容：

{
  "callee": "worker_agent",
  "caller": "reflexion_flow",
  "arguments": {
    "query": "解釋 RAG 的核心流程，並給出一個簡短示例"
  },
  "trace_id": "trace_abc123",
  "node_id": "node_worker_1"
}

LLM 實際收到的 Prompt（由 worker_agent 內部構造）：

System: You are a helpful AI assistant.

User: 解釋 RAG 的核心流程，並給出一個簡短示例

完整響應內容：

{
  "state": "COMPLETED",
  "output": "RAG（Retrieval-Augmented Generation）是一種結合檢索和生成的技術。它的核心流程包括：\n\n1. 用户提出問題\n2. 系統從知識庫中檢索相關文檔\n3. 將檢索到的文檔和問題一起輸入到生成模型\n4. 生成最終答案\n\n這種方法可以讓模型基於外部知識生成更準確的答案。",
  "extra": {
    "model": "gpt-4",
    "tokens": 156
  }
}

解析後的答案（通過 func_parse_worker_response）：

current_answer = """RAG（Retrieval-Augmented Generation）是一種結合檢索和生成的技術。它的核心流程包括：

1. 用户提出問題
2. 系統從知識庫中檢索相關文檔
3. 將檢索到的文檔和問題一起輸入到生成模型
4. 生成最終答案

這種方法可以讓模型基於外部知識生成更準確的答案。"""

1.2 Reflexion Agent 調用（評估答案）

請求構造：

# 步驟1: 填充評估模板
evaluation_query = self.evaluation_template.format(
    query=original_query, 
    answer=current_answer
)

# 步驟2: 如果使用 Pydantic 解析器，加上格式指令
if self.pydantic_parser_reflexion:
    evaluation_query = self.pydantic_parser_reflexion.format(evaluation_query)

# 步驟3: 調用反射代理
reflexion_response = await oxy_request.call(
    callee=self.reflexion_agent,
    arguments={"query": evaluation_query}
)

完整請求內容：

{
  "callee": "reflexion_agent",
  "caller": "reflexion_flow",
  "arguments": {
    "query": "Please evaluate the quality of the following answer:\n\nOriginal Question: 解釋 RAG 的核心流程，並給出一個簡短示例\n\nAnswer: RAG（Retrieval-Augmented Generation）是一種結合檢索和生成的技術。它的核心流程包括：\n\n1. 用户提出問題\n2. 系統從知識庫中檢索相關文檔\n3. 將檢索到的文檔和問題一起輸入到生成模型\n4. 生成最終答案\n\n這種方法可以讓模型基於外部知識生成更準確的答案。\n\nPlease evaluate based on these criteria:\n1. Accuracy: Is the information correct and factual?\n2. Completeness: Does it fully address the user's question?\n3. Clarity: Is it well-structured and easy to understand?\n4. Relevance: Does it stay focused on the user's needs?\n5. Helpfulness: Does it provide practical value to the user?\n\nReturn your evaluation in the following format:\n{\n  \"is_satisfactory\": boolean,\n  \"evaluation_reason\": \"string\",\n  \"improvement_suggestions\": \"string\"\n}\n\nYour response must be valid JSON matching the schema above."
  },
  "trace_id": "trace_abc123",
  "node_id": "node_reflexion_1"
}

LLM 實際收到的 Prompt（由 reflexion_agent 內部構造）：

System: You are an expert evaluator responsible for assessing answer quality.

User: Please evaluate the quality of the following answer:

Original Question: 解釋 RAG 的核心流程，並給出一個簡短示例

Answer: RAG（Retrieval-Augmented Generation）是一種結合檢索和生成的技術。它的核心流程包括：

1. 用户提出問題
2. 系統從知識庫中檢索相關文檔
3. 將檢索到的文檔和問題一起輸入到生成模型
4. 生成最終答案

這種方法可以讓模型基於外部知識生成更準確的答案。

Please evaluate based on these criteria:
1. Accuracy: Is the information correct and factual?
2. Completeness: Does it fully address the user's question?
3. Clarity: Is it well-structured and easy to understand?
4. Relevance: Does it stay focused on the user's needs?
5. Helpfulness: Does it provide practical value to the user?

Return your evaluation in the following format:
{
  "is_satisfactory": boolean,
  "evaluation_reason": "string",
  "improvement_suggestions": "string"
}

Your response must be valid JSON matching the schema above.

完整響應內容：

{
  "state": "COMPLETED",
  "output": "{\n  \"is_satisfactory\": false,\n  \"evaluation_reason\": \"答案解釋了 RAG 的基本概念，但缺少關鍵技術細節。流程過於簡化，沒有説明檢索的具體方法（如向量檢索、BM25等），也沒有提供用户要求的代碼示例。完整性和實用性不足。\",\n  \"improvement_suggestions\": \"1. 詳細説明檢索步驟，包括向量化、相似度計算等技術細節\\n2. 補充一個具體的代碼示例，展示如何實現簡單的 RAG 系統\\n3. 説明檢索到的文檔如何與 prompt 結合輸入到 LLM\"\n}",
  "extra": {
    "model": "gpt-4",
    "tokens": 203
  }
}

解析後的評估結果（通過 func_parse_reflexion_response）：

evaluation = ReflectionEvaluation(
    is_satisfactory=False,
    evaluation_reason="答案解釋了 RAG 的基本概念，但缺少關鍵技術細節。流程過於簡化，沒有説明檢索的具體方法（如向量檢索、BM25等），也沒有提供用户要求的代碼示例。完整性和實用性不足。",
    improvement_suggestions="1. 詳細説明檢索步驟，包括向量化、相似度計算等技術細節\n2. 補充一個具體的代碼示例，展示如何實現簡單的 RAG 系統\n3. 説明檢索到的文檔如何與 prompt 結合輸入到 LLM"
)

判斷結果：is_satisfactory = False，需要進入第 2 輪改進。

1.3 構造改進 Query（準備第 2 輪）

代碼邏輯：

if evaluation.improvement_suggestions:
    current_query = self.improvement_template.format(
        original_query=original_query,
        improvement_suggestions=evaluation.improvement_suggestions,
        previous_answer=current_answer
    )

改進後的 Query：

解釋 RAG 的核心流程，並給出一個簡短示例

Please improve your previous answer based on the following feedback:
1. 詳細説明檢索步驟，包括向量化、相似度計算等技術細節
2. 補充一個具體的代碼示例，展示如何實現簡單的 RAG 系統
3. 説明檢索到的文檔如何與 prompt 結合輸入到 LLM

Previous answer: RAG（Retrieval-Augmented Generation）是一種結合檢索和生成的技術。它的核心流程包括：

1. 用户提出問題
2. 系統從知識庫中檢索相關文檔
3. 將檢索到的文檔和問題一起輸入到生成模型
4. 生成最終答案

這種方法可以讓模型基於外部知識生成更準確的答案。

第 2 輪迭代

2.1 Worker Agent 調用（生成改進稿）

完整請求內容：

{
  "callee": "worker_agent",
  "caller": "reflexion_flow",
  "arguments": {
    "query": "解釋 RAG 的核心流程，並給出一個簡短示例\n\nPlease improve your previous answer based on the following feedback:\n1. 詳細説明檢索步驟，包括向量化、相似度計算等技術細節\n2. 補充一個具體的代碼示例，展示如何實現簡單的 RAG 系統\n3. 説明檢索到的文檔如何與 prompt 結合輸入到 LLM\n\nPrevious answer: RAG（Retrieval-Augmented Generation）是一種結合檢索和生成的技術。它的核心流程包括：\n\n1. 用户提出問題\n2. 系統從知識庫中檢索相關文檔\n3. 將檢索到的文檔和問題一起輸入到生成模型\n4. 生成最終答案\n\n這種方法可以讓模型基於外部知識生成更準確的答案。"
  },
  "trace_id": "trace_abc123",
  "node_id": "node_worker_2"
}

完整響應內容：

{
  "state": "COMPLETED",
  "output": "# RAG 核心流程詳解\n\nRAG（Retrieval-Augmented Generation）結合了信息檢索與生成模型的優勢，核心流程如下：\n\n## 1. 文檔預處理與索引\n- 將知識庫文檔分塊（chunking）\n- 使用 Embedding 模型（如 OpenAI embeddings）將文檔向量化\n- 存儲到向量數據庫（如 FAISS、Pinecone）\n\n## 2. 用户查詢處理\n- 用户提出問題\n- 將問題向量化（使用相同的 embedding 模型）\n- 通過餘弦相似度或其他距離度量在向量庫中檢索 Top-K 相關文檔\n\n## 3. 上下文組裝與生成\n- 將檢索到的文檔片段與用户問題組合成 prompt\n- 輸入到 LLM（如 GPT-4）生成答案\n- LLM 基於檢索到的上下文生成準確回答\n\n## 簡短代碼示例\n\n```python\nfrom langchain.embeddings import OpenAIEmbeddings\nfrom langchain.vectorstores import FAISS\nfrom langchain.llms import OpenAI\n\n# 1. 構建向量庫\nembeddings = OpenAIEmbeddings()\ndocs = [\"文檔1內容\", \"文檔2內容\", \"文檔3內容\"]\nvectorstore = FAISS.from_texts(docs, embeddings)\n\n# 2. 檢索相關文檔\nquery = \"用户的問題\"\nrelevant_docs = vectorstore.similarity_search(query, k=3)\n\n# 3. 組裝 prompt 並生成\ncontext = \"\\n\".join([doc.page_content for doc in relevant_docs])\nprompt = f\"基於以下上下文回答問題：\\n{context}\\n\\n問題：{query}\"\n\nllm = OpenAI()\nanswer = llm(prompt)\nprint(answer)\n```\n\n這個示例展示了 RAG 的完整流程：向量化存儲、相似度檢索、上下文注入生成。",
  "extra": {
    "model": "gpt-4",
    "tokens": 487
  }
}

解析後的答案：

current_answer = """# RAG 核心流程詳解

RAG（Retrieval-Augmented Generation）結合了信息檢索與生成模型的優勢，核心流程如下：

## 1. 文檔預處理與索引
- 將知識庫文檔分塊（chunking）
- 使用 Embedding 模型（如 OpenAI embeddings）將文檔向量化
- 存儲到向量數據庫（如 FAISS、Pinecone）

## 2. 用户查詢處理
- 用户提出問題
- 將問題向量化（使用相同的 embedding 模型）
- 通過餘弦相似度或其他距離度量在向量庫中檢索 Top-K 相關文檔

## 3. 上下文組裝與生成
- 將檢索到的文檔片段與用户問題組合成 prompt
- 輸入到 LLM（如 GPT-4）生成答案
- LLM 基於檢索到的上下文生成準確回答

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import OpenAI

# 1. 構建向量庫
embeddings = OpenAIEmbeddings()
docs = ["文檔1內容", "文檔2內容", "文檔3內容"]
vectorstore = FAISS.from_texts(docs, embeddings)

# 2. 檢索相關文檔
query = "用户的問題"
relevant_docs = vectorstore.similarity_search(query, k=3)

# 3. 組裝 prompt 並生成
context = "\\n".join([doc.page_content for doc in relevant_docs])
prompt = f"基於以下上下文回答問題：\\n{context}\\n\\n問題：{query}"

llm = OpenAI()
answer = llm(prompt)
print(answer)

這個示例展示了 RAG 的完整流程：向量化存儲、相似度檢索、上下文注入生成。

2.2 Reflexion Agent 調用（再次評估）

完整請求內容：

{
  "callee": "reflexion_agent",
  "caller": "reflexion_flow",
  "arguments": {
    "query": "Please evaluate the quality of the following answer:\n\nOriginal Question: 解釋 RAG 的核心流程，並給出一個簡短示例\n\nAnswer: # RAG 核心流程詳解\n\nRAG（Retrieval-Augmented Generation）結合了信息檢索與生成模型的優勢，核心流程如下：\n\n## 1. 文檔預處理與索引\n- 將知識庫文檔分塊（chunking）\n- 使用 Embedding 模型（如 OpenAI embeddings）將文檔向量化\n- 存儲到向量數據庫（如 FAISS、Pinecone）\n\n## 2. 用户查詢處理\n- 用户提出問題\n- 將問題向量化（使用相同的 embedding 模型）\n- 通過餘弦相似度或其他距離度量在向量庫中檢索 Top-K 相關文檔\n\n## 3. 上下文組裝與生成\n- 將檢索到的文檔片段與用户問題組合成 prompt\n- 輸入到 LLM（如 GPT-4）生成答案\n- LLM 基於檢索到的上下文生成準確回答\n\n## 簡短代碼示例\n\n```python\nfrom langchain.embeddings import OpenAIEmbeddings\nfrom langchain.vectorstores import FAISS\nfrom langchain.llms import OpenAI\n\n# 1. 構建向量庫\nembeddings = OpenAIEmbeddings()\ndocs = [\"文檔1內容\", \"文檔2內容\", \"文檔3內容\"]\nvectorstore = FAISS.from_texts(docs, embeddings)\n\n# 2. 檢索相關文檔\nquery = \"用户的問題\"\nrelevant_docs = vectorstore.similarity_search(query, k=3)\n\n# 3. 組裝 prompt 並生成\ncontext = \"\\n\".join([doc.page_content for doc in relevant_docs])\nprompt = f\"基於以下上下文回答問題：\\n{context}\\n\\n問題：{query}\"\n\nllm = OpenAI()\nanswer = llm(prompt)\nprint(answer)\n```\n\n這個示例展示了 RAG 的完整流程：向量化存儲、相似度檢索、上下文注入生成。\n\nPlease evaluate based on these criteria:\n1. Accuracy: Is the information correct and factual?\n2. Completeness: Does it fully address the user's question?\n3. Clarity: Is it well-structured and easy to understand?\n4. Relevance: Does it stay focused on the user's needs?\n5. Helpfulness: Does it provide practical value to the user?\n\nReturn your evaluation in the following format:\n{\n  \"is_satisfactory\": boolean,\n  \"evaluation_reason\": \"string\",\n  \"improvement_suggestions\": \"string\"\n}\n\nYour response must be valid JSON matching the schema above."
  },
  "trace_id": "trace_abc123",
  "node_id": "node_reflexion_2"
}

完整響應內容：

{
  "state": "COMPLETED",
  "output": "{\n  \"is_satisfactory\": true,\n  \"evaluation_reason\": \"答案全面覆蓋了 RAG 的核心流程，包括文檔預處理、向量化、檢索和生成四個關鍵步驟。技術細節充分（embedding、向量數據庫、相似度計算），並提供了一個清晰完整的代碼示例。結構清晰，使用 Markdown 格式易於閲讀。完全滿足用户需求。\",\n  \"improvement_suggestions\": \"\"\n}",
  "extra": {
    "model": "gpt-4",
    "tokens": 178
  }
}

解析後的評估結果：

evaluation = ReflectionEvaluation(
    is_satisfactory=True,
    evaluation_reason="答案全面覆蓋了 RAG 的核心流程，包括文檔預處理、向量化、檢索和生成四個關鍵步驟。技術細節充分（embedding、向量數據庫、相似度計算），並提供了一個清晰完整的代碼示例。結構清晰，使用 Markdown 格式易於閲讀。完全滿足用户需求。",
    improvement_suggestions=""
)

判斷結果：is_satisfactory = True，答案滿意，流程結束！

2.3 返回最終結果

代碼邏輯：

if evaluation.is_satisfactory:
    logger.info(f"Answer satisfactory after {current_round + 1} rounds")
    return OxyResponse(
        state=OxyState.COMPLETED,
        output=f"Final answer optimized through {current_round + 1} rounds of reflexion:\n\n{current_answer}",
        extra={
            "reflexion_rounds": current_round + 1,
            "final_evaluation": evaluation.dict(),
        },
    )

完整返回內容：

{
  "state": "COMPLETED",
  "output": "Final answer optimized through 2 rounds of reflexion:\n\n# RAG 核心流程詳解\n\nRAG（Retrieval-Augmented Generation）結合了信息檢索與生成模型的優勢，核心流程如下：\n\n## 1. 文檔預處理與索引\n- 將知識庫文檔分塊（chunking）\n- 使用 Embedding 模型（如 OpenAI embeddings）將文檔向量化\n- 存儲到向量數據庫（如 FAISS、Pinecone）\n\n## 2. 用户查詢處理\n- 用户提出問題\n- 將問題向量化（使用相同的 embedding 模型）\n- 通過餘弦相似度或其他距離度量在向量庫中檢索 Top-K 相關文檔\n\n## 3. 上下文組裝與生成\n- 將檢索到的文檔片段與用户問題組合成 prompt\n- 輸入到 LLM（如 GPT-4）生成答案\n- LLM 基於檢索到的上下文生成準確回答\n\n## 簡短代碼示例\n\n```python\nfrom langchain.embeddings import OpenAIEmbeddings\nfrom langchain.vectorstores import FAISS\nfrom langchain.llms import OpenAI\n\n# 1. 構建向量庫\nembeddings = OpenAIEmbeddings()\ndocs = [\"文檔1內容\", \"文檔2內容\", \"文檔3內容\"]\nvectorstore = FAISS.from_texts(docs, embeddings)\n\n# 2. 檢索相關文檔\nquery = \"用户的問題\"\nrelevant_docs = vectorstore.similarity_search(query, k=3)\n\n# 3. 組裝 prompt 並生成\ncontext = \"\\n\".join([doc.page_content for doc in relevant_docs])\nprompt = f\"基於以下上下文回答問題：\\n{context}\\n\\n問題：{query}\"\n\nllm = OpenAI()\nanswer = llm(prompt)\nprint(answer)\n```\n\n這個示例展示了 RAG 的完整流程：向量化存儲、相似度檢索、上下文注入生成。",
  "extra": {
    "reflexion_rounds": 2,
    "final_evaluation": {
      "is_satisfactory": true,
      "evaluation_reason": "答案全面覆蓋了 RAG 的核心流程，包括文檔預處理、向量化、檢索和生成四個關鍵步驟。技術細節充分（embedding、向量數據庫、相似度計算），並提供了一個清晰完整的代碼示例。結構清晰，使用 Markdown 格式易於閲讀。完全滿足用户需求。",
      "improvement_suggestions": ""
    }
  }
}

2. 特殊情況：達到最大輪次仍不滿意

假設經過 3 輪迭代後，is_satisfactory 仍然是 False，則會觸發備用 LLM 收尾機制。

2.1 備用 LLM 調用

代碼邏輯：

# 構造最終彙總 prompt
final_query = f"""
Original user question: {original_query}

Latest answer attempt: {current_answer}

Latest evaluation feedback: {evaluation.evaluation_reason}

Please provide the best possible final answer considering all the feedback above.
"""

final_messages = [
    Message.system_message(
        "You are tasked with providing the best possible answer based on previous attempts and feedback."
    ),
    Message.user_message(final_query),
]

final_response = await oxy_request.call(
    callee=self.llm_model,  # "gpt-4"
    arguments={"messages": [msg.to_dict() for msg in final_messages]},
)

完整請求內容：

{
  "callee": "gpt-4",
  "caller": "reflexion_flow",
  "arguments": {
    "messages": [
      {
        "role": "system",
        "content": "You are tasked with providing the best possible answer based on previous attempts and feedback."
      },
      {
        "role": "user",
        "content": "\nOriginal user question: 解釋 RAG 的核心流程，並給出一個簡短示例\n\nLatest answer attempt: [第3輪的答案內容]\n\nLatest evaluation feedback: 答案雖然詳細，但代碼示例過於依賴特定框架，缺乏通用性。\n\nPlease provide the best possible final answer considering all the feedback above.\n"
      }
    ]
  },
  "trace_id": "trace_abc123",
  "node_id": "node_finalizer"
}

完整響應內容：

{
  "state": "COMPLETED",
  "output": "# RAG（檢索增強生成）完整指南\n\n## 核心流程\n\nRAG 是一種將外部知識檢索與大語言模型生成能力結合的技術架構：\n\n### 1. 離線準備階段\n- **文檔分割**：將知識庫切分為適當大小的文本塊\n- **向量化**：使用 embedding 模型轉換為向量表示\n- **索引構建**：存儲到向量數據庫便於快速檢索\n\n### 2. 在線查詢階段\n- **查詢向量化**：將用户問題轉為向量\n- **相似度檢索**：找出最相關的 Top-K 文檔片段\n- **上下文增強**：將檢索結果與問題組合\n- **LLM 生成**：基於增強上下文生成準確答案\n\n## 通用實現示例（不依賴特定框架）\n\n```python\nimport numpy as np\nfrom openai import OpenAI\n\nclient = OpenAI()\n\n# 1. 構建知識庫向量索引\ndef build_index(documents):\n    embeddings = []\n    for doc in documents:\n        response = client.embeddings.create(\n            input=doc,\n            model=\"text-embedding-ada-002\"\n        )\n        embeddings.append(response.data[0].embedding)\n    return np.array(embeddings), documents\n\n# 2. 檢索相關文檔\ndef retrieve(query, embeddings, documents, top_k=3):\n    query_response = client.embeddings.create(\n        input=query,\n        model=\"text-embedding-ada-002\"\n    )\n    query_vec = np.array(query_response.data[0].embedding)\n    \n    # 計算餘弦相似度\n    similarities = np.dot(embeddings, query_vec)\n    top_indices = np.argsort(similarities)[-top_k:][::-1]\n    \n    return [documents[i] for i in top_indices]\n\n# 3. RAG 生成\ndef rag_generate(query, knowledge_base):\n    embeddings, documents = build_index(knowledge_base)\n    relevant_docs = retrieve(query, embeddings, documents)\n    \n    context = \"\\n---\\n\".join(relevant_docs)\n    prompt = f\"\"\"根據以下參考資料回答問題：\n\n{context}\n\n問題：{query}\n\n請基於上述資料提供準確答案。\"\"\"\n    \n    response = client.chat.completions.create(\n        model=\"gpt-4\",\n        messages=[\n            {\"role\": \"system\", \"content\": \"你是一個基於提供資料回答問題的助手。\"},\n            {\"role\": \"user\", \"content\": prompt}\n        ]\n    )\n    \n    return response.choices[0].message.content\n\n# 使用示例\nknowledge = [\n    \"Python 是一種高級編程語言\",\n    \"RAG 結合了檢索和生成技術\",\n    \"向量數據庫用於高效相似度搜索\"\n]\n\nanswer = rag_generate(\"什麼是 RAG？\", knowledge)\nprint(answer)\n```\n\n這個示例使用原生 API 實現，展示了 RAG 的完整數據流：文檔向量化 → 查詢檢索 → 上下文注入 → 生成答案。",
  "extra": {
    "model": "gpt-4",
    "tokens": 612
  }
}

最終返回：

{
  "state": "COMPLETED",
  "output": "Answer after 4 rounds of reflexion attempts:\n\n[備用 LLM 生成的最終答案]",
  "extra": {
    "reflexion_rounds": 4,
    "final_evaluation": {
      "is_satisfactory": false,
      "evaluation_reason": "答案雖然詳細，但代碼示例過於依賴特定框架，缺乏通用性。",
      "improvement_suggestions": "使用更通用的實現方式"
    },
    "reached_max_rounds": true
  }
}

3. 關鍵技術點總結

3.1. Worker Agent 的作用

負責生成答案草稿
接收 query 參數（第一輪是原始問題，後續輪是改進 query）
輸出可讀的答案文本
可通過 func_parse_worker_response 自定義解析邏輯

3.2. Reflexion Agent 的作用

負責評估答案質量
接收填充好的 evaluation_template（包含問題和答案）
輸出結構化的 ReflectionEvaluation 對象
關鍵字段：
- is_satisfactory：布爾值，決定是否繼續迭代
- evaluation_reason：評估理由
- improvement_suggestions：改進建議（用於下一輪）

3.3. 模板系統

evaluation_template：

用途：評估答案質量
變量：{query}, {answer}
輸出：指導 reflexion agent 返回結構化評估

improvement_template：

用途：構造改進 query
變量：{original_query}, {improvement_suggestions}, {previous_answer}
輸出：讓 worker agent 理解需要改進的方向

3.4. 解析器系統

Pydantic Parser：

自動在 prompt 中添加 JSON schema 説明
自動解析 LLM 返回的 JSON 字符串
提供類型驗證和錯誤處理

自定義解析器：

func_parse_worker_response：處理 worker 返回（如提取特定格式）
func_parse_reflexion_response：處理 reflexion 返回（如文本格式評估）

3.5. 控制流邏輯

for round in range(max_rounds + 1):
    # 1. Worker 生成答案
    answer = worker_agent(current_query)
    
    # 2. Reflexion 評估答案
    evaluation = reflexion_agent(evaluation_template.format(query, answer))
    
    # 3. 判斷是否滿意
    if evaluation.is_satisfactory:
        return answer  # 提前結束
    
    # 4. 構造改進 query
    if round < max_rounds:
        current_query = improvement_template.format(...)
    else:
        # 達到上限，調用備用 LLM
        return finalizer_llm(original_query, answer, evaluation)

3.6. 數據流轉

用户問題
   ↓
[Round 1]
   ↓ current_query (原始問題)
Worker Agent → answer_1
   ↓
Reflexion Agent → evaluation_1 (不滿意)
   ↓ 生成 improvement_query_1
[Round 2]
   ↓ current_query (包含改進建議)
Worker Agent → answer_2
   ↓
Reflexion Agent → evaluation_2 (滿意)
   ↓
返回最終答案

4. 可觀測性建議

為了更好地理解和調試 Reflexion 流程，建議記錄以下內容：

# 每一輪的完整追蹤
{
  "round": 2,
  "worker": {
    "request": {"query": "..."},
    "response": {"output": "..."},
    "parsed_answer": "..."
  },
  "reflexion": {
    "request": {"query": "...評估模板..."},
    "response": {"output": "{\"is_satisfactory\": true, ...}"},
    "parsed_evaluation": {
      "is_satisfactory": true,
      "evaluation_reason": "...",
      "improvement_suggestions": ""
    }
  },
  "decision": "COMPLETED"
}

這樣可以清晰地看到每一輪的輸入輸出，便於分析哪個環節出現問題。

5. 小結總結

Reflexion 的核心價值在於：

結構化迭代：不是簡單重試，而是基於評估反饋進行有針對性的改進
可控收斂：通過 max_reflexion_rounds 限制成本和延遲
可擴展性：通過模板、解析器、自定義 agent 適配不同場景
可觀測性：每一步的輸入輸出都可被追蹤和記錄

通過理解完整的 LLM 交互過程，你可以：

優化模板提示詞以提高評估準確性
調整解析器以支持不同的輸出格式
定製 worker/reflexion agent 以適配特定領域
監控和分析每輪迭代的效果

博客 / 詳情