使用Hugging Face模型與Spring AI和Ollama

1. 概述

人工智能正在改變我們構建 Web 應用程序的方式。 Hugging Face 是一個流行的平台，它提供了一個龐大的開源和預訓練的 LLM 集合。

我們可以使用 Ollama，一個開源工具，在本地機器上運行 LLM。它支持從 Hugging Face 運行 GGUF 格式的模型。

在本教程中，我們將探索如何使用 Hugging Face 模型與 Spring AI 和 Ollama 結合使用。我們將使用一個聊天完成模型構建一個簡單的聊天機器人，並使用嵌入模型實現語義搜索。

2. 依賴項

讓我們首先在項目的 pom.xml文件中添加必要的依賴項：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    <version>1.0.0-M6</version>
</dependency>

Ollama starter 依賴項有助於我們與 Ollama 服務建立連接。我們將使用它來拉取和運行我們的聊天補全和嵌入模型。

由於當前版本 1.0.0-M5 是里程碑版本，因此我們也需要將 Spring Milestones 倉庫添加到我們的 pom.xml 中：

<repositories>
    <repository>
        <id>spring-milestones</id>
        <name>Spring Milestones</name>
        <url>https://repo.spring.io/milestone</url>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </repository>
</repositories>

此倉庫用於發佈里程碑版本，與標準 Maven Central 倉庫不同。

3. 使用 Testcontainers 設置 Ollama

為了方便本地開發和測試，我們將使用 Testcontainers 設置 Ollama 服務。

3.1. 測試依賴

首先，讓我們為我們的 pom.xml 添加必要的測試依賴：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-spring-boot-testcontainers</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>ollama</artifactId>
    <scope>test</scope>
</dependency>

我們導入 Spring Boot 的 Spring AI Testcontainers 依賴項，以及 Testcontainers 中的 Ollama 模塊。

3.2. 定義 Testcontainers Bean

接下來，讓我們創建一個 @TestConfiguration 類，用於定義我們的 Testcontainers Bean：

@TestConfiguration(proxyBeanMethods = false)
class TestcontainersConfiguration {
    @Bean
    public OllamaContainer ollamaContainer() {
        return new OllamaContainer("ollama/ollama:0.5.4");
    }

    @Bean
    public DynamicPropertyRegistrar dynamicPropertyRegistrar(OllamaContainer ollamaContainer) {
        return registry -> {
            registry.add("spring.ai.ollama.base-url", ollamaContainer::getEndpoint);
        };
    }
}

我們在創建 OllamaContainer Bean 時，指定 Ollama 鏡像的最新穩定版本。

然後，我們定義一個 DynamicPropertyRegistrar Bean，用於配置 Ollama 服務的 base-url。這使得我們的應用程序能夠連接到啓動的 Ollama 容器。

3.3. 在開發期間使用 Testcontainers

雖然 Testcontainers 主要用於集成測試，但我們也可以在本地開發期間使用它。

要實現這一點，我們將創建一個單獨的主類，位於我們的 <em src/test/java</em> 目錄下：

public class TestApplication {
    public static void main(String[] args) {
        SpringApplication.from(Application::main)
          .with(TestcontainersConfiguration.class)
          .run(args);
    }
}

我們創建了一個 TestApplication 類，並在其 main() 方法中啓動我們的主 Application 類，並使用 TestcontainersConfiguration 類。

這個設置幫助我們運行 Spring Boot 應用程序，並使其連接到通過 Testcontainers 啓動的 Ollama 服務。

4. 使用聊天完成模型

現在我們已經設置好本地 Ollama 容器，讓我們使用聊天完成模型來構建一個簡單的聊天機器人。

4.1. 配置聊天模型和聊天機器人 Bean

讓我們首先在我們的 application.yaml 文件中配置一個聊天完成模型：

spring:
  ai:
    ollama:
      init:
        pull-model-strategy: when_missing
      chat:
        options:
          model: hf.co/microsoft/Phi-3-mini-4k-instruct-gguf

要配置 Hugging Face 模型，我們使用 hf.co/{username}/{repository} 的格式。這裏，我們指定了 Microsoft 提供的 Phi-3-mini-4k-instruct GGUF 版本的模型。

使用該模型對我們的實現並非嚴格要求。 我們的建議是本地設置代碼庫並嘗試使用更多聊天完成模型。

此外，我們將 pull-model-strategy 設置為 when_missing。這樣可以確保 Spring AI 在本地不可用時拉取指定的模型。

當配置有效的模型時，Spring AI 會自動創建一個類型為 ChatModel 的 Bean， 從而允許我們與聊天完成模型進行交互。

讓我們使用它來定義我們聊天機器人所需的其他 Bean：

@Configuration
class ChatbotConfiguration {
    @Bean
    public ChatMemory chatMemory() {
        return new InMemoryChatMemory();
    }

    @Bean
    public ChatClient chatClient(ChatModel chatModel, ChatMemory chatMemory) {
        return ChatClient
          .builder(chatModel)
          .defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory))
          .build();
    }
}

首先，我們定義一個 ChatMemory Bean，並使用 InMemoryChatMemory 實現。這通過將聊天曆史存儲在內存中來維護對話上下文。

接下來，使用 ChatMemory 和 ChatModel Bean，我們創建了一個類型為 ChatClient 的 Bean，它是我們與聊天完成模型交互的主要入口點。

4.2. 實現聊天機器人

有了我們已配置的設置，讓我們創建一個 ChatbotService 類。我們將注入我們之前定義的 ChatClient Bean，以便與我們的模型進行交互。

但首先，讓我們定義兩個簡單的記錄來表示聊天請求和響應：

record ChatRequest(@Nullable UUID chatId, String question) {}

record ChatResponse(UUID chatId, String answer) {}

ChatRequest 包含用户的問題以及可選的 chatId，用於標識持續進行的對話。

同樣，ChatResponse 包含 chatId 和聊天機器人提供的答案。

現在，讓我們來實現預期的功能。

public ChatResponse chat(ChatRequest chatRequest) {
    UUID chatId = Optional
      .ofNullable(chatRequest.chatId())
      .orElse(UUID.randomUUID());
    String answer = chatClient
      .prompt()
      .user(chatRequest.question())
      .advisors(advisorSpec ->
          advisorSpec
            .param("chat_memory_conversation_id", chatId))
      .call()
      .content();
    return new ChatResponse(chatId, answer);
}

如果傳入的請求中不包含 chatId, 我們會生成一個新的 chatId。 這允許用户啓動新的對話或繼續之前的對話。

我們將用户的 question 傳遞給 chatClient Bean，並將 chat_memory_conversation_id 參數設置為已解析的 chatId, 以保持對話歷史。

最後，我們返回聊天機器人的 answer 及其 chatId。

4.3. 與我們的聊天機器人交互

現在我們已經實現了服務層，讓我們在其之上暴露一個 REST API：

@PostMapping("/chat")
public ResponseEntity<ChatResponse> chat(@RequestBody ChatRequest chatRequest) {
    ChatResponse chatResponse = chatbotService.chat(chatRequest);
    return ResponseEntity.ok(chatResponse);
}

我們將使用上述 API 端點與我們的聊天機器人進行交互。

讓我們使用 HTTPie CLI 開始一個新的對話：

http POST :8080/chat question="Who wanted to kill Harry Potter?"

我們向聊天機器人發送一個簡單的問題，看看我們能得到什麼樣的回覆：

{
    "chatId": "7b8a36c7-2126-4b80-ac8b-f9eedebff28a",
    "answer": "Lord Voldemort, also known as Tom Riddle, wanted to kill Harry Potter because of a prophecy that foretold a boy born at the end of July would have the power to defeat him."
}

響應包含一個唯一的 chatId 以及聊天機器人的 answer 對我們提出的 question。

讓我們通過使用上述響應中的 chatId 發送後續 question 來繼續這段對話：

http POST :8080/chat chatId="7b8a36c7-2126-4b80-ac8b-f9eedebff28a" question="Who should he have gone after instead?"

讓我們看看聊天機器人是否能夠保持我們對話的上下文並提供相關的回覆：

{
    "chatId": "7b8a36c7-2126-4b80-ac8b-f9eedebff28a",
    "answer": "Based on the prophecy's criteria, Voldemort could have targeted Neville Longbottom instead, as he was also born at the end of July to parents who had defied Voldemort three times."
}

如我們所見，聊天機器人確實維護了對話上下文，因為它引用了我們在上一條消息中討論的預言。

chatId 保持不變，表明後續答案是同一條對話的延續。

5. 使用嵌入模型

從聊天完成模型進階，我們將使用嵌入模型來在小型引言數據集上實現語義搜索。

我們將從外部 API 獲取引言，存儲在內存中的向量存儲中，並執行語義搜索。

5.1. 從外部 API 獲取報價記錄

為了演示目的，我們將使用 QuoteSlate API 來獲取報價。

讓我們為這個目的創建一個 QuoteFetcher實用類：

class QuoteFetcher {
    private static final String BASE_URL = "https://quoteslate.vercel.app";
    private static final String API_PATH = "/api/quotes/random";
    private static final int DEFAULT_COUNT = 50;

    public static List<Quote> fetch() {
        return RestClient
          .create(BASE_URL)
          .get()
          .uri(uriBuilder ->
              uriBuilder
                .path(API_PATH)
                .queryParam("count", DEFAULT_COUNT)
                .build())
          .retrieve()
          .body(new ParameterizedTypeReference<>() {});
    }
}

record Quote(String quote, String author) {}

使用 RestClient，我們調用 QuoteSlate API，採用默認計數 50，並使用 ParameterizedTypeReference 將 API 響應反序列化為 Quote 記錄列表。

5.2. 配置和填充內存向量存儲

現在，讓我們在我們的 application.yaml 中配置一個嵌入模型：

spring:
  ai:
    ollama:
      embedding:
        options:
          model: hf.co/nomic-ai/nomic-embed-text-v1.5-GGUF

我們使用 nomic-embed-text-v1.5 模型，該模型由 nomic-ai 提供，該模型為 GGUF 版本。再次提醒，您可以嘗試使用不同的嵌入模型，例如通過此實現。

指定一個有效的模型後，Spring AI 會自動為我們創建一個類型為 EmbeddingModel 的 Bean。

讓我們使用它來創建一個向量存儲 Bean：

@Bean
public VectorStore vectorStore(EmbeddingModel embeddingModel) {
    return SimpleVectorStore
      .builder(embeddingModel)
      .build();
}

為了演示，我們創建一個 SimpleVectorStore 類的 Bean。 它是一個基於內存的實現，通過使用 java.util.Map 類來模擬向量存儲。

現在，為了在應用程序啓動時使用引用填充我們的向量存儲，我們將創建一個實現 ApplicationRunner 接口的 VectorStoreInitializer 類：

@Component
class VectorStoreInitializer implements ApplicationRunner {
    private final VectorStore vectorStore;

    // standard constructor

    @Override
    public void run(ApplicationArguments args) {
        List<Document> documents = QuoteFetcher
          .fetch()
          .stream()
          .map(quote -> {
              Map<String, Object> metadata = Map.of("author", quote.author());
              return new Document(quote.quote(), metadata);
          })
          .toList();
        vectorStore.add(documents);
    }
}

在我們的 VectorStoreInitializer 中，我們自動注入一個 VectorStore 的實例。

在 run() 方法內部，我們使用我們的 QuoteFetcher 工具類來檢索一組 Quote 記錄。然後，我們將每個 quote 映射到 Document，並將 author 字段配置為 metadata。

最後，我們將所有 documents 存儲在我們的向量存儲中。 當我們調用 add() 方法時，Spring AI 會自動將我們的純文本內容轉換為向量表示形式，然後再將其存儲在我們的向量存儲中。我們無需顯式地使用 EmbeddingModel bean 進行轉換。

5.3. 語義搜索測試

有了向量存儲填充完畢，讓我們驗證我們的語義搜索功能：

private static final int MAX_RESULTS = 3;

@ParameterizedTest
@ValueSource(strings = {"Motivation", "Happiness"})
void whenSearchingQuotesByTheme_thenRelevantQuotesReturned(String theme) {
    SearchRequest searchRequest = SearchRequest
      .builder()
      .query(theme)
      .topK(MAX_RESULTS)
      .build();
    List<Document> documents = vectorStore.similaritySearch(searchRequest);

    assertThat(documents)
      .hasSizeBetween(1, MAX_RESULTS)
      .allSatisfy(document -> {
          String title = String.valueOf(document.getMetadata().get("author"));
          assertThat(title)
            .isNotBlank();
      });
}

在這裏，我們使用 @ValueSource 將一些常見引用主題傳遞到我們的測試方法中。然後，我們創建一個 SearchRequest 對象，將主題作為查詢，並將 MAX_RESULTS 設置為所需的最多結果數量。

接下來，我們調用 vectorStore 豆中的 similaritySearch() 方法，並傳入 searchRequest。類似於 VectorStore 的 add() 方法，Spring AI 在查詢向量存儲之前，會將我們的查詢轉換為向量表示。

返回的文檔將包含與給定主題語義相關的引用，即使它們不包含確切的關鍵詞。

6. 結論

在本文中，我們探討了使用Hugging Face模型與Spring AI的結合。

藉助Testcontainers，我們搭建了Ollama服務，創建了一個本地測試環境。

首先，我們使用聊天完成模型構建了一個簡單的聊天機器人。然後，我們使用嵌入模型實現了語義搜索。

知識庫 / Spring / Spring AI RSS 訂閱