Spring AI 開放AI內容審核模型指南

1. 引言

我們使用 Spring AI 與 OpenAI 的 Moderation 模型共同檢測文本中的有害或敏感內容。 Moderation 模型對輸入進行分析，並標記諸如自殘、暴力、仇恨言論或性內容等類別。

在本教程中，我們將學習如何構建一個 Moderation 服務並將其與 Moderation 模型集成。

2. 依賴項

讓我們添加 spring-ai-starter-model-openai 依賴項：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

使用它，我們獲得了聊天客户端，包括內容審核模型請求。

3. 配置

接下來，我們將配置我們的 Spring AI 客户端：

spring:
  ai:
    openai:
      api-key: ${OPEN_AI_API_KEY}
      moderation:
        options:
          model: omni-moderation-latest

我們已經指定了 API 密鑰和內容審核模型名稱。現在，我們可以開始使用內容審核 API。

4. 審核類別

以下是我們可以使用的審核類別：

仇恨言論：我們可使用此類別來檢測基於受保護特徵的表達或推廣仇恨的內容。
仇恨/威脅：我們可使用此類別來檢測包含暴力或嚴重傷害威脅的仇恨內容。
騷擾：我們可能會遇到此類別，當語言對個人或羣體進行騷擾、欺凌或針對時。
騷擾/威脅：我們可能會遇到此類別，當騷擾包含明確的威脅或造成傷害的意圖時。
自殘：我們可使用此類別來識別推廣或描繪自殘行為的內容。
自殘/意圖：我們可使用此類別，當某人表達了自殘的意圖時。
自殘/指導：我們可能會遇到此類別，當內容提供自殘的指導、方法或鼓勵時。
色情：我們可使用此類別來標記明確的色情內容或性服務推廣。
色情/未成年人：我們可使用此類別來標記任何涉及未成年人的色情內容，此類內容嚴格禁止。
暴力：我們可能會遇到此類別，當內容描繪或描述死亡、暴力或身體傷害時。
暴力/生動：我們可使用此類別來檢測嚴重的傷害、死亡或嚴重危害的生動或圖樣化的描繪。
非法：我們可使用此類別來標記非法活動、指導或推廣。
非法/暴力：我們可能會遇到此類別，當非法內容包含暴力元素時。

5. 構建審核服務

現在，讓我們構建審核服務。在這個服務中，我們將消費用户輸入的消息，並使用審核模型驗證它們，以不同的類別進行驗證。

5.1. TextModerationService

讓我們首先構建 TextModerationService：

@Service public class TextModerationService { private final OpenAiModerationModel openAiModerationModel; @Autowired public TextModerationService(OpenAiModerationModel openAiModerationModel) { this.openAiModerationModel = openAiModerationModel; } public String moderate(String text) { ModerationPrompt moderationRequest = new ModerationPrompt(text); ModerationResponse response = openAiModerationModel.call(moderationRequest); Moderation output = response.getResult().getOutput(); return output.getResults().stream() .map(this::buildModerationResult) .collect(Collectors.joining("\n")); } }
在這裏，我們使用了 OpenAiModerationModel。我們向其中發送 ModerationPrompt，並傳入需要進行審核的文本，然後從模型的響應中構建結果。現在，讓我們創建 buildModerationResult() 方法：

private String buildModerationResult(ModerationResult moderationResult) { Categories categories = moderationResult.getCategories(); String violations = Stream.of( Map.entry("Sexual", categories.isSexual()), Map.entry("Hate", categories.isHate()), Map.entry("Harassment", categories.isHarassment()), Map.entry("Self-Harm", categories.isSelfHarm()), Map.entry("Sexual/Minors", categories.isSexualMinors()), Map.entry("Hate/Threatening", categories.isHateThreatening()), Map.entry("Violence/Graphic", categories.isViolenceGraphic()), Map.entry("Self-Harm/Intent", categories.isSelfHarmIntent()), Map.entry("Self-Harm/Instructions", categories.isSelfHarmInstructions()), Map.entry("Harassment/Threatening", categories.isHarassmentThreatening()), Map.entry("Violence", categories.isViolence())) .filter(entry -> Boolean.TRUE.equals(entry.getValue())) .map(Map.Entry::getKey) .collect(Collectors.joining(", ")); return violations.isEmpty() ? "No category violations detected." : "Violated categories: " + violations; }
我們獲得了審核結果類別，並創建了映射來為每個類別添加違規結果。如果未違反任何類別，我們僅構建默認文本響應。

5.2. <em >TextModerationController</em >>

在構建控制器之前，讓我們創建一個 <em >ModerateRequest</em >>，我們將使用它來發送文本進行審核：

public class ModerateRequest { private String text; //getters and setters }
現在，讓我們創建 TextModerationController：

@RestController public class TextModerationController { private final TextModerationService service; @Autowired public TextModerationController(TextModerationService service) { this.service = service; } @PostMapping("/moderate") public ResponseEntity<String> moderate(@RequestBody ModerateRequest request) { return ResponseEntity.ok(service.moderate(request.getText())); } }
在這裏，我們從 ModerateRequest 中獲取了文本，並將其發送到我們的 TextModerationService。

5.3. 測試行為

最後，讓我們測試我們的內容審核服務：

@AutoConfigureMockMvc @ExtendWith(SpringExtension.class) @EnableAutoConfiguration @SpringBootTest @ActiveProfiles("moderation") class ModerationApplicationLiveTest { @Autowired private MockMvc mockMvc; @Test void givenTextWithoutViolation_whenModerating_thenNoCategoryViolationsDetected() throws Exception { String moderationResponse = mockMvc.perform(post("/moderate") .contentType(MediaType.APPLICATION_JSON) .content("{\"text\": \"Please review me\"}")) .andExpect(status().isOk()) .andReturn() .getResponse() .getContentAsString(); assertThat(moderationResponse).contains("No category violations detected"); } }
我們發送了一條不違反任何分類的短信，並已確認服務端已驗證。現在，讓我們測試當某些分類被違反時，行為表現：

@Test void givenHarassingText_whenModerating_thenHarassmentCategoryShouldBeFlagged() throws Exception { String moderationResponse = mockMvc.perform(post("/moderate") .contentType(MediaType.APPLICATION_JSON) .content("{\"text\": \"You're really Bad Person! I don't like you!\"}")) .andExpect(status().isOk()) .andReturn() .getResponse() .getContentAsString(); assertThat(moderationResponse).contains("Violated categories: Harassment"); }
正如我們所見，騷擾類別按預期被違反。現在，讓我們檢查我們的服務是否可以處理多個違反情況：

@Test void givenTextViolatingMultipleCategories_whenModerating_thenAllCategoriesShouldBeFlagged() throws Exception { String moderationResponse = mockMvc.perform(post("/moderate") .contentType(MediaType.APPLICATION_JSON) .content("{\"text\": \"I hate you and I will hurt you!\"}")) .andExpect(status().isOk()) .andReturn() .getResponse() .getContentAsString(); assertThat(moderationResponse).contains("Violated categories: Harassment, Harassment/Threatening, Violence"); }
我們發送了一條包含多項違規行為的短信。正如我們所見，服務響應確認了三類違規行為。

6. 結論

本文回顧了 OpenAI 審核模型與 Spring AI 的集成。我們探討了審核類別並構建了一個用於審核傳入文本的服務。該服務可以成為更復雜系統的一部分，該系統與用户內容一起工作。例如，我們可以將其連接到聊天審核機器人，從而幫助我們控制文章下對話的質量。

知識庫 / Spring / Spring AI RSS 訂閱

1. 引言

2. 依賴項

3. 配置

4. 審核類別

5. 構建審核服務

5.1. TextModerationService

5.2. `<em >TextModerationController</em >>`

5.3. 測試行為

6. 結論

發佈評論

Product

Company

Support

Company

知識庫 / Spring / Spring AI RSS 訂閱