本教程将带你从 0 到 1 实现一个完整的 AI 流式问答应用,整合以下组件:
- Spring Boot + Spring AI 构建服务框架
- MCP(Model Connector Plugin) 用于统一管理本地/云端大模型
- DeepSeek-R1-7B 国产高性能大模型(OpenAI API 兼容)
- SSE(Server-Sent Events) 实现前后端实时流式响应
- Ollama(可选) 更便捷地部署 DeepSeek-R1-7B 模型并提供 OpenAI 接口支持
模型部署方式推荐:Ollama 运行 DeepSeek-R1-7B
安装 Ollama
访问:https://ollama.com
复制# macOS / Linux 安装 curl-fsSL https://ollama.com/install.sh |sh # Windows:安装官方 MSI 安装包
拉取模型(以 DeepSeek 为例)
复制ollama pull deepseek:chat
也可以加载其它模型,如 llama3, qwen:chat, yi:34b, phi3, mistral 等。
启动 Ollama
复制ollama run deepseek:chat
Ollama 会自动监听 OpenAI 风格接口(http://localhost:11434/v1/chat/completions),兼容 stream: true。
Spring Boot 接入 SSE 流式输出服务
添加依赖(pom.xml)
复制<dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-webflux</artifactId> </dependency> </dependencies>
WebClient 配置类
复制@Configuration public class WebClientConfig { @Bean public WebClient webClient() { return WebClient.builder() .baseUrl("http://localhost:11434/v1") // Ollama API 地址 .defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE) .build(); } }
请求体结构封装
复制@Data @NoArgsConstructor @AllArgsConstructor public class ChatCompletionRequest { private String model; private List<Map<String, String>> messages; private boolean stream = true; private double temperature = 0.7; }
DeepSeek-R1-7B 接口封装(支持 stream: true)
复制@Service public class DeepSeekStreamingService { @Autowired private WebClient webClient; public void streamChat(String userPrompt, SseEmitter emitter) { ChatCompletionRequest request = new ChatCompletionRequest(); request.setModel("deepseek:chat"); request.setStream(true); request.setMessages(List.of( Map.of("role", "user", "content", userPrompt) )); webClient.post() .uri("/chat/completions") .body(BodyInserters.fromValue(request)) .accept(MediaType.TEXT_EVENT_STREAM) .retrieve() .bodyToFlux(String.class) .doOnNext(chunk -> { try { if (chunk.contains("[DONE]")) { emitter.send(SseEmitter.event().data("[DONE]")); emitter.complete(); } else if (chunk.startsWith("data:")) { String json = chunk.replaceFirst("data: *", ""); String token = parseTokenFromJson(json); emitter.send(SseEmitter.event().data(token)); } } catch (Exception e) { emitter.completeWithError(e); } }) .doOnError(emitter::completeWithError) .subscribe(); } private String parseTokenFromJson(String json) { try { ObjectMapper mapper = new ObjectMapper(); JsonNode node = mapper.readTree(json); return node.path("choices").get(0).path("delta").path("content").asText(""); } catch (Exception e) { return ""; } } }
控制器对外暴露 SSE 接口
复制@RestController @RequestMapping("/api/ai") public class ChatSseController { @Autowired private DeepSeekStreamingService streamingService; @GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE) public SseEmitter stream(@RequestParam("prompt") String prompt) { SseEmitter emitter = new SseEmitter(0L); // 永不超时 streamingService.streamChat(prompt, emitter); return emitter; } }
前端 JS 接入 SSE 实现流式展示
复制<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>AI 流式问答</title> </head> <body> <input id="prompt" placeholder="请输入你的问题..."> <button onclick="startStream()">提问</button> <div id="result"></div> <script> function startStream() { const prompt = document.getElementById('prompt').value; const eventSource = new EventSource(`/api/ai/stream?prompt=${encodeURIComponent(prompt)}`); document.getElementById('result').innerHTML = ''; eventSource.onmessage = function (event) { if (event.data === '[DONE]') { eventSource.close(); } else { document.getElementById('result').innerHTML += event.data; } }; } </script> </body> </html>
总结
通过以上步骤,我们成功实现了:
- Ollama 部署并运行 DeepSeek-R1-7B 本地大模型
- Spring Boot 封装 OpenAI 接口 stream: true
- 实现后端 SSE 推流 + 前端实时 Token 渲染
- 支持国产开源模型的类 ChatGPT 对话功能