本教程将带你从 0 到 1 实现一个完整的 AI 流式问答应用,整合以下组件:
- Spring Boot + Spring AI 构建服务框架
- MCP(Model Connector Plugin) 用于统一管理本地/云端大模型
- DeepSeek-R1-7B 国产高性能大模型(OpenAI API 兼容)
- SSE(Server-Sent Events) 实现前后端实时流式响应
- Ollama(可选) 更便捷地部署 DeepSeek-R1-7B 模型并提供 OpenAI 接口支持
模型部署方式推荐:Ollama 运行 DeepSeek-R1-7B
安装 Ollama
访问:https://ollama.com
复制# macOS / Linux 安装 curl-fsSL https://ollama.com/install.sh |sh # Windows:安装官方 MSI 安装包
拉取模型(以 DeepSeek 为例)
复制ollama pull deepseek:chat
也可以加载其它模型,如 llama3, qwen:chat, yi:34b, phi3, mistral 等。
启动 Ollama
复制ollama run deepseek:chat
Ollama 会自动监听 OpenAI 风格接口(http://localhost:11434/v1/chat/completions),兼容 stream: true。
Spring Boot 接入 SSE 流式输出服务
添加依赖(pom.xml)
复制<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
</dependencies>WebClient 配置类
复制@Configuration
public class WebClientConfig {
@Bean
public WebClient webClient() {
return WebClient.builder()
.baseUrl("http://localhost:11434/v1") // Ollama API 地址
.defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
.build();
}
}请求体结构封装
复制@Data
@NoArgsConstructor
@AllArgsConstructor
public class ChatCompletionRequest {
private String model;
private List<Map<String, String>> messages;
private boolean stream = true;
private double temperature = 0.7;
}DeepSeek-R1-7B 接口封装(支持 stream: true)
复制@Service
public class DeepSeekStreamingService {
@Autowired
private WebClient webClient;
public void streamChat(String userPrompt, SseEmitter emitter) {
ChatCompletionRequest request = new ChatCompletionRequest();
request.setModel("deepseek:chat");
request.setStream(true);
request.setMessages(List.of(
Map.of("role", "user", "content", userPrompt)
));
webClient.post()
.uri("/chat/completions")
.body(BodyInserters.fromValue(request))
.accept(MediaType.TEXT_EVENT_STREAM)
.retrieve()
.bodyToFlux(String.class)
.doOnNext(chunk -> {
try {
if (chunk.contains("[DONE]")) {
emitter.send(SseEmitter.event().data("[DONE]"));
emitter.complete();
} else if (chunk.startsWith("data:")) {
String json = chunk.replaceFirst("data: *", "");
String token = parseTokenFromJson(json);
emitter.send(SseEmitter.event().data(token));
}
} catch (Exception e) {
emitter.completeWithError(e);
}
})
.doOnError(emitter::completeWithError)
.subscribe();
}
private String parseTokenFromJson(String json) {
try {
ObjectMapper mapper = new ObjectMapper();
JsonNode node = mapper.readTree(json);
return node.path("choices").get(0).path("delta").path("content").asText("");
} catch (Exception e) {
return "";
}
}
}控制器对外暴露 SSE 接口
复制@RestController
@RequestMapping("/api/ai")
public class ChatSseController {
@Autowired
private DeepSeekStreamingService streamingService;
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public SseEmitter stream(@RequestParam("prompt") String prompt) {
SseEmitter emitter = new SseEmitter(0L); // 永不超时
streamingService.streamChat(prompt, emitter);
return emitter;
}
}前端 JS 接入 SSE 实现流式展示
复制<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>AI 流式问答</title>
</head>
<body>
<input id="prompt" placeholder="请输入你的问题...">
<button onclick="startStream()">提问</button>
<div id="result"></div>
<script>
function startStream() {
const prompt = document.getElementById('prompt').value;
const eventSource = new EventSource(`/api/ai/stream?prompt=${encodeURIComponent(prompt)}`);
document.getElementById('result').innerHTML = '';
eventSource.onmessage = function (event) {
if (event.data === '[DONE]') {
eventSource.close();
} else {
document.getElementById('result').innerHTML += event.data;
}
};
}
</script>
</body>
</html>总结
通过以上步骤,我们成功实现了:
- Ollama 部署并运行 DeepSeek-R1-7B 本地大模型
- Spring Boot 封装 OpenAI 接口 stream: true
- 实现后端 SSE 推流 + 前端实时 Token 渲染
- 支持国产开源模型的类 ChatGPT 对话功能