Files
meme/tts/COSYVOICE.md
konjacpotato 6772699cfe
Some checks failed
Gitea Actions Demo / deploy (push) Failing after 2s
commit code
2025-12-29 19:34:39 +08:00

231 lines
5.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## CosyVoice 引擎集成指南
本文档说明如何在项目中使用 CosyVoice 引擎进行语音合成。
### 前置条件
1. 已部署本地 CosyVoice API 服务
2. API 地址:`http://192.168.1.200:8000/tts/zero_shot`
3. 确保依赖已安装:`httpx`
### 快速开始
#### 方式 1: 使用工厂模式创建引擎
```python
import asyncio
from tts.factory import TTSEngineFactory
async def main():
# 创建 CosyVoice 引擎实例
engine = TTSEngineFactory.create("cosyvoice")
# 合成语音
text = "你好,这是 CosyVoice 合成的语音。"
audio = await engine.synthesize(
text=text,
voice="your_speaker_id" # 替换为实际的 speaker ID
)
# 保存音频
with open("output.wav", "wb") as f:
f.write(audio.getvalue())
asyncio.run(main())
```
#### 方式 2: 直接使用 CosyVoice 引擎
```python
import asyncio
from tts.cosyvoice_engine import CosyVoiceEngine
async def main():
# 创建引擎实例,可以自定义 API 地址和超时时间
engine = CosyVoiceEngine(
api_url="http://192.168.1.200:8000/tts/zero_shot",
timeout=30.0
)
try:
# 合成语音
text = "你好,这是测试文本。"
audio = await engine.synthesize(
text=text,
voice="female_standard_speaker"
)
# 保存或处理音频
with open("output.wav", "wb") as f:
f.write(audio.getvalue())
finally:
# 关闭连接
await engine.close()
asyncio.run(main())
```
### API 参数说明
#### 合成接口 (`synthesize`)
**必需参数:**
- `text` (str): 要合成的文本
- `voice` (str): 发音人 ID (`zero_shot_spk_id`)
**可选参数:**
- `language` (str): 语言代码,默认 "zh-CN"
- `rate` (float): 语速,默认 1.0(暂不支持)
- `pitch` (float): 音调,默认 1.0(暂不支持)
**返回值:**
- `BytesIO`: 包含音频数据的字节流对象
**异常:**
- `ValueError`: 如果 `voice` 参数为空,或 API 返回错误
- `httpx.RequestError`: 网络连接错误
### CosyVoice API 请求示例
```bash
curl -X POST "http://192.168.1.200:8000/tts/zero_shot" \
-H "Content-Type: application/json" \
-d {
"text": "你好,世界",
"zero_shot_spk_id": "female_standard_speaker"
}
```
### 配置 CosyVoice
如果需要修改 API 地址或超时时间,可以:
1. **环境变量配置** (推荐)
```python
import os
from tts.cosyvoice_engine import CosyVoiceEngine
api_url = os.getenv("COSYVOICE_API_URL", "http://192.168.1.200:8000/tts/zero_shot")
timeout = float(os.getenv("COSYVOICE_TIMEOUT", "30"))
engine = CosyVoiceEngine(api_url=api_url, timeout=timeout)
```
2. **配置文件方式** (参考 `config/app.py`)
```python
from tts.cosyvoice_engine import CosyVoiceEngine
class CosyVoiceConfig:
API_URL = "http://192.168.1.200:8000/tts/zero_shot"
TIMEOUT = 30.0
engine = CosyVoiceEngine(**CosyVoiceConfig().__dict__)
```
### FastAPI 集成示例
在 API 路由中使用 CosyVoice
```python
from fastapi import APIRouter, HTTPException
from tts.factory import TTSEngineFactory
router = APIRouter(prefix="/api/v1/tts", tags=["tts"])
@router.post("/cosyvoice/synthesize")
async def synthesize_with_cosyvoice(text: str, speaker_id: str):
"""
使用 CosyVoice 合成语音
Args:
text: 要合成的文本
speaker_id: 发音人 ID
Returns:
音频文件内容
"""
try:
engine = TTSEngineFactory.create("cosyvoice")
audio = await engine.synthesize(text=text, voice=speaker_id)
return {
"status": "success",
"audio_size": len(audio.getvalue()),
"content_type": "audio/wav"
}
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
raise HTTPException(status_code=500, detail="TTS synthesis failed")
```
### 发音人 ID 参考
常见的发音人 ID 示例(需根据实际部署调整):
- `female_standard_speaker`: 女性标准发音
- `female_gentle_speaker`: 女性温柔发音
- `male_standard_speaker`: 男性标准发音
- `male_gentle_speaker`: 男性温柔发音
具体的发音人 ID 应该根据您部署的 CosyVoice 服务配置。
### 故障排查
#### 问题 1: "Failed to connect to CosyVoice API"
**原因:**
- CosyVoice 服务未运行
- API 地址配置错误
- 网络连接问题
**解决方案:**
```bash
# 检查服务是否运行
curl http://192.168.1.200:8000/tts/zero_shot -X POST -d "{\"text\":\"test\",\"zero_shot_spk_id\":\"test\"}"
# 检查网络连接
ping 192.168.1.200
```
#### 问题 2: "voice (zero_shot_spk_id) is required for CosyVoice"
**原因:** 没有提供 `voice` 参数
**解决方案:** 确保调用 `synthesize()` 时提供了 `voice` 参数
```python
audio = await engine.synthesize(
text="测试",
voice="valid_speaker_id" # 提供有效的发音人 ID
)
```
#### 问题 3: HTTP 错误 (400, 500 等)
**原因:** API 响应错误
**解决方案:**
- 检查文本格式是否正确
- 验证 speaker_id 是否有效
- 查看 CosyVoice 服务日志获取详细错误信息
### 性能优化
1. **连接重用**:使用工厂模式创建引擎实例可以重用 HTTP 连接
2. **超时配置**:根据网络情况调整 timeout 参数
3. **异步处理**:使用异步接口避免阻塞
### 相关文件
- `tts/cosyvoice_engine.py`: CosyVoice 引擎实现
- `tts/factory.py`: TTS 引擎工厂类
- `tts/base.py`: TTSEngine 抽象基类
- `tts/examples.py`: 使用示例代码
### 更多信息
- [TTS 架构文档](../docs/TTS_ARCHITECTURE.md)
- [TTS 实现指南](../docs/TTS_IMPLEMENTATION_SUMMARY.md)