Fish Speech

Fish Speech TTS 服务部署说明。

部署tts推理

git clone https://github.com/fishaudio/fish-speech.git

1. 安装环境

# 创建一个 python 3.10 虚拟环境, 你也可以用 virtualenv
conda create -n fish-speech python=3.10
conda activate fish-speech

# 安装 pytorch
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1

# (Ubuntu / Debian 用户) 安装 sox + ffmpeg
apt install libsox-dev ffmpeg

# (Ubuntu / Debian 用户) 安装 pyaudio
apt install build-essential \
    cmake \
    libasound-dev \
    portaudio19-dev \
    libportaudio2 \
    libportaudiocpp0
    
# 安装 fish-speech
pip3 install -e .[stable]

2. 下载模型文件

在fish-speech项目下执行 huggingface-cli download fishaudio/fish-speech-1.5 –local-dir checkpoints/fish-speech-1.5

3. 建立克隆声音目录

在fish-speech项目下新建/references/test文件夹，将音频文件放到该目录下，并建一个同名的lab文件，将字幕放到该文件里。

4. 启动api服务

python -m tools.api_server \
    --listen 0.0.0.0:8080 \
    --llama-checkpoint-path "checkpoints/fish-speech-1.5" \
    --decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
    --decoder-config-name firefly_gan_vq \
    --compile \
    --half

5. 接口说明

5.1 Text-to-Speech

endpoint: /v1/tts

POST:

{
  "text": "string",
  "chunk_length": 200,
  "format": "wav",
  "references": [],
  "reference_id": null,
  "seed": null,
  "use_memory_cache": "off",
  "normalize": true,
  "streaming": false,
  "max_new_tokens": 1024,
  "top_p": 0.7,
  "repetition_penalty": 1.2,
  "temperature": 0.7
}