转载

Facebook开源VoiceLoop,一种在多个扬声器合成语音的方法

 

Facebook开源VoiceLoop,一种在多个扬声器合成语音的方法

PyTorch通过语音循环实现了野外演讲者的语音合成中描述的方法。

Facebook开源VoiceLoop,一种在多个扬声器合成语音的方法

VoiceLoop是一种神经文本到语音(TTS),能够在野外采样的语音中将文本转换为语音。 一些演示样品可以在这里找到

快速链接

快速开始

按照安装程序中的说明,然后简单地执行:

python generate.py  --npz data/vctk/numpy_features_valid/p318_212.npz --spkr 13 --checkpoint models/vctk/bestmodel.pth

结果将放在models / vctk / results中。 它将生成2个样本:

You can also generate the same text but with a different speaker, specifically:

python generate.py  --npz data/vctk/numpy_features_valid/p318_212.npz --spkr 18 --checkpoint models/vctk/bestmodel.pth

Which will generate the following sample.

Here is the corresponding attention plot:

安装

Requirements: Linux/OSX, Python2.7 and PyTorch 0.1.12. The current version of the code requires CUDA support for training. Generation can be done on the CPU.

git clone https://github.com/facebookresearch/loop.git
cd loop
pip install -r scripts/requirements.txt

Data

用于训练本文中模型的数据可以通过以下方式下载:

bash scripts/download_data.sh

The script downloads and preprocesses a subset of VCTK. This subset contains speakers with american accent.

The dataset was preprocessed using Merlin - from each audio clip we extracted vocoder features using the WORLD vocoder. After downloading, the dataset will be located under subfolder data as follows:

loop
├── data
    └── vctk
        ├── norm_info
        │   ├── norm.dat
        ├── numpy_feautres
        │   ├── p294_001.npz
        │   ├── p294_002.npz
        │   └── ...
        └── numpy_features_valid

The preprocess pipeline can be executed using the following script by Kyle Kastner:https://gist.github.com/kastnerkyle/cc0ac48d34860c5bb3f9112f4d9a0300.

预训模型

Pretrainde models can be downloaded via:

bash scripts/download_models.sh

After downloading, the models will be located under subfolder models as follows:

loop
├── data
├── models
    ├── vctk
    │   ├── args.pth
    │   └── bestmodel.pth
    └── vctk_alt

SPTK and WORLD

Finally, speech generation requires SPTK3.9 and WORLD vocoder as done in Merlin. To download the executables:

bash scripts/download_tools.sh

Which results the following sub directories:

loop
├── data
├── models
├── tools
    ├── SPTK-3.9
    └── WORLD

训练

在vctk上训练一个新的模型,首先使用4的噪声级别和100的输入序列长度训练模型:

python train.py --expName vctk --data data/vctk --noise 4 --seq-len 100 --epochs 90

然后,继续训练模型使用2的噪声水平,完整序列:

python train.py --expName vctk_noise_2 --data data/vctk --checkpoint checkpoints/vctk/bestmodel.pth --noise 2 --seq-len 1000 --epochs 90

引文

如果您发现这段代码在您的研究中有用,请引用:

@article{taigman2017voice,
  title           = {Voice Synthesis for in-the-Wild Speakers via a Phonological Loop},
  author          = {Taigman, Yaniv and Wolf, Lior and Polyak, Adam and Nachmani, Eliya},
  journal         = {ArXiv e-prints},
  archivePrefix   = "arXiv",
  eprinttype      = {arxiv},
  eprint          = {1705.03122},
  primaryClass    = "cs.CL",
  year            = {2017}
  month           = July,
}

许可

Loop has a CC-BY-NC license.

 

代码地址:https://github.com/facebookresearch/loop

论文地址:https://arxiv.org/abs/1707.06588

 

扩展阅读

分享海量 iOS 及 Mac 开源项目和学习资料
2016AI巨头开源IP盘点 50个最常用的深度学习库
FEX 技术周刊 - 2015/12/14
重磅|谷歌发表Nature封面论文破解围棋难题,Facebook却说「是我们先做到的」
开源的黄金时代已经来临

为您推荐

你可能不知道的 30 个 Python 语言的特点技巧
10个最佳的HTML5代码段,以简化您的开发任务
Web移动开发框架 jQuery Mobile 1.0 RC2 发布
5个真正有效的CSS Boilerplates(样板)和框架
20个你应该知道的有用HTML5代码段

更多

Facebook
语音
开源软件
正文到此结束
Loading...