博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
wav2letter-基于深度学习的语音识别
阅读量:7162 次
发布时间:2019-06-29

本文共 9527 字,大约阅读时间需要 31 分钟。

hot3.png

wav2letter

wav2letter is a simple and efficient end-to-end Automatic Speech Recognition (ASR) system from Facebook AI Research. The original authors of this implementation are Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve, Neil Zeghidour, and Vitaliy Liptchinsky.

wav2letter implements the architecture proposed in and .

If you want to get started transcribing speech right away, we provide for the dataset.

Papers

Our approach is detailed in two scientific contributions:

@article{collobert:2016,  author    = {Ronan Collobert and Christian Puhrsch and Gabriel Synnaeve},  title     = {Wav2Letter: an End-to-End ConvNet-based Speech Recognition System},  journal   = {CoRR},  volume    = {abs/1609.03193},  year      = {2016},  url       = {http://arxiv.org/abs/1609.03193},}

and

@article{liptchinsky:2017,  author    = {Vitaliy Liptchinsky and Gabriel Synnaeve and Ronan Collobert},  title     = {Letter-Based Speech Recognition with Gated ConvNets},  journal   = {CoRR},  volume    = {abs/1712.09444},  year      = {2017},  url       = {http://arxiv.org/abs/1712.09444},}

If you use wav2letter or related pre-trained models, then please cite one of these papers.

Requirements

  • A computer running MacOS or Linux.
  • . We detail in the following how to install it.
  • For training on CPU: .
  • For training on GPU: .
  • For reading of audio file: - should be available in any standard distribution.
  • For standard speech features: - should be available in any standard distribution.

Installation

MKL

If you plan to train on CPU, it is highly recommended to install .

Update your .bashrc file with the following:

# We assume Torch will be installed in $HOME/usr.# Change according to your needs.export PATH=$HOME/usr/bin:$PATH# This is to detect MKL during compilation# but also to make sure it is found at runtime.INTEL_DIR=/opt/intel/lib/intel64MKL_DIR=/opt/intel/mkl/lib/intel64MKL_INC_DIR=/opt/intel/mkl/includeif [ ! -d "$INTEL_DIR" ]; then    echo "$ warning: INTEL_DIR out of date"fiif [ ! -d "$MKL_DIR" ]; then    echo "$ warning: MKL_DIR out of date"fiif [ ! -d "$MKL_INC_DIR" ]; then    echo "$ warning: MKL_INC_DIR out of date"fi# Make sure MKL can be found by Torch.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$INTEL_DIR:$MKL_DIRexport CMAKE_LIBRARY_PATH=$LD_LIBRARY_PATHexport CMAKE_INCLUDE_PATH=$CMAKE_INCLUDE_PATH:$MKL_INC_DIR

LuaJIT + LuaRocks

The following installs luaJIT and luarocks locally in $HOME/usr. If you want a system-wide installation, remove the -DCMAKE_INSTALL_PREFIX=$HOME/usr option.

git clone https://github.com/torch/luajit-rocks.gitcd luajit-rocksmkdir build; cd buildcmake .. -DCMAKE_INSTALL_PREFIX=$HOME/usr -DWITH_LUAJIT21=OFFmake -j 4make installcd ../..

In the next sections, we assume luarocks and luajit are in $PATH. If they are not - and assuming you installed them locally in $HOME/usr - you can instead run ~/usr/bin/luarocks and ~/usr/bin/luajit.

If you plan to use the wav2letter decoder, you will need KenLM.

KenLM requires .

# make sure boost is installed (with system/thread/test modules)# actual command might vary depending on your systemsudo apt-get install libboost-dev libboost-system-dev libboost-thread-dev libboost-test-dev

Once boost is properly installed, you may install KenLM:

wget https://kheafield.com/code/kenlm.tar.gztar xfvz kenlm.tar.gzcd kenlmmkdir build && cd buildcmake .. -DCMAKE_INSTALL_PREFIX=$HOME/usr -DCMAKE_POSITION_INDEPENDENT_CODE=ONmake -j 4make installcp -a lib/* ~/usr/lib # libs are not installed by default :(cd ../..

and

If you plan to use multi-CPU/GPUs (and/or multi-machines), you will need OpenMPI and TorchMPI.

Disclaimer: it is highly encouraged to recompile OpenMPI yourself. OpenMPI binaries on standard distributions come with a lot of variance in the compilation flags. Certain flags are crucial to successfully compile and run TorchMPI.

First install OpenMPI:

wget https://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.2.tar.bz2tar xfj openmpi-2.1.2.tar.bz2cd openmpi-2.1.2; mkdir build; cd build./configure --prefix=$HOME/usr --enable-mpi-cxx --enable-shared --with-slurm --enable-mpi-thread-multiple --enable-mpi-ext=affinity,cuda --with-cuda=/public/apps/cuda/9.0make -j 20 allmake install

Note: works the same with openmpi-3.0.0.tar.bz2, but --enable-mpi-thread-multiple needs then to be removed.

You may now install TorchMPI:

MPI_CXX_COMPILER=$HOME/usr/bin/mpicxx ~/usr/bin/luarocks install torchmpi

Torch and other Torch packages

luarocks install torchluarocks install cudnn # for GPU supportluarocks install cunn # for GPU support

wav2letter packages

git clone https://github.com/facebookresearch/wav2letter.gitcd wav2lettercd gtn && luarocks make rocks/gtn-scm-1.rockspec && cd ..cd speech && luarocks make rocks/speech-scm-1.rockspec && cd ..cd torchnet-optim && luarocks make rocks/torchnet-optim-scm-1.rockspec && cd ..cd wav2letter && luarocks make rocks/wav2letter-scm-1.rockspec && cd ..# Assuming here you got KenLM in $HOME/kenlm# And only if you plan to use the decoder:cd beamer && KENLM_INC=$HOME/kenlm luarocks make rocks/beamer-scm-1.rockspec && cd ..

Training wav2letter models

Data pre-processing

The data folder contains a number of scripts for preprocessing various datasets. For now we provide only LibriSpeech and TIMIT.

Below is an example on how to preprocess LibriSpeech ASR corpus:

wget http://www.openslr.org/resources/12/dev-clean.tar.gztar xfvz dev-clean.tar.gz# repeat for train-clean-100, train-clean-360, train-other-500, dev-other, test-clean, test-otherluajit ~/wav2letter/data/librispeech/create.lua ~/LibriSpeech ~/librispeech-procluajit ~/wav2letter/data/utils/create-sz.lua librispeech-proc/train-clean-100 librispeech-proc/train-clean-360 librispeech-proc/train-other-500 librispeech-proc/dev-clean librispeech-proc/dev-other librispeech-proc/test-clean librispeech-proc/test-other

Training

mkdir experimentsluajit ~/wav2letter/train.lua --train -rundir ~/experiments -runname hello_librispeech -arch ~/wav2letter/arch/librispeech-glu-highdropout -lr 0.1 -lrcrit 0.0005 -gpu 1 -linseg 1 -linlr 0 -linlrcrit 0.005 -onorm target -nthread 6 -dictdir ~/librispeech-proc  -datadir ~/librispeech-proc -train train-clean-100+train-clean-360+train-other-500 -valid dev-clean+dev-other -test test-clean+test-other -gpu 1 -sqnorm -mfsc -melfloor 1 -surround "|" -replabel 2 -progress -wnorm -normclamp 0.2 -momentum 0.9 -weightdecay 1e-05

Training on multiple GPUs

Use OpenMPI to spawn multiple training processes, one per GPU:

mpirun -n 2 --bind-to none  ~/TorchMPI/scripts/wrap.sh luajit ~/wav2letter/train.lua --train -mpi -gpu 1 ...

We assume here mpirun is in $PATH.

Running the decoder (inference)

We need to do few pre-processing steps to run the decoder.

We first create a dictionary of letters, which includes the special repetition letters we use in wav2letter:

cat ~/librispeech-proc/letters.lst >> ~/librispeech-proc/letters-rep.lst && echo "1" >> ~/librispeech-proc/letters-rep.lst && echo "2" >> ~/librispeech-proc/letters-rep.lst

We then get a language model, and pre-process it. Here, we will use the , but one can also train its own with KenLM. We then pre-process it to transform words in low caps, and produce their letter transcriptions with the repetition letters in a particular dictionary dict.lst. The script might warn you about words which are incorrectly transcribed, due to insufficient number of repetitions letters (here 2, with -r 2). This is not a problem in our case, as these words are rare.

wget http://www.openslr.org/resources/11/3-gram.pruned.3e-7.arpa.gz luajit~/wav2letter/data/utils/convert-arpa.lua ~/3-gram.pruned.3e-7.arpa.gz ~/3-gram.pruned.3e-7.arpa ~/dict.lst -preprocess ~/wav2letter/data/librispeech/preprocess.lua -r 2 -letters letters-rep.lst

Note: one can use the pre-trained 4-gram language model 4-gram.arpa.gz instead; pre-processing will take longer.

Optional: subsequent loading of the language model can be made faster by converting it to a binary format with KenLM (we assume here KenLM is in your $PATH).

build_binary 3-gram.pruned.3e-7.arpa 3-gram.pruned.3e-7.bin

We can now generate emissions for a particular trained model, running test.lua on a dataset. The script also displays Letter Error Rate (LER) and Word Error Rate (WER) - the latter being computed with no post-processing of the acoustic model.

luajit ~/wav2letter/test.lua ~/experiments/hello_librispeech/001_model_dev-clean.bin -progress -show -test dev-clean -save

Once the emissions are stored, the decoder can be ran to compute the WER obtained by constraining the decoding with a particular language model:

luajit ~/wav2letter/decode.lua ~/experiments/hello_librispeech dev-clean -show -letters ~/librispeech-proc/letters-rep.lst  -words ~/dict.lst -lm ~/3-gram.pruned.3e-7.arpa -lmweight 3.1639 -beamsize 25000 -beamscore 40 -nthread 10 -smearing max -show

Pre-trained models

We provide a fully pre-trained model for LibriSpeech:

wget https://s3.amazonaws.com/wav2letter/models/librispeech-glu-highdropout.bin

To transcribe speech using this model, you need to follow the some of the , , and parts of this README.

NOTE: the model was pre-trained on Facebook infrastructure, so you need to run test.lua with slightly different parameters to use it:

luajit ~/wav2letter/test.lua ~/librispeech-glu-highdropout.bin -progress -show -test dev-clean -save -datadir ~/librispeech-proc/ -dictdir ~/librispeech-proc/ -gfsai

Join the wav2letter community

  • Facebook page:
  • Google group:
  • Contact: , ,

See the for how to help out.

转载于:https://my.oschina.net/u/2306127/blog/1600966

你可能感兴趣的文章