Any pre-trained TensorFlow models on speech/voice data?

Hi All,

I have been looking for TensorFlow models pre-trained on speech
data, preferably in js/python. That I can use to extract embeddings
for streaming/recorded audio up to 1 min long.

I intend to use the embeddings as an input to my machine
learning pipeline.

So far, I have found only this:

https://github.com/tensorflow/tfjs-models/tree/master/speech-commands

This is trained to classify 20 voice commands. So, I feel the
embeddings from this model may not have sufficient discriminative
power to identify, let’s say – phonemes, 1000 words each from
English, French and a few other popular languages.

I am not worried about embedding->word mapping. At the
current stage, I am happy to use the embeddings to evaluate
similarity score of two different sound samples. E.g. I am not
worried about resolving confusion between – ‘red’ and ‘read(past
tense)’. In fact – ‘I read a red book’ ‘Eye red a read buk’ should
result to 95+% match.

Any hints/redirection are also greatly appreciated. Perhaps
there are simpler ways to achieve the same.

submitted by /u/akshayxyz

[visit reddit]
[comments]

Leave a Reply Cancel reply