Last active
February 14, 2021 18:04
-
-
Save vunb/7132619 to your computer and use it in GitHub Desktop.
Tập hợp các link tham khảo CMU Sphinx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Create background noise profile from mp3 | |
/usr/bin/sox noise.mp3 -n noiseprof noise.prof | |
# Remove noise from mp3 using profile | |
/usr/bin/sox input.mp3 output.mp3 noisered noise.prof 0.21 | |
# Remove silence from mp3 | |
/usr/bin/sox input.mp3 output.mp3 silence -l 1 0.3 5% -1 2.0 5% | |
# Remove noise and silence in a single command | |
/usr/bin/sox input.mp3 output.mp3 noisered noise.prof 0.21 silence -l 1 0.3 5% -1 2.0 5% | |
# Batch process files | |
/usr/bin/find . -type f -name "*.mp3" -mmin +30 -exec sox -S --multi-threaded -buffer 131072 {} /path/to/output/{} noisered noise.prof 0.21 silence -l 1 0.3 5% -1 2.0 5% \; | |
# Remove insignificant files | |
/usr/bin/find . -type f -name "*.mp3" -mmin +30 -size -500k -delete |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CMU Sphinx | |
http://cmusphinx.sourceforge.net/wiki/tutorialam | |
http://www.speech.cs.cmu.edu/sphinxman/scriptman1.html | |
PocketSphinx | |
http://ghatage.com/2012/12/voice-to-text-in-linux-using-pocketsphinx/ | |
http://ghatage.com/2012/12/make-pocketsphinx-recognize-new-words/ | |
Languague model Adaptation: | |
http://pwnetics.wordpress.com/2011/07/01/sphinx-4-language-model-adaptation/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Convert wav sang định dạng chuẩn vào của sphinx: | |
Input File : 'resampled.wav' | |
Channels : 1 | |
Sample Rate : 16000 | |
Precision : 16-bit | |
Duration : 00:00:02.62 = 41878 samples ~ 196.303 CDDA sectors | |
Sample Encoding: 16-bit Signed Integer PCM | |
2. Lệnh chuyển đổi 1 file: | |
Run: sox [input.wav] -r 16k -e signed -b 16 -c 1 [output.wav] | |
Short: sox [input.wav] -r 16k [output.wav] | |
Before: | |
[vi@Manlab wav]$ file khong8k.wav | |
KHOONG0010.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz | |
[vi@Manlab wav]$ soxi khong8k.wav | |
Input File : 'khong8k.wav' | |
Channels : 1 | |
Sample Rate : 8000 | |
Precision : 16-bit | |
Duration : 00:00:02.62 = 20939 samples ~ 196.303 CDDA sectors | |
Sample Encoding: 16-bit Signed Integer PCM | |
Full command in-process: | |
[vi@Manlab wav]$ sox khong8k.wav -r 16k -e signed -b 16 -c 1 khong16k.wav | |
For short with the input above: | |
[vi@Manlab wav]$ sox khong8k.wav -r 16k khong16k.wav | |
After: | |
[vi@Manlab wav]$ file khong16k.wav | |
KHONG16k.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz | |
[vi@Manlab wav]$ soxi khong16k.wav | |
Input File : 'khong16k.wav' | |
Channels : 1 | |
Sample Rate : 16000 | |
Precision : 16-bit | |
Duration : 00:00:02.62 = 41878 samples ~ 196.303 CDDA sectors | |
Sample Encoding: 16-bit Signed Integer PCM | |
2. Shell batch: | |
[vi@Manlab wav]$ for i in test/* ; do echo $i ; done; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for i in huanluyen_diadiem* ; do mv $i ${i:10} ; done; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Lỗi không mở được thiết bị thu âm khi sử dụng pocketsphinx_continuous: | |
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25 | |
INFO: continuous.c(367): pocketsphinx_continuous COMPILED ON: Apr 3 2012, AT: 17:50:38 | |
ad_oss.c(103): Failed to open audio device(/dev/dsp): No such file or directory | |
FATAL_ERROR: "continuous.c", line 246: Failed to open audio device | |
Solutions: | |
1. Install alsa development package and recompile sphinxbase | |
Run: yum install alsa-* | |
2. If still get the message error: ad_oss.c(103): Failed to open audio device(/dev/dsp): No such file or directory | |
Then run: "modprobe snd_pcm_oss" as root | |
3. If still get another message error: ad_oss.c(99): Audio device(/dev/dsp) busy | |
Then turn off all of applications are recording and using audio device |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
noiseprof [profile-file] | |
Calculate a profile of the audio for use in noise reduction. See the description of the noisered effect for details. | |
noisered [profile-file [amount]] | |
Reduce noise in the audio signal by profiling and filtering. This effect is moderately effective at removing consistent background noise such as hiss or hum. To use it, first run SoX with the noiseprof effect on a section of audio that ideally would contain silence but in fact contains noise - such sections are typically found at the beginning or the end of a recording. noiseprof will write out a noise profile to profile-file, or to stdout if no profile-file or if ‘−’ is given. E.g. | |
sox speech.wav −n trim 0 1.5 noiseprof speech.noise-profile | |
To actually remove the noise, run SoX again, this time with the noisered effect; noisered will reduce noise according to a noise profile (which was generated by noiseprof), from profile-file, or from stdin if no profile-file or if ‘−’ is given. E.g. | |
sox speech.wav cleaned.wav noisered speech.noise-profile 0.3 | |
How much noise should be removed is specified by amount-a number between 0 and 1 with a default of 0.5. Higher numbers will remove more noise but present a greater likelihood of removing wanted components of the audio signal. Before replacing an original recording with a noise-reduced version, experiment with different amount values to find the optimal one for your audio; use headphones to check that you are happy with the results, paying particular attention to quieter sections of the audio. | |
On most systems, the two stages - profiling and reduction - can be combined using a pipe, e.g. | |
sox noisy.wav −n trim 0 1 noiseprof | play noisy.wav noisered |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
usage="Help, usage: sphinx4Normalize -i /path/to/audio/input/ -o /path/to/audio/output/ [-t wav|mp3]"; | |
# lay so luong tham so thong qua bien $# | |
if [ $# -eq 0 ] | |
then | |
echo $usage; | |
exit 128; | |
fi | |
# Duyet danh sach tham so, su dung bien: $@ | |
intput="" | |
output="" | |
fileout="" | |
type=wav | |
while [ "$1" != "" ]; do | |
case $1 in | |
-i |-di| --input ) shift | |
input=$1 | |
;; | |
-o|-do| --output ) shift | |
output=$1 | |
;; | |
-t| --type ) shift | |
case $1 in | |
wav|mp3) | |
type=$1 | |
;; | |
esac | |
;; | |
-h | --help ) echo $usage | |
exit 0 | |
;; | |
* ) echo $usage | |
exit 1 | |
esac | |
shift | |
done | |
if [ $input = '' ] || [ $output = '' ] ; then | |
echo $usage | |
exit 128; | |
fi | |
for i in $input/*$type ; do | |
fileout="$output/`basename $i`"; | |
#echo $fileout; | |
echo "Processing $i"; | |
sox $i -r 16k -e signed -b 16 -c 1 $fileout | |
echo "Output: $fileout"; | |
done | |
echo "Complete!" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bài viết về Nhận dạng tiếng nói, khá ngắn gọn
http://web.science.mq.edu.au/~cassidy/comp449/html/comp449.html