Home >Backend Development >Python Tutorial >How to use python to implement speech recognition function under Linux

How to use python to implement speech recognition function under Linux

WBOY
WBOYforward
2023-05-11 08:04:051593browse

Introduction to how speech recognition works

Speech recognition originated from research done at Bell Labs in the early 1950s. Early speech recognition systems could only recognize a single speaker and a vocabulary of only about a dozen words. Modern speech recognition systems have come a long way to recognize multiple speakers and have large vocabularies that recognize multiple languages.
The first part of speech recognition is of course speech. Through the microphone, speech is converted from physical sound into electrical signals, and then into data through an analog-to-digital converter. Once digitized, several models can be applied to transcribe audio to text.
Most modern speech recognition systems rely on Hidden Markov Models (HMM). Its working principle is: the speech signal can be approximated as a stationary process on a very short time scale (such as 10 milliseconds), that is, a process whose statistical characteristics do not change with time.
Many modern speech recognition systems use neural networks before HMM recognition to simplify the speech signal through feature transformation and dimensionality reduction techniques. Voice activity detectors (VAD) can also be used to reduce the audio signal to parts that may only contain speech.
Fortunately for Python users, some speech recognition services are available online through APIs, and most of them also provide Python SDKs.

Choose the right python speech recognition package

There are some ready-made speech recognition packages in PyPI. These include:
apiai
google-cloud-speech
pocketsphinx
SpeechRcognition
watson-developer-cloud
wit
Some software packages (such as wit and apiai) provide some Built-in capabilities beyond basic speech recognition, such as natural language processing to identify speaker intent. Other software packages, such as Google Cloud Speech, focus on speech-to-text conversion.
Among them, SpeechRecognition stands out for its ease of use.
Recognizing speech requires audio input, and retrieving the audio input in SpeechRecognition is very simple. It does not require building a script to access the microphone and process the audio file from scratch. It only takes a few minutes to automatically complete the retrieval. and run.

Installing SpeechRecognition

SpeechRecognition is compatible with Python2.6, 2.7 and 3.3, but some additional installation steps are required if used in Python 2. You can use the pip command to install SpeechRecognition from the terminal: pip3 install SpeechRecognition

After the installation is complete, you can open the interpreter window to verify the installation:

How to use python to implement speech recognition function under Linux

Note: Do not close this session, you will use it in the next few steps.
If you are dealing with existing audio files, just call SpeechRecognition directly, paying attention to some dependencies of the specific use case. Also note, install the PyAudio package to get microphone input

Recognizer class

The core of SpeechRecognition is the recognizer class.
The main purpose of the Recognizer API is to recognize speech. Each API has a variety of settings and functions to recognize the speech of the audio source. Here I choose recognize_sphinx(): CMU Sphinx - requires installing PocketSphinx (supports offline Speech recognition)
Then we need to install PocketSphinx through the pip command. During the installation process, a large series of errors in red fonts are also prone to occur.

Use of audio files

Download related audio files and save them to a specific directory (save directly to the ubuntu desktop)
Note:
AudioFile class can be performed through the path of the audio file Initializes and provides a context manager interface for reading and processing file contents.
SpeechRecognition currently supports the following file types:

  • WAV: must be in PCM/LPCM format

  • AIFF

  • AIFF-CFLAC: must be the initial FLAC format; OGG-FLAC format is not available

English speech recognition

After completing the above basic work , you can perform English speech recognition.
(1) Open the terminal
(2) Enter the directory where the voice test file is located (the blogger’s is the desktop)
(3) Open the python interpreter
(4) Enter the relevant commands as shown below

How to use python to implement speech recognition function under Linux

Finally, you can see the speech-to-text content (this they’ll smell...). In fact, the effect is very good! Because it's in English and there's no noise.

The impact of noise on speech recognition

Noise does exist in the real world, all recordings have some degree of noise, and unprocessed noise can destroy the accuracy of speech recognition applications sex.
By trying to transcribe the effect is not good, we can try calling the adjust_for_ambient_noise() command of the Recognizer class.

Use of microphone

To use SpeechRecognizer to access the microphone, you must install the PyAudio package.
If you are using Debian-based Linux (such as Ubuntu), you can use apt to install PyAudio: sudo apt-get install python-pyaudio python3-pyaudioYou may still need to enable pip3 install pyaudio after the installation is complete. , especially when running virtually.
After installing pyaudio, you can use python to generate voice input and generate related files.
Notes on the use of pocketsphinx:
Supported file formats: wav
Decoding requirements for audio files: 16KHZ, mono
Use python to implement recording and generate related files. The program code is as follows:

from pyaudio import PyAudio, paInt16
import numpy as np
import wave
class recoder:
     NUM_SAMPLES = 2000   
     SAMPLING_RATE = 16000  
     LEVEL = 500     
     COUNT_NUM = 20   
     SAVE_LENGTH = 8     
     Voice_String = []
     def savewav(self,filename):
         wf = wave.open(filename, 'wb')
         wf.setnchannels(1)
         wf.setsampwidth(2)
         wf.setframerate(self.SAMPLING_RATE)
         wf.writeframes(np.array(self.Voice_String).tostring())
         wf.close()
     def recoder(self):
         pa = PyAudio()
         stream = pa.open(format=paInt16, channels=1, rate=self.SAMPLING_RATE, input=True,frames_per_buffer=self.NUM_SAMPLES)
         save_count = 0
         save_buffer = []
         while True:
            string_audio_data = stream.read(self.NUM_SAMPLES)
            audio_data = np.fromstring(string_audio_data, dtype=np.short)
            large_sample_count = np.sum(audio_data > self.LEVEL)
            print(np.max(audio_data))
            if large_sample_count > self.COUNT_NUM:
                save_count = self.SAVE_LENGTH
            else:
                save_count -= 1
            if save_count < 0:
                save_count = 0
            if save_count > 0:
                save_buffer.append(string_audio_data )
            else:
                if len(save_buffer) > 0:
                    self.Voice_String = save_buffer
                    save_buffer = []
                    print("Recode a piece of voice successfully!")
                    return True
		 else:
                    return False
if __name__ == "__main__":
    r = recoder()
    r.recoder()
    r.savewav("test.wav")

Note: Be sure to pay attention to spaces when implementing using the python interpreter! ! !
The final generated file is in the directory where the Python interpreter session is located. You can play it through play to test it. If play is not installed, you can install it through the apt command.

Chinese speech recognition

After completing the previous work, we have a certain understanding of the speech recognition process, but as a Chinese, we have to do a Chinese speech recognition. !

We need to download the corresponding Mandarin admission and language model from the CMU Sphinx speech recognition toolkit.

How to use python to implement speech recognition function under Linux

The words marked in the picture are Mandarin! Download the relevant speech recognition toolkit.

But we need to convert zh_broadcastnews_64000_utf8.DMP into language-model.lm.bin, and then unzip zh_broadcastnews_16k_ptm256_8000.tar.bz2 to get the zh_broadcastnews_ptm256_8000 folder.
Learn from the blogger’s method just now and find the speech_recognition folder under Ubuntu. There may be many friends who cannot find the relevant folders, but they are actually under hidden files. You can click on the three bars in the upper right corner of the folder. As shown in the picture below:

How to use python to implement speech recognition function under Linux

Then check Show hidden files, as shown in the picture below:

How to use python to implement speech recognition function under Linux

Then You can find it by following the following directories:

How to use python to implement speech recognition function under Linux

## and then rename the original

en-US to en-US-bak , create a new folder en-US, change the extracted zh_broadcastnews_ptm256_8000 to acoustic-model, and change chinese.lm.bin to language -model.lm.bin, change the suffix of pronounciation-dictionary.dic to dict, and copy these three files to en-US. At the same time, copy LICENSE.txt in the original en-US file directory to the current folder. Finally, there are the following files in this folder:

How to use python to implement speech recognition function under Linux

Then we can record a voice file ("test.wav") through the microphone

In this Open the python interpreter in the file directory and enter the following content:

How to use python to implement speech recognition function under Linux

and you will see the output, but I am talking about two Chinas, and I also tested other discovery and recognition effects. very bad! ! !


Small-scale Chinese recognition

The effect provided by the official one is so poor that it is almost unusable! Then I thought of an optimization method after reading many articles, but it is only suitable for small-scale recognition! Some commands and the like should be fine, but chatting and the like may not work so well.

Find the 4 folders you just copied. There is a folder of pronounciation-dictionary.dict. After opening it, the following content is found:

How to use python to implement speech recognition function under Linux

It feels like this content is similar to There is a big gap between the words used in a dictionary and the words used in daily communication. Then we can just change it to the words we are used to! With the idea of ​​giving it a try, the result is really good. The recognition effect is really good!

My approach is:
(1) Keep the content above the red mark in the picture, and delete the content below the red mark. Of course, for insurance reasons, it is recommended that you back up this file!
(2) Enter the content you want to identify below the red line! (Input according to the rules, different from pinyin!!!) The situation of new pneumonia has been getting better recently. The most heard sentence is "Come on, China". So today's content is to convert "Come on, China" into text! I hope school can start soon, hahahaha.

How to use python to implement speech recognition function under Linux

3) Enter the following:

How to use python to implement speech recognition function under Linux

Speech synthesis

My personal understanding of speech synthesis is text-to-speech. However, in this sentence you can set client = AipSpeech(APP_ID, API_KEY, SECRET_KEY) result = client.synthesis('Hello Baidu', 'zh', 1, { 'vol': 5,'spd': 3 ,'pit':9,'per': 3})Volume, tone, speed, male/female/loli/carefree.

The above is the detailed content of How to use python to implement speech recognition function under Linux. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete