Home >Web Front-end >JS Tutorial >Make a Voice-Controlled Audio Player with the Web Speech API
Core points
/ Used to hide/show extra blocks / .sp_hiddenblock { margin: 2px; border: 1px solid rgb(250, 197, 82); border-radius: 3px; padding: 5px; background-color: rgba(250, 197, 82, 0.7); } .sp_hiddenblock.sp_hide { display: none !important; } This article was reviewed by Edwin Reynoso and Mark Brown. Thanks to all SitePoint peer reviewers for getting SitePoint content to its best!
Web Voice API is a JavaScript API that enables web developers to integrate speech recognition and synthesis capabilities into their web pages.
There are many reasons for this. For example, to enhance the experience of people with disabilities (especially users with visual impairment, or users with limited hand mobility), or to allow users to interact with web applications while performing other tasks, such as driving.
If you have never heard of the Web Voice API, or you want to get started quickly, it might be a good idea to read Aurelio De Rosa's articles Introduction to the Web Voice API, Voice Synthesis API, and Talking Forms idea.
Browser manufacturers have only recently begun to implement both the voice recognition API and the voice synthesis API. As you can see, support for these APIs is far from perfect, so if you are studying this tutorial, use the right browser.
In addition, the voice recognition API currently requires an internet connection because voice will be transmitted over the network and the result will be returned to the browser. If the connection uses HTTP, the user must allow the site to use their microphone every time the request is made. If the connection uses HTTPS, you only need to do this once.
library helps us manage complexity and ensures we stay forward compatible. For example, when another browser starts supporting the voice recognition API, we don't have to worry about adding vendor prefixes.
Annyang is such a library, which is very easy to use. Learn more.
To initialize Annyang, we add their scripts to our website:
<code class="language-javascript"></code>
We can check if the API is supported like this:
<code class="language-javascript">if (annyang) { /*逻辑*/ }</code>
and add a command using an object that uses the command name as the key and the callback as the method:
<code class="language-javascript">var commands = { 'show divs': function() { $('div').show(); }, 'show forms': function() { $("form").show(); } };</code>
Finally, we just add them and start voice recognition with the following command:
<code class="language-javascript">annyang.addCommands(commands); annyang.start();</code>
In this article, we will build a voice-controlled audio player. We will use both the Speech Synthesis API (used to tell the user which song is being played, or the command is not recognized) and the Speech Recognition API (converting voice commands to strings that will trigger different application logic).
The advantage of using the audio player with the Web Voice API is that users can browse other pages in the browser or minimize the browser and perform other actions while still being able to switch between songs. If we have many songs on our playlist, we can even request a specific song without manual search (if we know its name or singer, of course).
We will not rely on third-party libraries for speech recognition, as we want to show how to use the API without adding additional dependencies to the project. Voice-controlled audio players only support browsers that support the interimResults
attribute. The latest version of Chrome should be a safe choice.
As always, you can find the full code on GitHub, as well as a demo on CodePen.
Let's start with a static playlist. It consists of an object that contains different songs in an array. Each song is a new object containing the path to the file, the singer's name, and the name of the song:
<code class="language-javascript">var data = { "songs": [ { "fileName": "https://www.ruse-problem.org/songs/RunningWaters.mp3", "singer" : "Jason Shaw", "songName" : "Running Waters" }, ...</code>
We should be able to add new objects to the songs
array and automatically include new songs into our audio player.
Now let's look at the player itself. This will be an object containing the following:
This is relatively simple.
<code class="language-javascript">var audioPlayer = { audioData: { currentSong: -1, songs: [] },</code>The
currentSong
attribute refers to the index of the song the user is currently in. This is useful, for example, when we have to play the previous/next song or stop/pause song.
songs
Array contains all songs the user has listened to. This means that the next time the user listens to the same song, we can load it from the array without downloading it.
You can view the full code here.
UI will contain a list of available commands, a list of available tracks, and a context box to notify the user of the current action and previous commands. I won't go into detail about the UI method, but provide a brief overview. You can find the code for these methods here.
This will iterate over the playlist we declared earlier and append the song's name, along with the artist's name, to the list of available tracks.
This indicates which song is currently playing (by marking it in green and adding a pair of headphones next to it), and which songs have been played.
This indicates that the user's song is playing or ended through the changeStatusCode
method (adding this information to the box) and by notifying the user of this change through the voice API.
As mentioned above, this updates the status message in the context box (for example, indicating that a new song is being played) and uses the speak
method to notify the user of this change.
A small helper function to update the last command box.
A small helper function to hide or display the spinner icon (indicating that the user's voice command is currently being processed).
The player will be responsible for what you might expect, namely: starting, stopping, and pausing playback, and moving back and forth between tracks. Again, I'm not going to go into these methods in detail, but rather I want to direct you to our GitHub code base.
This checks whether the user has listened to the song. If not, it starts the song, otherwise it will only call the playSong
method we discussed earlier on the currently cached song. This is in audioData.songs
and corresponds to the currentSong
index.
This pauses or stops completely (returns playback time to the beginning of the song) a song, depending on what is passed as the second parameter. It also updates the status code to notify the user that the song has been stopped or paused.
This pauses or stops the song based on its first and only parameter:
This checks if the previous song is cached, and if so, pauses the current song, decrements currentSong
and plays the current song again. If the new song is not in the array, it does the same thing, but it first loads the song based on the file name/path corresponding to the decreasing currentSong
index.
If the user has listened to a song before, this method will try to pause it. If the next song exists in our data object (i.e. our playlist), it loads and plays it. If there is no next song, it will just change the status code and inform the user that they have reached the last song.
This takes the keyword as a parameter and performs a linear search between the song name and the artist, then plays the first match.
The Voice API is surprisingly easy to implement. In fact, just two lines of code can make the web application talk to the user:
<code class="language-javascript"></code>
What we do here is create a utterance
object with the text we want to say. The speechSynthesis
interface (available on the window
object) is responsible for handling this utterance
object and controlling the playback of the generated voice.
Continue to try it in your browser. It's that simple!
We can see its practical application in our speak
method, which reads aloud the message passed as a parameter:
<code class="language-javascript">if (annyang) { /*逻辑*/ }</code>
If a second parameter (scope
) exists, after the message is played, we call the scope
method on play
(which will be an Audio object).
This method is not that exciting. It takes a command as an argument and calls the appropriate method to respond to it. It uses a regular expression to check if the user wants to play a specific song, otherwise, it goes into a switch statement to test different commands. If none corresponds to the received command, it informs the user that the command is not understood.
You can find its code here.
So far, we have a data object representing the playlist, and a audioPlayer
object representing the player itself. Now we need to write some code to identify and process user input. Please note that this applies only to WebKit browsers.
The code that makes the user talk to your app as simple as before:
<code class="language-javascript">var commands = { 'show divs': function() { $('div').show(); }, 'show forms': function() { $("form").show(); } };</code>
This will invite users to allow pages to access their microphone. If you allow access, you can start talking and when you stop, the onresult
event will be triggered to make the result of the voice capture available as a JavaScript object.
Reference: HTML5 Speech Recognition API
We can implement it in our application as follows:
<code class="language-javascript">annyang.addCommands(commands); annyang.start();</code>
As you can see, we tested the presence of webkitSpeechRecognition
on the window
object. If it exists, then we can start, otherwise we will tell the user that the browser does not support it. If all goes well, we then set a few options. Among them lang
is an interesting option that improves the recognition results based on your origin.
Then, we declare handlers for the start
and onresult
events before starting the onend
method.
When the speech recognizer gets results, at least in the context of the current speech recognition implementation and our needs, we want to do a few things. Every time there is a result, we want to save it in the array and set a timeout to wait for three seconds so that the browser can collect any further results. After three seconds we want to use the collected results and loop through them in reverse order (newer results are more likely to be accurate) and check if the identified transcripts contain one of the commands we have available. If so, we execute the command and restart voice recognition. We do this because it can take up to a minute to wait for the end result, which makes our audio player look rather unresponsive and meaningless because it will be faster with just a click of a button.
<code class="language-javascript"></code>
Because we don't use the library, we have to write more code to set up our speech recognizer, loop through each result and check if its transcription matches the given keyword.
Finally, we restart it immediately at the end of speech recognition:
<code class="language-javascript">if (annyang) { /*逻辑*/ }</code>
You can view the full code for this section here.
That's it. We now have a fully functional and voice-controlled audio player. I highly recommend you download the code from GitHub and try it out, or check out the CodePen demo. I also provide a version that serves over HTTPS.
I hope this practical tutorial will provide a good introduction to the possibilities of the Web Voice API. I think as the implementation stabilizes and new features are added, we will see the usage of this API grow. For example, I think future YouTube will be completely voice-controlled, where we can watch videos from different users, play specific songs, and move between songs with just voice commands.
The Web Voice API can also improve many other areas or open up new possibilities. For example, use voice to browse emails, navigate websites, or search for the network.
Do you use this API in your project? I'd love to hear you in the comments below.
The Web Voice API is a powerful tool that allows developers to integrate speech recognition and synthesis into their web applications. In a voice-controlled audio player, the API works by converting spoken commands into text that the application can then interpret and execute. For example, if the user says "play", the API will convert it to text, and the application will understand that this is the command to start playing audio. This process involves sophisticated algorithms and machine learning techniques to accurately identify and interpret human speech.
Voice-controlled audio players have several advantages. First, it provides a hands-free experience, which is especially useful when users are busy with other tasks. Second, it can enhance accessibility for users with reduced mobility, which may have difficulty using traditional controls. Finally, it offers a novel and engaging user experience that can make your app stand out from the competition.
Most modern web browsers support the Web Voice API, including Google Chrome, Mozilla Firefox, and Microsoft Edge. However, it is always best to check specific browser compatibility before integrating APIs into your application, as support may vary between versions and platforms.
You can use high-quality microphones, reduce background noise, and train APIs to better understand the user's voice and accents to improve the accuracy of speech recognition. Additionally, you can implement error handling in your application to handle unidentified commands and provide feedback to users.
Yes, you can customize voice commands in voice-controlled audio players. This can be done by defining your own set of commands in your application code, which the Web Voice API will then recognize and interpret. This allows you to customize the user experience based on your specific needs and preferences.
Yes, the Web Voice API supports multiple languages. You can specify a language in the API settings, and it will recognize and interpret commands for that language. This makes it a universal tool for developing applications for international audiences.
The Web Voice API is designed with security in mind. It uses a secure HTTPS connection to transmit voice data and does not store any personal information. However, like any web technology, it is important to follow security best practices, such as regularly updating software and protecting your applications from common web vulnerabilities.
While the Voice Web API is primarily designed for use in web applications, it can also be used in mobile applications through web views. However, for native mobile applications, you may want to consider using platform-specific speech recognition APIs that may provide better performance and integration.
While the Web Voice API is a powerful tool, it does have some limitations. For example, it requires an internet connection to work, and its accuracy may be affected by factors such as background noise and user accent. Additionally, API support may vary between different web browsers and platforms.
To get started with the Web Voice API, you need to understand the basics of JavaScript and Web development. You can then browse the API documentation that provides detailed information about their features and how to use them. There are also many online tutorials and examples available to help you learn how to integrate APIs into your own applications.
The above is the detailed content of Make a Voice-Controlled Audio Player with the Web Speech API. For more information, please follow other related articles on the PHP Chinese website!