Home >Technology peripherals >AI >Web Speech API Developer's Guide: What it is and how it works
Translator|Li Rui
Reviewer|Sun Shujuan
Web Speech API is a web technology that allows users to incorporate speech data into applications middle. It can convert speech to text and vice versa through the browser.
The Web Speech API was introduced by the W3C community in 2012. Ten years later, this API is still under development due to limited browser compatibility.
The API supports both short-term input fragments, such as a verbal command, and long-term continuous input. Extensive dictation capabilities make it ideal for integration with Applause apps, while short typing is great for language translation.
Speech recognition has had a huge impact on accessibility. Users with disabilities can use voice to browse the web more easily. Therefore, this API could be the key to making the web friendlier and more efficient.
Text-to-speech and speech-to-text functionality is handled by two interfaces: speech synthesis and speech recognition.
In the speech recognition interface, the user speaks into the microphone, and then the speech recognition service will check what he said according to its own grammar. .
The API protects the user's privacy by first requesting permission to access the user's voice through the microphone. If the page using the API uses the HTTPS protocol, permission is only requested once. Otherwise, the API will ask in each instance.
The user's device may already include a speech recognition system, such as Siri for iOS or Android voices. When using the speech recognition interface, the default system will be used. After the speech is recognized, it is converted and returned as a text string.
In "one-shot" speech recognition, the recognition ends as soon as the user stops speaking. This is useful for short commands, such as searching the web for an application testing site or making a phone call. In "continuous" recognition, the user must manually end the recognition using the "Stop" button.
Currently, the speech recognition of the Web Speech API is only officially supported by two browsers: Chrome for Desktop and Android. Chrome needs to use the prefix interface.
However, the Web Speech API is still in the experimental stage and the specification is subject to change. You can check whether the current browser supports this API by searching for the webkitSpeechRecognition object.
Let’s learn a new function: speech recognition ().
var recognizer = new speechRecognition();
Now check the callbacks of certain events:
(1) onStart: onStart is triggered when the speech recognizer starts listening and recognizing speech. A message can be displayed to notify the user that the device is listening.
(2) onEnd: onEnd generates an event, which will be triggered every time the user ends speech recognition.
(3)onError: Whenever a speech recognition error occurs, this event will be triggered using the SpeechRecognitionError interface.
(4) onResult: This event is triggered when the speech recognition object obtains the result. It returns interim results and final results. onResult must use the SpeechRecognitionEvent interface.
The SpeechRecognitionEvent object contains the following data:
(1) results[i]: an array of speech recognition result objects, each element represents a Recognized words.
(2) resultindex: current recognition index.
(3) results[i][j]: Identify the j-th alternative word of the word; the first word that appears is the most likely word to appear.
(4) results[i].isFinal: A Boolean value showing whether the result is temporary or final.
(5) results[i][j].transcript: Text representation of the word.
(6) results[i][j].confidence: The probability that the result is correct (value range is from 0 to 1).
So, what properties should be configured on the speech recognition object? Take a look below.
(1) Continuous vs One-Shot
The user determines whether the speech recognition object is required to listen to him until it is turned off, or whether to just It is needed to recognize a short phrase. Its default setting is "false".
Assume that the technology is being used to take notes in order to integrate with the inventory tracking template. Need to be able to talk for long periods of time with enough time to pause without sending the app back to sleep. continuous can be set to true as follows:
speechRecognition.continuous = true;
(2) Language
Hope What language does the object recognize? If the browser is set to English by default, it will automatically select English. However, area codes can also be used.
Additionally, the user can be allowed to select the language from a menu:
speechRecognition.lang = document.querySelector("#select_dialect").value;
(3) Interim Results
Interim results refer to results that are not yet complete or final. By setting this property to true, you can cause the object to display temporary results as feedback to the user:
speechRecognition.interimResults = true;
(4) Start and Stop
If the speech has been If the recognition object is configured as "continuous", you need to set the onClick properties of the start and stop buttons as follows:
JavaScript
1 document.querySelector("#start").onclick = () => { 2 3 speechRecognition.start(); 4 5 }; 6 7 document.querySelector("#stop").onclick = () => { 8 9 speechRecognition.stop(); 10 11 };
这将允许用户控制使用的浏览器何时开始“监听”,何时停止。
因此,在深入了解了语音识别界面、方法和属性之后。现在探索Web Speech API的另一面。
语音合成也被称为文本到语音(TTS)。语音合成是指从应用程序中获取文本,将其转换成语音,然后从设备的扬声器中播放。
可以使用语音合成做任何事情,从驾驶指南到为在线课程朗读课堂笔记,再到视觉障碍用户的屏幕阅读。
在浏览器支持方面,从Gecko42+版本开始,Web Speech API的语音合成可以在Firefox桌面和移动端使用。但是,必须首先启用权限。Firefox OS2.5+默认支持语音合成;不需要权限。Chrome和Android 33+也支持语音合成。
那么,如何让浏览器说话呢?语音合成的主要控制器界面是SpeechSynthesis,但需要一些相关的界面,例如用于输出的声音。大多数操作系统都有默认的语音合成系统。
简单地说,用户需要首先创建一个SpeechSynthesisUtterance界面的实例。其界面包含服务将读取的文本,以及语言、音量、音高和速率等信息。指定这些之后,将实例放入一个队列中,该队列告诉浏览器应该说什么以及什么时候说。
将需要说话的文本指定给其“文本”属性,如下所示:
newUtterance.text =
除非使用.lang属性另有指定,否则语言将默认为应用程序或浏览器的语言。
在网站加载后,语音更改事件可以被触发。要改变浏览器的默认语音,可以使用语音合成中的getvoices()方法。这将显示所有可用的语音。
声音的种类取决于操作系统。谷歌和MacOSx一样有自己的默认声音集。最后,用户使用Array.find()方法选择喜欢的声音。
根据需要定制SpeechSynthesisUtterance。可以启动、停止和暂停队列,或更改通话速度(“速率”)。
什么时候应该使用Web Speech API?这种技术使用起来很有趣,但仍在发展中。尽管如此,还是有很多潜在的用例。集成API可以帮助实现IT基础设施的现代化,而用户可以了解Web Speech API哪些方面已经成熟可以改进。
对着麦克风说话比打字更快捷、更有效。在当今快节奏的工作生活中,人们可能需要能够在旅途中访问网页。
它还可以很好地减少管理工作量。语音到文本技术的改进有可能显著减少数据输入任务的时间。语音到文本技术可以集成到音频视频会议中,以加快会议的记录速度。
如上所述,语音到文本(STT)和文本语音(TTS)对于有残疾或支持需求的用户来说都是很好的工具。此外,由于任何原因而在写作或拼写方面有困难的用户可以通过语音识别更好地表达自己。
这样,语音识别技术就可以成为互联网上一个很好的均衡器。鼓励在办公室使用这些工具也能促进工作场所的可访问性。
Web Speech API可以成为一种强大的语言翻译工具,因为它同时支持语音到文本(STT)和文本语音(TTS)。目前,并不是每一种语言都可用。这是Web Speech API尚未充分发挥其潜力的一个领域。
一个缺点是API必须要有互联网连接才能正常工作。此时,浏览器将输入发送到它的服务器,然后服务器返回结果。这限制了Web Speech API可以使用的环境。
在提高语音识别器的准确性方面已经取得了令人难以置信的进展。用户可能偶尔还会遇到一些困难,例如技术术语和其他专业词汇或者方言。然而,到2022年,语音识别软件的精确度已经达到了人类的水平。
虽然Web Speech API还处于实验阶段,但它可以成为网站或应用程序的一个惊人的补充。从科技公司到市场营销商,所有的工作场所都可以使用这个API来提高效率。只需几行简单的JavaScript代码,就可以打开一个全新的可访问性世界。
语音识别可以使用户更容易更有效地浏览网页,人们期待看到这项技术快速成长和发展!
原文链接:https://dzone.com/articles/the-developers-guide-to-web-speech-api-what-is-it
The above is the detailed content of Web Speech API Developer's Guide: What it is and how it works. For more information, please follow other related articles on the PHP Chinese website!