Web Speech API Developer's Guide: What it is and how it works-AI-php.cn

Home

Technology peripherals

Web Speech API Developer's Guide: What it is and how it works

王林

Apr 11, 2023 pm 07:22 PM

webSpeech Recognitionspeech synthesis

Translator|Li Rui

Reviewer|Sun Shujuan

Web Speech API is a web technology that allows users to incorporate speech data into applications middle. It can convert speech to text and vice versa through the browser.

The Web Speech API was introduced by the W3C community in 2012. Ten years later, this API is still under development due to limited browser compatibility.

The API supports both short-term input fragments, such as a verbal command, and long-term continuous input. Extensive dictation capabilities make it ideal for integration with Applause apps, while short typing is great for language translation.

Speech recognition has had a huge impact on accessibility. Users with disabilities can use voice to browse the web more easily. Therefore, this API could be the key to making the web friendlier and more efficient.

Text-to-speech and speech-to-text functionality is handled by two interfaces: speech synthesis and speech recognition.

1. Speech recognition

In the speech recognition interface, the user speaks into the microphone, and then the speech recognition service will check what he said according to its own grammar. .

The API protects the user's privacy by first requesting permission to access the user's voice through the microphone. If the page using the API uses the HTTPS protocol, permission is only requested once. Otherwise, the API will ask in each instance.

The user's device may already include a speech recognition system, such as Siri for iOS or Android voices. When using the speech recognition interface, the default system will be used. After the speech is recognized, it is converted and returned as a text string.

In "one-shot" speech recognition, the recognition ends as soon as the user stops speaking. This is useful for short commands, such as searching the web for an application testing site or making a phone call. In "continuous" recognition, the user must manually end the recognition using the "Stop" button.

Currently, the speech recognition of the Web Speech API is only officially supported by two browsers: Chrome for Desktop and Android. Chrome needs to use the prefix interface.

However, the Web Speech API is still in the experimental stage and the specification is subject to change. You can check whether the current browser supports this API by searching for the webkitSpeechRecognition object.

2. Speech recognition attributes

Let’s learn a new function: speech recognition ().

var recognizer = new speechRecognition();

Now check the callbacks of certain events:

(1) onStart: onStart is triggered when the speech recognizer starts listening and recognizing speech. A message can be displayed to notify the user that the device is listening.

(2) onEnd: onEnd generates an event, which will be triggered every time the user ends speech recognition.

(3)onError: Whenever a speech recognition error occurs, this event will be triggered using the SpeechRecognitionError interface.

(4) onResult: This event is triggered when the speech recognition object obtains the result. It returns interim results and final results. onResult must use the SpeechRecognitionEvent interface.

The SpeechRecognitionEvent object contains the following data:

(1) results[i]: an array of speech recognition result objects, each element represents a Recognized words.

(2) resultindex: current recognition index.

(3) results[i][j]: Identify the j-th alternative word of the word; the first word that appears is the most likely word to appear.

(4) results[i].isFinal: A Boolean value showing whether the result is temporary or final.

(5) results[i][j].transcript: Text representation of the word.

(6) results[i][j].confidence: The probability that the result is correct (value range is from 0 to 1).

So, what properties should be configured on the speech recognition object? Take a look below.

(1) Continuous vs One-Shot

The user determines whether the speech recognition object is required to listen to him until it is turned off, or whether to just It is needed to recognize a short phrase. Its default setting is "false".

Assume that the technology is being used to take notes in order to integrate with the inventory tracking template. Need to be able to talk for long periods of time with enough time to pause without sending the app back to sleep. continuous can be set to true as follows:

speechRecognition.continuous = true;

(2) Language

Web Speech API Developers Guide: What it is and how it works

Hope What language does the object recognize? If the browser is set to English by default, it will automatically select English. However, area codes can also be used.

Additionally, the user can be allowed to select the language from a menu:

speechRecognition.lang = document.querySelector("#select_dialect").value;

(3) Interim Results

Interim results refer to results that are not yet complete or final. By setting this property to true, you can cause the object to display temporary results as feedback to the user:

speechRecognition.interimResults = true；

(4) Start and Stop

If the speech has been If the recognition object is configured as "continuous", you need to set the onClick properties of the start and stop buttons as follows:

JavaScript

1 document.querySelector("#start").onclick = () => {
2
3 speechRecognition.start();
4
5 };
6
7 document.querySelector("#stop").onclick = () => {
8
9 speechRecognition.stop();
10
11 };

这将允许用户控制使用的浏览器何时开始“监听”，何时停止。

因此，在深入了解了语音识别界面、方法和属性之后。现在探索Web Speech API的另一面。

三、语音合成

语音合成也被称为文本到语音(TTS)。语音合成是指从应用程序中获取文本，将其转换成语音，然后从设备的扬声器中播放。

可以使用语音合成做任何事情，从驾驶指南到为在线课程朗读课堂笔记，再到视觉障碍用户的屏幕阅读。

在浏览器支持方面，从Gecko42+版本开始，Web Speech API的语音合成可以在Firefox桌面和移动端使用。但是，必须首先启用权限。Firefox OS2.5+默认支持语音合成；不需要权限。Chrome和Android 33+也支持语音合成。

那么，如何让浏览器说话呢?语音合成的主要控制器界面是SpeechSynthesis，但需要一些相关的界面，例如用于输出的声音。大多数操作系统都有默认的语音合成系统。

简单地说，用户需要首先创建一个SpeechSynthesisUtterance界面的实例。其界面包含服务将读取的文本，以及语言、音量、音高和速率等信息。指定这些之后，将实例放入一个队列中，该队列告诉浏览器应该说什么以及什么时候说。

将需要说话的文本指定给其“文本”属性，如下所示：

newUtterance.text =

除非使用.lang属性另有指定，否则语言将默认为应用程序或浏览器的语言。

在网站加载后，语音更改事件可以被触发。要改变浏览器的默认语音，可以使用语音合成中的getvoices()方法。这将显示所有可用的语音。

声音的种类取决于操作系统。谷歌和MacOSx一样有自己的默认声音集。最后，用户使用Array.find()方法选择喜欢的声音。

根据需要定制SpeechSynthesisUtterance。可以启动、停止和暂停队列，或更改通话速度（“速率”）。

四、Web Speech API的优点和缺点

什么时候应该使用Web Speech API？这种技术使用起来很有趣，但仍在发展中。尽管如此，还是有很多潜在的用例。集成API可以帮助实现IT基础设施的现代化，而用户可以了解Web Speech API哪些方面已经成熟可以改进。

1.提高生产力

对着麦克风说话比打字更快捷、更有效。在当今快节奏的工作生活中，人们可能需要能够在旅途中访问网页。

它还可以很好地减少管理工作量。语音到文本技术的改进有可能显著减少数据输入任务的时间。语音到文本技术可以集成到音频视频会议中，以加快会议的记录速度。

2.可访问性

如上所述，语音到文本（STT）和文本语音（TTS）对于有残疾或支持需求的用户来说都是很好的工具。此外，由于任何原因而在写作或拼写方面有困难的用户可以通过语音识别更好地表达自己。

这样，语音识别技术就可以成为互联网上一个很好的均衡器。鼓励在办公室使用这些工具也能促进工作场所的可访问性。

3.翻译

Web Speech API可以成为一种强大的语言翻译工具，因为它同时支持语音到文本（STT）和文本语音（TTS）。目前，并不是每一种语言都可用。这是Web Speech API尚未充分发挥其潜力的一个领域。

4.离线功能

一个缺点是API必须要有互联网连接才能正常工作。此时，浏览器将输入发送到它的服务器，然后服务器返回结果。这限制了Web Speech API可以使用的环境。

5.精确度

在提高语音识别器的准确性方面已经取得了令人难以置信的进展。用户可能偶尔还会遇到一些困难，例如技术术语和其他专业词汇或者方言。然而，到2022年，语音识别软件的精确度已经达到了人类的水平。

五、结语

虽然Web Speech API还处于实验阶段，但它可以成为网站或应用程序的一个惊人的补充。从科技公司到市场营销商，所有的工作场所都可以使用这个API来提高效率。只需几行简单的JavaScript代码，就可以打开一个全新的可访问性世界。

语音识别可以使用户更容易更有效地浏览网页，人们期待看到这项技术快速成长和发展!

原文链接：https://dzone.com/articles/the-developers-guide-to-web-speech-api-what-is-it

The above is the detailed content of Web Speech API Developer's Guide: What it is and how it works. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

Expert Systems in AIApr 16, 2025 pm 12:00 PM

Expert Systems: A Deep Dive into AI's Decision-Making Power Imagine having access to expert advice on anything, from medical diagnoses to financial planning. That's the power of expert systems in artificial intelligence. These systems mimic the pro

Three Of The Best Vibe Coders Break Down This AI Revolution In CodeApr 16, 2025 am 11:58 AM

First of all, it’s apparent that this is happening quickly. Various companies are talking about the proportions of their code that are currently written by AI, and these are increasing at a rapid clip. There’s a lot of job displacement already around

Runway AI's Gen-4: How Can AI Montage Go Beyond AbsurdityApr 16, 2025 am 11:45 AM

The film industry, alongside all creative sectors, from digital marketing to social media, stands at a technological crossroad. As artificial intelligence begins to reshape every aspect of visual storytelling and change the landscape of entertainment

How to Enroll for 5 Days ISRO AI Free Courses? - Analytics VidhyaApr 16, 2025 am 11:43 AM

ISRO's Free AI/ML Online Course: A Gateway to Geospatial Technology Innovation The Indian Space Research Organisation (ISRO), through its Indian Institute of Remote Sensing (IIRS), is offering a fantastic opportunity for students and professionals to

Local Search Algorithms in AIApr 16, 2025 am 11:40 AM

Local Search Algorithms: A Comprehensive Guide Planning a large-scale event requires efficient workload distribution. When traditional approaches fail, local search algorithms offer a powerful solution. This article explores hill climbing and simul

OpenAI Shifts Focus With GPT-4.1, Prioritizes Coding And Cost EfficiencyApr 16, 2025 am 11:37 AM

The release includes three distinct models, GPT-4.1, GPT-4.1 mini and GPT-4.1 nano, signaling a move toward task-specific optimizations within the large language model landscape. These models are not immediately replacing user-facing interfaces like

The Prompt: ChatGPT Generates Fake PassportsApr 16, 2025 am 11:35 AM

Chip giant Nvidia said on Monday it will start manufacturing AI supercomputers— machines that can process copious amounts of data and run complex algorithms— entirely within the U.S. for the first time. The announcement comes after President Trump si

Enterprise AI Is Headed Toward Autonomy, Says NTT Data's AI ChiefApr 16, 2025 am 11:34 AM

In a recent interview with Wendy Collins, chief AI officer at NTT DATA — a part of the global conglomerate NTT Group and innovator of IT and business services — she noted that this is a trend that we will begin to see more and more across the enterpr

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

Dreamweaver CS6

Visual web development tools

Zend Studio 13.0.1

Powerful PHP integrated development environment

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Hot Topics

Where is the login entrance for gmail email?

7522

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers