Talking Web Pages and the Speech Synthesis API-JS Tutorial-php.cn

Home

Web Front-end

JS Tutorial

Talking Web Pages and the Speech Synthesis API

William Shakespeare

Feb 22, 2025 am 09:23 AM

Talking Web Pages and the Speech Synthesis API

Core points

The voice synthesis API allows the website to provide information to users by reading text aloud, which can greatly help visually impaired users and multitasking users.
The Voice Synthesis API provides a variety of methods and attributes to customize speech output, such as language, speech speed, and tone. This API also includes methods to start, pause, resume, and stop the speech synthesis process.
At present, the voice synthesis API is only fully supported by Chrome 33, and partially supports the Safari browser for iOS 7. This API requires wider browser support to be practically applied on the website.

A few weeks ago, I briefly discussed NLP and its related technologies. When dealing with natural language, two distinct but complementary aspects need to be considered: automatic speech recognition (ASR) and text-to-speech (TTS). In an article introducing the Web Voice API, I discussed the Web Voice API, an API that provides voice input and text-to-speech output capabilities in a web browser. You may have noticed that I only covered how to implement voice recognition on a website, not voice synthesis. In this article, we will fill this gap and describe the speech synthesis API. Voice recognition provides an opportunity to provide information to the website for users, especially those with disabilities. Recall the use cases I emphasize: > On a website, users can use voice navigation pages or fill in form fields. Users can also interact with the page while driving without taking their eyes off the road. None of these are trivial use cases.

Therefore, we can think of it as a channel from the user to the website. Phonetic synthesis, on the contrary, enables the website to provide information to users by reading text aloud. This is especially useful for people with blindness and often those with visual impairment. There are as many use cases for speech synthesis as speech recognition. Think of some systems implemented in new cars that can read your text or email so you don't have to take your eyes off the road. Visually impaired people using computers are familiar with software like JAWS, which can read aloud anything displayed on the desktop, allowing them to perform tasks. These apps are great, but they are expensive. With the voice synthesis API, we can help people who use our website, regardless of whether they have a disability or not. For example, suppose you are writing a blog post (like I am doing now), and to make it readable, you split it into paragraphs. Isn't this a good opportunity to use a speech synthesis API? In fact, we can program our website so that the speaker's icon appears on the screen once the user hovers over (or focuses) the text. If the user clicks the icon, we will call a function to synthesize the text of the given paragraph. This is a non-trivial improvement. Even better, it has very low overhead for us as developers and no overhead for our users. The basic implementation of this concept is shown below. Voice Synthesis API Demonstration Now we have a better understanding of the use cases of this API, allowing us to understand its methods and properties. The Method and Attribute Speech Synthesis API defines an interface called SpeechSynthesis, whose structure is shown here. As in the previous article, this article does not cover all properties and methods described in the specification. The reason is that it is too complex to cover it in one article. However, we will explain enough elements to make it easy for you to understand the elements that are not covered. ### SpeechSynthesisUtterance object The first object we need to know is the SpeechSynthesisUtterance object. It represents the pronunciation (i.e. text) that the synthesizer will read aloud. This object is very flexible and can be customized in a variety of ways. In addition to text, we can also set the language, speech speed and even tone used to pronounce text. Here is its attribute list: - text – A string that specifies the speech (text) to be synthesized. - lang – A string representing a speech synthesis language (such as "en-GB" or "it-IT"). - voiceURI – A string that specifies the address of the speech synthesis service that the web application wants to use. - volume – A number that represents the volume of the text. It ranges from 0 (minimum) to 1 (maximum) (including), and the default value is 1. - rate – The number that represents the speed of speech. It is relative to the default rate of speech. The default value is 1. A value of 2 means that the speech will be read aloud at twice the default speed. Values below 0.1 or above 10 are not allowed. - pitch – The number representing the tone of the voice. It ranges from 0 (minimum) to 2 (maximum) (inclusive). The default value is 1. To instantiate this object, we can pass the text to be synthesized as a constructor parameter, or omit the text and set it later. The following code is an example of the first case.// 创建语音对象var utterance = new SpeechSynthesisUtterance('My name is Aurelio De Rosa');The second case is to construct SpeechSynthesisUtterance and assign parameters as shown below. // 创建语音对象var utterance = new SpeechSynthesisUtterance();utterance.text = 'My name is Aurelio De Rosa';utterance.lang = 'it-IT';utterance.rate = 1.2; Some methods exposed by this object are: - onstart – Set the callback that is triggered at the start of the synthesis. - onpause – Sets the callback triggered when the speech synthesis is paused. - onresume – Sets the callback that is triggered when the composition is restored. - oneend – Sets the callback triggered at the end of the composition. The SpeechSynthesisUtterance object allows us to set the text to be read aloud and configure how it is read aloud. Currently, we have only created objects representing speech. We still need to bind it to the synthesizer. ### SpeechSynthesis object The SpeechSynthesis object does not need to be instantiated. It belongs to a window object and can be used directly. This object exposes some methods, such as: - speak() - accepts SpeechSynthesisUtterance object as its only parameter. This method is used to synthesize speech. - stop() – Stop the synthesis process immediately. - pause() – Pause the synthesis process. - resume() – Resuming the synthesis process. Another interesting way is getVoices(). It does not accept any parameters and is used to retrieve a list of voices (arrays) available to a specific browser. Each entry in the list provides information such as name, mnemonic name (providing voice prompts to developers such as "Google US English", lang (the language of voice, such as it-IT), and voiceURI (this voice is address of voice synthesis service). Important Note: In Chrome and Safari, the voiceURI attribute is called voice. Therefore, the demo we will build in this article uses voice instead of voiceURI. Browser compatibility Unfortunately, at the time of writing, the only browsers that support the voice synthesis API are Chrome 33 (full support) and iOS 7 (partially supported). Demo This section provides a simple demonstration of the speech synthesis API. This page allows you to enter some text and synthesize it. In addition, you can set the rate, tone, and language you want to use. You can also stop, pause, or resume synthesis of text at any time using the corresponding buttons provided. Before attaching the listener to the button, we tested the implementation because support for this API is very limited. Generally, the test is very simple, including the following code: if (window.SpeechSynthesisUtterance === undefined) { // 不支持} else { // 读取我的文本} If the test fails, we will display the message "API does not support".Once support is verified, we dynamically load available voices in the specific selection box placed in the tag. Note that there is a problem with the getVoices() method in Chrome (#340160). So I created a workaround for this using setInterval(). We then attach a handler to each button so that they can call their specific actions (play, stop, etc.). A live demonstration of the code is provided here. Additionally, this demo, and all the other demos I have built so far, can be found in my HTML5 API demo repository. ```

charset="UTF-8"> name="viewport" content="width=device-width, initial-scale=1.0"/>
>Speech Synthesis API Demo>

{ -webkit-box-sizing: border-box; -moz-box-sizing: border-box; box-sizing: border-box; }

<code>  body
  {
    max-width: 500px;
    margin: 2em auto;
    padding: 0 0.5em;
    font-size: 20px;
  }

  h1,
  .buttons-wrapper
  {
    text-align: center;
  }

  .hidden
  {
    display: none;
  }

  #text,
  #log
  {
    display: block;
    width: 100%;
    height: 5em;
    overflow-y: scroll;
    border: 1px solid #333333;
    line-height: 1.3em;
  }

  .field-wrapper
  {
    margin-top: 0.2em;
  }

  .button-demo
  {
    padding: 0.5em;
    display: inline-block;
    margin: 1em auto;
  }
></code>

>
Speech Synthesis API>

<code><h3 id="gt">></h3>Play area>
 action="" method="get">
  <label> for="text"></label>Text:>
   id="text">>
  <div> class="field-wrapper">
    <label> for="voice"></label>Voice:>
     id="voice">>
  </div>>
  <div> class="field-wrapper">
    <label> for="rate"></label>Rate (0.1 - 10):>
     type="number" id="rate" min="0.1" max="10" value="1" step="any" />
  </div>>
  <div> class="field-wrapper">
    <label> for="pitch"></label>Pitch (0.1 - 2):>
     type="number" id="pitch" min="0.1" max="2" value="1" step="any" />
  </div>>
  <div> class="buttons-wrapper">
     id="button-speak-ss" class="button-demo">Speak>
     id="button-stop-ss" class="button-demo">Stop>
     id="button-pause-ss" class="button-demo">Pause>
     id="button-resume-ss" class="button-demo">Resume>
  </div>>
>

 id="ss-unsupported" class="hidden">API not supported>

<h3 id="gt">></h3>Log>
<div> id="log"></div>>
 id="clear-all" class="button-demo">Clear all>

>
  // Test browser support
  if (window.SpeechSynthesisUtterance === undefined) {
    document.getElementById('ss-unsupported').classList.remove('hidden');
    ['button-speak-ss', 'button-stop-ss', 'button-pause-ss', 'button-resume-ss'].forEach(function(elementId) {
      document.getElementById(elementId).setAttribute('disabled', 'disabled');
    });
  } else {
    var text = document.getElementById('text');
    var voices = document.getElementById('voice');
    var rate = document.getElementById('rate');
    var pitch = document.getElementById('pitch');
    var log = document.getElementById('log');

    // Workaround for a Chrome issue (#340160 - https://code.google.com/p/chromium/issues/detail?id=340160)
    var watch = setInterval(function() {
      // Load all voices available
      var voicesAvailable = speechSynthesis.getVoices();

      if (voicesAvailable.length !== 0) {
        for(var i = 0; i               voices.innerHTML += '                                  'data-voice-uri="' + voicesAvailable[i].voiceURI + '">' +
                              voicesAvailable[i].name +
                              (voicesAvailable[i].default ? ' (default)' : '') + '';
        }

        clearInterval(watch);
      }
    }, 1);

    document.getElementById('button-speak-ss').addEventListener('click', function(event) {
      event.preventDefault();

      var selectedVoice = voices.options[voices.selectedIndex];

      // Create the utterance object setting the chosen parameters
      var utterance = new SpeechSynthesisUtterance();

      utterance.text = text.value;
      utterance.voice = selectedVoice.getAttribute('data-voice-uri');
      utterance.lang = selectedVoice.value;
      utterance.rate = rate.value;
      utterance.pitch = pitch.value;

      utterance.onstart = function() {
        log.innerHTML = 'Speaker started' + '<br>' + log.innerHTML;
      };

      utterance.onend = function() {
        log.innerHTML = 'Speaker finished' + '<br>' + log.innerHTML;
      };

      window.speechSynthesis.speak(utterance);
    });

    document.getElementById('button-stop-ss').addEventListener('click', function(event) {
      event.preventDefault();

      window.speechSynthesis.cancel();
      log.innerHTML = 'Speaker stopped' + '<br>' + log.innerHTML;
    });

    document.getElementById('button-pause-ss').addEventListener('click', function(event) {
      event.preventDefault();

      window.speechSynthesis.pause();
      log.innerHTML = 'Speaker paused' + '<br>' + log.innerHTML;
    });

    document.getElementById('button-resume-ss').addEventListener('click', function(event) {
      event.preventDefault();

      if (window.speechSynthesis.paused === true) {
        window.speechSynthesis.resume();
        log.innerHTML = 'Speaker resumed' + '<br>' + log.innerHTML;
      } else {
        log.innerHTML = 'Unable to resume. Speaker is not paused.' + '<br>' + log.innerHTML;
      }
    });

    document.getElementById('clear-all').addEventListener('click', function() {
      log.textContent = '';
    });
  }
></code>

Conclusion

This article introduces the speech synthesis API. This is an API that synthesizes text and improves the overall experience of our website users, especially visually impaired users. As we can see, this API exposes multiple objects, methods, and properties, but it is not difficult to use. Unfortunately, its browser support is currently very poor, with Chrome and Safari being the only browsers that support it. Hopefully more browsers will follow suit and allow you to actually consider using it on your website. I've decided to do that. Don't forget to play the demo, if you like this post, please leave a comment. I really want to hear your opinions. Frequently Asked Questions about Web Pages and Voice Synthesis APIs (FAQ)

What is the speech synthesis API and how does it work?

The Voice Synthesis API is a web-based interface that allows developers to integrate text-to-speech functionality into their applications. It works by converting written text into spoken words using computer-generated voice. This is done by breaking the text into speech components and then synthesizing these components into speech. The API provides a range of voice and languages to choose from, allowing developers to customize voice output to suit their needs.

How do I implement the speech synthesis API in a web application?

Implementing the speech synthesis API in your web application involves several steps. First, you need to create a new SpeechSynthesisUtterance instance and set its text property to the text you want to read aloud. You can then set other properties, such as speech, tone, and rate, to customize the speech output. Finally, call the SpeechSynthesis interface's spoke method to start speech synthesis.

Can I customize the voice and language of voice output?

Yes, the speech synthesis API provides a range of speech and languages you can choose from. You can set the voice and language by setting the voice and lang properties of the SpeechSynthesisUtterance instance. The API also allows you to adjust the tone and rate of your voice to further customize the output.

What are the limitations of the speech synthesis API?

While the speech synthesis API is a powerful tool, it does have some limitations. For example, voice and language availability may vary by browser and operating system. Additionally, the quality of voice output may vary and may not always sound natural. Furthermore, this API does not provide control over the pronunciation of a particular word or phrase.

How to handle errors when using speech synthesis API?

The Voice Synthesis API provides an error event that you can listen to. This event is triggered when an error occurs during speech synthesis. You can handle this event by adding an event listener to the SpeechSynthesisUtterance instance and providing a callback function that will be called when the event is triggered.

Can I pause and resume voice output?

Yes, the Voice Synthesis API provides pause and recovery methods that you can use to control your voice output. You can call these methods on the SpeechSynthesis interface to pause and restore voice.

Is the voice synthesis API supported in all browsers?

The voice synthesis API is supported in most modern browsers, including Chrome, Firefox, Safari, and Edge. However, voice and language availability may vary by browser and operating system.

Can I use the voice synthesis API in my mobile application?

Yes, the voice synthesis API can be used in mobile applications. However, voice and language availability may vary by mobile operating system.

How to test the speech synthesis API?

You can test the speech synthesis API by creating a simple web page that converts written text into speech using the API. You can then try different voices, languages, tones and rates to see how they affect the voice output.

Where can I find more information about the Voice Synthesis API?

You can find more information about the Voice Synthesis API in the official documentation provided by the World Wide Web Alliance (W3C). There are also many online tutorials and articles that provide detailed explanations and examples on how to use the API.

The above is the detailed content of Talking Web Pages and the Speech Synthesis API. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

From Websites to Apps: The Diverse Applications of JavaScriptApr 22, 2025 am 12:02 AM

JavaScript is widely used in websites, mobile applications, desktop applications and server-side programming. 1) In website development, JavaScript operates DOM together with HTML and CSS to achieve dynamic effects and supports frameworks such as jQuery and React. 2) Through ReactNative and Ionic, JavaScript is used to develop cross-platform mobile applications. 3) The Electron framework enables JavaScript to build desktop applications. 4) Node.js allows JavaScript to run on the server side and supports high concurrent requests.

Python vs. JavaScript: Use Cases and Applications ComparedApr 21, 2025 am 12:01 AM

Python is more suitable for data science and automation, while JavaScript is more suitable for front-end and full-stack development. 1. Python performs well in data science and machine learning, using libraries such as NumPy and Pandas for data processing and modeling. 2. Python is concise and efficient in automation and scripting. 3. JavaScript is indispensable in front-end development and is used to build dynamic web pages and single-page applications. 4. JavaScript plays a role in back-end development through Node.js and supports full-stack development.

The Role of C/C in JavaScript Interpreters and CompilersApr 20, 2025 am 12:01 AM

C and C play a vital role in the JavaScript engine, mainly used to implement interpreters and JIT compilers. 1) C is used to parse JavaScript source code and generate an abstract syntax tree. 2) C is responsible for generating and executing bytecode. 3) C implements the JIT compiler, optimizes and compiles hot-spot code at runtime, and significantly improves the execution efficiency of JavaScript.

JavaScript in Action: Real-World Examples and ProjectsApr 19, 2025 am 12:13 AM

JavaScript's application in the real world includes front-end and back-end development. 1) Display front-end applications by building a TODO list application, involving DOM operations and event processing. 2) Build RESTfulAPI through Node.js and Express to demonstrate back-end applications.

JavaScript and the Web: Core Functionality and Use CasesApr 18, 2025 am 12:19 AM

The main uses of JavaScript in web development include client interaction, form verification and asynchronous communication. 1) Dynamic content update and user interaction through DOM operations; 2) Client verification is carried out before the user submits data to improve the user experience; 3) Refreshless communication with the server is achieved through AJAX technology.

Understanding the JavaScript Engine: Implementation DetailsApr 17, 2025 am 12:05 AM

Understanding how JavaScript engine works internally is important to developers because it helps write more efficient code and understand performance bottlenecks and optimization strategies. 1) The engine's workflow includes three stages: parsing, compiling and execution; 2) During the execution process, the engine will perform dynamic optimization, such as inline cache and hidden classes; 3) Best practices include avoiding global variables, optimizing loops, using const and lets, and avoiding excessive use of closures.

Python vs. JavaScript: The Learning Curve and Ease of UseApr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

Python vs. JavaScript: Community, Libraries, and ResourcesApr 15, 2025 am 12:16 AM

Python and JavaScript have their own advantages and disadvantages in terms of community, libraries and resources. 1) The Python community is friendly and suitable for beginners, but the front-end development resources are not as rich as JavaScript. 2) Python is powerful in data science and machine learning libraries, while JavaScript is better in front-end development libraries and frameworks. 3) Both have rich learning resources, but Python is suitable for starting with official documents, while JavaScript is better with MDNWebDocs. The choice should be based on project needs and personal interests.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.