Create a Text-to-Speech Chrome Extension-JS Tutorial-php.cn

Home

Web Front-end

JS Tutorial

Create a Text-to-Speech Chrome Extension

Jennifer Aniston

Feb 18, 2025 am 11:30 AM

Create a Text-to-Speech Chrome Extension

Core points

This article explains how to create a Chrome browser text-to-speech (TTS) extension that uses HTML5 voice synthesis API or third-party API to convert highlighted text or clipboard content into speech.

Chrome extensions usually contain manifest files (metadata files), images (such as extension icons), HTML files, JavaScript files, and other resources (such as style sheets).

TTS extension waits for the user to click on their icon or press a specific hotkey (Shift Y), and then converts the highlighted text or clipboard content to voice.

The code for the extension includes background scripts and content scripts, permissions to access active tags and user clipboards, as well as checking highlighted text or clipboard content, initializing extensions, adding hotkeys, and converting text to voice method.

If the HTML5 Voice Synthesis API is not available, the extension will use a third-party API such as Voice RSS to convert text to speech. The extension also includes a bug fix to fix the problem that Chrome stops pronunciation after 200-300 words.

This article was peer-reviewed by Marc Towler. Thanks to all the peer reviewers of SitePoint to make the content of SitePoint perfect!

Text to speech (also known as speech synthesis or TTS) is a way of artificially producing human speech. This is nothing new, according to Wikipedia, people have tried to create machines that can produce human voice for at least a thousand years.

TTS is becoming more and more common in our lives today and everyone can benefit from it. We will demonstrate this by creating a Chrome extension that converts text to speech. HTML5 brings us a speech synthesis API that allows any web application to convert arbitrary text strings into speech and play to users for free.

Chrome extensions usually contain the following:

Talent file (required file containing metadata)
Image (such as icon for extension)
HTML file (for example, a popup window that appears when the user clicks on the extension's icon)
JavaScript files (such as content and/or background scripts that will be explained later)
Any other resources that the application may use (such as style sheets)

About Page to Voice Extension

Due to the popularity of Chrome and the rise of TTS, we will create a Chrome extension that converts text to voice. The extension will wait for the user to click on their icon or press a special hotkey (Shift Y), and then try to find what the user highlights on the page they are currently viewing, or try to find what is copied to their clipboard. If anything is found, it will first try to convert it to speech using the HTML5 speech synthesis API, and if that API is not available, a third-party API is called.

Basics of Chrome Extension

Each Chrome extension requires a file named manifest.json. The manifest is a JSON file containing data that is critical to the application, from the extension's name, description, icon, and author, to the data that defines the extension's requirements - which websites should the extension be able to be on Run (these will be permissions that the user must grant) or what files to run when the user browses a specific website.

{
  "manifest_version": 2,

  "name": "Page to Speech",
  "description": "This extension will produce English speech to whatever text you highlight on a webpage. Highlight text and click the extension's icon",
  "author": "Ivan Dimov",
  "version": "1.0",
  "icons": { 
    "16": "icon16.png",
    "48": "icon48.png",
    "128": "icon128.png"
  },

Our list begins with the name, description, author, version, and icon of the extension. You can provide many icons with different sizes in the icons object.

 "background": {
    "scripts": ["background.min.js"]
  },
  "content_scripts": [
    {
      "matches": ["http://*/*", "https://*/*"],
      "js": [ "polyfill.min.js", "ext.min.js"],
      "run_at": "document_end"
    }],

Then, we define a background script called background.min.js in the background object (note that we are using a minimization file). Background scripts are long-running scripts that will continue to run until the user's browser is closed or the extension is disabled.

After

we have an array of content_scripts that instruct Chrome to load two JavaScript files on each website request due to wildcards "http://*/*" and "https://*/*"" and "https://*/*"" . Unlike background scripts, content scripts can access the DOM of the actual website the user is visiting. Content scripts can both read and modify the DOM of any web page embedded in it. Therefore, our polyfill.min.js and ext.min.js will be able to read and modify all data on each web page .

  "browser_action": {
    "default_icon": "speech.png"
  },
   "permissions": [
     "activeTab",
     "clipboardRead"
    ]
}

Wait! We also have an array called permissions, which we request to access only the web page (activity tag) currently opened by the user. We also request another permission called clipboardRead, which will allow us to read the user's clipboard (so we can convert its contents into voice).

Writing a page to voice Chrome extension

First, we create our only background script that hooks up an event listener that will fire when the user clicks on the extension's icon. When this happens, we will call the sendMessage function, which uses the chrome.tabs.sendMessage(tabId, message, callback) method to send a message to our content script (the content script can read the DOM and find out what the user highlights. content and/or content placed by the user on the clipboard). We use the chrome.tabs.query method to send a message to the currently opened tab page - because this is what we are interested in and what we are able to access - the parameters of the method include a callback function that will use the following: Query the parameter call for matching tab pages.

chrome.browserAction.onClicked.addListener(function (tab) {
    //fired when the user clicks on the ext's icon
    sendMessage();
});
function sendMessage() {
  chrome.tabs.query({active: true, currentWindow: true}, function(tabs){
    chrome.tabs.sendMessage(tabs[0].id, {action: "pageToSpeech"}, function(response) {});
  });
}

Now, the more verbose thing is our content script. We create an object to hold some data related to the extension and then define our initialization method.

initialize: function() {
    if (!pageToSpeech.hasText()) { return;}
    if (!pageToSpeech.trySpeechSynthesizer()) {
        pageToSpeech.trySpeechApi();
    }
},

This method checks whether the user is not highlighted with text or nothing in the clipboard, and in this case it is only returned. Otherwise, it will try to generate speech using the HTML5 speech synthesis API. If this fails, it will eventually try to use a third-party API.

How to check text performs several actions. It tries to get an object containing highlighted text using the built-in getSelection() method and convert it into a text string using toString(). Then, if the text is not highlighted, it will try to find the text in the user's clipboard. It does this by adding an input element to the page, focusing it, triggering a paste event with execCommand('paste'), and then saving the text pasted into that input in a property. Then it clears the input. In either case, it returns what it found.

{
  "manifest_version": 2,

  "name": "Page to Speech",
  "description": "This extension will produce English speech to whatever text you highlight on a webpage. Highlight text and click the extension's icon",
  "author": "Ivan Dimov",
  "version": "1.0",
  "icons": { 
    "16": "icon16.png",
    "48": "icon48.png",
    "128": "icon128.png"
  },

To enable the user to run text-to-speech conversion using hotkeys (hard coded as Shift Y), we initialize an array and set up an event listener for the onkeydown and onkeyup events. In the listener, we store an index corresponding to the keyCode of the key pressed, which is derived from the comparison result of the e.type event type and keydown, and is a boolean value. Therefore, whenever a key is pressed, the value of the corresponding key index will be set to true, and whenever a key is released, the value of the index will be changed to false. So if both indexes 16 and 84 hold true values, we know that the user is using our hotkeys, so we will initialize the text to speech conversion.

 "background": {
    "scripts": ["background.min.js"]
  },
  "content_scripts": [
    {
      "matches": ["http://*/*", "https://*/*"],
      "js": [ "polyfill.min.js", "ext.min.js"],
      "run_at": "document_end"
    }],

To convert text to speech, we rely on the trySpeechSynthesizer() method. If the HTML5 speech synthesis exists in the user's browser (window.speechSynthesis), we know that the user can use it, so we check if the speech is currently running (we know if it is running through the pageToSpeech.data.speechInProgress boolean). If the voice is in progress, we will stop the current voice (because trySpeechSynthesizer will start a new voice, we do not want to make two sounds at the same time). We then set speechInProgress to true, and whenever the speech is finished, we set the property to a false value again.

Now, I don't want to elaborate on why we use speechUtteranceChunker, but it's a bug fix related to Chrome stopping speech synthesis after emitting 200-300 words. Basically, it splits our text string into many smaller chunks (120 words in our case) and calls the speech synthesis API using one block after another.

  "browser_action": {
    "default_icon": "speech.png"
  },
   "permissions": [
     "activeTab",
     "clipboardRead"
    ]
}

Finally, if the HTML5 Voice Synthesis API is not available, we will try an API. We have the same properties to know if we need to stop the already running audio. We then create a new Audio object directly and pass it the URL of the desired API endpoint, as the demo API we selected directly streams the audio. We just have to pass the API key and the text to be converted. We also check if the audio triggers an error. In this case, we just need to show the user an alert that we cannot help at this time (we test the code for this specific API, Voice RSS, allowing 300 requests on the free hierarchy).

{
  "manifest_version": 2,

  "name": "Page to Speech",
  "description": "This extension will produce English speech to whatever text you highlight on a webpage. Highlight text and click the extension's icon",
  "author": "Ivan Dimov",
  "version": "1.0",
  "icons": { 
    "16": "icon16.png",
    "48": "icon48.png",
    "128": "icon128.png"
  },

Finally, outside of any local scope, we call the addHotkeys method, which will start waiting for the user to press the correct hotkey, and we set up a listener that will wait for the message to be received from the background script. If you receive the correct message (speakHighlight) or press the hotkey, we will initialize the text to speech conversion object.

 "background": {
    "scripts": ["background.min.js"]
  },
  "content_scripts": [
    {
      "matches": ["http://*/*", "https://*/*"],
      "js": [ "polyfill.min.js", "ext.min.js"],
      "run_at": "document_end"
    }],

Conclusion

Voice, we have a nice Chrome extension that converts text to voice. The concept here can be used to create Chrome extensions for different purposes. Have you built any interesting Chrome extensions, or do you want to build one? Please let me know in the comments!

If you like this idea and want to develop it further, you can find the complete code in our GitHub repository. If you want to test it, you can find a production version of the extension in the Chrome Web Store.

References: https://www.php.cn/link/b8b0e04211dce1c104dfcdb685c9b9ad https://www.php.cn/link/e417baa9cdf34202f71b55a27da899e8

Text to Speech Chrome Extension FAQ

How to install the text to voice Chrome extension?

Installing the Text-to-Speech Chrome extension is an easy process. First, open your Google Chrome browser and navigate to the Chrome Web Store. In the search bar, enter the name of the extension you want to install, such as "Read Aloud" or "Text-to-Speech (TTS)". Click the extension from the search results and click the "Add to Chrome" button. A pop-up window will appear to ask for confirmation and click "Add Extension". The extension will be installed and an icon will appear on your browser toolbar.

Can I customize my voice in my text to speech Chrome extension?

Yes, most text to speech Chrome extensions allow you to customize your speech. You can usually choose from a variety of voices, including male and female voices in different accents and languages. To customize your voice, click the extension icon on the browser toolbar and navigate to the Settings or Options menu. Here you should find options to change voice, speed, tone, and volume.

Is the text to voice Chrome extension free to use?

Many text-to-speech Chrome extensions are free to use, but some may charge a small fee to offer advanced features. These advanced features may include other voice, ad-free use, or saving audio files. Be sure to check the details of the extension in the Chrome Web Store before installing.

Can I use the text to voice Chrome extension offline?

Some text to voice Chrome extensions can be used offline, but not all extensions can do it. It depends on how the extension is designed. If offline use is important to you, check the description of the extension in the Chrome Web Store or the settings for the extension after installation.

How to use the text to speech Chrome extension?

To use the Text to Speech Chrome extension, first, navigate to the web page you want to read aloud. Then, click on the extension icon on the browser toolbar. Some extensions will immediately start reading the page aloud, while others may require you to select the text you want to read. You can usually use controls in the extension pop-up window to pause, resume, or stop reading.

Can I use the text to voice Chrome extension on any website?

Most text to voice Chrome extensions should work on any website, with exceptions possible. Some websites may have compatibility issues with certain extensions, or extensions may not be able to read certain types of content, such as images or videos. If you have problems, try using a different extension or contact the developer of the extension for support.

Is my data safe in text to speech Chrome extension?

Most text-to-speech Chrome extensions should respect your privacy and will not collect or share your data without your consent. However, it is best to check the extension's privacy policy before installing. If you are not satisfied with this policy, consider looking for other extensions.

Can I change the speed of speech in the text to speech in Chrome extension?

Yes, most text to speech Chrome extensions allow you to adjust the speed of your speech. This can usually be done in the settings or options menu of the extension. You can usually choose a range of speeds, from very slow to very fast.

Can I use the text to voice Chrome extension in other browsers?

Text to Speech Chrome extension is designed to run in Google Chrome and may not run in other browsers. However, many extension developers will also create versions of their extensions for other browsers, such as Firefox or Edge. Please check the developer's website or the relevant extension store for these browsers to see if there are any version available.

Can I use the text to voice Chrome extension on my mobile device?

Some text to voice Chrome extensions may work for Chrome on Android or iOS, but not all extensions are available. It depends on how the extension is designed. If mobile usage is important to you, check the description of the extension in the Chrome Web Store or the settings for the extension after installation.

The above is the detailed content of Create a Text-to-Speech Chrome Extension. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Javascript Data Types : Is there any difference between Browser and NodeJs?May 14, 2025 am 12:15 AM

JavaScript core data types are consistent in browsers and Node.js, but are handled differently from the extra types. 1) The global object is window in the browser and global in Node.js. 2) Node.js' unique Buffer object, used to process binary data. 3) There are also differences in performance and time processing, and the code needs to be adjusted according to the environment.

JavaScript Comments: A Guide to Using // and /* */May 13, 2025 pm 03:49 PM

JavaScriptusestwotypesofcomments:single-line(//)andmulti-line(//).1)Use//forquicknotesorsingle-lineexplanations.2)Use//forlongerexplanationsorcommentingoutblocksofcode.Commentsshouldexplainthe'why',notthe'what',andbeplacedabovetherelevantcodeforclari

Python vs. JavaScript: A Comparative Analysis for DevelopersMay 09, 2025 am 12:22 AM

The main difference between Python and JavaScript is the type system and application scenarios. 1. Python uses dynamic types, suitable for scientific computing and data analysis. 2. JavaScript adopts weak types and is widely used in front-end and full-stack development. The two have their own advantages in asynchronous programming and performance optimization, and should be decided according to project requirements when choosing.

Python vs. JavaScript: Choosing the Right Tool for the JobMay 08, 2025 am 12:10 AM

Whether to choose Python or JavaScript depends on the project type: 1) Choose Python for data science and automation tasks; 2) Choose JavaScript for front-end and full-stack development. Python is favored for its powerful library in data processing and automation, while JavaScript is indispensable for its advantages in web interaction and full-stack development.

Python and JavaScript: Understanding the Strengths of EachMay 06, 2025 am 12:15 AM

Python and JavaScript each have their own advantages, and the choice depends on project needs and personal preferences. 1. Python is easy to learn, with concise syntax, suitable for data science and back-end development, but has a slow execution speed. 2. JavaScript is everywhere in front-end development and has strong asynchronous programming capabilities. Node.js makes it suitable for full-stack development, but the syntax may be complex and error-prone.

JavaScript's Core: Is It Built on C or C ?May 05, 2025 am 12:07 AM

JavaScriptisnotbuiltonCorC ;it'saninterpretedlanguagethatrunsonenginesoftenwritteninC .1)JavaScriptwasdesignedasalightweight,interpretedlanguageforwebbrowsers.2)EnginesevolvedfromsimpleinterpreterstoJITcompilers,typicallyinC ,improvingperformance.

JavaScript Applications: From Front-End to Back-EndMay 04, 2025 am 12:12 AM

JavaScript can be used for front-end and back-end development. The front-end enhances the user experience through DOM operations, and the back-end handles server tasks through Node.js. 1. Front-end example: Change the content of the web page text. 2. Backend example: Create a Node.js server.

Python vs. JavaScript: Which Language Should You Learn?May 03, 2025 am 12:10 AM

Choosing Python or JavaScript should be based on career development, learning curve and ecosystem: 1) Career development: Python is suitable for data science and back-end development, while JavaScript is suitable for front-end and full-stack development. 2) Learning curve: Python syntax is concise and suitable for beginners; JavaScript syntax is flexible. 3) Ecosystem: Python has rich scientific computing libraries, and JavaScript has a powerful front-end framework.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055612 fails to install in Windows 10?

4 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Nordhold: Fusion System, Explained

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

WebStorm Mac version

Useful JavaScript development tools

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),