Home >Web Front-end >JS Tutorial >Create a Text-to-Speech Chrome Extension

Create a Text-to-Speech Chrome Extension

Jennifer Aniston
Jennifer AnistonOriginal
2025-02-18 11:30:16798browse

Create a Text-to-Speech Chrome Extension

Core points

This article explains how to create a Chrome browser text-to-speech (TTS) extension that uses HTML5 voice synthesis API or third-party API to convert highlighted text or clipboard content into speech.

Chrome extensions usually contain manifest files (metadata files), images (such as extension icons), HTML files, JavaScript files, and other resources (such as style sheets).

TTS extension waits for the user to click on their icon or press a specific hotkey (Shift Y), and then converts the highlighted text or clipboard content to voice.

The code for the extension includes background scripts and content scripts, permissions to access active tags and user clipboards, as well as checking highlighted text or clipboard content, initializing extensions, adding hotkeys, and converting text to voice method.

If the HTML5 Voice Synthesis API is not available, the extension will use a third-party API such as Voice RSS to convert text to speech. The extension also includes a bug fix to fix the problem that Chrome stops pronunciation after 200-300 words.

This article was peer-reviewed by Marc Towler. Thanks to all the peer reviewers of SitePoint to make the content of SitePoint perfect!

Text to speech (also known as speech synthesis or TTS) is a way of artificially producing human speech. This is nothing new, according to Wikipedia, people have tried to create machines that can produce human voice for at least a thousand years.

TTS is becoming more and more common in our lives today and everyone can benefit from it. We will demonstrate this by creating a Chrome extension that converts text to speech. HTML5 brings us a speech synthesis API that allows any web application to convert arbitrary text strings into speech and play to users for free.

Chrome extensions usually contain the following:

  1. Talent file (required file containing metadata)
  2. Image (such as icon for extension)
  3. HTML file (for example, a popup window that appears when the user clicks on the extension's icon)
  4. JavaScript files (such as content and/or background scripts that will be explained later)
  5. Any other resources that the application may use (such as style sheets)

About Page to Voice Extension

Due to the popularity of Chrome and the rise of TTS, we will create a Chrome extension that converts text to voice. The extension will wait for the user to click on their icon or press a special hotkey (Shift Y), and then try to find what the user highlights on the page they are currently viewing, or try to find what is copied to their clipboard. If anything is found, it will first try to convert it to speech using the HTML5 speech synthesis API, and if that API is not available, a third-party API is called.

Basics of Chrome Extension

Each Chrome extension requires a file named manifest.json. The manifest is a JSON file containing data that is critical to the application, from the extension's name, description, icon, and author, to the data that defines the extension's requirements - which websites should the extension be able to be on Run (these will be permissions that the user must grant) or what files to run when the user browses a specific website.

<code class="language-json">{
  "manifest_version": 2,

  "name": "Page to Speech",
  "description": "This extension will produce English speech to whatever text you highlight on a webpage. Highlight text and click the extension's icon",
  "author": "Ivan Dimov",
  "version": "1.0",
  "icons": { 
    "16": "icon16.png",
    "48": "icon48.png",
    "128": "icon128.png"
  },
</code>

Our list begins with the name, description, author, version, and icon of the extension. You can provide many icons with different sizes in the icons object.

<code class="language-json"> "background": {
    "scripts": ["background.min.js"]
  },
  "content_scripts": [
    {
      "matches": ["http://*/*", "https://*/*"],
      "js": [ "polyfill.min.js", "ext.min.js"],
      "run_at": "document_end"
    }],
</code>

Then, we define a background script called background.min.js in the background object (note that we are using a minimization file). Background scripts are long-running scripts that will continue to run until the user's browser is closed or the extension is disabled.

After

we have an array of content_scripts that instruct Chrome to load two JavaScript files on each website request due to wildcards "http://*/*" and "https://*/*"" and "https://*/*"" . Unlike background scripts, content scripts can access the DOM of the actual website the user is visiting. Content scripts can both read and modify the DOM of any web page embedded in it. Therefore, our polyfill.min.js and ext.min.js will be able to read and modify all data on each web page .

<code class="language-json">  "browser_action": {
    "default_icon": "speech.png"
  },
   "permissions": [
     "activeTab",
     "clipboardRead"
    ]
}
</code>

Wait! We also have an array called permissions, which we request to access only the web page (activity tag) currently opened by the user. We also request another permission called clipboardRead, which will allow us to read the user's clipboard (so we can convert its contents into voice).

Writing a page to voice Chrome extension

First, we create our only background script that hooks up an event listener that will fire when the user clicks on the extension's icon. When this happens, we will call the sendMessage function, which uses the chrome.tabs.sendMessage(tabId, message, callback) method to send a message to our content script (the content script can read the DOM and find out what the user highlights. content and/or content placed by the user on the clipboard). We use the chrome.tabs.query method to send a message to the currently opened tab page - because this is what we are interested in and what we are able to access - the parameters of the method include a callback function that will use the following: Query the parameter call for matching tab pages.

<code class="language-javascript">chrome.browserAction.onClicked.addListener(function (tab) {
    //fired when the user clicks on the ext's icon
    sendMessage();
});
function sendMessage() {
  chrome.tabs.query({active: true, currentWindow: true}, function(tabs){
    chrome.tabs.sendMessage(tabs[0].id, {action: "pageToSpeech"}, function(response) {});
  });
}
</code>

Now, the more verbose thing is our content script. We create an object to hold some data related to the extension and then define our initialization method.

<code class="language-javascript">initialize: function() {
    if (!pageToSpeech.hasText()) { return;}
    if (!pageToSpeech.trySpeechSynthesizer()) {
        pageToSpeech.trySpeechApi();
    }
},
</code>

This method checks whether the user is not highlighted with text or nothing in the clipboard, and in this case it is only returned. Otherwise, it will try to generate speech using the HTML5 speech synthesis API. If this fails, it will eventually try to use a third-party API.

How to check text performs several actions. It tries to get an object containing highlighted text using the built-in getSelection() method and convert it into a text string using toString(). Then, if the text is not highlighted, it will try to find the text in the user's clipboard. It does this by adding an input element to the page, focusing it, triggering a paste event with execCommand('paste'), and then saving the text pasted into that input in a property. Then it clears the input. In either case, it returns what it found.

<code class="language-json">{
  "manifest_version": 2,

  "name": "Page to Speech",
  "description": "This extension will produce English speech to whatever text you highlight on a webpage. Highlight text and click the extension's icon",
  "author": "Ivan Dimov",
  "version": "1.0",
  "icons": { 
    "16": "icon16.png",
    "48": "icon48.png",
    "128": "icon128.png"
  },
</code>

To enable the user to run text-to-speech conversion using hotkeys (hard coded as Shift Y), we initialize an array and set up an event listener for the onkeydown and onkeyup events. In the listener, we store an index corresponding to the keyCode of the key pressed, which is derived from the comparison result of the e.type event type and keydown, and is a boolean value. Therefore, whenever a key is pressed, the value of the corresponding key index will be set to true, and whenever a key is released, the value of the index will be changed to false. So if both indexes 16 and 84 hold true values, we know that the user is using our hotkeys, so we will initialize the text to speech conversion.

<code class="language-json"> "background": {
    "scripts": ["background.min.js"]
  },
  "content_scripts": [
    {
      "matches": ["http://*/*", "https://*/*"],
      "js": [ "polyfill.min.js", "ext.min.js"],
      "run_at": "document_end"
    }],
</code>

To convert text to speech, we rely on the trySpeechSynthesizer() method. If the HTML5 speech synthesis exists in the user's browser (window.speechSynthesis), we know that the user can use it, so we check if the speech is currently running (we know if it is running through the pageToSpeech.data.speechInProgress boolean). If the voice is in progress, we will stop the current voice (because trySpeechSynthesizer will start a new voice, we do not want to make two sounds at the same time). We then set speechInProgress to true, and whenever the speech is finished, we set the property to a false value again.

Now, I don't want to elaborate on why we use speechUtteranceChunker, but it's a bug fix related to Chrome stopping speech synthesis after emitting 200-300 words. Basically, it splits our text string into many smaller chunks (120 words in our case) and calls the speech synthesis API using one block after another.

<code class="language-json">  "browser_action": {
    "default_icon": "speech.png"
  },
   "permissions": [
     "activeTab",
     "clipboardRead"
    ]
}
</code>

Finally, if the HTML5 Voice Synthesis API is not available, we will try an API. We have the same properties to know if we need to stop the already running audio. We then create a new Audio object directly and pass it the URL of the desired API endpoint, as the demo API we selected directly streams the audio. We just have to pass the API key and the text to be converted. We also check if the audio triggers an error. In this case, we just need to show the user an alert that we cannot help at this time (we test the code for this specific API, Voice RSS, allowing 300 requests on the free hierarchy).

<code class="language-json">{
  "manifest_version": 2,

  "name": "Page to Speech",
  "description": "This extension will produce English speech to whatever text you highlight on a webpage. Highlight text and click the extension's icon",
  "author": "Ivan Dimov",
  "version": "1.0",
  "icons": { 
    "16": "icon16.png",
    "48": "icon48.png",
    "128": "icon128.png"
  },
</code>

Finally, outside of any local scope, we call the addHotkeys method, which will start waiting for the user to press the correct hotkey, and we set up a listener that will wait for the message to be received from the background script. If you receive the correct message (speakHighlight) or press the hotkey, we will initialize the text to speech conversion object.

<code class="language-json"> "background": {
    "scripts": ["background.min.js"]
  },
  "content_scripts": [
    {
      "matches": ["http://*/*", "https://*/*"],
      "js": [ "polyfill.min.js", "ext.min.js"],
      "run_at": "document_end"
    }],
</code>

Conclusion

Voice, we have a nice Chrome extension that converts text to voice. The concept here can be used to create Chrome extensions for different purposes. Have you built any interesting Chrome extensions, or do you want to build one? Please let me know in the comments!

If you like this idea and want to develop it further, you can find the complete code in our GitHub repository. If you want to test it, you can find a production version of the extension in the Chrome Web Store.

References: https://www.php.cn/link/b8b0e04211dce1c104dfcdb685c9b9ad https://www.php.cn/link/e417baa9cdf34202f71b55a27da899e8

Text to Speech Chrome Extension FAQ

How to install the text to voice Chrome extension?

Installing the Text-to-Speech Chrome extension is an easy process. First, open your Google Chrome browser and navigate to the Chrome Web Store. In the search bar, enter the name of the extension you want to install, such as "Read Aloud" or "Text-to-Speech (TTS)". Click the extension from the search results and click the "Add to Chrome" button. A pop-up window will appear to ask for confirmation and click "Add Extension". The extension will be installed and an icon will appear on your browser toolbar.

Can I customize my voice in my text to speech Chrome extension?

Yes, most text to speech Chrome extensions allow you to customize your speech. You can usually choose from a variety of voices, including male and female voices in different accents and languages. To customize your voice, click the extension icon on the browser toolbar and navigate to the Settings or Options menu. Here you should find options to change voice, speed, tone, and volume.

Is the text to voice Chrome extension free to use?

Many text-to-speech Chrome extensions are free to use, but some may charge a small fee to offer advanced features. These advanced features may include other voice, ad-free use, or saving audio files. Be sure to check the details of the extension in the Chrome Web Store before installing.

Can I use the text to voice Chrome extension offline?

Some text to voice Chrome extensions can be used offline, but not all extensions can do it. It depends on how the extension is designed. If offline use is important to you, check the description of the extension in the Chrome Web Store or the settings for the extension after installation.

How to use the text to speech Chrome extension?

To use the Text to Speech Chrome extension, first, navigate to the web page you want to read aloud. Then, click on the extension icon on the browser toolbar. Some extensions will immediately start reading the page aloud, while others may require you to select the text you want to read. You can usually use controls in the extension pop-up window to pause, resume, or stop reading.

Can I use the text to voice Chrome extension on any website?

Most text to voice Chrome extensions should work on any website, with exceptions possible. Some websites may have compatibility issues with certain extensions, or extensions may not be able to read certain types of content, such as images or videos. If you have problems, try using a different extension or contact the developer of the extension for support.

Is my data safe in text to speech Chrome extension?

Most text-to-speech Chrome extensions should respect your privacy and will not collect or share your data without your consent. However, it is best to check the extension's privacy policy before installing. If you are not satisfied with this policy, consider looking for other extensions.

Can I change the speed of speech in the text to speech in Chrome extension?

Yes, most text to speech Chrome extensions allow you to adjust the speed of your speech. This can usually be done in the settings or options menu of the extension. You can usually choose a range of speeds, from very slow to very fast.

Can I use the text to voice Chrome extension in other browsers?

Text to Speech Chrome extension is designed to run in Google Chrome and may not run in other browsers. However, many extension developers will also create versions of their extensions for other browsers, such as Firefox or Edge. Please check the developer's website or the relevant extension store for these browsers to see if there are any version available.

Can I use the text to voice Chrome extension on my mobile device?

Some text to voice Chrome extensions may work for Chrome on Android or iOS, but not all extensions are available. It depends on how the extension is designed. If mobile usage is important to you, check the description of the extension in the Chrome Web Store or the settings for the extension after installation.

The above is the detailed content of Create a Text-to-Speech Chrome Extension. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn