Implementing a conversational robot with Raspberry Pi

Home

Backend Development

PHP Tutorial

Implementing a conversational robot with Raspberry Pi_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 12, 2016 am 08:57 AM

android

Using Raspberry Pi to implement a conversational robot

Recently, I used Raspberry Pi to implement a robot that can talk to people. Let’s give a brief introduction.

Raspberry Pi is the world's most popular microcomputer motherboard and a leader in open source hardware. It is designed for student computer programming education, is only the size of a credit card, and is affordable. Supports operating systems such as linux (debian). The most important thing is that the information is complete and the community is active.
I am using Raspberry Pi B version. The basic configuration is Broadcom BCM2836 processor, 4-core 900M clock speed, and 1G RAM.

My goal is to make a robot that can talk to people, which requires the robot to have input devices and output devices. The input device is a microphone, and the output can be HDMI, headphones or speakers. I used speakers here. Below is a photo of my Raspberry Pi. The 4 USB interfaces are respectively connected to wireless network cards, wireless keyboards, microphones, and audio power supplies.

We can divide the robot’s conversation into three parts: listening, thinking, and speaking.
"Listening" means recording what people say and converting it into words.
"Thinking" means giving different outputs based on different inputs. For example, if the other party says "it's time now", you can reply "it's xx o'clock xx minutes Beijing time".
"Speak" means converting text into speech and playing it back.

These three parts involve a lot of speech recognition, speech synthesis, artificial intelligence and other technologies, which require a lot of time and effort to research. Fortunately, some companies have opened interfaces for customers to use. Here, I chose Baidu’s API. The implementation of these three parts is explained below.

"Listen"

The first thing is to record what people say. I used the arecord tool. The command is as follows:

arecord -D "plughw:1" -f S16_LE -r 16000 test.wav

Among them, the -D parameter is followed by the recording device and the microphone is connected. Finally, there are 2 devices on the Raspberry Pi: internal device and external USB device. plughw:1 represents using the external device. -f indicates the recording format, and -r indicates the sound sampling frequency. Since the Baidu speech recognition mentioned later has requirements for the audio file format, we need to record it into a format that meets the requirements. In addition, I did not specify the recording time here, it will continue recording until the user presses ctrl-c. The recorded audio file is saved as test.wav.
Next, we need to convert the audio into text, that is, speech recognition (asr). Baidu's voice open platform provides free services and supports REST API
For documentation, see: http://yuyin.baidu. com/docs/asr/57
The process is basically to obtain the token, send the voice information, voice data, token, etc. that need to be recognized to Baidu's speech recognition server, and then the corresponding text can be obtained. Because the server supports REST API, we can use any language to implement the client code. Here I use python

<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li># coding: utf-8<br /> </li><li><br /></li><li>import urllib.request<br /></li><li>import json<br /></li><li>import base64<br /></li><li>import sys<br /></li><li><br /></li><li>def get_access_token():<br /></li><li>url = "https://openapi.baidu.com/oauth/2.0/token"<br /></li><li>grant_type = "client_credentials"<br /></li><li>client_id = "xxxxxxxxxxxxxxxxxx"<br /></li><li>client_secret = "xxxxxxxxxxxxxxxxxxxxxx"<br /></li><li><br /></li><li>url = url + "?" + "grant_type=" + grant_type + "&" + "client_id=" + client_id + "&" + "client_secret=" + client_secret<br /></li><li><br /></li><li>resp = urllib.request.urlopen(url).read()<br /></li><li>data = json.loads(resp.decode("utf-8"))<br /></li><li>return data["access_token"]<br /></li><li><br /></li><li><br /></li><li>def baidu_asr(data, id, token):<br /></li><li>speech_data = base64.b64encode(data).decode("utf-8")<br /></li><li>speech_length = len(data)<br /></li><li><br /></li><li>post_data = {<br /></li><li>"format" : "wav",<br /></li><li>"rate" : 16000,<br /></li><li>"channel" : 1,<br /></li><li>"cuid" : id,<br /></li><li>"token" : token,<br /></li><li>"speech" : speech_data,<br /></li><li>"len" : speech_length<br /></li><li>}<br /></li><li><br /></li><li>url = "http://vop.baidu.com/server_api"<br /></li><li>json_data = json.dumps(post_data).encode("utf-8")<br /></li><li>json_length = len(json_data)<br /></li><li>#print(json_data)<br /></li><li><br /></li><li>req = urllib.request.Request(url, data = json_data)<br /></li><li>req.add_header("Content-Type", "application/json")<br /></li><li>req.add_header("Content-Length", json_length)<br /></li><li><br /></li><li>print("asr start request\n")<br /></li><li>resp = urllib.request.urlopen(req)<br /></li><li>print("asr finish request\n")<br /></li><li>resp = resp.read()<br /></li><li>resp_data = json.loads(resp.decode("utf-8"))<br /></li><li>if resp_data["err_no"] == 0:<br /></li><li>return resp_data["result"]<br /></li><li>else:<br /></li><li>print(resp_data)<br /></li><li>return None<br /></li><li><br /></li><li>def asr_main(filename):<br /></li><li>f = open(filename, "rb")<br /></li><li>audio_data = f.read()<br /></li><li>f.close()<br /></li><li><br /></li><li>#token = get_access_token()<br /></li><li>token = "xxxxxxxxxxxxxxxxxx"<br /></li><li>uuid = "xxxx"<br /></li><li>resp = baidu_asr(audio_data, uuid, token)<br /></li><li>print(resp[0])<br /></li><li>return resp[0] </li></ol>

"Thinking"
Here I use Turing from Baidu api store robot. Its documentation can be found at: http://apistore.baidu.com/apiworks/servicedetail/736.html
Its use is very simple and will not be described in detail here. The code is as follows:

<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>import urllib.request<br /> </li><li>import sys<br /></li><li>import json<br /></li><li><br /></li><li>def robot_main(words):<br /></li><li>url = "http://apis.baidu.com/turing/turing/turing?"<br /></li><li><br /></li><li>key = "879a6cb3afb84dbf4fc84a1df2ab7319"<br /></li><li>userid = "1000"<br /></li><li><br /></li><li>words = urllib.parse.quote(words)<br /></li><li>url = url + "key=" + key + "&info=" + words + "&userid=" + userid<br /></li><li><br /></li><li>req = urllib.request.Request(url)<br /></li><li>req.add_header("apikey", "xxxxxxxxxxxxxxxxxxxxxxxxxx")<br /></li><li><br /></li><li>print("robot start request")<br /></li><li>resp = urllib.request.urlopen(req)<br /></li><li>print("robot stop request")<br /></li><li>content = resp.read()<br /></li><li>if content:<br /></li><li>data = json.loads(content.decode("utf-8"))<br /></li><li>print(data["text"])<br /></li><li>return data["text"]<br /></li><li>else:<br /></li><li>return None</li></ol>

"Speaking"
first needs to convert text into speech, that is, speech synthesis (tts). Then play the sound.
Baidu's voice open platform provides a tts interface, and can configure male and female voices, intonation, speaking speed, and volume. The server returns audio data in mp3 format. We write the data to the file in binary format.
For details, see http://yuyin.baidu.com/docs/tts/136
The code is as follows:

<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li># coding: utf-8<br /> </li><li><br /></li><li>import urllib.request<br /></li><li>import json<br /></li><li>import sys<br /></li><li><br /></li><li>def baidu_tts_by_post(data, id, token):<br /></li><li>post_data = {<br /></li><li>"tex" : data,<br /></li><li>"lan" : "zh",<br /></li><li>"ctp" : 1,<br /></li><li>"cuid" : id,<br /></li><li>"tok" : token,<br /></li><li>}<br /></li><li><br /></li><li>url = "http://tsn.baidu.com/text2audio"<br /></li><li>post_data = urllib.parse.urlencode(post_data).encode('utf-8')<br /></li><li>#print(post_data)<br /></li><li>req = urllib.request.Request(url, data = post_data)<br /></li><li><br /></li><li>print("tts start request")<br /></li><li>resp = urllib.request.urlopen(req)<br /></li><li>print("tts finish request")<br /></li><li>resp = resp.read()<br /></li><li>return resp<br /></li><li><br /></li><li>def tts_main(filename, words):<br /></li><li>token = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"<br /></li><li>text = urllib.parse.quote(words)<br /></li><li>uuid = "xxxx"<br /></li><li>resp = baidu_tts_by_post(text, uuid, token)<br /></li><li><br /></li><li>f = open("test.mp3", "wb")<br /></li><li>f.write(resp)<br /></li><li>f.close() </li></ol>

After getting the audio file, you can use the mpg123 player to play it.

mpg123 test.mp3

Integration
Finally, combine these three parts.
You can first integrate the python-related code into main.py, as follows:

<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>import asr<br /> </li><li>import tts<br /></li><li>import robot<br /></li><li><br /></li><li>words = asr.asr_main("test.wav")<br /></li><li>new_words = robot.robot_main(words)<br /></li><li>tts.tts_main("test.mp3", new_words) </li></ol>

Then use the script to call related tools:

#! /bin/bash
arecord -D "plughw:1" -f S16_LE -r 16000 test.wav
python3 main.py
mpg123 test .mp3

Okay, now you can talk to the robot. Run the script, say something into the microphone, then press ctrl-c, and the robot will reply to you.

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the concept of session locking.Apr 29, 2025 am 12:39 AM

Sessionlockingisatechniqueusedtoensureauser'ssessionremainsexclusivetooneuseratatime.Itiscrucialforpreventingdatacorruptionandsecuritybreachesinmulti-userapplications.Sessionlockingisimplementedusingserver-sidelockingmechanisms,suchasReentrantLockinJ

Are there any alternatives to PHP sessions?Apr 29, 2025 am 12:36 AM

Alternatives to PHP sessions include Cookies, Token-based Authentication, Database-based Sessions, and Redis/Memcached. 1.Cookies manage sessions by storing data on the client, which is simple but low in security. 2.Token-based Authentication uses tokens to verify users, which is highly secure but requires additional logic. 3.Database-basedSessions stores data in the database, which has good scalability but may affect performance. 4. Redis/Memcached uses distributed cache to improve performance and scalability, but requires additional matching

What is the full form of PHP?Apr 28, 2025 pm 04:58 PM

The article discusses PHP, detailing its full form, main uses in web development, comparison with Python and Java, and its ease of learning for beginners.

How does PHP handle form data?Apr 28, 2025 pm 04:57 PM

PHP handles form data using $\_POST and $\_GET superglobals, with security ensured through validation, sanitization, and secure database interactions.

What is the difference between PHP and ASP.NET?Apr 28, 2025 pm 04:56 PM

The article compares PHP and ASP.NET, focusing on their suitability for large-scale web applications, performance differences, and security features. Both are viable for large projects, but PHP is open-source and platform-independent, while ASP.NET,

Is PHP a case-sensitive language?Apr 28, 2025 pm 04:55 PM

PHP's case sensitivity varies: functions are insensitive, while variables and classes are sensitive. Best practices include consistent naming and using case-insensitive functions for comparisons.

How do you redirect a page in PHP?Apr 28, 2025 pm 04:54 PM

The article discusses various methods for page redirection in PHP, focusing on the header() function and addressing common issues like "headers already sent" errors.

Explain type hinting in PHPApr 28, 2025 pm 04:52 PM

Article discusses type hinting in PHP, a feature for specifying expected data types in functions. Main issue is improving code quality and readability through type enforcement.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

2 weeks agoByDDD

Roblox: Dead Rails – How To Summon And Defeat Nikola Tesla

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SublimeText3 Chinese version

Chinese version, very easy to use

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Hot Topics

Where is the login entrance for gmail email?

7801

1644

1402

1299

1236