search
HomeTechnology peripheralsAIso fast! Recognize video speech into text in just a few minutes with less than 10 lines of code

so fast! Recognize video speech into text in just a few minutes with less than 10 lines of code

Hello everyone, I am Kite

Two years ago, the need to convert audio and video files into text content was difficult to achieve. But now it can be easily solved in just a few minutes.

It is said that in order to obtain training data, some companies have fully crawled videos on short video platforms such as Douyin and Kuaishou, and then extracted the audio from the videos and converted them into text form for use as big data Model training corpus.

If you need to convert video or audio files to text, you can try this open source solution available today. For example, you can search for the specific time points when dialogues in film and television programs appear.

Without further ado, let’s get to the point.

Whisper

This solution is OpenAI’s open source Whisper. Of course it is written in Python. You only need to simply install a few packages and write a few lines of code. Wait for a while (depending on the performance of your machine and the length of the audio and video), the final text content will come out, it's that simple.

GitHub warehouse address: https://github.com/openai/whisper

Fast-Whisper

Although it has been quite simplified, for the program It is still not streamlined enough for the staff. After all, programmers tend to prefer simplicity and efficiency. Although it is relatively easy to install and call Whisper, you still need to install PyTorch, ffmpeg, and even Rust separately.

Therefore, Fast-Whisper came into being, which is faster and more concise than Whisper. Fast-Whisper is not just a simple encapsulation of Whisper, but a reconstruction of OpenAI's Whisper model by using CTranslate2. CTranslate2 is an efficient inference engine for the Transformer model.

To summarize, it is faster than Whisper. The official statement is that it is 4-8 times faster than Whisper. Not only does it support GPU, but it also supports CPU, and even my broken Mac can be used.

GitHub warehouse address: https://github.com/SYSTRAN/faster-whisper

It only takes two steps to use.

  1. Install dependency packages
pip install faster-whisper
  1. Write code,
from faster_whisper import WhisperModelmodel_size = "large-v3"# Run on GPU with FP16model = WhisperModel(model_size, device="cuda", compute_type="float16")# or run on GPU with INT8# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")# or run on CPU with INT8# model = WhisperModel(model_size, device="cpu", compute_type="int8")segments, info = model.transcribe("audio.mp3", beam_size=5)print("Detected language '%s' with probability %f" % (info.language, info.language_probability))for segment in segments:print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Yes, it's that simple.

What can I do?

It happens that a friend wants to make short videos and post some chicken soup literature videos. Chicken Soup comes from interviews with some famous people. However, he didn't want to watch the entire video again, he just wanted to use the fastest way to get the text content, and then read the text, because reading text is much faster than watching a video, and it can also be searched.

Let me just say, if you don’t even have the piety to watch a complete video, how can you manage an account well?

So I made one for him, using Fast-Whisper.

Client

The client uses Swift and only supports Mac.

  1. Select a video;
  2. Then click "Extract Text", then the Python interface will be called, and you need to wait for a while;
  3. Load the parsed text As well as the start and end times that appear;
  4. Select a start time and an end event;
  5. Click the "Export" button, and the video clip will be exported;

, duration 00:10

Server

The server is, of course, Python, and then it is packaged with Flask and the interface is open to the outside world.

from flask import Flask, request, jsonifyfrom faster_whisper import WhisperModelapp = Flask(__name__)model_size = "large-v2"model = WhisperModel(model_size, device="cpu", compute_type="int8")@app.route('/transcribe', methods=['POST'])def transcribe():# Get the file path from the requestfile_path = request.json.get('filePath')# Transcribe the filesegments, info = model.transcribe(file_path, beam_size=5, initial_prompt="简体")segments_copy = []with open('segments.txt', 'w') as file:for segment in segments:line = "%.2fs|%.2fs|[%.2fs -> %.2fs]|%s" % (segment.start, segment.end, segment.start, segment.end, segment.text)segments_copy.append(line)file.write(line + '\n')# Prepare the responseresponse_data = {"language": info.language,"language_probability": info.language_probability,"segments": []}for segment in segments_copy:response_data["segments"].append(segment)return jsonify(response_data)if __name__ == '__main__':app.run(debug=False)

The above is the detailed content of so fast! Recognize video speech into text in just a few minutes with less than 10 lines of code. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
科大讯飞:华为昇腾 910B 能力基本可对标英伟达 A100,正合力打造我国通用人工智能新底座科大讯飞:华为昇腾 910B 能力基本可对标英伟达 A100,正合力打造我国通用人工智能新底座Oct 22, 2023 pm 06:13 PM

本站10月22日消息,今年第三季度,科大讯飞实现净利润2579万元,同比下降81.86%;前三季度净利润9936万元,同比下降76.36%。科大讯飞副总裁江涛在Q3业绩说明会上透露,讯飞已于2023年初与华为昇腾启动专项攻关,与华为联合研发高性能算子库,合力打造我国通用人工智能新底座,让国产大模型架构在自主创新的软硬件基础之上。他指出,目前华为昇腾910B能力已经基本做到可对标英伟达A100。在即将举行的科大讯飞1024全球开发者节上,讯飞和华为在人工智能算力底座上将有进一步联合发布。他还提到,

自然语言生成任务中的五种采样方法介绍和Pytorch代码实现自然语言生成任务中的五种采样方法介绍和Pytorch代码实现Feb 20, 2024 am 08:50 AM

在自然语言生成任务中,采样方法是从生成模型中获得文本输出的一种技术。这篇文章将讨论5种常用方法,并使用PyTorch进行实现。1、GreedyDecoding在贪婪解码中,生成模型根据输入序列逐个时间步地预测输出序列的单词。在每个时间步,模型会计算每个单词的条件概率分布,然后选择具有最高条件概率的单词作为当前时间步的输出。这个单词成为下一个时间步的输入,生成过程会持续直到满足某种终止条件,比如生成了指定长度的序列或者生成了特殊的结束标记。GreedyDecoding的特点是每次选择当前条件概率最

PyCharm与PyTorch完美结合:安装配置步骤详解PyCharm与PyTorch完美结合:安装配置步骤详解Feb 21, 2024 pm 12:00 PM

PyCharm是一款强大的集成开发环境(IDE),而PyTorch是深度学习领域备受欢迎的开源框架。在机器学习和深度学习领域,使用PyCharm和PyTorch进行开发可以极大地提高开发效率和代码质量。本文将详细介绍如何在PyCharm中安装配置PyTorch,并附上具体的代码示例,帮助读者更好地利用这两者的强大功能。第一步:安装PyCharm和Python

用PyTorch实现噪声去除扩散模型用PyTorch实现噪声去除扩散模型Jan 14, 2024 pm 10:33 PM

在详细了解去噪扩散概率模型(DDPM)的工作原理之前,我们先来了解一下生成式人工智能的一些发展情况,这也是DDPM的基础研究之一。VAEVAE使用编码器、概率潜在空间和解码器。在训练过程中,编码器预测每个图像的均值和方差,并从高斯分布中对这些值进行采样。采样的结果传递到解码器中,解码器将输入图像转换为与输出图像相似的形式。KL散度用于计算损失。VAE的一个显著优势是其能够生成多样化的图像。在采样阶段,可以直接从高斯分布中采样,并通过解码器生成新的图像。GAN在变分自编码器(VAEs)的短短一年之

安装PyTorch的PyCharm教程安装PyTorch的PyCharm教程Feb 24, 2024 am 10:09 AM

PyTorch作为一款功能强大的深度学习框架,被广泛应用于各类机器学习项目中。PyCharm作为一款强大的Python集成开发环境,在实现深度学习任务时也能提供很好的支持。本文将详细介绍如何在PyCharm中安装PyTorch,并提供具体的代码示例,帮助读者快速上手使用PyTorch进行深度学习任务。第一步:安装PyCharm首先,我们需要确保已经在计算机上

使用PHP和PyTorch进行深度学习使用PHP和PyTorch进行深度学习Jun 19, 2023 pm 02:43 PM

深度学习是人工智能领域的一个重要分支,近年来受到了越来越多人的关注和重视。为了能够进行深度学习的研究和应用,往往需要使用到一些深度学习框架来帮助实现。在本文中,我们将介绍如何使用PHP和PyTorch进行深度学习。一、什么是PyTorchPyTorch是一个由Facebook开发的开源机器学习框架,它可以帮助我们快速地创建深度学习模型并进行训练。PyTorc

真快!几分钟就把视频语音识别为文本了,不到10行代码真快!几分钟就把视频语音识别为文本了,不到10行代码Feb 27, 2024 pm 01:55 PM

大家好,我是风筝两年前,将音视频文件转换为文字内容的需求难以实现,但是如今只需几分钟便可轻松解决。据说一些公司为了获取训练数据,已经对抖音、快手等短视频平台上的视频进行了全面爬取,然后将视频中的音频提取出来转换成文本形式,用作大数据模型的训练语料。如果您需要将视频或音频文件转换为文字,可以尝试今天提供的这个开源解决方案。例如,可以搜索影视节目的对话出现的具体时间点。话不多说,进入正题。Whisper这个方案就是OpenAI开源的Whisper,当然是用Python写的了,只需要简单安装几个包,然

使用OpenAI的Whisper 模型进行语音识别使用OpenAI的Whisper 模型进行语音识别Apr 12, 2023 pm 05:28 PM

语音识别是人工智能中的一个领域,它允许计算机理解人类语音并将其转换为文本。该技术用于 Alexa 和各种聊天机器人应用程序等设备。而我们最常见的就是语音转录,语音转录可以语音转换为文字记录或字幕。wav2vec2、Conformer 和 Hubert 等最先进模型的最新发展极大地推动了语音识别领域的发展。这些模型采用无需人工标记数据即可从原始音频中学习的技术,从而使它们能够有效地使用未标记语音的大型数据集。它们还被扩展为使用多达 1,000,000 小时的训练数据,远远超过学术监督数据集中使用的

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.