>解鎖本地語音助手的力量:逐步指南
>多模式大語言模型(LLM)的興起徹底改變了我們與AI的互動方式,從而促進了基於語音的互動。雖然OpenAI的語音啟用ChatGpt提供了方便的解決方案,但構建本地語音助手提供了增強的數據隱私,無限的API呼叫,以及可以針對特定需求進行微調模型的能力。 本指南詳細介紹了基於標準CPU的機器上這種助手的構造。
>
>三個關鍵優勢推動了當地語音助手的吸引力:
這個項目包括四個核心組件:
sounddevice
>
<code class="language-python">import sounddevice as sd import wave import numpy as np sampling_rate = 16000 # Matches Whisper.cpp model recorded_audio = sd.rec(int(duration * sampling_rate), samplerate=sampling_rate, channels=1, dtype=np.int16) sd.wait() audio_file = "<path>/recorded_audio.wav" with wave.open(audio_file, "w") as wf: wf.setnchannels(1) wf.setsampwidth(2) wf.setframerate(sampling_rate) wf.writeframes(recorded_audio.tobytes())</path></code>
ggml-base.en.bin
<code class="language-python">import subprocess WHISPER_BINARY_PATH = "/<path>/whisper.cpp/main" MODEL_PATH = "/<path>/whisper.cpp/models/ggml-base.en.bin" try: result = subprocess.run([WHISPER_BINARY_PATH, "-m", MODEL_PATH, "-f", audio_file, "-l", "en", "-otxt"], capture_output=True, text=True) transcription = result.stdout.strip() except FileNotFoundError: print("Whisper.cpp binary not found. Check the path.")</path></path></code>
qwen:0.5b
run_ollama_command
<code class="language-python">import subprocess import re def run_ollama_command(model, prompt): try: result = subprocess.run(["ollama", "run", model], input=prompt, text=True, capture_output=True, check=True) return result.stdout except subprocess.CalledProcessError as e: print(f"Ollama error: {e.stderr}") return None matches = re.findall(r"] *(.*)", transcription) concatenated_text = " ".join(matches) prompt = f"""Please ignore [BLANK_AUDIO]. Given: "{concatenated_text}", answer in under 15 words.""" answer = run_ollama_command(model="qwen:0.5b", prompt=prompt)</code>文本到語音轉換:
<code class="language-python">import nemo_tts import torchaudio from io import BytesIO try: fastpitch_model = nemo_tts.models.FastPitchModel.from_pretrained("tts_en_fastpitch") hifigan_model = nemo_tts.models.HifiGanModel.from_pretrained("tts_en_lj_hifigan_ft_mixerttsx") fastpitch_model.eval() parsed_text = fastpitch_model.parse(answer) spectrogram = fastpitch_model.generate_spectrogram(tokens=parsed_text) hifigan_model.eval() audio = hifigan_model.convert_spectrogram_to_audio(spec=spectrogram) audio_buffer = BytesIO() torchaudio.save(audio_buffer, audio.cpu(), sample_rate=22050, format="wav") audio_buffer.seek(0) except Exception as e: print(f"TTS error: {e}")</code>
>該修訂後的響應維護核心信息,同時顯著提高了清晰度,結構和代碼格式。 它也可以去除YouTube嵌入,因為它不直接可重複。
>
以上是在您的CPU筆記本電腦上建立LLM和神經網絡的本地語音助手的詳細內容。更多資訊請關注PHP中文網其他相關文章!