WasapiLoopbackCapture 内部音频识别在没有音频时给出乱码和文本

2023-12-09

我终于构建了一个程序来使用 NAudio 监听内部音频环回,并输出识别的文本。问题是它会听,并且总是说,例如:

Recognized text: had
Recognized text: had
Recognized text: had
Recognized text: had
Recognized text: had had phone Le K add phone Laton
Recognized text: had phone looked had phone looked had phone looked had phone lo
oked zone
Recognized text: had phone lines to had, had phone looked had phone looked had p
hone line had phone
Recognized text: had phone line had phone looked had phone
Recognized text: had phone looked had phone looked had phone line had phone
Recognized text: had phone looked had phone look to had pot they had phone lit o
nly had phone
Recognized text: had phone line had phone looked had phone line to had to had ph
one
Recognized text: had phone line had phone looked had phone looked had phone
Recognized text: had phone line had phone looked had phone looked had phone line
 10 only T had phone
Recognized text: had phone line had
Recognized text: had phone line had phone looked had phone line had
Recognized text: had phone Le tone looked had
Recognized text: had phone looked had phone looked had phone
Recognized text: had phone line had phone line had phone licked had phone
Recognized text: had phone lines to had popped the own

和类似的废话,但即使当我暂停音频时,它也只是一次又一次地显示“识别的文本:有”或“一个”。当我取消暂停音频时,它始终无法成功识别内部音频。有没有办法解决这个问题,或者至少得到它试图发送到 Microsoft 语音识别识别器的波形?

using System;
using System.Speech.Recognition;
using NAudio.Wave;
using NAudio.CoreAudioApi.Interfaces;

using NAudio.CoreAudioApi;
using System.IO;
using System.Speech.AudioFormat;
using NAudio.Wave.SampleProviders;
using NAudio.Utils;
using System.Threading;
using System.Collections.Generic;

namespace SpeechRecognitionApp
{
    class SpeechStreamer : Stream
    {
        private AutoResetEvent _writeEvent;
        private List<byte> _buffer;
        private int _buffersize;
        private int _readposition;
        private int _writeposition;
        private bool _reset;

        public SpeechStreamer(int bufferSize)
        {
            _writeEvent = new AutoResetEvent(false);
            _buffersize = bufferSize;
            _buffer = new List<byte>(_buffersize);
            for (int i = 0; i < _buffersize; i++)
                _buffer.Add(new byte());
            _readposition = 0;
            _writeposition = 0;
        }

        public override bool CanRead
        {
            get { return true; }
        }

        public override bool CanSeek
        {
            get { return false; }
        }

        public override bool CanWrite
        {
            get { return true; }
        }

        public override long Length
        {
            get { return -1L; }
        }

        public override long Position
        {
            get { return 0L; }
            set { }
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            return 0L;
        }

        public override void SetLength(long value)
        {

        }

        public override int Read(byte[] buffer, int offset, int count)
        {
            int i = 0;
            while (i < count && _writeEvent != null)
            {
                if (!_reset && _readposition >= _writeposition)
                {
                    _writeEvent.WaitOne(100, true);
                    continue;
                }
                buffer[i] = _buffer[_readposition + offset];
                _readposition++;
                if (_readposition == _buffersize)
                {
                    _readposition = 0;
                    _reset = false;
                }
                i++;
            }

            return count;
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            for (int i = offset; i < offset + count; i++)
            {
                _buffer[_writeposition] = buffer[i];
                _writeposition++;
                if (_writeposition == _buffersize)
                {
                    _writeposition = 0;
                    _reset = true;
                }
            }
            _writeEvent.Set();

        }

        public override void Close()
        {
            _writeEvent.Close();
            _writeEvent = null;
            base.Close();
        }

        public override void Flush()
        {

        }
    }

    class FakeStreamer : Stream
    {
        public bool bExit = false;
        Stream stream;
        Stream client;
        public FakeStreamer(Stream client)
        {
            this.client = client;
            this.stream = client;
        }
        public override bool CanRead
        {
            get { return stream.CanRead; }
        }

        public override bool CanSeek
        {
            get { return false; }
        }

        public override bool CanWrite
        {
            get { return stream.CanWrite; }
        }

        public override long Length
        {
            get { return -1L; }
        }

        public override long Position
        {
            get { return 0L; }
            set { }
        }
        public override long Seek(long offset, SeekOrigin origin)
        {
            return 0L;
        }

        public override void SetLength(long value)
        {
            stream.SetLength(value);
        }
        public override int Read(byte[] buffer, int offset, int count)
        {
            int len = 0, c = count;
            while (c > 0 && !bExit)
            {
                //try {
                    len = stream.Read(buffer, offset, c);
                /*}
                catch (Exception e)
                {
                    Console.WriteLine("ouch");
                }
                if (!client.Connected || len == 0)
                {
                    //Exit read loop
                    return 0;
                }*/
                offset += len;
                c -= len;
            }
            return count;
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            stream.Write(buffer, offset, count);
        }

        public override void Close()
        {
            stream.Close();
            base.Close();
        }

        public override void Flush()
        {
            stream.Flush();
        }
    }

    class Program
    {
        static void Main(string[] args)
        {

            // Create an in-process speech recognizer for the en-US locale.  
            using (
            SpeechRecognitionEngine recognizer =
              new SpeechRecognitionEngine(
                new System.Globalization.CultureInfo("en-US")))
            {

                // Create and load a dictation grammar.  
                recognizer.LoadGrammar(new DictationGrammar());

                // Add a handler for the speech recognized event.  
                recognizer.SpeechRecognized +=
                  new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);

                // Configure input to the speech recognizer.  
                //recognizer.SetInputToDefaultAudioDevice();  
                WasapiLoopbackCapture capture = new WasapiLoopbackCapture();
                BufferedWaveProvider WaveBuffer = new BufferedWaveProvider(capture.WaveFormat);
                WaveBuffer.DiscardOnBufferOverflow = true;
                //WaveBuffer.ReadFully = false;
                WaveToSampleProvider sampleStream = new WaveToSampleProvider(WaveBuffer);
                StereoToMonoSampleProvider monoStream = new StereoToMonoSampleProvider(sampleStream)
                {
                    LeftVolume = 1f,
                    RightVolume = 1f
                };

                //Downsample to 8000 https://stackoverflow.com/questions/48233099/capture-audio-from-wasapiloopbackcapture-and-convert-to-mulaw
                WdlResamplingSampleProvider resamplingProvider = new WdlResamplingSampleProvider(monoStream, 16000);
                SampleToWaveProvider16 ieeeToPcm = new SampleToWaveProvider16(resamplingProvider);
                var arr = new byte[128];
                Stream captureConvertStream = new System.IO.MemoryStream();
                capture.StartRecording();
                //outputStream = new MuLawConversionProvider(ieeeToPcm);

                Stream captureStream = new System.IO.MemoryStream();
                //Stream buffStream = new FakeStreamer(captureStream);
                capture.DataAvailable += (s, a) =>
                {
                    //It is getting here.
                    //captureStream.Write(a.Buffer, 0, a.BytesRecorded);
                    WaveBuffer.AddSamples(a.Buffer, 0, a.BytesRecorded);
                };
                Console.WriteLine(capture.WaveFormat.AverageBytesPerSecond);
                Console.WriteLine(capture.WaveFormat.BitsPerSample);
                //var newFormat = new WaveFormat(8000, 16, 1);
                //using (var conversionStream = new WaveFormatConversionStream(newFormat, capture)
                //capture.StartRecording();
                //using (var resampler = new MediaFoundationResampler(new NAudio.Wave.RawSourceWaveStream(captureStream, capture.WaveFormat), newFormat))
                //{
                    //resampler.ResamplerQuality = 60;
                    //WaveFileWriter.WriteWavFileToStream(captureConvertStream, resampler);
                    //recognizer.SetInputToDefaultAudioDevice();
                    //Stream buffStream = new FakeStreamer(captureConvertStream);
                    Stream buffStream = new SpeechStreamer(2048);
                    //recognizer.SetInputToWaveStream(buffStream);
                    recognizer.SetInputToAudioStream(buffStream, new SpeechAudioFormatInfo(
                        16000, AudioBitsPerSample.Eight, AudioChannel.Mono));

                    // Start asynchronous, continuous speech recognition.  
                    recognizer.RecognizeAsync(RecognizeMode.Multiple);

                    /*System.Threading.Thread.Sleep(5000);
                    works when playing anything
                    var floata = new float[128];
                    while(monoStream.Read(floata, 0, floata.Length) > 0 )
                    {
                        Console.WriteLine(arr.Length);
                    }*/
                    while (ieeeToPcm.Read(arr, 0, arr.Length) > 0)
                    {
                        //Console.Write("Writing PCM ");
                        //Console.WriteLine(arr.Length);
                        //captureConvertStream.Write(arr, 0, arr.Length);
                        buffStream.Write(arr, 0, arr.Length);
                    }
                    Console.WriteLine("end");

                    /*capture.StartRecording();
                    //Never getting to the resampler, the read is always zero!? even if waiting 5s for the audio to buffer.
                    System.Threading.Thread.Sleep(5000);
                    var arr = new byte[128];
                    while (resampler.Read(arr, 0, arr.Length) > 0)
                    {
                        captureConvertStream.Write(arr, 0, arr.Length);
                        Console.WriteLine("Never getting here");
                    }
                    // Keep the console window open.  
                    while (true)
                    {
                        Console.ReadLine();
                    }*/
                //}
            }
        }

        // Handle the SpeechRecognized event.  
        static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            Console.WriteLine("Recognized text: " + e.Result.Text);
        }
    }
}

SpeechStreamer 类有一些问题,我真的看不出它的目的是什么。我试过。另外,从您的实现中查看波形文件转储,音频确实很不稳定,样本之间有很长的停顿。这可能就是导致语音识别器失效的原因。这是一个例子:您的代码中的 Windows 音量调整声音

正如您可能听到的那样,它非常不稳定,中间有很多沉默。语音识别部分将此识别为:“ta ta ta ta ta ta ta...”

我必须稍微重写您的代码才能转储波形文件,因为 SpeechStream 的 Read 方法在用于读取其内容时会导致永久循环。

要转储波形文件,您可以执行以下操作:

var buffer = new byte[2048];
using (var writer = new WaveFileWriter("tmp.wav", ieeeToPcm.WaveFormat))
{
    //buffStream is changed to a MemoryStream for this to work.
    buffStream.Seek(0,SeekOrigin.Begin);

    while (buffStream.Read(buffer, 0, buffer.Length)>0)
    {
        writer.Write(buffer, 0, buffer.Length);
    }
}

或者,您可以在阅读时执行此操作SampleToWaveProvider16:

var writer = new WaveFileWriter("dump.wav", ieeeToPcm.WaveFormat);
while (ieeeToPcm.Read(arr, 0, arr.Length) > 0)
{
    if (Console.KeyAvailable && Console.ReadKey().Key == ConsoleKey.Escape)
        break;
    buffStream.Write(arr, 0, arr.Length);
    writer.Write(arr, 0, arr.Length);
}

我刚刚添加了击中的能力Escape退出循环。

现在我确实想知道你为什么使用 NAudio?为什么不使用 Sound.Speech API 原生的方法?

class Program
{
    private static ManualResetEvent _done;
    static void Main(string[] args)
    {
        _done = new ManualResetEvent(false);

        using (SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine(new CultureInfo("en-US")))
        {
            recognizer.LoadGrammar(new DictationGrammar());
            recognizer.SpeechRecognized += RecognizedSpeech;
            recognizer.SetInputToDefaultAudioDevice();
            recognizer.RecognizeAsync(RecognizeMode.Multiple);
            _done.WaitOne();
        }
    }

    private static void RecognizedSpeech(object sender, SpeechRecognizedEventArgs e)
    {
        if (e.Result.Text.Contains("exit"))
        {
            _done.Set();
        }

        Console.WriteLine(e.Result.Text);
    }
}
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

WasapiLoopbackCapture 内部音频识别在没有音频时给出乱码和文本 的相关文章

随机推荐

  • 如何获取连接到 Android 设备的 USB 配件列表?

    我有一个基于棒棒糖的安卓盒子 该盒子有一个 USB 端口 类似于 Android 手机中的微型 USB 端口 我想检查是否有任何主机连接到 USB 端口 并且 Android 盒子处于设备模式 类似于手机连接到 PC 时的情况 至少 我只需
  • 哪些类型的控件不会显示在 Spy++/Inspect 中?

    我们正在开发一个 Windows 桌面应用程序 该应用程序可以从其他应用程序上的其他控件中提取值 我们的产品非常适合大多数桌面应用程序 我注意到某些控件没有显示在检查和 Spy 中 例如 在 GP 2015 客户端中 只有少数控件可在 Sp
  • 计算平均字长和平均句子长度的 Java 代码

    好吧 我是一个相对较新的程序员 在这项任务上我遇到了很大的困难 作业是创建一个 2 类 java 代码 该代码将读取一个文件 具体是一本书 并对其进行分析以计算平均句子长度和平均单词长度 正确的输出应该是这样的 gt java WordMa
  • python相同unicode的不同长度

    我发现关于 unicode 的一些事情很奇怪 根据我的理解 如果我 u string 类型将是 unicode 但为什么它们的长度不同 print len u New York u200b 14 print type u New York
  • hdfs 权限被拒绝

    我是hadoop分布式文件系统的新手 我已经在我的机器上完成了hadoop单节点的完整安装 但是之后当我将数据上传到hdfs时 它给出了错误消息Permission Denied 来自终端的消息 包含命令 hduser ubuntu usr
  • 为什么我们在c++中需要接口或纯虚函数

    为什么我们在c 中需要接口 纯虚函数或抽象类 我们可以不使用抽象类 而使用一个基类并在其中定义虚函数 并在派生类中重写该虚函数吗 上述方法有什么优点和缺点 除了我们可以创建基类的对象 纯虚函数适用于没有合理的方法在基类中实现该函数的情况 例
  • 如何重新排序整数的字节? [复制]

    这个问题在这里已经有答案了 我的任务是使用 C 将数据文件从大端转换为小端 反之亦然 我已经在网上查找了大约 3 个小时的其他示例并阅读了我的教科书 但是我对如何开始这个问题非常困惑功能 到目前为止 我的事件顺序是正确的 1 到 4 但在我
  • 适用于所有 iOS (iPhone/iPad/Apple Watch) 设备的 App Store Connect 屏幕截图尺寸

    我正在尝试向 App Store 提交新的应用程序 但现在 Apple 要求提供 4 7 英寸和 5 5 英寸 iPhone 的屏幕截图 有人有这些截图规格 尺寸 吗 我尝试过 640 x 1136 纵向 640 x 1096 纵向 113
  • 如何解决问题:int无法解引用

    这里我有一些值 其中两个是整数 并且我无法对它们调用方法 因为它们不是引用 我该如何解决这个问题 String srcAddr dstAddr protocol int srcPort dstPort public int hashCode
  • 如何在 Django Rest Framework 中显示 ManyToMany 字段的值而不是 Id?

    Model class Genre models Model name models CharField max length 100 def str self return self name class Song models Mode
  • 带有悬挂缩进的内联描述列表

    我想创建一个描述列表 其中每个术语和描述对都出现在一行中 但这些 行 带有悬挂缩进 以防它们换行 这是我想要的视觉效果 但使用p元素代替 p margin 0 0 0 2em text indent 2em p b H b Himenaeo
  • 为什么 Java 8 中的函数式接口有一个抽象方法?

    我们知道 在Java 8中 引入了函数式接口的概念 函数式接口有一个abstract方法和几个默认或静态方法是可能的 但为什么功能接口应该只有一个抽象方法呢 如果接口有多个抽象方法 为什么这不是函数式接口 函数式接口也称为单一抽象方法接口引
  • 具有多个配置文件模型的 Django-allauth

    我有一个 django 项目 其中有多个配置文件模型 每个模型都有一个指向 User 模型的外键 它用django allauth进行注册 目前 当使用社交帐户注册时 用户注册 创建用户和社交帐户 然后将用户重定向到要填写的表单 具体取决于
  • Python:numpy数组列表,不能执行index()吗?

    center 是 numpy 数组的列表 Shortest dist 1 是一个 numpy 数组 但是 当我这样做时 centers index shortest dist 1 它告诉我 ValueError The truth valu
  • TCP 多播和多线程

    我需要提出能够可靠地多播到其他客户端的客户端 这意味着我将使用 TCP 在多播组内的客户端之间进行可靠连接 这不是达到了 n 2 个连接吗 这对我来说似乎有点愚蠢 是否 不应该有一种方法可以更轻松 更可靠地进行多播 编辑 UNIX C 编辑
  • 页面未正确重定向 Ruby on Rails

    由于某种原因 我在单击页面的受保护部分时收到错误 Firefox 页面未正确重定向 这是我在我的中使用的方法ApplicationController protected def authorize unless User find by
  • 如何让 RabbitMQ 具有可扩展性?

    我尝试测试RabbitMQ 但发现rabbitmq存在一些问题 如果我创建了一个包含 3 个节点的集群 我无法发布 交付超过 6000 s 另一方面 如果我使用一个节点 我可以发布 交付直到 25000 s 这意味着 添加的节点越多 性能就
  • cmap、vmin、vmax 内部做什么(matplotlib)?

    假设我使用 matplotlib 的 imshow 显示图像 如下所示 plt imshow IMG cmap hot vmin 0 20 vmax 0 90 where IMG是一个二维灰度图像 数据类型为 float64 数据值在 0
  • Android - 如何绘制2个方向的渐变?

    我开始使用渐变 我发现绘制 1 向渐变非常容易 例如从上到下 从左到右 或对角线 但如何绘制 2 向渐变 我的意思是这样的 大蓝色矩形是 2 向渐变 右上角是蓝色 左侧转变为白色 底部转变为黑色 这个要怎么画呢 答案是 你必须组合2个不同的
  • WasapiLoopbackCapture 内部音频识别在没有音频时给出乱码和文本

    我终于构建了一个程序来使用 NAudio 监听内部音频环回 并输出识别的文本 问题是它会听 并且总是说 例如 Recognized text had Recognized text had Recognized text had Recog