Problem
我的项目由一个实时录制音频的桌面应用程序组成,我打算为此接收来自 API 的实时识别反馈。与一个麦克风,使用 Microsoft 新的 Speech-to-Text API 进行实时实现是微不足道的,我的场景与此的不同之处仅在于我的数据被写入MemoryStream
object.
API支持
本文 https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-audio-input-streams解释如何实现 APIRecognizer
(link https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.recognizer) with 自定义音频流,这总是需要抽象类的实现PullAudioInputStream
(link https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.audio.pullaudioinputstream)以创建所需的AudioConfig
对象使用CreatePullStream
方法 (link https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.audio.audioinputstream.createpullstream)。也就是说,要实现我的要求,就必须实现一个回调接口。
实施尝试
由于我的数据被写入 MemoryStream(并且我使用的库只会记录到文件或 Stream 对象),因此在下面的代码中我只需将缓冲区复制到实现的类(也许是以一种草率的方式?) 解决方法签名中的分歧。
class AudioInputCallback : PullAudioInputStreamCallback
{
private readonly MemoryStream memoryStream;
public AudioInputCallback(MemoryStream stream)
{
this.memoryStream = stream;
}
public override int Read(byte[] dataBuffer, uint size)
{
return this.Read(dataBuffer, 0, dataBuffer.Length);
}
private int Read(byte[] buffer, int offset, int count)
{
return memoryStream.Read(buffer, offset, count);
}
public override void Close()
{
memoryStream.Close();
base.Close();
}
}
The Recognizer
实施如下:
private SpeechRecognizer CreateMicrosoftSpeechRecognizer(MemoryStream memoryStream)
{
var recognizerConfig = SpeechConfig.FromSubscription(SubscriptionKey, @"westus");
recognizerConfig.SpeechRecognitionLanguage =
_programInfo.CurrentSourceCulture.TwoLetterISOLanguageName;
// Constants are used as constructor params)
var format = AudioStreamFormat.GetWaveFormatPCM(
samplesPerSecond: SampleRate, bitsPerSample: BitsPerSample, channels: Channels);
// Implementation of PullAudioInputStreamCallback
var callback = new AudioInputCallback(memoryStream);
AudioConfig audioConfig = AudioConfig.FromStreamInput(callback, format);
//Actual recognizer is created with the required objects
SpeechRecognizer recognizer = new SpeechRecognizer(recognizerConfig, audioConfig);
// Event subscriptions. Most handlers are implemented for debugging purposes only.
// A log window outputs the feedback from the event handlers.
recognizer.Recognized += MsRecognizer_Recognized;
recognizer.Recognizing += MsRecognizer_Recognizing;
recognizer.Canceled += MsRecognizer_Canceled;
recognizer.SpeechStartDetected += MsRecognizer_SpeechStartDetected;
recognizer.SpeechEndDetected += MsRecognizer_SpeechEndDetected;
recognizer.SessionStopped += MsRecognizer_SessionStopped;
recognizer.SessionStarted += MsRecognizer_SessionStarted;
return recognizer;
}
如何将数据提供给识别器(使用 CSCore):
MemoryStream memoryStream = new MemoryStream(_finalSource.WaveFormat.BytesPerSecond / 2);
byte[] buffer = new byte[_finalSource.WaveFormat.BytesPerSecond / 2];
_soundInSource.DataAvailable += (s, e) =>
{
int read;
_programInfo.IsDataAvailable = true;
// Writes to MemoryStream as event fires
while ((read = _finalSource.Read(buffer, 0, buffer.Length)) > 0)
memoryStream.Write(buffer, 0, read);
};
// Creates MS recognizer from MemoryStream
_msRecognizer = CreateMicrosoftSpeechRecognizer(memoryStream);
//Initializes loopback capture instance
_soundIn.Start();
await Task.Delay(1000);
// Starts recognition
await _msRecognizer.StartContinuousRecognitionAsync();
Outcome
当应用程序运行时,我没有收到任何异常,也没有收到除 API 之外的任何响应SessionStarted
and SessionStopped
,如下图我的应用程序的日志窗口所示。
我可以使用不同方法的建议来实现,因为我怀疑在绑定记录时存在一些计时问题DataAvailable
实际向 API 发送数据的事件,这使其过早丢弃会话。由于没有详细反馈我的请求为何不成功,我只能猜测原因。