我正在使用 Microsoft 的 JavaScript 语音 SDK 转录麦克风流。录音和转录都是使用语音 SDK 完成的,我无法找到在录音完成后如何访问和保存录制的音频文件的方法。
创建录音机并录音的代码
recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);
// to start the recording
recognizer.startContinuousRecognitionAsync(
() => {
portFromCS.postMessage({ type: "started", data: "" });
},
err => {
recognizer.close();
},
);
// used after user input to stop the recording
recognizer.stopContinuousRecognitionAsync(
() => {
window.console.log("successfully stopped");
// TODO: somehow need to save the file
},
err => {
window.console.log("error on stop", err);
},
);
The 文档 https://learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/?view=azure-node-latest相当糟糕,我无法找到如何使用他们的 SDK 访问原始音频的内置方法。我唯一的选择是使用两个音频流进行录制并使用单独的录制流保存文件吗?这意味着什么?
SDK 不保存音频,也没有内置的功能。
在版本 1.11.0 中,连接对象中添加了一个新的 API,以允许您查看发送到服务的消息,您可以从中提取音频并自行组装波形文件。
这是一些执行此操作的打字稿:
import * as SpeechSdk from "microsoft-cognitiveservices-speech-sdk";
import * as fs from "fs";
const filename: string = "input.wav";
const outputFileName: string = "out.wav";
const subscriptionKey: string = "<SUBSCRIPTION_KEY>";
const region: string = "<SUBSCRIPTION_REGION>";
const speechConfig: SpeechSdk.SpeechConfig = SpeechSdk.SpeechConfig.fromSubscription(subscriptionKey, region);
// Load the audio from a file, alternately you could use
// const audioConfig:SpeechSdk.AudioConfig = SpeechSdk.AudioConfig.fromDefaultMicrophone() in a browser();
const fileContents: Buffer = fs.readFileSync(filename);
const inputStream: SpeechSdk.PushAudioInputStream = SpeechSdk.AudioInputStream.createPushStream();
const audioConfig: SpeechSdk.AudioConfig = SpeechSdk.AudioConfig.fromStreamInput(inputStream);
inputStream.write(fileContents);
inputStream.close();
const r: SpeechSdk.SpeechRecognizer = new SpeechSdk.SpeechRecognizer(speechConfig, audioConfig);
const con: SpeechSdk.Connection = SpeechSdk.Connection.fromRecognizer(r);
let wavFragmentCount: number = 0;
const wavFragments: { [id: number]: ArrayBuffer; } = {};
con.messageSent = (args: SpeechSdk.ConnectionMessageEventArgs): void => {
// Only record outbound audio mesages that have data in them.
if (args.message.path === "audio" && args.message.isBinaryMessage && args.message.binaryMessage !== null) {
wavFragments[wavFragmentCount++] = args.message.binaryMessage;
}
};
r.recognizeOnceAsync((result: SpeechSdk.SpeechRecognitionResult) => {
// Find the length of the audio sent.
let byteCount: number = 0;
for (let i: number = 0; i < wavFragmentCount; i++) {
byteCount += wavFragments[i].byteLength;
}
// Output array.
const sentAudio: Uint8Array = new Uint8Array(byteCount);
byteCount = 0;
for (let i: number = 0; i < wavFragmentCount; i++) {
sentAudio.set(new Uint8Array(wavFragments[i]), byteCount);
byteCount += wavFragments[i].byteLength;
}
// Set the file size in the wave header:
const view = new DataView(sentAudio.buffer);
view.setUint32(4, byteCount, true);
view.setUint32(40, byteCount, true);
// Write the audio back to disk.
fs.writeFileSync(outputFileName, sentAudio);
r.close();
});
它从文件加载,因此我可以在 NodeJS 而不是浏览器中进行测试,但核心部分是相同的。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)