This post is part 2 of a 3 part series on Converting Speech to Text in Power Apps.
In our previous post we created a Canvas app and configured it with formulas to capture data in the form of audio from the Microphone control of Power App.
To reduce the complexity, we have divided this article in four parts:
- Design a Canvas App with The Microphone Control to capture Audio.
- Create an Azure Function to convert Microphone Audio captured in WEBM format to WAV format using FFmpeg.
- Create a Power Automate (Flow) to create an HTML file, using the text obtained from the output of the Speech to Text action.
In this post we will be creating an Azure function to convert Audio captured from Microphone control in Power apps from WEBM to WAV format using FFMpeg.
- Looking at the heading you might be wondering what are all these odd acronyms? (WEBM, WAV, FFmpeg) .
- To answer this, let’s backtrack a bit, so you get to know the gist of why we’re driving our solution in this particular way.
WEBM and WAV formats-
- Webm is an audiovisual media format.
- Whenever you record your audio in a Microphone control in PowerApps it always gets encoded in WEBM format.
- If you go ahead and record an Audio in our Power apps and check the ‘AudioJSON’ label, you can see that the recorded audio is actually in WEBM format.
- A WAV is a raw audio format created by Microsoft and IBM.
- To consume The Azure Cognitive Speech to Text Service, you need to pass audio to the service in either WAV or OGG formats as shown below.
Why Only WAV or OGG Formats?
We pass-in WAV or OGG files, since if you pass-in a WEBM audio file to The Speech to Text Cognitive Service, it pops up an error stating that ‘This audio cannot be converted to text since WEBM is an unsupported format”.
So, to avoid this problem, we bring FFmpeg in to our rescue!
- FFmpeg is basically an audio and video converter.
- We’re going to use FFmpeg to convert the Microphone Audio in WEBM format to an audio file in WAV format, so we can pass that file to The Azure Speech to Text Cognitive Services.
Simply put, we’re going to make use of an Azure function to build a simple API, which will do the work of converting a WEBM file to a WAV file for us . This API will be making use of FFmpeg to do the actual conversion itself.
Setting up the Azure function and FFmpeg to convert Audio with WEBM format to WAV format:
- Navigate to azure.com.
- From the left hand navigation section, select the ‘Function App’ to start creating a new Azure function.
- On the Page that opens next ,click on ‘+new’ and fill-in the details to create a new ‘Function App’ .
- Create a separate, dedicated resource group, to keep our app clean and organised.
- Give a meaningful name to the Function App .
- Make sure to select Runtimestack as .NetCore and you can select any region of your choice .
- Next, create a storage account and select Windows as the operating system.
- Select a plan type on the basis of your requirement, here I’m choosing to go ahead with ‘Consumption’ type plan.
- Once you’ve made your selections, click on Review+Create,which then starts ramping up the function App for you.
- Next, we need to start setting up The Cognitive Services. Go back to the home page of The Azure Portal and search for Cognitive Services. Click on the Cognitive Services and under it look for Speech Services as shown.
- Once you locate it, click on Create.
- Fill in the details for Name and Location along with the Pricing tier as shown below and click on Create.
- Once the deployment is successful, you will get a Key and an endpoint . Make sure you copy this and keep them handy, because you’re going to need them later.
- Now that we’ve created both the Function App, as well as a Speech Service, it’s time to setup The Azure Function that will consume The FFmpeg application and bring about the audio format conversion.
Go back to the Function App and click on New function as highlighted below:
- When asked for the template, click on HTTP Trigger and select the Authorization level as Anonymous.
- Once done, click on Create.
- We don’t have to write the code from scratch (or even write anything for that matter) . I have included exception handling in the code so that you can catch the error if the code fails in your environment at any stage.
- John Liu (Microsoft MVP) has simplified things for us, as he has already executed FFmpeg inside an Azure function using C#.
- Here is the GitHub reference for his brilliant code that we can borrow.
#r “Microsoft.WindowsAzure.Storage”
using Microsoft.WindowsAzure.Storage.Blob;
using System.Diagnostics;
using System.IO;
using System.Net;
using System.Net.Http.Headers;
public static HttpResponseMessage Run(Stream req, TraceWriter log)
{
var temp = Path.GetTempFileName() + “.webm”;
var tempOut = Path.GetTempFileName() + “.wav”;
var tempPath = Path.Combine(Path.GetTempPath(),Guid.NewGuid().ToString());
Directory.CreateDirectory(tempPath);
using (var ms = new MemoryStream())
{
req.CopyTo(ms);
File.WriteAllBytes(temp, ms.ToArray());
}
var bs = File.ReadAllBytes(temp);
log.Info($”Renc Length: {bs.Length}”);
try{
var psi = new ProcessStartInfo();
psi.FileName = @”D:\home\site\wwwroot\ConvertAudioFormatUsingFFMpeg\ffmpeg.exe”;
psi.Arguments = $”-i \”{temp}\” \”{tempOut}\””;
psi.RedirectStandardOutput = true;
psi.RedirectStandardError = true;
psi.UseShellExecute = false;
log.Info($”Args: {psi.Arguments}”);
var process = Process.Start(psi);
process.WaitForExit((int)TimeSpan.FromSeconds(60).TotalMilliseconds);
}
catch(Exception ex){
log.Info(ex.Message);
}
var bytes = File.ReadAllBytes(tempOut);
log.Info($”Renc Length: {bytes.Length}”);
var response = new HttpResponseMessage(HttpStatusCode.OK);
response.Content = new StreamContent(new MemoryStream(bytes));
response.Content.Headers.ContentType = new MediaTypeHeaderValue(“audio/wav”);
File.Delete(tempOut);
File.Delete(temp);
Directory.Delete(tempPath, true);
return response;
}
- If you take a close look at the code above, you will see that there is a directory path as noted below:
@”D:\home\site\wwwroot\ConvertAudioFormatUsingFFMpeg\ffmpeg.exe”;
- This is nothing but the directory where we’re going to download and place FFmpeg files, so that the Azure Function can make use of these files while running a conversion between the audio formats.
- So how does this file system become available? Kudu will be helping us with this.
- The advanced tools for App Service (also known as Kudu) provides access to advanced administrative features of your function app.
- From Kudu you manage system information, app settings, environment variables, site extensions, HTTP headers, and server variables.
- You can also launch Kudu by browsing to the SCM endpoint for your function app.
- Navigate back to your Function app and click on Function App settings.
- Once there, click on the Platform features and select Kudu.
- On the page that opens up, select the Debug console and then CMD to open up the Kudu directory structure.
- Now let’s go ahead and start adding the FFmpeg files into the Kudu so that the Azure function can make use of them.
- First move to the location “D:\home\site\wwwroot\” and create a new folder named“ConvertAudioFormatUsingFFmpeg”
- Download the FFmpeg executable file for Windows from here, unzip it and move to the location “D:\home\site\wwwroot\ConvertAudioFormatUsingFFmpeg”
- Place all the files as shown below.
That’s it . Your Azure function is now ready and waiting for an audio sample!
Now that we have our Azure function up and ready waiting for an audio sample,let’s go ahead and configure a Power Automate (Flow) to call the Azure function and the Cognitive service API and get the speech converted to text . We will then enrich the output text with the HTML and convert it to PDF using Muhimbi “Convert HTML to PDF” action in the next post.
Your style is really unique in comparison to other people I have read stuff from. I appreciate you for posting when you’ve got the opportunity, Guess I will just book mark this blog.
LikeLiked by 1 person
Thank you so much Darnell. I appreciate you liking my style. I have pretty interesting blog posts coming up, stay tuned !!!
LikeLike
Thanks for the brief article but for some reason this did not work for me . Code in azure function was failing with error Could not find file ‘D:\local\Temp\tmpCD73.tmp.wav’. Any help would be much appreciated.
LikeLiked by 1 person
Hello buddy, guess you are doing something wrong. Can you add some screenshots and email me ykamdar3@gmail.com . I will take a look and let you know the details.
LikeLike
Experiencing the same issue here
LikeLiked by 1 person
Can you please try the code from this repo – https://github.com/markharrison/azfunction-webm-to-wav
LikeLike
Did you ever get this figured out? I am getting the same error.
LikeLiked by 1 person
Hello buddy, can you try out the code from this repo https://github.com/markharrison/azfunction-webm-to-wav
LikeLike
I finally got this working tweaking some of Mark Harrison’s code to work as an Azure Function. https://github.com/hbouwers/webmtowav
LikeLiked by 1 person
Thanks for your profound explanation. But how do you deploy the c# code zu Azure Functions?
LikeLike
Had a couple of issues with the function code – made a few tweaks :
https://github.com/markharrison/azfunction-webm-to-wav
LikeLiked by 1 person
Hello buddy !! Awesome. Seems the issue is intermittent, the same code works for me but for some it is giving problems. Let me redirect other folks to try your stuff.
LikeLike
Thanks Yash for this excellent Tutorial – but I think I missed something: How to deploy the c# code for the Azure Function?
LikeLike
I missed something: Hoe to deploy the mentioned c# code?
LikeLike
Note: Running the Power App on Windows, the mic control will record in WEBM format, but if you run it on Android (?) or iOS (AAC), it’s a different format.
LikeLike
Hello Craig,thank you so much for your response. You can always modify the code to convert Audio from one format to Wav format. That’s the whole point, you can run the Power Automate once and that tells you which format audio the device is giving out and then just modify the code to convert it from that format to WAV.
Thank you so much for your response on this blog post. I did not know that for IOS the format is AAC.
Keep reading !!!
LikeLike
Download link doesnt work for https://ffmpeg.zeranoe.com/builds/ can you please help.
LikeLike