Create an Azure Function to convert Microphone Audio captured in WEBM format to WAV format using FFmpeg

This post is part 2 of a 3 part series on Converting Speech to Text in Power Apps.

In our previous post we created a Canvas app and configured it with formulas to capture data in the form of audio from the Microphone control of Power App.

To reduce the complexity, we have divided this article in four parts:

  1. Design a Canvas App with The Microphone Control to capture Audio.
  2. Create an Azure Function to convert Microphone Audio captured in WEBM format to WAV format using FFmpeg.
  3. Create a Power Automate (Flow) to create an HTML file, using the text obtained from the output of the Speech to Text action.

In this post we will be creating an Azure function to convert Audio captured from Microphone control in Power apps from WEBM to WAV format using FFMpeg.

  • Looking at the heading you might be wondering what are all these odd acronyms? (WEBM, WAV, FFmpeg) .
  • To answer this, let’s backtrack a bit, so you get to know the gist of why we’re driving our solution in this particular way.

WEBM and WAV formats-

  • Webm is an audiovisual media format.
  • Whenever you record your audio in a Microphone control in PowerApps it always gets encoded in WEBM format.
  • If you go ahead and record an Audio in our Power apps and check the ‘AudioJSON’ label, you can see that the recorded audio is actually in WEBM format.


  • A WAV  is a raw audio format created by Microsoft and IBM.
  • To consume The Azure Cognitive Speech to Text Service, you need to pass audio to the service in either WAV or OGG formats as shown below.


Why Only WAV or OGG Formats?

We pass-in WAV or OGG files, since if you pass-in a WEBM audio file to The Speech to Text Cognitive Service, it pops up an error stating that ‘This audio cannot be converted to text since WEBM is an unsupported format”.

So, to avoid this problem, we bring  FFmpeg  in to our rescue!

  • FFmpeg is basically an audio and video converter.
  • We’re going to use FFmpeg to convert the Microphone Audio in WEBM format to an audio file in WAV format, so we can pass that file to The Azure Speech to Text Cognitive Services.

Simply put,  we’re going to make use of an Azure function to build a simple API, which will do the work of converting a WEBM file to a WAV file for us . This API will be making use of FFmpeg to do the actual conversion itself.

Setting up the Azure function and FFmpeg to convert Audio with WEBM format to WAV format:

  • Navigate to
  • From the left hand navigation section, select the ‘Function App’ to start creating a new Azure function.


  • On the Page that opens next ,click on ‘+new’ and fill-in the details to create a new ‘Function App’ .
  • Create a separate, dedicated resource group, to keep our app clean and organised.
  • Give a meaningful name to the Function App .
  • Make sure to select Runtimestack as .NetCore and you can select any region of your choice .


  • Next, create a storage account and select Windows as the operating system.
  • Select a plan type  on the basis of your requirement, here I’m choosing to go ahead with ‘Consumption’ type plan.


  • Once you’ve made your selections, click on Review+Create,which then starts ramping up the function App for you.
  • Next, we need to start setting up The Cognitive Services. Go back to the home page of The Azure Portal and search for Cognitive Services. Click on the Cognitive Services and under it look for Speech Services as shown.
  • Once you locate it, click on Create.


  • Fill in the details for Name and Location along with the Pricing tier as shown below and click on Create.


  • Once the deployment is successful,  you will get a Key and an endpoint . Make sure you copy this and keep them handy, because you’re going to need them later.


  • Now that we’ve created both the Function App, as well as a Speech Service, it’s time to setup The Azure Function that will consume The FFmpeg application and bring about the audio format conversion.

Go back to the Function App and click on New function as highlighted below:


  • When asked for the template, click on HTTP Trigger and select the Authorization level as Anonymous.
  • Once done, click on Create.


  • We don’t have to write the code from scratch (or even write anything for that matter) . I have included exception handling in the code so that you can catch the error if the code fails in your environment at any stage.
  • John Liu (Microsoft MVP) has simplified things for us, as he has already executed FFmpeg inside an Azure function using C#.
  • Here is the GitHub reference for his brilliant code that we can borrow.

#r “Microsoft.WindowsAzure.Storage”

using Microsoft.WindowsAzure.Storage.Blob;

using System.Diagnostics;

using System.IO;

using System.Net;

using System.Net.Http.Headers;

public static HttpResponseMessage Run(Stream req, TraceWriter log)


var temp = Path.GetTempFileName() + “.webm”;

var tempOut = Path.GetTempFileName() + “.wav”;

var tempPath = Path.Combine(Path.GetTempPath(),Guid.NewGuid().ToString());


using (var ms = new MemoryStream())



File.WriteAllBytes(temp, ms.ToArray());


var bs = File.ReadAllBytes(temp);

log.Info($”Renc Length: {bs.Length}”);


var psi = new ProcessStartInfo();

psi.FileName = @”D:\home\site\wwwroot\ConvertAudioFormatUsingFFMpeg\ffmpeg.exe”;

psi.Arguments = $”-i \”{temp}\” \”{tempOut}\””;

psi.RedirectStandardOutput = true;

psi.RedirectStandardError = true;

psi.UseShellExecute = false;

log.Info($”Args: {psi.Arguments}”);

var process = Process.Start(psi);



catch(Exception ex){



var bytes = File.ReadAllBytes(tempOut);

log.Info($”Renc Length: {bytes.Length}”);

var response = new HttpResponseMessage(HttpStatusCode.OK);

response.Content = new StreamContent(new MemoryStream(bytes));

response.Content.Headers.ContentType = new MediaTypeHeaderValue(“audio/wav”);



Directory.Delete(tempPath, true);

return response;


  • If you take a close look at the code above, you will see that there is a directory path as noted below:


  • This is nothing but the directory where we’re going to download and place FFmpeg files, so that the Azure Function can make use of these files while running a conversion between the audio formats.
  • So how does this file system become available? Kudu will be helping us with this.
  • The advanced tools for App Service (also known as Kudu) provides access to advanced administrative features of your function app.
  • From Kudu you manage system information, app settings, environment variables, site extensions, HTTP headers, and server variables.
  • You can also launch Kudu by browsing to the SCM endpoint for your function app.
  • Navigate back to your Function app and click on Function App settings.


  • Once there, click on the Platform features and select Kudu.


  • On the page that opens up, select the Debug console and then CMD to open up the Kudu directory structure.


  • Now let’s go ahead and start adding the FFmpeg files into the Kudu so that the Azure function can make use of them.
  • First move to the location “D:\home\site\wwwroot\” and create a new folder named“ConvertAudioFormatUsingFFmpeg”
  • Download the FFmpeg executable file for Windows from here, unzip it and move to the location “D:\home\site\wwwroot\ConvertAudioFormatUsingFFmpeg”
  • Place all the files as shown below.


That’s it . Your Azure function is now ready and waiting for an audio sample!

Now that we have our Azure function up and ready waiting for an audio sample,let’s go ahead and configure a Power Automate (Flow) to call the Azure function and the Cognitive service API and get the speech converted to text . We will then enrich the output text with the HTML and convert it to PDF using Muhimbi “Convert HTML to PDF” action in the next post.



20 thoughts on “Create an Azure Function to convert Microphone Audio captured in WEBM format to WAV format using FFmpeg

  1. Thanks for the brief article but for some reason this did not work for me . Code in azure function was failing with error Could not find file ‘D:\local\Temp\tmpCD73.tmp.wav’. Any help would be much appreciated.

    Liked by 1 person

    1. Hello buddy !! Awesome. Seems the issue is intermittent, the same code works for me but for some it is giving problems. Let me redirect other folks to try your stuff.


  2. Thanks Yash for this excellent Tutorial – but I think I missed something: How to deploy the c# code for the Azure Function?


  3. Note: Running the Power App on Windows, the mic control will record in WEBM format, but if you run it on Android (?) or iOS (AAC), it’s a different format.


    1. Hello Craig,thank you so much for your response. You can always modify the code to convert Audio from one format to Wav format. That’s the whole point, you can run the Power Automate once and that tells you which format audio the device is giving out and then just modify the code to convert it from that format to WAV.

      Thank you so much for your response on this blog post. I did not know that for IOS the format is AAC.

      Keep reading !!!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s