Convert Speech to Text in Power Apps

This post is part 1 of a 3 part series on Converting Speech to Text in Power Apps.

In this article, we will convert Speech recorded with the Microphone Control of Power Apps to Text using Azure Cognitive Services.

This is an advanced topic related to a business scenario since it effectively allows a Power User to consume the Speech API in Azure Cognitive services for converting Speech to Text.

To reduce the complexity, we will divide this article in four parts:

  1. Design a Canvas App with The Microphone Control to capture Audio.
  2. Create an Azure Function to convert audio captured in Power Apps from WEBM to WAV format using FFmpeg.
  3. Create a Power Automate (Flow) to create an HTML file, using the text obtained from the output of the Speech to Text action.

Prerequisites-

Before you begin, please make sure the following prerequisites are in place:

Designing a Canvas App with The Microphone Control to Capture Audio

Step 1- Creating the basic structure of The Canvas App

  • Go to powerapps.com, sign in with your work or school account, click on the Apps menu in the left navigation bar, and then click on ‘+Create’ and Select Canvas app from blank.
  • Specify the Name and the Format of the APP and Click on ‘Create’.
60
  • Add a Microphone control as below.

61

  • Now add a Button to the Canvas as shown below.

105

  • Next, rename the Button to ‘Submit’ as shown.

70

100

  • Add two Lables to the Canvas that we will use as containers to hold values.

101

  • Rename Label 1 to ‘AudioJSON’ and Label 2 to ‘TextOutput’, as shown.

103

  • The final structure of Power Apps is shown below.

73

  • Now that we have the outer body ready, let’s go ahead and configure our components with formulas.

Step 2-  Configuring the components with Formula’s

  • We will first configure a collection called “AudioCollection” to add a recorded Audio sample .
  • Then,we’ll create variable ‘JSONValue’ and set it with the JSON of the audio.
  • Select the ‘OnStop’ property of Microphone and add the following formula to it:

ClearCollect(AudioCollection,Microphone1.Audio);Set(JSONValue,JSON(AudioCollection,JSONFormat.IncludeBinaryData));

11

  • Select the ‘Text’property of ‘AudioJSON’ label as below to display it’s value set earlier.

12

  • Now it’s time to add a Power Automate (Flow) to our Power Apps.
  • Inside the Action menu, there is an option to add ‘Power Automate’ to your existing Power Apps. To do this, click on the ‘Power Automate’ option as highlighted.

15

  • Then, click on ‘Create a new Flow’ as shown below.

105

  • Rename the Power Automate (Flow) to “Convert Speech To Text” and add ‘PowerApps’ as a trigger .
  • Once that has been done, add a ‘Compose Action’and select ‘Ask in PowerApps’ from the Dynamic content in the pane on the right side of the image below.
  • Make sure you click on Save.

13

Note – We completed these steps in the Power Automate (Flow) just so that we can get the Power Automate (Flow) added to our Power Apps. Later in this article,  we will add more actions to the Power Automate (Flow), so as to carry out a Speech to Text conversion.

  • Finally select the ‘OnSelect’ property of the ‘Submit’ button and add the following formula.

Set(DesiredOutput,’ConvertSpeechToText’.Run(JSON(First(AudioCollection),JSONFormat.IncludeBinaryData)));

17

  • The above formula tells the ‘Submit’ button to trigger a Power Automate (Flow) with name ‘ConvertSpeechToText‘ created earlier using the .Run() method in which we are passing the JSON of the first audio sample sitting in the Audio Collection.
  • Now that we have our Power App ready, let’s head towards an Azure function.

Now that we have our Power App ready let’s head towards creating and configuring an Azure function to convert Audio captured in Power Apps from Webm to Wav format using FFMpeg in the next post.


10 thoughts on “Convert Speech to Text in Power Apps

  1. Went through your blogs and I can surely say each and every concept has been explained in a very simple manner. If we go through your blogs and try to implement that content with a real time example it surely connects all the dots which makes it very easy for any beginner to understand the concept in a better way.

    Awaiting for the further blogs. Keep up the good work❤

    Liked by 1 person

  2. Great article!
    I am a Powerapps beginner and following this article to get myself familiar. There were a few places where I am seeing some errors:
    1. As per the instructions, I created a new Power Automate workflow called “ConvertSpeechToText”. This flow has PowerApps as a trigger and then Compose action with one argument (ask in powerapps). After this, I edited the OnSelect event of the Submit button copying the following code: Set(DesiredOutput,’ConvertSpeechToText’.Run(JSON(First(AudioCollection),JSONFormat.IncludeBinaryData)));
    However, after this, shows an error in the above formula. I have verified/ reentered all the syntax, including different function names. The issue is with incompatible context variable types in this formula. Is there something I am missing here?
    2. In part 3 of this series, can you please elaborate on the following para: “In the ‘ParseJSON’ action, click on ‘use sample payload to generate schema’, it should open a Modal dialog box, where you then paste the content from the label AudioJSON (refer to part 1 of this blog series)and click on Done” . … The newbie question is on what contents from AudioJSON label is it instructing to paste here. I see the “Generate from sample button” , but which sample needs to be copied from AudioJSON? All I have in that label is Text(JSONValue) (I did not run the code). OR are you suggesting to copy the json as in the screenshot provided after that para?
    thanks

    Like

    • Hello Buddy, a good way to debug would be to only enter the following formula ConvertSpeechToText’.Run(JSON(First(AudioCollection),JSONFormat.IncludeBinaryData)) and check if the error disappears. For the next question, yes just paste the JSON as in the screenshot provided after the para. If you still have issues please let me know and we will resolve this on a screenshare session.

      Like

  3. Great article! Is it possible to show an example of using Text to Speech. i.e. Using Azure Cognitive Services to read the content of a text field or label.

    Like

Leave a comment