Raw New Markdown
Generating updated version of doc...
Rendered New Markdown
Generating updated version of doc...
---
title: include file
description: Java Recognize action quickstart
services: azure-communication-services
author: Kunaal
ms.service: azure-communication-services
ms.subservice: call-automation
ms.date: 11/20/2023
ms.topic: include
ms.author: kpunjabi
---
## Prerequisites
- Azure account with an active subscription, for details see [Create an account for free.](https://azure.microsoft.com/pricing/purchase-options/azure-account?cid=msft_learn).
- Azure Communication Services resource. See [Create an Azure Communication Services resource](../../../quickstarts/create-communication-resource.md?tabs=windows&pivots=platform-azp)
- Create a new web service application using the [Call Automation SDK](../../../quickstarts/call-automation/callflows-for-customer-interactions.md).
- [Java Development Kit](/java/azure/jdk/?preserve-view=true&view=azure-java-stable) version 8 or above.
- [Apache Maven](https://maven.apache.org/download.cgi).
### For AI features
- Create and connect [Azure AI services to your Azure Communication Services resource](../../../concepts/call-automation/azure-communication-services-azure-cognitive-services-integration.md).
- Create a [custom subdomain](/azure/ai-services/cognitive-services-custom-subdomains) for your Azure AI services resource.
## Technical specifications
The following parameters are available to customize the Recognize function:
| Parameter | Type | Default (if not specified) | Description | Required or Optional |
| ------- | --- | ------------------------ | --------- | ------------------ |
| `Prompt` <br/><br/> *(For details, see [Customize voice prompts to users with Play action](../play-ai-action.md))* | FileSource, TextSource | Not set | The message to play before recognizing input. | Optional |
| `InterToneTimeout` | TimeSpan | 2 seconds <br/><br/>**Min:** 1 second <br/>**Max:** 60 seconds | Limit in seconds that Azure Communication Services waits for the caller to press another digit (inter-digit timeout). | Optional |
| `InitialSegmentationSilenceTimeoutInSeconds` | Integer | 0.5 second | How long recognize action waits for input before considering it a timeout. See [How to recognize speech](/azure/ai-services/speech-service/how-to-recognize-speech). | Optional |
| `RecognizeInputsType` | Enum | dtmf | Type of input that is recognized. Options are `dtmf`, `choices`, `speech`, and `speechordtmf`. | Required |
| `InitialSilenceTimeout` | TimeSpan | 5 seconds<br/><br/>**Min:** 0 seconds <br/>**Max:** 300 seconds (DTMF) <br/>**Max:** 20 seconds (Choices) <br/>**Max:** 20 seconds (Speech)| Initial silence timeout adjusts how much nonspeech audio is allowed before a phrase before the recognition attempt ends in a "no match" result. See [How to recognize speech](/azure/ai-services/speech-service/how-to-recognize-speech). | Optional |
| `MaxTonesToCollect` | Integer | No default<br/><br/>**Min:** 1|Number of digits a developer expects as input from the participant.| Required |
| `StopTones` | IEnumeration\<DtmfTone\> | Not set | The digit participants can press to escape out of a batch DTMF event. | Optional |
| `InterruptPrompt` | Bool | True | If the participant has the ability to interrupt the playMessage by pressing a digit. | Optional |
| `InterruptCallMediaOperation` | Bool | True | If this flag is set, it interrupts the current call media operation. For example if any audio is being played it interrupts that operation and initiates recognize. | Optional |
| `OperationContext` | String | Not set | String that developers can pass mid action, useful for allowing developers to store context about the events they receive. | Optional |
| `Phrases` | String | Not set | List of phrases that associate to the label. Hearing any of these phrases results in a successful recognition. | Required |
| `Tone` | String | Not set | The tone to recognize if user decides to press a number instead of using speech. | Optional |
| `Label` | String | Not set | The key value for recognition. | Required |
| `Language` | String | En-us | The language that is used for recognizing speech. | Optional |
| `EndSilenceTimeout` | TimeSpan | 0.5 second | The final pause of the speaker used to detect the final result that gets generated as speech. | Optional |
>[!NOTE]
>In situations where both DTMF and speech are in the `recognizeInputsType`, the recognize action acts on the first input type received. For example, if the user presses a keypad number first then the recognize action considers it a DTMF event and continues listening for DTMF tones. If the user speaks first then the recognize action considers it a speech recognition event and listens for voice input.
## Create a new Java application
In your terminal or command window, navigate to the directory where you would like to create your Java application. Run the `mvn` command to generate the Java project from the maven-archetype-quickstart template.
```console
mvn archetype:generate -DgroupId=com.communication.quickstart -DartifactId=communication-quickstart -DarchetypeArtifactId=maven-archetype-quickstart -DarchetypeVersion=1.4 -DinteractiveMode=false
```
The `mvn` command creates a directory with the same name as the `artifactId` argument. The `src/main/java` directory contains the project source code. The `src/test/java` directory contains the test source.
Notice that the `generate` step created a directory with the same name as the `artifactId`. The `src/main/java` directory contains source code. The `src/test/java` directory contains tests. The `pom.xml` file is the project's Project Object Model (POM).
Update your applications POM file to use Java 8 or higher.
```xml
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
```
## Add package references
In your POM file, add the following reference for the project:
**azure-communication-callautomation**
``` xml
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-communication-callautomation</artifactId>
<version>1.0.0</version>
</dependency>
```
## Establish a call
By this point you should be familiar with starting calls. For more information about making a call, see [Quickstart: Make and outbound call](../../../quickstarts/call-automation/quickstart-make-an-outbound-call.md). You can also use the code snippet provided here to understand how to answer a call.
``` java
CallIntelligenceOptions callIntelligenceOptions = new CallIntelligenceOptions().setCognitiveServicesEndpoint("https://sample-cognitive-service-resource.cognitiveservices.azure.com/");
answerCallOptions = new AnswerCallOptions("<Incoming call context>", "<https://sample-callback-uri>").setCallIntelligenceOptions(callIntelligenceOptions);
Response < AnswerCallResult > answerCallResult = callAutomationClient
.answerCallWithResponse(answerCallOptions)
.block();
```
## Call the recognize action
When your application answers the call, you can provide information about recognizing participant input and playing a prompt.
### DTMF
``` java
var maxTonesToCollect = 3;
String textToPlay = "Welcome to Contoso, please enter 3 DTMF.";
var playSource = new TextSource()
.setText(textToPlay)
.setVoiceName("en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeDtmfOptions(targetParticipant, maxTonesToCollect)
.setInitialSilenceTimeout(Duration.ofSeconds(30))
.setPlayPrompt(playSource)
.setInterToneTimeout(Duration.ofSeconds(5))
.setInterruptPrompt(true)
.setStopTones(Arrays.asList(DtmfTone.POUND));
var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId)
.getCallMediaAsync()
.startRecognizingWithResponse(recognizeOptions)
.block();
log.info("Start recognizing result: " + recognizeResponse.getStatusCode());
```
For speech-to-text flows, the Call Automation Recognize action also supports the use of [custom speech models](/azure/machine-learning/tutorial-train-model). Features like custom speech models can be useful when you're building an application that needs to listen for complex words that the default speech-to-text models may not understand. One example is when you're building an application for the telemedical industry and your virtual agent needs to be able to recognize medical terms. You can learn more in [Create a custom speech project](/azure/ai-services/speech-service/speech-services-quotas-and-limits).
### Speech-to-Text Choices
``` java
var choices = Arrays.asList(
new RecognitionChoice()
.setLabel("Confirm")
.setPhrases(Arrays.asList("Confirm", "First", "One"))
.setTone(DtmfTone.ONE),
new RecognitionChoice()
.setLabel("Cancel")
.setPhrases(Arrays.asList("Cancel", "Second", "Two"))
.setTone(DtmfTone.TWO)
);
String textToPlay = "Hello, This is a reminder for your appointment at 2 PM, Say Confirm to confirm your appointment or Cancel to cancel the appointment. Thank you!";
var playSource = new TextSource()
.setText(textToPlay)
.setVoiceName("en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeChoiceOptions(targetParticipant, choices)
.setInterruptPrompt(true)
.setInitialSilenceTimeout(Duration.ofSeconds(30))
.setPlayPrompt(playSource)
.setSpeechLanguages("en-US", "es-ES", "hi-IN")
.setSentimentAnalysisEnabled(true)
.setOperationContext("AppointmentReminderMenu")
//Only add the SpeechRecognitionModelEndpointId if you have a custom speech model you would like to use
.setSpeechRecognitionModelEndpointId("YourCustomSpeechModelEndpointID");
var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId)
.getCallMediaAsync()
.startRecognizingWithResponse(recognizeOptions)
.block();
```
### Speech-to-Text
``` java
String textToPlay = "Hi, how can I help you today?";
var playSource = new TextSource()
.setText(textToPlay)
.setVoiceName("en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeSpeechOptions(targetParticipant, Duration.ofMillis(1000))
.setPlayPrompt(playSource)
.setOperationContext("OpenQuestionSpeech")
//Only add the SpeechRecognitionModelEndpointId if you have a custom speech model you would like to use
.setSpeechRecognitionModelEndpointId("YourCustomSpeechModelEndpointID");
var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId)
.getCallMediaAsync()
.startRecognizingWithResponse(recognizeOptions)
.block();
```
### Speech-to-Text or DTMF
``` java
var maxTonesToCollect = 1;
String textToPlay = "Hi, how can I help you today, you can press 0 to speak to an agent?";
var playSource = new TextSource()
.setText(textToPlay)
.setVoiceName("en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeSpeechOrDtmfOptions(targetParticipant, maxTonesToCollect, Duration.ofMillis(1000))
.setPlayPrompt(playSource)
.setInitialSilenceTimeout(Duration.ofSeconds(30))
.setInterruptPrompt(true)
.setOperationContext("OpenQuestionSpeechOrDtmf")
//Only add the SpeechRecognitionModelEndpointId if you have a custom speech model you would like to use
.setSpeechRecognitionModelEndpointId("YourCustomSpeechModelEndpointID");
var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId)
.getCallMediaAsync()
.startRecognizingWithResponse(recognizeOptions)
.block();
```
> [!Note]
> If parameters aren't set, the defaults are applied where possible.
### Real-time language identification (Preview)
With the additional of real-time language identification, developers can automatically detect spoken languages to enable natural, human-like communications and eliminate manual language selection by the end users.
``` java
String textToPlay = "Hi, how can I help you today?";
var playSource = new TextSource()
.setText(textToPlay)
.setVoiceName("en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeSpeechOptions(participant, Duration.ofSeconds(15))
.setPlayPrompt(playSource)
.setInterruptPrompt(false)
.setInitialSilenceTimeout(Duration.ofSeconds(15))
.setSentimentAnalysisEnabled(true)
.setSpeechLanguages("en-US", "es-ES", "hi-IN")
.setOperationContext("OpenQuestionSpeech")
// Only add the SpeechRecognitionModelEndpointId if you have a custom speech model you would like to use
.setSpeechRecognitionModelEndpointId("YourCustomSpeechModelEndpointID");
var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId)
.getCallMediaAsync()
.startRecognizingWithResponse(recognizeOptions)
.block();
```
>[!Note]
> **Language support limits**
>
> When using the `Recognize` API with Speech as the input type:
> - You can specify **up to 10 languages** using `setSpeechLanguages(...)`.
> - Be aware that using more languages may **increase the time** it takes to receive the `RecognizeCompleted` event due to additional processing.
>
> When using the `Recognize` API with **choices**:
> - Only **up to 4 languages** are supported.
> - Specifying more than 4 languages in choices mode may result in errors or degraded performance.
### Sentiment Analysis (Preview)
The Recognize API supports sentiment analysis when using speech input. Track the emotional tone of conversations in real time to support customer and agent interactions, and enable supervisors to intervene when necessary. It can also be useful for routing, personalization or analytics.
``` java
String textToPlay = "Hi, how can I help you today?";
var playSource = new TextSource()
.setText(textToPlay)
.setVoiceName("en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeSpeechOptions(participant, Duration.ofSeconds(15))
.setPlayPrompt(playSource)
.setInterruptPrompt(false)
.setInitialSilenceTimeout(Duration.ofSeconds(15))
.setSentimentAnalysisEnabled(true)
.setSpeechLanguages("en-US", "es-ES", "hi-IN")
.setOperationContext("SpeechContext");
var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId)
.getCallMediaAsync()
.startRecognizingWithResponse(recognizeOptions)
.block();
```
## Receiving recognize event updates
Developers can subscribe to `RecognizeCompleted` and `RecognizeFailed` events on the registered webhook callback. Use this callback with business logic in your application to determine next steps when one of the events occurs.
### Example of how you can deserialize the *RecognizeCompleted* event:
``` java
if (acsEvent instanceof RecognizeCompleted) {
RecognizeCompleted event = (RecognizeCompleted) acsEvent;
RecognizeResult recognizeResult = event.getRecognizeResult().get();
if (recognizeResult instanceof DtmfResult) {
// Take action on collect tones
DtmfResult dtmfResult = (DtmfResult) recognizeResult;
List<DtmfTone> tones = dtmfResult.getTones();
log.info("Recognition completed, tones=" + tones + ", context=" + event.getOperationContext());
} else if (recognizeResult instanceof ChoiceResult) {
ChoiceResult collectChoiceResult = (ChoiceResult) recognizeResult;
String labelDetected = collectChoiceResult.getLabel();
String phraseDetected = collectChoiceResult.getRecognizedPhrase();
String languageIdentified = collectChoiceResult.getLanguageIdentified();
log.info("Recognition completed, labelDetected=" + labelDetected + ", phraseDetected=" + phraseDetected + ", context=" + event.getOperationContext());
log.info("Language Identified: " + languageIdentified);
if (choiceResult.getSentimentAnalysisResult() != null) {
log.info("Sentiment: " + choiceResult.getSentimentAnalysisResult().getSentiment());
}
} else if (recognizeResult instanceof SpeechResult) {
SpeechResult speechResult = (SpeechResult) recognizeResult;
String text = speechResult.getSpeech();
String languageIdentified = speechResult.getLanguageIdentified();
log.info("Recognition completed, text=" + text + ", context=" + event.getOperationContext());
log.info("Language Identified: " + languageIdentified);
if (speechResult.getSentimentAnalysisResult() != null) {
log.info("Sentiment: " + speechResult.getSentimentAnalysisResult().getSentiment());
}
} else {
log.info("Recognition completed, result=" + recognizeResult + ", context=" + event.getOperationContext());
}
}
```
### Example of how you can deserialize the *RecognizeFailed* event:
``` java
if (acsEvent instanceof RecognizeFailed) {
RecognizeFailed event = (RecognizeFailed) acsEvent;
if (ReasonCode.Recognize.INITIAL_SILENCE_TIMEOUT.equals(event.getReasonCode())) {
// Take action for time out
log.info("Recognition failed: initial silence time out");
} else if (ReasonCode.Recognize.SPEECH_OPTION_NOT_MATCHED.equals(event.getReasonCode())) {
// Take action for option not matched
log.info("Recognition failed: speech option not matched");
} else if (ReasonCode.Recognize.DMTF_OPTION_MATCHED.equals(event.getReasonCode())) {
// Take action for incorrect tone
log.info("Recognition failed: incorrect tone detected");
} else {
log.info("Recognition failed, result=" + event.getResultInformation().getMessage() + ", context=" + event.getOperationContext());
}
}
```
### Example of how you can deserialize the *RecognizeCanceled* event:
``` java
if (acsEvent instanceof RecognizeCanceled) {
RecognizeCanceled event = (RecognizeCanceled) acsEvent;
log.info("Recognition canceled, context=" + event.getOperationContext());
}
```