Watson Speech To Text Application [Unique Features & Pricing]

The Watson speech to text program is IBM's natural languaging processing computer system to transcribe audio. It's not a consumer end user application, and it's enterprise level system designed to be used via APIs and code.

Businesses can use it to convert hours of audio into text very quickly. This guide will go over what functionalities and benefits it can bring to you.

Also To Note:  Watson text to speech is another separate application to convert written words to realistic speech, which you can learn more here

Are You Mostly Looking For…

clone your own voice

convert tts 

Plan and Pricing

The free plan Lite is per month is 500 minutes of audio, but if you want to process more than this amount, there is the Plus plan that offers as low as $0.01 per minute.

The Premium version requires contacting sales, which provide big companies with more capacity and data protection.




Deploy Anywhere

Pricing Per Month



Contact sales

Contact sales

Minutes per month





Concurrent transcriptions





Pre trained speech models





Speech customization and training


Noise detection and speaker diarization


Numeric redaction, smart formatting, word spotting and filtering


Data isolation, end to end encryption and HIPPA ready


Run on any cloud, including IBM, Amazon, Google, Microsoft or on-premises


Main Features - What It Can Do For You

Out of all audio to text transcription software, the IBM watson speech to text is a enterprise level system that offers 5 major benefits to businesses beyond just accurate audio transcription and speech recognition.

  • Watson speech to text can be integrated with interactive voice response call system via IBM voice agent - For example, answer common call queries
  • Mining conversation logs to ID call patterns, collects complaints, sentiment and more - Call analytics
  • AI powered agent assist to quickly transcribe the audio and search relevant data within seconds - For example ID verification
  • Cloud computing - Everything is done on the cloud
  • Data protection for large organization

Business Applications That Can Benefit From Watson Speech To Text

The IBM Watson Speech to Text platform is the next logical step for any organization looking at integrating speech recognition technology into their business. 

  • Various sizes of call centers using IVR system
  • Technical support system
  • Billing
  • CRM
  • Machine learning to improve customer interactions
  • Liabilities detection
  • Predict industry disruption

Audio Transcription 

Convert audio to text - IBM Watson speech to text automatically pick up a speaker's voice and accurately transcribe what's being said into text.

Noise Reduction - The system automatically filters out background noises to isolate high fidelity audio to transcribe, and minimize error

ibm watson speech to text speaker diarization

Multi-Speaker detection - Yes, the system can identify up to 6 different speakers at the same time, and transcribe them all - This is called the speaker diarization feature

Filter Words and Content - Professionals can use keyword spotting feature to detect words and phrases whether to keep or remove.

Improved Accuracy - The software can recognize various vocabulary that appeal to a broad audience. In addition Watson also supports grammar functionality for all the languages it recognizes

Language Support

  • Arabic (Modern Standard)
  • Chinese (Mandarin)
  • Czech
  • Dutch (Belgian and Netherlands)
  • English (Australian, Indian, United Kingdom, and United States)
  • French (Canadian and France)
  • German
  • Hindi (Indian)
  • Italian
  • Japanese
  • Korean
  • Portuguese (Brazilian)
  • Spanish (Castilian and Latin American)

API Support

IBM watson API

IBM's Watson Speech to Text service enables developers and businesses with an API key, the ability for transcribing spoken audio in different languages. 

The API can be used in mobile phone apps, website apps and Python coding projects.

Check out Watson API SDK here

Build A Speech To Text Service Using The Watson STT API

This code imports the Watson PHP SDK and uses it to create a Speech to Text service client. Then it sends an audio file to the service for transcription and prints the transcript if the request was successful.

Copy the following PHP code:


// import the Watson PHP SDK
require_once 'vendor/autoload.php';

use WatsonSDK\Services\SpeechToText;
use WatsonSDK\Common\WatsonCredential;

// replace with your Watson Speech to Text API key
$apiKey = 'YOUR_API_KEY';

// create a new Watson Speech to Text service client
$speechToText = new SpeechToText(WatsonCredential::initWithAPIKey($apiKey));

// replace with the path to your audio file
$audioFile = 'path/to/audio.mp3';

// send the audio file to the Watson Speech to Text service for transcription
$response = $speechToText->recognize(
['content_type' => 'audio/mp3']

// check for errors
if ($response->getStatusCode() == 200) {
// if successful, get the transcript
$transcript = $response->getContent()['results'][0]['alternatives'][0]['transcript'];

// print the transcript
else {
// if there was an error, print the error message