Identified speaker & language in audio/video transcripts

Vocapia’s VoxSigma Speech-to-Text software suite is a leading edge speech processing technology that offers large vocabulary continuous speech recognition in multiple languages for a variety of audio data types. It enables the transcription of large quantities of audio and video documents such as broadcast data, either in batch mode or in real-time. It also provides audio segmentation and partitioning, speaker identification and language recognition. The software suite is available as a web service via a REST Speech-to-Text API, offering full speech transcription, audio indexing and speech-text alignment capabilities via a REST API over HTTPS. Additionally, the software offers advanced language technologies such as language identification and speaker diarization to transform raw audio data into structured and searchable XML documents, enabling users to access content in video documents. It is used for applications such as broadcast and telephone data mining, speech analytics, media monitoring, media asset management, speech transcription, subtitling and more. The speech recognition software is available for over 82 languages and clients can create models for their desired language set.

Ai Promptly

Featured on March 10, 2022



Create, deploy and monitor ML models on a platform.