The files removed fall into 3 categories: Last time we removed 10 files, but this time as the recognizers improved only 8 files had their WER higher than 20%. ![]() We have repeated the test using the same methodology as before: take 44 files from the Jason Kincaid data set and 20 files published by rev.ai and remove all files where the best recognizer could not achieve a Word Error Rate (WER) lower than 20%. Back then the results were as follows (from most accurate to least): Microsoft and Google Enhanced (close 2nd), then Voicegain and Amazon (also close 4th) and then, far behind, Google Standard. It has been over 8 months since we published our last speech recognition accuracy benchmark (described here). The architecture of the model is based on encoder-decoder transformers system and has shown significant performance improvement compared to previous models because it has been trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. OpenAI Whisper is an open-source automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. These APIs today process over 60 millions minutes of audio every month for our enterprise and startup customers. We also offer premium support and uptime SLAs for our multi-tenant cloud offering. There is another feature that we have for the Voicegain models - enhanced diarization models - which is a required feature for contact center and meeting use-cases - will soon be made available on Whisper. Word-level timestamps is another important feature that our API offers which is needed to map audio to text. Our APIs support two-channel stereo audio - which is common in contact center recording systems. Voicegain also offers critical features for contact centers and meetings. Enhanced features for Contact Centers & Meetings. As a result, we are able to offer access to the Whisper model at a price that is 40% lower than what Open AI offers. Affordable pricing - 40% less expensive than Open AIĪt Voicegain, we have optimized Whisper for higher throughput. Since the Voicegain platform is deployed on Kubernetes clusters, it is well suited for modern AI SaaS product companies and enterprises that want to integrate with their private LLMs. We can bring this practical real-world experience of running AI models at scale to our developer community. Today the same APIs are enabling Voicegain to processes over 60 Million minutes a month. In addition to the core deep-learning-based Speech-to-text model, our platform includes our REST API services, logging and monitoring systems, auto-scaling and offline task and queue management. The Voicegain platform has been architected and designed for single-tenant private cloud and datacenter deployment. While developers can use Voicegain Whisper on our multi-tenant cloud offering, a big differentiator for Voicegain is our support for the Edge. Support for Private Cloud/On-Premise deployment (integrate with Private LLMs) There are four main reasons for developers to use Voicegain Whisper over other offerings: 1. With today's release Voicegain supports Whisper-medium, Whisper-small and Whisper-base. Voicegain now supports transcription in over 99 different languages that are supported by Whisper. Open AI open sourced several versions of the Whisper models released. They can integrate Voicegain Whisper APIs with LLMs like GPT 3.5 and 4 (from Open AI) PaLM2 (from Google), Claude (from Anthropic), LLAMA 2 (Open Source from Meta), and their own private LLMs to power generative conversational AI apps. Generative AI developers now have access to a well-tested accurate, affordable and accessible transcription API. ![]() The same APIs currently process over 60 Million minutes of audio every month for leading enterprises in the US including Samsung, Aetna and several Fortune 100 enterprise. Today we are really excited to announce the launch of Voicegain Whisper, an optimized version of Open AI's Whisper Speech recognition/ASR model that can be accessed using Voicegain APIs.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |