Many people have been involved in transcription by machines for many years. James and Janet Baker are partly responsible for the current ‘real’ artificial intelligence revolution. They created a program that did real transcription and non-trained transcription, voice to text. As I’ve mentioned in previous posts, once this has been done, the world is open to creating a semantic brain of sorts. All that is required is computer science and knowledge engineers to take the next necessary steps. To empower individuals to contribute to the collective knowledge using voice. The Bakers solution was the first usable consumer product for voice to text.
I used the Baker's programs 15+ years ago and marveled at it accuracy. I often had wished they had made the technology available for developers long ago but they didn't. It was mostly a desktop typing tool. Eventually, the company, now Nuance, which took over the transcription technology from the Bakers (an unfortunate experience for the Bakers) started to make the transcription functions available for developers. BUT, they made it very expensive. A poor decision. Nuance could have taken an Amazon AWS approach to pricing and could have had a monopoly on the market. Too late. They forced others to solve the problem and solve it better by using newer machine learning techniques. Now Google, Microsoft, IBM, Amazon, Apple, and other companies provide transcription at low costs and at large scale. Perhaps it’s karma coming back to Nuance in terms of what happened to the Bakers.
Having been involved in working on natural language processing over the years the Bakers are inspiring in a way. They, made possible, anything with voice long ago through hard work. It just never came together for them.
Interacting with voice is the ultimate in computer-human frictionless interactions.
Now with deep machine learning, it will accelerate!