Speech recognition systems use powerful and complicated statistical models to correctly build speech to text response. These systems use probability and mathematical functions to determine the most likely outcome.
Under the hood, VoiceboxMD takes the audio by taking precise measurements of the sound-wave at frequent intervals. The system filters the digitized audio to remove unwanted noise, and sometimes to separate it into different bands of frequency. VoiceboxMD in this process also normalizes the sound or adjusts it to a constant volume level. It may also have to be temporally aligned.
Your speech is divided into small segments as short as a few hundredths of a second, or even thousandths.
We examine phonemes in the context of the other phonemes around them. Medical terms are composed of different phonemes. We run the contextual phoneme plot through a complex statistical model and compares them to an extensive library of known words, phrases, and sentences stored in our database. We then determine what you were saying and either output it as text or issues a computer command.
A Voice Profile is created which allows continuous learning and on-going improvement to the user.