In a nutshell, Speech recognition systems use powerful and complicated statistical modeling systems. These systems use probability and mathematical functions to determine the most likely outcome.
What makes us so great is our technology. Under the hood, VoiceboxMD takes the sound by taking precise measurements of the wave at frequent intervals. The system filters the digitized audio to remove unwanted noise, and sometimes to separate it into different bands of frequency. VoiceboxMD in this process also normalizes the sound or adjusts it to a constant volume level. It may also have to be temporally aligned.
Your speech is divided into small segments as short as a few hundredths of a second, or even thousandths.
We examine phonemes in the context of the other phonemes around them. Medical terms are composed of different phonemes. We run the contextual phoneme plot through a complex statistical model and compares them to an extensive library of known words, phrases, and sentences stored in our database. We then determine what you were saying and either output it as text or issues a computer command.