Inclusive speech processing

Project Overview

Improve state-of-the-art ASR system to understand dysarthric English speech accurately.

Motivation

Over 250 million people have non-standard speech patterns that makes it difficult for them to use ASR systems.

Our goal is to build robust speech recognizers that can accurately transcribe speech as well as provide a clear audio message for the voice sample that was input by the user.

Many communication tools used by speakers with non-standard speech patterns, such as those with dysarthria, have some redundancy built into them. For example, augmentative and alternative communications (AAC), software often have redundancy built in terms of the same information appearing as:

An icon that represents the object of interest
A textual orthographic representation in terms of words that describe the object of interest
An audio rendition of the words used to describe the object

AAC icons are often not very expressive, and it might be easier for the user to just speak in order to communicate. As noted in the image above, state-of-the-art speech recognizers often break when presented with atypical speech.

Our goal is to build robust speech recognizers that can accurately transcribe speech as well as provide a clear audio message for the voice sample that was input by the user.

Foundation models for speech recognition and voice conversion

Towards this end, we are focusing on some of the latest deep learning-based speech recognition and voice conversion models. We seek to leverage representations learned with semi-supervised learning to transfer knowledge to the domain of speech recognition and voice conversion.