New AI features make audio editing easier – and more accessible for everyone 

March 21, 2024

Tags: AI & Machine Learning, Audio

The team at Adobe Research has helped develop four new AI-driven features that are changing the way people edit audio in Premiere Pro: Enhance Speech, language detection for Text-Based Editing, Filler Word Detection in Text-Based Editing, and Audio Category Tagging. 

Each of the new features deploys AI to understand audio and streamline tedious tasks so that creators—whether they’re seasoned pros or novices—can turn their ideas into high-quality audio and video to share with their audiences.  

Here’s a closer look at each of the new features. 

Enhance Speech: Studio-quality sound with just one click 

Once a user captures the perfect audio moment, there’s the work of cleaning it up—removing background noise and reverberation and balancing the sound. “With the old tools, getting a good result from this process required a very deep level of expertise in applying audio filters and chaining different audio operations together with lots of different parameters. It’s a very intricate process,” explains Justin Salamon, Senior Research Scientist for Adobe Research. But with Enhance Speech (a feature users can find on the Essential Sound Panel), a single click produces audio that sounds as clear as if it were recorded in a studio.  

The technology behind Enhance Speech was years in the making. To start, Adobe Researchers trained a neural network on millions of pairs of before-and-after audio recordings, refining the algorithm until it could transform a recording with noise, reverberation, and EQ distortion into studio-quality audio. From there, the team published their research, and then revealed an early version of the work in the 2019 MAX Sneak Project Awesome Audio.  

“Our mission is to make Adobe products sound great,” says Jiaqi Su, an Audio Research Scientist who started working on Enhance Speech as an intern and then returned to Adobe Research full-time after completing her PhD. “It’s not just for professional users—we’re considering people who don’t have access to expensive studios and microphone setups. Now you can record yourself with a laptop or iPhone and with one click, we can give you professional studio-quality audio that’s clean, clear, and perceptually pleasing.” 

Enhance Speech, which is also integrated in Adobe Podcast, has reached nearly a million active monthly users across Adobe products. “People are using it to simplify their workflows when editing interviews, presentations, and lectures. And we’ve even heard from users who are taking old tape recordings from 20 years ago—recordings that are poor quality because of the limitations of the devices—and using Enhance Speech to revive the voices of family members. It’s just invaluable,” explains Su.

Filler Word Detection: A simple way to automatically find and remove the uhs and ums 

When Premiere Pro introduced Text-Based Editing, it changed the way many people edit audio and video. Users can now start with an automatic transcription of their dialogue and then edit it just as they would any other text document—and the edits are automatically reflected in the video. Once users began editing with text, they had a new request—a quick and easy feature that could find filler words, like “uh” and “um,” and remove them.  

“This is a task video editors do all the time when they’re working with interviews and dialogue,” says Salamon. “Historically, the only way to do it was to listen through the whole thing meticulously and go one-by-one to carefully edit them out, which is a very tedious, not-fun-at-all job.” 

So the Adobe Research team built an AI model that finds filler words. With Filler Word Detection, users can easily see all of the fillers in the transcript and delete them instantly. They simply open a transcribed sequence, then choose Filler from the Transcript Panel. “It used to take minutes or hours. Now, you can just remove all of the filler words with a click and you’re good to go,” adds Salamon.  

Language detection for Text-Based Editing: A behind-the-scenes feature that knows the language of your dialogue 

Language Detection is another feature that helps make Text-Based Editing even more powerful. Instead of manually selecting the language of a clip to launch a transcription, Language Detection automatically determines the language without any input from the user.  

“This feature is under the hood, so it’s nearly invisible to users. But its power is in reducing friction, making editing that much easier,” says Salamon.  

In the course of solving the language detection and filler word problems, the Adobe Research team (including several talented interns) produced novel research, filed several patents, and published papers to share their findings. “It’s so nice when we can solve a problem while innovating and contributing back to our research community,” notes Salamon.

Audio Category Tagging: AI that understands your audio so it can offer the most useful editing controls 

When it’s time to edit an audio clip, Premiere Pro users used to select a type of audio, and that would let them see the most relevant controls in the Essential Sound Panel. But with the new Audio Category Tagging feature, Premiere Pro detects whether a clip is dialogue (speech), music, sound effects, or ambiance, and automatically surfaces the most useful controls via new interactive audio badges in the timeline. The feature saves time, and it helps new users discover more editing controls. The technology’s users are already finding it useful, leading to increased discovery and usage of the Essential Sound Panel in Premiere Pro’s beta. 

The technology behind Auto Category Tagging grew out of long-running research inside Adobe. “We were building models that can use AI to really understand what’s going on in audio,” says Oriol Nieto, Senior Audio Research Engineer. “So when we discovered that Premiere Pro needed a way to detect different types of audio, it just made sense to transfer the technology into the product. It’s one more way we can reduce barriers to the technology and help newcomers with their creative process.” 

“With each of these new audio AI features, we were aiming toward the same goal,” adds Nieto. “We wanted to help users be more efficient and spend less time on things that are dull or tedious—so they have more time for their creative work. If we’ve done that, it’s a great win for us.” 

Wondering what else is happening inside Adobe Research? Check out our latest news here. 

Related Posts