Microsoft Creates AI That Can Generate a Realistic Singing Voice

The company's DeepSinger could have implications for the music industry

Photo of a microphone
AI is getting better at singing. Getty Images
Headshot of Patrick Kulp

Microsoft is proving that artificial intelligence can realistically carry a tune in a new experiment published this week called DeepSinger.

The company’s researchers trained the system on data from music websites, using machine learning to isolate the vocal tracks from the instrumentals, then had it sing sets of lyrics in both Chinese and English acapella. While still far from a commercial reality, the technology—and other text-to-speech and vocal synthesis software like it—could eventually have implications for the music industry, allowing for better vocal touch-up tools or supplemental tracks but also opening the door to deepfakes and copyright battles.

Here some of the English-language samples from the tool below:

The research builds on other recent breakthrough applications of generative audio AI in recent months. Chief among them was a system released by Microsoft-backed research group OpenAI earlier this year called Jukebox, which can produce fully formed tracks in any of a number of musical genres in a matter of seconds.

Meanwhile, AI-generated narration and vocal mimicry has improved to the point where Jay-Z’s record label recently threatened legal action against a YouTube channel that specialized in these types of audio deepfakes for unlawfully imitating the rapper’s voice.

Using this type of technology specifically for singing rather than simple speech, however, comes with extra challenges, including nailing specific pitches and rhythms as well as a lack of major publicly available singing datasets. The researchers addressed these issues with algorithms that separated vocals and instrumentals in songs pulled from tens of thousands of sites, then extracted the duration of each unit of sound within it.

While generative AI like this isn’t close to transforming the music industry yet, companies are starting to experiment with how it could play a role in audio creative processes. Visual editing platform PicsArt recently launched a feature that avoids royalty fees by auto-generating backing tracks in different genres with AI, and agency Space150 used similar tech to imitate Travis Scott’s voice for an experimental creative campaign.

@patrickkulp Patrick Kulp is an emerging tech reporter at Adweek.