Audio is experiencing a Cambrian moment, growing faster now than it ever has before. Apple is expected to sell over 100 million AirPods this year, up from 60 million in 2019. Roughly one in four adults owns a smart speaker, with a third of those owners having three or more devices in their homes. Sixty-two percent of Americans use a voice-operated personal assistant. And if those statistics aren’t convincing enough, just look at the podcasting industry: monthly podcast listeners grew by 16% year-over-year, surpassing 100 million Americans for the very first time.
Just as the Internet, mobile, and cloud gave rise to new innovation across consumer and enterprise technology, we believe audio will fundamentally change how people connect and businesses operate.
So why is audio so special?
“Audio is the hottest format of all. Human speech burns with information. It resolves uncertainty and communicates meaning more than any other format. Audio is how you communicate what you really mean. Intonation, emphasis, innuendo and all.” — Colin Keeley
Audio is powerful, and it touches every aspect of our lives. It connects us — to our families, and to strangers across the world. It tells us stories, and it keeps us company as we commute to work.
The world’s biggest technology companies and social media platforms are all beginning to embrace audio as a core feature. Apple, Amazon, and Google all have their own voice assistants (Siri, Alexa, and Google Assistant, respectively), and Twitter rolled out audio tweets as a core feature this past year. Twitter is also currently testing out a new Audio Spaces Beta, which would allow users to engage in live conversations on the app. Furthermore, Spotify acquired podcast networks Gimlet and Anchor last year in a deal worth $340 million.
Audio, in many forms, has been a well-established medium since the dawning of the telephone and radio, but it’s also the most disparate to study at scale. Now, we believe we’ve reached a point where advancements in software can help us separate the signal from noise.
“More importantly, technology is now affecting the audio category faster than ever before. The diversity of its revenue models, content, and delivery has never been greater. This is inspiring and healthy. And there is a lot more to come.” — Matthew Ball
We’re in the golden age of audio for communication technology. But why is this happening now? We’ve outlined three main reasons:
1. Technological Innovation
Natural language processing (NLP), speech recognition, and speech-to-text technology is reaching a tipping point. Automatic speech recognition accuracy rates have surpassed 95%. This accuracy, combined with improved latency, allows for near real-time voice recognition and transcription, making it easier to search, personalize, and monetize audio. These technological innovations are in part thanks to speech recognition tools, like portfolio company Rev, which are making it possible to use voice as a computing interface and affordably index audio content.
Most humans can talk faster than they type, and with the rise of voice assistants and automated speech-to-text technology, we can now interact with our devices faster and more efficiently than ever before. These innovations have created a conversational relationship between people and devices, serving as an entryway for consumers to access additional products and services from a company. “Lack of a device boundary means that voice assistants can spread more easily, can provide different types of value than the previous platforms, and can offer an entry point for new providers that don’t have assets from the previous eras,” writes Bret Kinsella in the Harvard Business Review.
Another exciting trend we’re seeing is the rise of spatial audio, which positions sounds in 3D space. For example, this technology enables a person to hear a noise like it’s just behind and to the right, as if they were in a surround sound movie theatre. At WWDC 2020, Apple announced that iOS apps will soon be able to use motion data from your AirPods Pro to enable head-tracked spatial audio. This technology will provide a more immersive experience for music, films, and 3D games, and will open doors in entirely other mediums, like navigation and group voice chat.
2. New Hardware Platforms
Today, over 500 million devices in the U.S. can be directly controlled with voice. These voice-first interfaces have become mainstream, including connected cars (like Bessemer portfolio company Otonomo), smart speakers, and AirPods (perhaps the most ubiquitous example to date). Google even reported that 27% of the global online population uses voice search on mobile.
The smart speaker revolution has unlocked an opportunity to engage and communicate by voice — we’ve all heard our fair share of “Hey Siri!” and “Alexa, what’s the weather?” Today, smart speakers are owned by an estimated 60 million people in the U.S. TV manufacturers are even installing microphones for far-field voice recognition, in an effort to make TV’s speakers function like smart home speakers.
The “always on” nature of these listening devices is leading to a new and ubiquitous form of ambient computing. Importantly, they also represent new distribution channels that will be exploited by a new generation of startups.
3. New Modes of Interaction
Audio-first modes of interaction represent the third major platform and UI shift of the past three decades, following the Internet, the PC, and the mobile eras. The current shift to voice has increased consumers’ familiarity and acceptance with audio, shown through users’ growing comfort with speaking to smart devices, like Siri or Alexa. It’s also enabled more short-form texting inputs: for a culture obsessed with multitasking, audio is our new best friend.
Last decade, mobile apps were the remote control for the real world. This decade, voice is the remote control.
The advent of always-on, two-way voice communication channels not only makes voice assistants like Siri and Alexa more accessible and utilitarian, but also opens the door to new social behavior. Already, a myriad of social audio networks have taken 2020 by storm. This is what continues to excite us about companies like Discord and other emerging audio social networks, including Clubhouse and Chalk (broadcast), Rodeo and Soapbox (small group), Dialup (one-to-one), and more. We’re even seeing vertical social networks in this space begin to materialize, such as Locker Room for sports, Roadtrip for music, and Quilt for meaningful connection and self-care.
Note: The explosion of social audio deserves an analysis of its own. In a more thorough breakdown, Rex Woodbury dissects the opportunity for a new platform to emerge in audio, arguing that the platform will be highly social and will rely on users to create and distribute content.
Request for Startups
“Audio listening has always adapted to the times, and that’s one of its main benefits.” — James Cridland, radio futurologist
At Bessemer, we’re on the hunt for new audio-first products. Some areas we’re excited about include:
The Audio Creation Suite
During a gold rush, sell shovels. Some of the most exciting startups we’re seeing in audio serve as underlying infrastructure for creation. This opportunity has proven lucrative across other rich content types like photo and video; for example, photo creation/sharing tools were democratized by platforms like Canva and Instagram, and video by the likes of YouTube, Loom, and TikTok.
We’re bullish about tools that expand the funnel of audio creators. Earlier in this piece, we mentioned Spotify’s strategic acquisition of Anchor, which brought Anchor’s suite tools for podcast creation and distribution to Spotify’s community of over 200 million users. Another example in the audio creation suite is Descript, which makes editing audio and video content as easy as editing a word document, and provides collaborative tools for podcast and media creation. (Descript also acquired Lyrebird, which built editing tools for speech synthesis.) In music, we see startups like Splice, a collaborative tool for music production, often referred to as a “GitHub for musicians.”
Beyond creation, there are exciting innovations for fan engagement and content monetization. Audio has been notoriously under-monetized relative to other forms of media. So far, we’ve seen products like Glow, which offers a way to distribute and monetize premium podcast content.
Audio for the Masses
With an influx of audio creation tools, it’s never been easier to make and distribute audio content. But as a consumer, it’s never been harder to distinguish the best content from the crowd. We see the opportunity for there to be a “TikTok for Audio,” which we believe might help solve this problem.
TikTok is powerful in its features for both creation and discovery. On the creation front, it has enabled anyone to become a filmmaker, similar to how Instagram widened the funnel for photographers. The social graph on TikTok has allowed users to share videos to their friends and greater network, paving the way for power users to become influencers in their own right. And on the discovery front, TikTok is almost too good to be true: just look at Dreams by Fleetwood Mac, which catapulted back onto the charts more than four decades after its initial success, thanks to a viral TikTok that combined a morning skateboard session and a bottle of Ocean Spray paired with the classic hit.
Following these themes, a “TikTok for Audio” could solve some of the problems surrounding audio discovery and engagement. So far, we’ve been excited by the entrance of startups like Shuffle, which helps podcast fans share the best ideas and stories locked up in long podcasts.
Audio at Work
Voice technology is just beginning to permeate the workplace. Knowledge work is increasingly remote and solo, and both open offices and remote work benefit from audio solutions(AirPods are a survival tool; we already knew that, but it’s worth noting that they give employees more mobility and free up their hands, too).
But this theme isn’t just limited to music and podcast streaming — voice experiences at work are becoming widespread. Specifically, we’re intrigued by innovations in asynchronous and spatial audio that help create a sense of presence in a digitally-native workplace. For example, Yacutilizes asynchronous voice messaging to help cut down on meetings and calls, and Slashtalk decentralizes conversations so remote teams can move faster. In spatial audio, companies like Branch are creating virtual HQs to bring serendipity back into the workday.
“There’s a temptation in our networked age to think that ideas can be developed by email and iChat. That’s crazy. Creativity comes from spontaneous meetings, from random discussions. You run into someone, you ask what they’re doing, you say ‘wow,’ and soon you’re cooking up all sorts of ideas.” — Steve Jobs
Beyond this, advances in audio and voice-based technologies are reinventing sales, marketing, and customer relationship management software. Gong, a voice intelligence company, uses real-time natural language processing to train and suggest information to salespeople and other customer service reps. (Fun fact: Gong’s model was trained on the full run of Seinfeld, in an effort to teach the platform about awkward conversations and sarcasm.)
The Future of Audio
“If I have seen further, it is by standing on the shoulders of giants.” — Isaac Newton
Innovation doesn’t — and shouldn’t — occur in a vacuum. The iPod was built upon the work of Kane Kramer, who took inspiration from the Sony Walkman. Facebook is presumed to have been partially inspired by Friendster. Similarly, many successful companies grew efficiently in their early years by standing on the shoulders of giants and exploiting creative distribution loopholes. Airbnb scraped Craigslist to hack marketplace growth, Zynga leveraged Facebook’s social graph and newsfeed, and Yelp built its user base by capitalizing on Google search algorithms. This trend will be no different in audio. We’re only scratching the surface: as new platforms, technological innovations, and modes of communication emerge, audio is a space to watch — or should we say, listen?
Are you building something in audio? Let us know and reach out to Talia Goldberg (email@example.com) and Gaby Goldberg (firstname.lastname@example.org) via email.