Clicky

Daten über Musik: Der Pandora-Ansatz vs. The Echo Nest und Co.

Pandora setzt mit seinem händischen Ansatz der Katalogisierung von Musik auf "Weniger aber besser", während viele andere Services Metadaten algorithmisch erfassen lassen:

On the surface, it appears that Pandora, despite its head start of at least a decade on most of its competitors, is facing an uphill battle. It’s not just the number of services that are potentially coming online but also the number of songs and the amount of data those services are offering up to consumers. The Echo Nest, which powers personalization for Spotify (among others), boasts data on more than 35 million songs (which is why Spotify’s library is able to hold more than 20 million songs). Upstart Senzari says its new MusicGraph platform contains around 20 million songs. Gracenote has data on more than 130 million.

Pandora, by contrast, has “more than 1 million” songs in its library. If consumers want unlimited choice in the music they can listen to, they probably don’t want Pandora. And the company is fine by that — or so it says. “I agree that more data is better, but that’s different than more content is better,” Pandora chief scientist and vice president of playlists Eric Bieschke told me in a recent interview (before Gracenote announced its Rhythm data platform, I should note).

Metadaten-APIs werden immer wichtiger werden. Die Metadaten von Pandora sind eine Ausnahme. Aufgrund ihrer geringen Masse glaube ich nicht, dass sie eine wichtige Rolle künftig spielen werden. Je mehr Menschen Spotify und co. nutzen und diese aus dem Datenfeedback für ihre eigenen Metadaten lernen können, desto besser werden die Metadaten dieser Dienste werden.

Beide Ansätze können friedlich nebeneinander existieren. Der automatisierte, datengetriebene Ansatz wird aber der wichtigere werden. Hier werden auch sehr stark positive Skaleneffekte ins Spiel kommen.

Spannend sind die verschiedenen Quellen, die The Echo Nest mittlerweile auswertet:

As of May, The Echo Nest, which was co-founded by MIT-trained machine-listening experts Tristan Jehan and Brian Whitman, was indexing about 10 million documents (e.g., blogs, articles and reviews) a day relating to music. This is on top of the couple of million songs it’s able to analyze per week using its machine listening system. Lucchese said the company was researching all sorts of new capabilities, such as mechanically determining valence (i.e., the mood of a song) and the regional differences in how people classify or access music.

Selbst Pandora wertet mittlerweile Daten aus, die durch die Nutzung des Webradios entstehen:

Over the years, Pandora and its recommendation algorithms have actually evolved quite a bit. For the first several years of its existence, the company was focused solely on the Music Genome Project, which was the engine that powered recommendations when Pandora first began doing streaming radio in 2005. Around 2007, the company first realized it was sitting on a valuable collection of data about how individual users listened to music.

“It turns out that 35 billion thumbs is a gold mine of data about people’s personal music preferences,” Bieschke said.

Das gilt für alle Streaming-Dienste: Je mehr Hörerinnen je länger auf ihnen verweilen, desto besser wird ihr Wissen über Musik allgemein.