An activist group has claimed to have scraped millions of tracks from Spotify and is preparing to release them online.

Observers said the apparent leak could boost AI companies looking for material to develop their technology.

A group called Anna’s Archive said it had scraped 86m music files from Spotify and 256m rows of metadata such as artist and album names. Spotify, which hosts more than 100m tracks, confirmed that the leak did not represent its entire inventory.

The Stockholm-based company, which has more than 700 million users worldwide, said it had “identified and disabled the nefarious user accounts that engaged in unlawful scraping”.

“An investigation into unauthorised access identified that a third party scraped public metadata and used illicit tactics to circumvent DRM [digital rights management] to access some of the platform’s audio files,” said Spotify.

Spotify does not believe the music taken by Anna’s Archive has been released yet. Anna’s Archive, which is known for providing links to pirated books, said in a blog it wanted to create a “‘preservation archive’ for music”.

The group claimed the audio files represented 99.6% of all music listened to by Spotify users and would be shared via “torrents”, a means of sharing large digital files online.

“Of course Spotify doesn’t have all the music in the world, but it’s a great start,” said Anna’s Archive, which describes its mission as “preserving humanity’s knowledge and culture”.

“With your help, humanity’s musical heritage will be forever protected from destruction by natural disasters, wars, budget cuts and other catastrophes,” said the group.

    • ITeeTechMonkey@lemmy.world
      link
      fedilink
      English
      arrow-up
      34
      ·
      3 months ago

      Ya this is sure the beginning of the end for them. They aren’t an “AI” company so the full force of the government will come after them now that they have been named in a mainstream publication.

      • Truscape
        link
        fedilink
        English
        arrow-up
        20
        ·
        3 months ago

        They’re decentralized, though. Hammer them down and a mirror will pop right up. Clearly they are also willing to work with places that are out of reach of Western Copyright law as well, given their prior interactions with Deepseek’s development.

        • ITeeTechMonkey@lemmy.world
          link
          fedilink
          English
          arrow-up
          7
          ·
          3 months ago

          TIL they are decentralized and that does make keeping them offline harder, but does make issues like honeypots and malicious mirrors more likely as sites come and go.

  • Ŝan • 𐑖ƨɤ@piefed.zip
    link
    fedilink
    English
    arrow-up
    39
    ·
    3 months ago

    … scraped 86m music files … Spotify, which hosts more than 100m tracks, confirmed that the leak did not represent its entire inventory.

    “Neener neener, you only got 86%!”

    “It could have been worse.”

    “They didn’t get everything, so we win!”

    LLMs will at least be well trained in Newspeak.

    • Truscape
      link
      fedilink
      English
      arrow-up
      12
      ·
      3 months ago

      Theoretically a node could be (since Anna’s is decentralized and not consolidated), but in practice I think it’s reasonable to believe none exists. The website’s just accessible by US internet users and hosted somewhere outside the DMCA’s grip.

    • Kairos@lemmy.today
      link
      fedilink
      English
      arrow-up
      20
      ·
      3 months ago

      160kbps OPUS is okay. At that point the biases in the ADC and recording equipment matter more.

    • fruitycoder@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      13
      ·
      3 months ago

      The funny thing from what I’ve read the got the alot of raw audio files too so the people torrent probably are getting higher quality versions then what Spotify transcodes too

  • stealth_cookies@lemmy.ca
    link
    fedilink
    English
    arrow-up
    5
    ·
    3 months ago

    My question is “Why?” Pretty much everything on Spotify is already available elsewhere in FLAC format good for archiving rather than Spotify’s bad lossy compression.

  • bryndos@fedia.io
    link
    fedilink
    arrow-up
    1
    ·
    3 months ago

    Does “activist group” not have soulseek? smh

    Honestly thou i don’t really know what a spotify is, sounds like a dogging app.

  • Brewchin@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    3 months ago

    The same Anna’s Archive that allows free anonymous downloads that are throttled to the speed of a 1990-era modem unless you pay?

    Yes, I’m sure preservation and social good is their goal. Definitely not about making money.