• koper@feddit.nl
      link
      fedilink
      arrow-up
      12
      ·
      3 months ago

      Ultimately, Judge William Alsup ruled that this destructive scanning operation qualified as fair use—but only because Anthropic had legally purchased the books first, destroyed each print copy after scanning, and kept the digital files internally rather than distributing them. The judge compared the process to “conserv[ing] space” through format conversion and found it transformative.

      • ChicoSuave@lemmy.world
        link
        fedilink
        arrow-up
        5
        ·
        3 months ago

        It’s literally the process that allows digitized media to be safe to possess. Someone read the FBI warnings before movies on VHS. This is some corporate malicious compliance and what the law looks like when taken to an absurd extreme.

    • MartianSands@sh.itjust.works
      link
      fedilink
      arrow-up
      5
      ·
      3 months ago

      That depends on whether you consider an LLM to be reading the text, or reproducing it.

      Outside of the kind of malfunctions caused by overfitting, like when the same text appears again and again in the training data, it’s not difficult to construct an argument that an LLM does the former, not the latter.

      • awesomesauce309@midwest.social
        link
        fedilink
        English
        arrow-up
        4
        ·
        3 months ago

        It’s rare a person on social media understands they turn the input into predictive weights, and do not selectively copy and paste out of them.

      • Arthur Besse@lemmy.mlOP
        link
        fedilink
        English
        arrow-up
        3
        ·
        3 months ago

        models can and do sometimes produce verbatim copies of individual items in their training data, and more frequently produce outputs that are close enough to them that they would clearly constitute copyright infringement if a human produced them.

        the argument that models are not derivative works of their training data is absurd, and the fact that it is being accepted by courts is yet another confirmation that the “justice system” is anything but just and the law simply doesn’t apply when there is enough money at stake.