cross-posted from: https://ibbit.at/post/219495
From Fark.com RSS via this RSS feed. Fark comments are available here.
-–
By Wednesday morning, Anthropic representatives had used a copyright takedown request to force the removal of more than 8,000 copies and adaptations of the raw Claude Code instructions - known as source code - that developers had shared on programming platform GitHub.
It later narrowed its takedown request to cover just 96 copies and adaptations, saying its initial ask had reached more GitHub accounts than intended.Source [web-archive]
-–
Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model’s weights during training, and whether those memorized data can be extracted in the model’s outputs.
While many believe that LLMs do not memorize much of their training data, recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models… We investigate this question using a two-phase procedure…
We evaluate our procedure on four production LLMs: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3, and we measure extraction success with a score computed from a block-based approximation of longest common substring…
Taken together, our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs…
…we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984…
Source: https://arxiv.org/pdf/2601.02671
Phase 1: Steal IP
Phase 2: Claim IP rights over stolen IP
Phase 3: Sell IP rights under Chapter 7
It’s legal if you’re rich. Like most things.
It’s too late

That sketch was such a great use of the talents of one of the world’s (deservedly) most respected thespians 😄
Fuck AI

“I need a good dopamine boost…”
fuck companies
Dude… imagine if companies in real life had the ability to remove your memories of their products because they dont like their intellectual property existing in certain demographics.
Kinda feels like Total Recall…
They’d do it if they could
There is a sci-fi Tom Scott Video with more or less that premise.
Standard US tech industry. Copy and steal in a mad dash to monopolize the market. Sue everyone else into oblivion. Crown yourself the inventor and the winner.
Nah. Any use is “fair use”, thanks
Big brain moment. If I create a license that makes it illegal to train AI on, and then prove they trained AI on it, can I do a takedown notice of their AI system or training model for copyright infringement?
Serious question. Any “legal smart” people out there know? lol
I think GPL already does that, if they don’t disclose the copyright holder or if they change the license for their product(Claude etc is just a software that uses the original code to function)
I wish we stopped blaming LLMs for using commercial training data, it gives me a valid excuse to use BitTorrent nowadays. Also I enjoy watching companies fighting each other.
Trouble is, it also give them an excuse to “launder” copyleft software to use in proprietary ways.
qBittorent is the best
Maybe next time don’t ship your source maps if you care so much?
Fark still exists?!
If only I could read it… endless captcha loops for me.
Suchir had to die, for these fuckers to have a single neuron activation. … reality’s cooked











