

3·
7 days agoThe claim was “Yet most AI models can recite entire Harry Potter books if prompted the right way, so that’s all bullshit.”
In this test they did not get a model to produce an entire book with the right prompt.
For context: These two sentences are 46 Tokens/210 Characters, as per https://platform.openai.com/tokenizer.
50 tokens is just about two sentences. This comment is about 42 tokens itself.

Note: if we throw out copyright, it’s OK for LLM companies to just steal any and all data under the sun. Example: Anthropic buys books, and just digitizes them, then uses that data to train their LLMs
It’s a nasty side effect.
On the other hand: No company can claim anything you’ve generated from their LLMs.