NostraDavid

NostraDavid@programming.dev · 4 days ago

Note: if we throw out copyright, it’s OK for LLM companies to just steal any and all data under the sun. Example: Anthropic buys books, and just digitizes them, then uses that data to train their LLMs

It’s a nasty side effect.

On the other hand: No company can claim anything you’ve generated from their LLMs.

NostraDavid@programming.dev · 7 days ago

The claim was “Yet most AI models can recite entire Harry Potter books if prompted the right way, so that’s all bullshit.”

In this test they did not get a model to produce an entire book with the right prompt.

For context: These two sentences are 46 Tokens/210 Characters, as per https://platform.openai.com/tokenizer.

50 tokens is just about two sentences. This comment is about 42 tokens itself.