Yes, but how much of the training data is synthetic data? Because I expect this startup has no idea. Microsoft uses ML to crawl files on OneDrive to build aggregate models of document types, then use that for LLM training.
It’s just all slop all the way down, huh? Just a fuzzy picture of a fuzzy picture hit with the “sharpen” filter 20 times?
Yes, but how much of the training data is synthetic data? Because I expect this startup has no idea. Microsoft uses ML to crawl files on OneDrive to build aggregate models of document types, then use that for LLM training.
It’s just all slop all the way down, huh? Just a fuzzy picture of a fuzzy picture hit with the “sharpen” filter 20 times?