• MrScottyTay@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      ·
      14 days ago

      Sadly I think it’s more that there isn’t really a standard way to buy books and other media in bulk at the scale of which AI training usually requires. So the companies realise they can save both time and money in just pirating after calculating the fine risk. Its just a bonus that they usually get away with it and that the fines would likely be cheaper than a legit transaction. But i do think it’s the bulk data packaging that makes piracy look more attractive to them at the get-go.

      Heck, even video game publishers often source their roms for their official re-releases from pirated copies because pirates are better at preserving data and keeping it in a nice friendly format. Easier to search for it on the web and download it then it is too goo into their own archives and rip it themselves, if they even still have original copies, cause they sure as hell didn’t keep their source code.

    • UnspecificGravity@piefed.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      14 days ago

      It’s not stealing when corpos do it.

      Meta torrented their training data from the pirate bay. Hell, Spotify initially built their catalog from pirated music. They all do this shit. Corporations are built to steal our shit and sell it back to us. This isn’t any different from pumping oil out of pubic lands and selling it back to us.

        • Grimy@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          14 days ago

          In his June ruling, Judge Alsup agreed with Anthropic’s argument, stating the company’s use of books by the plaintiffs to train their AI model was acceptable.

          “The training use was a fair use,” he wrote. “The use of the books at issue to train Claude and its precursors was exceedingly transformative.”

          However, the judge ruled that Anthropic’s use of millions of pirated books to build its models – books that websites such as Library Genesis (LibGen) and Pirate Library Mirror (PiLiMi) copied without getting the authors’ consent or giving them compensation – was not.

          Pirating isn’t but training on copyrighted works is fair use, you just have to buy them.