Robot reading books from a computer screen

Authors Are Furious After Finding Their Works on Huge List of Books Used To Train AI

A search tool has been developed to allow authors to search through a dataset of books used to train artificial intelligence, and many authors are infuriated to find that their books are on the list. The dataset, known as Books3, contains a staggering 191,000 copyrighted books, which were used to train a large language model developed by Meta. This was done without the knowledge or consent of thousands of authors who had their works stolen and fed into and copied by a machine, which now relies on all that copyrighted material to be able to generate text in response to prompts by users.

Recommended Videos

The dataset has already sparked a lawsuit from writers Richard Kadrey, Sarah Silverman, and Christopher Golden against Meta. The lawsuit alleges that Meta secretly used copyrighted works obtained through a pirating website to train its AI. Additionally, it asserts that AI utilizing these books for training constitutes a violation of copyright infringement because the machine isn’t able to function or output any text without the expressive information it pulls from copyrighted materials. The lawsuit may finally result in definitive answers about whether AI being trained from existing works constitutes copyright infringement.

Many artists and writers believe it is copyright infringement for their works to be used to train AI, even if the text AI generates doesn’t resemble their works. It’s not just a matter of copyright infringement but also consent. These writers did not consent for their work to be used for this purpose. It was done secretly, with Meta even lying that it only trained AI using publicly available, open-source materials—which copyrighted books are not. Meanwhile, Meta offered no compensation to these authors for using their works to train AI. Further lawsuits may be forthcoming now that a new search tool allows more authors to find if their works were used in the dataset.

Authors react to finding their works in Books3 dataset

The Atlantic‘s Alex Reisner has designed a new search tool that allows authors to search the Books3 dataset to see if their works were used. Reisner was able to identify 183,000 books in the dataset and their authors. Using this information, he created a tool that authors can use to search their names to find if their books are part of the 1830,000. However, it’s important to remember that the list is incomplete, and there’s a very small chance it could result in false positives. Many authors have already begun using it, though, and the reactions have been swift and furious.

Many of the authors express the same sentiments of outrage and disgust. Many of them have devoted their lives to writing and spent years and years getting their books on shelves just to learn that they’re being used to train AI. Not only that, but many writers are already concerned about how AI threatens their careers. Now, without their consent, they’re being used to help train and aid in the rapid advancements of the very same machines that they fear could replace them in the future. Other authors mentioned the many problems plaguing the writing industry and how many are underpaid and hardly earn enough to survive. On top of all that, they now have to deal with their work being stolen from them without compensation.

One particularly interesting perspective came from Kelly Jensen. The author mentioned that the books of hers that were stolen were also being banned and challenged. Book bans have been sweeping across the nation, resulting in many authors facing censorship and having their artistic freedom taken away. It’s horrific to think that Jensen’s actual book isn’t allowed on bookshelves in some states, but pieces of her book can be potentially stolen, repurposed, and make their way to a library in a different form to profit some thieving AI developer.

All of these authors’ reactions are understandable and valid. They are already facing many challenges, from book bans to being paid unlivable wages. Now, AI developers, expecting to make enormous amounts of money from their AI systems, have to add another injustice to writers’ plates. These writers did not give permission for companies like Meta, Bloomberg, and OpenAI to use their works, and they did not want these companies to use their works. One would think these demands are reasonable and should be easy to respect. Yet, AI has blatantly disrespected them and violated these authors’ life works. Sadly, many of these authors may not have the resources to take legal action. We can only hope that the current lawsuits against these companies will allow thousands of authors to experience some form of justice.

(featured image: josefkubes / Getty)

The Mary Sue is supported by our audience. When you purchase through links on our site, we may earn a small affiliate commission. Learn more about our Affiliate Policy
Image of Rachel Ulatowski
Rachel Ulatowski
Rachel Ulatowski is a Staff Writer for The Mary Sue, who frequently covers DC, Marvel, Star Wars, literature, and celebrity news. She has over three years of experience in the digital media and entertainment industry, and her works can also be found on Screen Rant, JustWatch, and Tell-Tale TV. She enjoys running, reading, snarking on YouTube personalities, and working on her future novel when she's not writing professionally. You can find more of her writing on Twitter at @RachelUlatowski.