Thousands of writers including Nora Roberts, Viet Thanh Nguyen, Michael Chabon and Margaret Atwood have signed a letter asking artificial intelligence companies like OpenAI and Meta to stop using their work without permission or compensation. "It says it's not fair to use our stuff in your AI without permission or payment," said Mary Rasenberger, CEO of The Author's Guild. But some literary figures are willing to fight the tech companies in court. Authors including Sarah Silverman, Paul Tremblay and Mona Awad recently signed on as plaintiffs in class action lawsuits alleging Meta and/or OpenAI trained their AI programs on pirated copies of their works.
AG Recommends Clause in Publishing and Distribution Agreements Prohibiting AI Training Uses The Authors Guild has drafted a new model clause to prohibit the use of an author’s work for training artificial intelligence technologies without the author’s express permission. Authors and agents can request this clause be added to their contracts.
I think currently the AI companies are passing the buck onto the creator of the AI-generated content. So the AI companies are saying, we're not infringing on copyright when we feed this book or that book into our machines, but if you use the AI output, then you should seek permission from us. https://nolongerset.com/chatgpt-copyright/
Would it bother you if your published book was used as text data in an AI language model without your permission?
It should be fraud really. When you buy a book, you own the copy, but the copyright law forbids you from doing certain things like making copies or using it for any sort of commercial purpose. AI training is a commercial purpose because the models are making them money. So, I'm not sure why it's legal. Even if it does become illegal, how do you prove that they didn't do it? The AI scientists don't care. They're here to make history and make a bunch of money in the process. Our wishes as authors do not matter to them. We're nothing in their eyes. Contrary to what OpenAI's name suggests, their models are closed-source, so there's no way to tell what data they trained ChatGPT on.
How did these authors find out their works were used to train AI? How do we, as writers, find out if our own work had been used like this? I think this AI thing is so wrong when it comes to writing. It seems like if you use AI, you might not even know whose work you're stealing, but I think you should care. I've also hear that any prompt or piece of writing you upload to these AI "writing" programs then retains and uses what you've uploaded to train itself. I won't touch this AI stuff as a writer. I've read about it, but I have zero interest in trying it out, personally.
You can ask the AI to make quotations from the books. If it can produce accurate or close-enough sentences, it's been trained on the data. But it isn't a reliable indicator as the creators can re-train the model to dodge regeneration of original material. OpenAI has demonstrated that they have the ability to influence responses hundreds of times.
A lot of us probably have some of our writing online that could have easily been absorbed by AI. And, again, you load anything to AI, it will use it to keep trying itself. Also, with whatever it produces as a result. This video was pretty interesting. Wanted to share it.
It's not a database A database is a file structure that can be queried by an accompanying piece of software to process data (get them, show them, remove them, etc.) AI uses a "dataset". It's something like a database but it doesn't work in the same way. The dataset basically contains convoluted structures of data that the AI yielded during its training from sources of data like novels, articles, blogs and any piece of digital text they feed it. It's not easily readable and accessible like a traditional database. But the AI understands it, of course, so the obvious thing is to use it as an interface to the dataset by simply asking it, as I said.
This AI learning and copyrights is one of the core issues in the Screenwriters' Guild strikes. I support their position.
(I accidentally posted this in the other thread too but meant this one) Some of you may already know this or tried this, but I asked ChatGPT about some of these topics: Source of Training Data Default (GPT-3.5) User Do you remember the source of text you were trained on? ChatGPT I don't have access to my training data, but I was trained on a mixture of licensed data, data created by human trainers, and publicly available data. My training also involved large-scale datasets obtained from books, websites, and other texts to develop a wide-ranging understanding of human language. User What do you mean by you don't have access to your training data? Do you not search your training data to provide humanlike responses to the questions that people ask you? ChatGPT You're correct, I use the knowledge and patterns learned during my training to generate responses. However, I don't have the ability to search or directly access specific details about the individual sources that were used in my training. My responses are based on a mixture of general knowledge and information from various sources, but I can't pinpoint the exact origin of any particular piec e of information.
This whole lawsuit business is stupid. They don't have the first clue as to how LLMs work. Suing the AI companies is akin to suing John Q. Public professor in Bumbfuck, Idaho for using Stephen King's On Writing to teach his class full of twenty something year old brains. LLMs are just really large brains being taught. Right now, feeding these brains *insert random books here* is akin to teaching a two year old to talk. No one has stolen anything.
Honestly, no. Simply because my husband has a Masters in Machine Learning and has explained in detail what LLMs are and how they learn.
You know, that's a very common comparison. Often in the form of "well what about human artists that read books?" I think the important differences are lost when that simplification of terminology is used, when machine-learning is explained as if the machine is a human. No two-year-old is fed thousands of books line by line nor are they digitally omnipotent, for one. Another fact is that machine-learning will never have the same in/out functionality of a human being. It's human beings that have the right to consume literature and be inspired by it, not anything else. At the same time, when the author of Tardigrade Boyfriend 14: Small Butts in Venice complains that AI generated works aren't real art and are destined to hurt the industry, I'm hardly sympathetic. AI's literary deficiencies mean it's not a threat to anything other than the most predictable of formats and language, and who needs those anyway? Edit: to bolster my point about the human/AI distinction rather than similarity, consider how many calories a child would need to eat to compete with ChatGPT: https://www.washington.edu/news/2023/07/27/how-much-energy-does-chatgpt-use/#:~:text=Overall%2C%20this%20can%20lead%20to,of%20over%201%2C000%20U.S.%20households.
In tour example, unlike with machine learning, Mr. King will be paid for his work writing the book, through the royalties on the sale of the textbooks. Using an author's wrote without compensating them is theft. At a minimum these companies should be required to purchase the book, so the author recieves something for their work. That is completely ignoring the infringement of the works copyright. The copyright should entitle the author a fee. What these companies are doing is comparable to buying a DVD then playing it publically while charging an entrance fee.
Surely, the holder of a copyright has the right to determine how their book will be used. The lawsuit doesn't say "you can never use our books" - it says "you cannot use our books without permission or compensation."
That’s the thing… these authors have been compensated. 98% of these data sets were BOUGHT bulk. Also, once bought we relinquish any control.
No, the professor isn’t having the students but the book, he’s simply teaching from it and yet no one is rushing to sue him… because it would be frivolous.
What’s being consumed/used isn’t the same as the output being generated. My quote is accurate, LLMs are akin to that two year old.
thing is they weren’t bought bulk at all, what has authors annoyed is that the content was scraped without permission or compensation someone wants an AI to read one of my books after purchasing it. Not a problem so long as it doesn’t regurgitate verbatim it for a user claiming it to be original work but that’s not what’s happening here The fact that your husband has a degree in machine learning doesn't make him or you an unquestioned authority on everything to do with AI, nor does it trump the very many experts included in the preparation of the lawsuit in question.