Tech companies agree to feed news stories to AI bots. A great concern for journalists
On Nov. 30, 2022, San-Francisco-based OpenAI released ChatGPT, a chatbot capable of generating text almost indistinguishable from that written by a human. It wasn’t perfect — the bot had a tendency to ”hallucinate facts” — but still prompted journalists to wonder whether the bot might eventually take their jobs.
At least journalists could take comfort in the fact that they had something ChatGPT didn’t: currency. Back in 2022, the AI chatbot couldn’t access real-time information, having only been trained on information up until September 2021.
That all changed in September 2023, when OpenAI announced that the chatbot was now up to date. Users could ask the bot questions about current affairs.
Later that year, the New York Times sued OpenAI and Microsoft, alleging in a court filing that the tech companies had unlawfully used the Times’ copyrighted articles to train its software. Microsoft and OpenAI sought to “free-ride on the Times’ massive investment in its journalism … without permission or payment,” reads the complaint.
The lawsuit added an additional layer of complexity to the discussion surrounding journalism and AI: the technology posed a threat not only to individual journalists and their livelihoods, but also to the field itself, by providing access to hard-won articles for free.
The chatbot’s outputs were “similar, but not identical” to Times articles, responded OpenAI in a motion. “Contrary to the allegations in the Complaint, ChatGPT is not in any way a substitute for a subscription to the New York Times,” the motion notes. “One cannot use ChatGPT to serve up Times articles at will.”
The motion added that it is “perfectly lawful to use copyrighted content as part of a technological process that (as here) results in the creation of new, different and innovative products.”
Journalism advocates are not convinced.
Generative AI is a threat to journalism and everything it stands for, said Courtney Radsch, director of the Center for Journalism and Liberty at the Open Markets Institute, a Washington, D.C.-based non-profit. In an interview with the Star, Radsch argues that AI companies are operating on a “model of theft” — and getting away with it.
Legislators need to step up, Radsch said, otherwise journalism’s business model — and subsequently democracy — will be at risk.
What’s the problem?
Artificial intelligence systems are built on the theft of intellectual property from publishers — from the news sector.
Journalism is a core and fundamentally important part of the training sets and models that underlie AI systems. It is essential to make AI applications like search, chatbots and other sorts of products and services that rely on contextually accurate, timely, relevant information.
And yet (large tech companies) are building their products and services on a model of theft. They are creating value to their investors, creating their product using journalism, and yet they have not paid license, they have not received permission, and we know that this information is being taken from news outlets and journalists, even when it was put behind a paywall.
If we don’t address this fundamental problem with how the AI market is developing, there are a series of problems. There will be no business model for journalism, and this critical democratic institution will devolve.
Not having journalism (in AI training) is also not the answer. If you don’t include news and quality information in AI systems, if they’re trained on too much synthetic data, the systems will degrade and potentially collapse. And they will regurgitate problematic information.
How does this trend affect the average consumer?
The average consumer relies on journalism for a wide variety of things. In places where you don’t have journalism, there is more corruption, more polarization, less civic discourse. People are less informed, less able to participate as an active citizen in their democracy.
I feel like it’s a little bit far removed from their realities because it’s hard for people to even understand the value of journalism in their communities. A lot of times people don’t necessarily know the role that journalism plays in making their communities safer or more accountable.
Do you see any benefits of AI for journalists?
AI has been in newsrooms for a long time. Newsrooms and journalists have been using AI to help with translation and transcription. There’s all sorts of potential for AI in terms of investigative work such as information analysis and retrieval. But I think that integrating generative AI into the newsroom in ways that replace human creativity or human ingenuity and intelligence is problematic.
The problem is that these companies are creating partnerships with news organizations to integrate their tools into the newsroom. It’s problematic because it furthers dependencies on the same big tech platforms that strip them of their value and constrain their independence in the social media era.
Just playing devil’s advocate here: Why is it so bad if journalism jobs are replaced by AI?
AI is based on prediction; it enforces a mediocrity and genericness because, again, it’s the prediction of what is most likely to occur in that string of text, or that relationship between an image and a word. I get concerned about what that’s going to do in terms of outputs … cutting off the outer edges, the extremes. You’re going to see this vast middling of human creativity.
If you’re creating an AI, (you need to ask:) who gets published? Which ideas get published? It’s overwhelmingly white, overwhelmingly male. There’s a lot of familiarity with this idea about bias and how it is replicated in systems. It goes way beyond just, say, racial bias — there’s ideological bias.
Would you say generative AI is more of a threat than an advantage?
The fact is that generative AI is out there. It is a fundamental threat to the continuation of the news industry as an independent sector of the economy, and I think that it could pose a threat to journalism as a profession and as a human endeavour, if it gets too integrated.
At the same time, there are huge opportunities to do investigations and collaborations and make sense of and discover things in data, and I think that’s really interesting. I don’t think that journalism has the option to just opt out, although I think there will be news outlets and journalists that decide: our value proposition, our uniqueness, is going to be that we are human-first, human-centred, human-only.
What should we do?
We need policymakers to step up and do their jobs. They need to clarify that copyright applies to the data used to train and develop AI systems. We need them to clarify that stealing this data is not fair use. And we need them to impose appropriate market constraints on the power of these tech companies to just completely reconfigure our entire economic and political system by rolling out these really powerful and dangerous products that are not ready for prime time.
In no other industry would you be able to put out a product that plagiarizes, defames and libels, and produces unsafe guidance and advice. Most other industries have testing requirements or safety requirements. They have licensing requirements. And yet somehow we’ve decided that because it’s tech, they’re free from any sort of traditional constraints or safeguards.
Any thoughts on the lawsuit between the New York Times and OpenAI/Microsoft?
I think it’s great. I think we need a lot more lawsuits like that. What I really like about the OpenAI lawsuit is that the New York Times is asking for the destruction of the model. I think that’s very important because we have to take back power in order to control and regulate market actors.
The Federal Trade Commission in the U.S. has already used algorithmic destruction. It has forced companies that have created algorithms based on ill-gotten data or privacy-violating data to destroy the algorithms built on that data. We should absolutely be doing the same thing for these large language models, because they’re not safe and are incredibly disruptive, and there are no safeguards and no constraints.
Is it too late to fix the problem?
I definitely think there’s still time to address the core and fundamental problem; you can certainly require retroactive payment to news publishers. Then you would essentially compel the companies to develop the technology to trace provenance and authenticity. That has all sorts of benefits.
Provenance is being able to tell where the data comes from. And if you can tell where it came from then you can determine: is it really what it is? If you can trace the original author, website or database of that work, then you can then trace authenticity. It will allow you to figure out who should be compensated for the use of that data.
Any final thoughts?
My big concern is that we’re seeing big tech make deals with a handful of big media and everyone else is getting left behind. I know everyone is out for themselves, but that is not in the interest of the news ecosystem, or our society more broadly. What we need is a collective industry approach.
Even the biggest, best papers and news outlets depend on local news that percolates up the ecosystem. That’s why we have to have public policy options. The self-regulation, independent licensing deals is not sufficient, and is going to ultimately be more harmful than helpful.
This article was first reported by The Star