Cory Doctorow on the Right — and Wrong — Way to Criticize AI

Cory DoctorowI went to the Consumer Electronics Show (CES) in Las Vegas this year with Ed Zitron. He brought some of his favorite tech critics to go and make fun of it for his podcast. Everything at CES this year was just a chatbot integrated into a thing: a chatbot in a toy, a chatbot in an appliance, a chatbot in a brick wall. Our question for all of them was what they’ll do if OpenAI goes under, or if OpenAI starts charging a hundred times what they do now for tokens. And they all said they’d switch to Chinese models — which, so long as you never ask your robot companion about Tiananmen Square, may or may not work.But the point is that if you can use a Chinese model, you can use a local model. The heart of AI mania is not just a bet that you can use automation to replace a worker but also that automation can be proprietary to the firm you’re investing in. If you can use automation to replace a worker, but the firm that generates that automation cannot capture the value of that displaced worker’s salary, then that’s important economically — and it’s important for the workers certainly — but I don’t really understand what the investment story is there. If that was your investor pitch, I don’t know where you’d get the $2 or 3 trillion that Sam Altman says that you’d need to spend in order to make the industry actually do what it claims it can do.Cory DoctorowThe argument goes that AI companies that take works to train on are breaking copyright law as it stands, and I think many people do not understand how weak and contentious a legal argument that is.You can break AI training into three steps, all of which I think are arguably legal under copyright, and all of which are used for legitimate activities that I think most of us are very happy exist in the world.The first step is scraping the internet and making transient copies of words. If you can only scrape the web with explicit permission from the copyright holders to the works you’re scraping, then Google is the last search engine we’ll ever get, because no one else will have the capital and the reputation to get those permissions. We would also lose our archives. Making copies of, say, a corporate website before and after the Trump administration comes in and seeing everything they’ve changed about DEI and labor policy and fairness — that’s a socially useful activity that you only get if you’re allowed to scrape the web.Step two is performing a mathematical analysis on a work. In the case of a large language model, it’s counting words, how far apart those words are, and how often one appears near another — within one word, within two words, and so on. You don’t need a copyright holder’s permission to derive facts from a creative work. You can count all the adjectives in the lyrics on a CD or make a dictionary citing where each word was first used, and all of that would be extinguished if we created a new regime where those activities require permission.Labor rights didn’t arise because we got labor law. Labor rights arose because we asserted the rights, and then the law followed.And then, the final piece of making a model is publishing the facts. Software is a literary work; that’s why it’s covered by copyright. A model is a literary work full of facts about other literary works. It’s basically the proximity of words in some abstruse vector space that is populated by counting all the words in everything we could find. And again, publishing compendia of facts about copyrighted works is not a thing that you need copyright permission for.Some people may disagree with me, and even people who agree with my legal analysis might wonder if we can solve our problems by wordsmithing a law that would preserve all of these beneficial activities and still prohibit the creation of AI models. And my answer to that is no.We have been expanding copyright for forty years. Copyright covers more kinds of works — it covers more uses of those works — and statutory damages are higher and easier to secure. The media industry that pushed for those copyrights is larger and more profitable than it’s ever been, and the share of income going to creative workers is lower than it’s ever been.The answer to this seeming riddle is that giving more bargainable rights to creative workers in a market dominated by five publishers, four studios, three labels, two companies that control all the apps, and one company that controls all the e-books and audiobooks is like giving your bullied kid more lunch money. There is no amount of lunch money you can give that kid that will get them lunch, because the bullies will take it away.The media industry is somewhat explicit about this. When Midjourney was sued by Disney and Universal, I got a press release from the CEO of the Recording Industry Association of America that basically said, “We’re really disappointed that Midjourney took all these creative works from media companies rather than licensing them, because we could have just done a partnership.” It’s not that media companies don’t want to use AI to replace creative workers; it’s that they want to get paid for the training data and, presumably, to have some guardrails on the model that gets produced.Really, any time you’re pushing for legislative action that your boss likes, you should ask yourself if you’re on the right side.

Source link