Context Window 37
It’s been a significant week of contrasting AI developments on both sides of the Atlantic. In the US, the administration moved swiftly toward a more deregulated environment, signalling its preference for Silicon Valley-friendly policies on AI training and licensing. Europe, meanwhile, saw new releases emphasising transparency, environmental responsibility and public benefit, underlining a growing divergence in approaches and creating important challenges and opportunities for global publishers. The US government published its AI action plan, which was broadly as expected but did not address copyright or intellectual property. However, President Trump gave a clear sense of the administration’s view in a speech the next day, stating that paid licensing of training data was not viable. His argument came straight from Silicon Valley talking points, in particular the view that models training on copyrighted material is no different to a human learning from a book. How the administration’s view affects ongoing litigation is unclear, but late last week, the judge in the Bartz v. Anthropic case granted a motion to allow a class action representing US authors and rights holders whose work was downloaded from shadow libraries—nearly 7 million books. The scale of the class presents questions: the Authors Alliance highlights some of the practical challenges here. If your work is part of this class, you may be contacted soon and face important decisions. A Swiss consortium announced the release of a new, open LLM for the public good, developed in compliance with Swiss copyright and data protection laws and the EU AI Act. The training data set will be transparent and reproducible, and comprises over 1,500 languages, so the model may offer both a more ethical alternative to existing LLMs and global utility. Much remains unclear, including the name of the model and its performance versus larger peers, but this is a really distinctive approach and will be worth following. In a second European development, French LLM Mistral released a comprehensive, peer-reviewed study of its environmental footprint. It shows that generating a single page of text represents about 1.14g CO2e—the equivalent of about a minute of watching streaming media, which few of us would think twice about. The same prompt uses about 0.05l of water, an order of magnitude better than previous estimates. It’s also likely that Mistral’s report understates wider impact as it won’t have the full economies of scale of larger models, but hopefully its bigger competitors will release similar data. Substack published a new survey of how writers and publishers on the platform are using AI—interesting for other publishers because of what an important channel Substack has become for book authors: “Based on our results, a typical AI-using publisher is 45 or over, more likely to be a man, and tends to publish in categories like Technology and Business. He’s not using AI to generate full posts or images. Instead, he’s leaning on it for productivity, research, and to proofread his writing.” (I feel seen.) For any publishers thinking about their use of AI, Matt Ballantine has published a good framework for selecting the right approach to AI, based on four options: plan, follow, tinker, adapt. Matt argues innovation typically comes from tinkering and adapting, but most businesses prefer planning and following. Read his piece for some smart thinking on how to resolve that disconnect. Thanks to friend and subscriber Matt Haslum for sharing the new Shutterstock/D&AD report on AI and creativity. This is largely aimed at the wider advertising and design sectors. Pages 48–49 include a useful set of transferable creative principles. For publishers, embracing the report’s findings—”push for nuance, weirdness and edge”—could help content remain distinctive in an increasingly algorithmic media landscape. New research analyses 500k prompts across a range of LLMs to show where AI models draw their responses. This is a fundamental question for anyone wanting their content to visible within model outputs. Over a quarter of data sources cited were from journalistic content: if you want to influence outputs, that’s where to start. However, academic and research content, including journals and books, didn’t rank among the top three sources for any of the half dozen query types examined. I came across an interesting approach to AI innovation this week: a new incubator programme pairing AI startups with writers and editors, largely from news publishing, who can help them tell their story. This takes quite a punchy view of the value of narrative, but it’ll be interesting to see what the incubator produces. For academic and university press readers, the American Association of University Professors published a major report on AI, with clear recommendations around professional development, governance and transparency. Finally, the coolest AI development I saw this week is Google’s new model Aeneas, which interprets damaged and incomplete Roman inscriptions and fills in missing words and context. Though niche, it’s a powerful illustration of how AI can extend the value and longevity of specialist, archival, or niche content. Caecilius is in the training data, not just in horto.
This was originally published in my email newsletter. To receive weekly updates on how AI is affecting the publishing industry, sign up here.