Context Window 5

This edition examines the ALCS report on authors and AI training, Pleias’s openly-licensed and energy-efficient LLMs, new content deals from OpenAI/Future Plc and Bertelsmann/ElevenLabs, the risks of AI-generated podcasts, and practical ideas for working creatively with AI.

The big publishing news this week was the publication of the Authors’ Licensing and Collecting Society report A Brave New World, based on a survey of more than 13.5k authors. The report rightly calls for transparency and remuneration when authors’ works are used in training AI, but its calls for attribution and credit do not fully address the difficulty of attributing specific influences within a vast training corpus. Most AI systems do not retain or reference individual works directly but instead derive patterns and insights from large datasets, making it nearly impossible to trace a specific contribution back to an individual author: Benedict Evans argued over a year ago that training data is valuable in aggregate. This practical challenge points toward a need for compensation models that don’t rely solely on direct attribution, such as collective licensing. ​ Interestingly, at the same time as this debate, European AI company Pleias released a new family of LLMs which stand out for being trained exclusively on openly licensed data, and for their energy efficiency, with the most capable of the models having approximately 12% the carbon footprint of Meta’s Llama 3.2. It’s encouraging to see these factors emerging as points of competitive differentiation. ​ Two interesting deals were announced in the last week. OpenAI is working with the FTSE 250 publisher Future Plc. The deal integrates Future content with ChatGPT search outputs, and includes use of OpenAI tools for sales, marketing and publicity. The more interesting aspect to me is the development of chat interfaces for specific content brands, such as Tom’s Hardware. I spoke to another publisher this week which is shifting from viewing AI as a productivity enabler and/or licensing opportunity, to integrating it with their core product. We’ll see more of this trend in 2025. ​ The other big announcement was Bertelsmann’s global deal with AI audio platform ElevenLabs: 36 business units within the media group are already using ElevenLabs tools, and the platform is setting the pace for audio AI developments. ​ On the subject of audio, CJR ran an interesting piece this week on the risks of AI-generated podcasts being used for misinformation. The ease of doing so highlights the growing responsibility of publishers to ensure trustworthiness in AI-generated content. ​ Incidentally, if you haven’t tried this yourself, Google’s NotebookLM can produce a reasonably effective podcast-style audio discussion from minimal input data: I recently used this as part of a consulting project, and there’s a good tutorial here if you want to give this a try. ​ Concluding with a practical focus, this HBR piece has twelve ideas for working creatively with AI—I particularly liked number eight, using AI to get a response from a different perspective (asking AI to take the opposing side of an argument is one of the things I find it most useful for).

This was originally published in my email newsletter. To receive weekly updates on how AI is affecting the publishing industry, sign up here.

Written on December 6, 2024