Context Window 6

This edition rounds up a packed week of tech news—OpenAI’s Sora rollout, Google’s Gemini 2 and Deep Research, and Reddit’s new AI search tool—alongside new public-domain training datasets, AI deal trackers, the impact of AI clauses in publishing contracts, and a four-part model of how organisations use AI.

It’s been a big week on the technology side. OpenAI is halfway through a seasonal 12 days of product announcements: a full round up will follow after day 12, but the biggest announcement at the time of writing was OpenAI rolling out its Sora video creation tool in most markets except the UK and Europe (owing to regulatory issues: the Online Safety Act in the UK and the Digital Services Act/GDPR in Europe). Meanwhile, Google released Gemini 2, which offers improved multimodality, audio and image creation, and support for AI agents. There’s also a new tool, Deep Research, intended to facilitate researching and compiling complex reports, which looks directly relevant to various publishing use cases. Historically, I’ve seen Gemini as something of an also-ran (despite running my business on Google Workspace). For most tasks I’ve defaulted to ChatGPT or Claude, and I can’t justify subscriptions to every major model. But it’s a good prompt to revisit Gemini’s capabilities. The tech announcement this week that I was most interested in from a publishing perspective is Reddit releasing an AI tool to search and summarise content from its platform. Their data has previously been licensed to developers including Google and OpenAI, and the new tool shows that extracting value from licensing is not incompatible with use of the same data for product development. Secondly, as a wisdom of crowds thing, access to a platform that generates 500m+ posts annually is a really interesting market research tool for publishers in terms of identifying trends and understanding sentiment. At times, you can debate the level of wisdom involved on Reddit, but it’s a very large crowd… Closer to publishing, a couple of interesting announcements regarding training data this week. Harvard Law School’s new Institutional Data Initiative is creating a dataset of over 1 million public domain books from its collection. The HLS announcement doesn’t mention this, but the project is supported by Google and OpenAI. Similarly, Authors Alliance, supported by the Mellon Foundation, has announced plans for a public interest training commons. Besides their immediate application for model development, robust public domain datasets undermine the position taken by some AI developers that scraping copyrighted content is a necessity for model training. Not that this will stop licensing or litigation any time soon. I just did an end of year update for a client and found a couple of good resources to bookmark: Press Gazette has a tracker of which media companies have done deals and which have gone to law; it doesn’t really address books and journals, but Ithaka has its own useful deal tracker for scholarly publishing. Having found the Authors Alliance Substack through the announcement above, I was also really interested in their post on how AI clauses in publishing contracts could restrict scholarly research, which also includes a practical proposal for a contractual framework, which would be a good starting point for a discussion. Finally, for anyone interested in a deeper read on how AI changes management, Johns Hopkins academic Henry Farrell published a really good piece with a four-part model for how AI is used in organisations: micro-tasks, knowledge maps, prayer wheels and translation. It’s worth reading in full, as a short summary won’t do it justice. However, all four concepts resonate with organisational behaviours I’ve observed in publishing.

This was originally published in my email newsletter. To receive weekly updates on how AI is affecting the publishing industry, sign up here.

Written on December 13, 2024

Tags: Artificial Intelligence, Newsletter, Publishing