Context Window 30
Thirty weeks of doing this—thanks for sticking with me. Someone asked this week if this newsletter will be going paid, and to be really clear: NO. I enjoy researching and writing it, and it’s a wonderful way to start conversations. However, if it’s something that you find useful, please do share it with your colleagues. Personal referrals make a huge difference to me, and every one is genuinely appreciated. A couple of data points on how publishing is using AI. The Bookseller published the first results from its recent survey of publishing workers. This showed a high level of concern about AI with 58% concerned and only 18% optimistic (interestingly almost an exact reversal of proportions compared to a recent survey on Spanish-language publishing). The Bookseller research also showed 20% of respondents using AI on a daily basis, compared to roughly 10% in the general economy. However, as I highlighted on Bluesky, a 100-person, self-selecting sample raises real questions about representativeness. Best to treat this as directional rather than definitive. One of my issues with the Bookseller data is that it doesn’t break down responses by publishing area or segment. It’s long been my experience that academic publishers and university presses are ahead of their trade counterparts, and this fabulous compilation of 167 academic publishing AI use cases by Helen King confirms that. From metadata to accessibility to peer review, it’s a great benchmark for any academic publisher. Between 1995-2019, Mary Meeker’s Internet Trends report was an annual fixture for analysts and investors. It relaunched this week in a new format: 340 pages of research and data on how AI is being used across the economy. OpenAI announced a slew of new business features for ChatGPT, including integrations with Sharepoint, Teams, Outlook, Gmail, Google Drive and Dropbox, and the ability to record meetings. All of which are highly useful for publishers looking to bring their own data into their AI model’s context window. The documentation is available here. But this level of integration with core systems will require customer trust in data security and control, and on that front OpenAI hit a setback… As part of the ongoing New York Times copyright litigation, OpenAI has been ordered by a judge to preserve logs and data, including those deleted by customers or in temporary, private chats. In addition to chat interactions, this also affects third party services using OpenAI’s API, which may now conflict with some platforms’ data deletion guarantees and customer instructions. Does your business have retention policy/guidelines for AI, and are any of your third party agreements affected by this? If you want maximal control of your data, one option is to run LLMs locally on your computer rather than using online services. I actually spoke to a UK independent publisher recently that was taking that approach, though it isn’t one for the unskilled or faint hearted. If a hardcore approach appeals though, there’s a tutorial and link to resources here. Interesting data here which shows that in recent weeks, Reddit has jumped up to become the second largest source of cited data on ChatGPT, behind only Wikipedia (I can’t quite bring myself to describe it as a source of truth, as the original article does). If you want to influence the model output, shape the training data… Reddit’s data licensing deal with OpenAI may also explain the latest piece of litigation: Reddit is suing Anthropic for unauthorised use of its data. I spoke at a webinar for the Open Institutional Publishing Association this week, which in a wide-ranging discussion addressed the interesting question of how AI training and content sits alongside copyright and Creative Commons licenses. I’m really grateful to fellow panelist Jane Secker for sharing this link to CC resources on the subject. For UK readers, Politico has a good roundup of how the Data (Use and Access) bill has become a legislative pain for the government, but in time possibly also for creators—it suggests that having taken rigid positions, some rightsholder groups may struggle to sell an eventual compromise to their stakeholders… For context though, any UK or other sovereign regulation of AI is likely to attract the attention of the US government, which this week [rebadged the US AI Safety Institute as the Center for AI Standards and Innovation: its remit includes “guard[ing] against burdensome and unnecessary regulation of American technologies by foreign governments”]16. Finally, the most thought provoking thing I read this week was this piece by a developer on how AI is useful and why his smart, thoughtful friends are wrong about its utility. I think it’s interesting to read across from this to professional writing. AI generated text will never compete with great writing (the author John Scalzi made this point this week). But just as this piece argues that some code isn’t meant to be beautiful, a lot of writing is not based on great craft or artisanship, but speed and availability. Maybe AI has little to offer great writers. But a lot of writing is done by people who don’t find it easy, for a whole variety of reasons. Yes, there are profound questions about copyright and training data. But the framing of AI as a thieving stochastic parrot doesn’t fully capture the upside for a lot of users. Back full circle to one of the respondents quoted in that Bookseller research—”If you’ve never experienced what it’s like to constantly feel your brain ‘has let you down’ in a neurotypical world can’t understand how impactful AI is. I personally think it’s life changing for people who are neurodivergent.”
This was originally published in my email newsletter. To receive weekly updates on how AI is affecting the publishing industry, sign up here.