Context Window 8
For anyone taking stock of AI developments as the year turns, Simon Willison’s end-of-year roundup is phenomenally good. I recommend reading it in full as it gives a really comprehensive view of what happened and why it matters, but there are several points highlighted below which are particularly relevant to publishers.
The cost of using LLMs has dropped: GPT 4o now costs $2.50 per million input tokens, versus $30 for the less capable GPT 4 a year ago. Willison gives a worked example of generating image descriptions—the cost to generate nearly 70,000 descriptions is now under $2.00. Besides this specific opportunity which is very relevant to any publisher working on image accessibility, decreasing cost is an opportunity for publishers of all sizes to experiment with AI. That cost reduction is driven in part by increased efficiency of models, such that environmental impact of training models is being reduced. It doesn’t mitigate the impact of increased datacentre usage, though if Willison is correct, some of those new datacentres may not be needed (I really liked his comparison with the rail bubble in the nineteenth century).
Increasingly, model developers are using AI-generated data to train models, without seeing the degradation in performance that some experts feared. For publishers, the questions are whether this lessens the use of copyrighted material, and thus how this affects the licensing value of content—I suspect specialist datasets hold their value, but there could be lower demand for more general corpora: “Careful design of the training data that goes into an LLM appears to be the entire game for creating these models. The days of just grabbing a full scrape of the web and indiscriminately dumping it into a training run are long gone.” Twelve years since it stopped publishing its print edition, Encyclopedia Britannica is getting more users than ever and using AI across a range of functions—for me, this New York Times profile raised more questions than it answered (for example, fact-checking feels like a super high risk use case), but it’s an interesting case study of successful digital transformation.
In my training courses, I talk about AI applications being somewhere on a 2x2 grid where the axes are general to industry specific, and automative to generative. On similar lines, this piece usefully breaks down the differences between generative and analytical AI, and where each approach is best employed.
Finally, a small productivity tip for ChatGPT users. In mid-December, OpenAI rolled out a new feature called Projects, which allows users to arrange chats into folders for individual projects, with the ability to set custom instructions on a project-by-project basis. Having everything organised by project (or book, or client) is a nice step forward: the only limitation is that it is currently for individual use only. With an ability to share projects across users in a publisher, this would be ideal for organising AI use per book. Let’s hope that comes in 2025.
Thanks as ever for your attention and all good wishes for your publishing this year.
This was originally published in my email newsletter. To receive weekly updates on how AI is affecting the publishing industry, sign up here.