Recent Posts

Context Window 35

This edition covers Doug Shapiro’s media-trends slide deck and the questions it poses for any creative business, fresh Cambridge University Press polling on public support for AI training payments, an EU antitrust complaint against Google’s AI Overviews and grim data on news click-throughs, OpenAI and partners providing AI training to 400,000 US teachers, hidden prompts being inserted in academic papers to game AI reviewers, Bloomberg on the “tiny teams” era and Anthropic’s Project Vend experiment as a counterpoint, and research showing managers using AI to make decisions about their direct reports. ​ The always-excellent Storythings newsletter linked to this media trends presentation from industry veteran Doug Shapiro. The focus is on the movie business, but there are plenty of transferable insights for publishers. It’s worth browsing, but if you want a quick tl;dr, slides 60, 64 and 65 set out some essential questions for creators in all media. Short term—what happens if GenAI enables over 80% of the quality at under 10% of the cost (p60)? Longer term—in the same way that the web was more than just flat documents online, and film became more than plays performed in front of a camera, what does AI production enable in terms of production and storytelling that isn’t possible today (pp64-65)? ​ Cambridge University Press & Assessment released the results of new opinion polling showing that more than two thirds of the public back making technology companies pay for content used to train their AI models. Only 9% of respondents disagreed. ​ We’ve become used to litigation against AI companies on copyright grounds, but a new legal fight opened with a group of independent online publishers filing an antitrust complaint against Google’s AI Overviews. Incidentally, the Independent Publishing Alliance, which filed the complaint, is not connected to the similarly named Independent Publishers Guild in the books and journals space. ​ To underscore the significance of the complaint, new research showed that the proportion of news searches that don’t result in a click through to a publisher site grew from 56% to 69% in a year: in simple terms, consumers are searching for a topic, reading the AI summary in Google and going no further. ​ OpenAI announced a partnership with Anthropic, Microsoft and two of the largest teaching unions in America which will deliver AI training to 400,000 teachers. Skills development is a huge opportunity for educational publishers and arguably one which shouldn’t be left to the AI platforms themselves. ​ Nikkei found that research papers from more than a dozen academic institutions contained hidden instructions to AI models to positively review the papers, typically obfuscated through white text on white backgrounds, or tiny font sizes. That researchers feel it’s worth taking this step is a recognition that reviewers and publishers as well as authors may be using AI in their work. Extending this out into other areas of publishing such as trade, I wonder what proportion of manuscript submissions already contain hidden instructions: if authors believe that publishers or agents will evaluate using AI, the incentives are the same… Has anyone looked at this? ​ Three linked takes on AI agents this week. Bloomberg ran an essay on how AI tools and agents are underpinning an era of “tiny teams”, where bragging rights for startups are determined not just by valuation but by how few people delivered a project. However, as a corrective to the idea that AI agents are ready to displace employment, Anthropic ran an experiment using their Claude AI model to manage a small shop at their San Francisco office. I guess it speaks well of their humility that they published the results as a case study: a couple of kids with a lemonade stall would not have made some of the mistakes the AI did. Vaughn Tan has a great piece exploring this and arguing that even simple businesses depend on inherently human decisions. ​ A frankly depressing piece of research shows that 60% of managers use AI to make decisions about their direct reports, including determining compensation, layoffs and terminations. Two thirds of those using AI had not received any formal training, and more than one in five let AI make final decisions without human review. This is idiotic and won’t end well. I look forward to the inevitable lawsuits and managers’ embarrassing prompt histories surfacing in discovery. ​ Finally, on a lighter note, an example of a ghost in the AI machine. One of my friends asked ChatGPT to recommend speakers for a literary event and got the response screenshotted below. Carmen Callil would indeed have been an excellent speaker but for being dead—though apparently her spirit is influential…

11 July 2025 | Read More

Context Window 34

Happy Friday—though I’m writing this the day before on the way back from a great day at the Publishers Licensing Services Conference in London. The agenda and other speakers were superb, offering plenty of food for thought. It was also great to meet Helen King, whose PubTech Radar newsletter I’ve really enjoyed recently (do sign up for it!) Connecting with Helen via Bluesky commentary on the conference felt nostalgically like the Twitter backchannel at publishing events in the early 2010s. Thanks to PLS and the IPG for the invitation to speak. ​ My presentation at the conference was on AI threats and opportunities for publishers, which I structured as fifteen observations from the last two and a half years (slides here if you’re interested). ​ Given the scale of the issues, many presentations took a big-picture perspective, so as a change of focus I particularly enjoyed a more tactical session on principles for successful AI licensing with subscriber Clare Hodder of Rightszone and Adele Parker of Taylor and Francis. Their best practices included being clear on what purpose content is being licensed for: foundational training, fine tuning and model evaluation, or reference/retrieval augmented generation. Good advice for all of us. ​ One of the points that I referenced in my presentation was an open letter to the publishers of America which appeared on LitHub this week. It’s worth reading in full as a barometer of author sentiment, though, as ever, sentiment isn’t monolithic. The letter itself was a mix of unarguable points such as transparency, a primary focus on Big New York publishing (culturally it felt a little bit like the View of the World from 9th Avenue), and a set of demands that would be hard, perhaps impossible, for any publisher to acquiesce to. I came across it through Richard Charkin, who did a good job highlighting its limitations here. I waded in here. ​ In my response to Richard, I mentioned this recent piece by Steven Johnson on how he uses NotebookLM in his research as an example of author opinion being varied. For transparency, Johnson collaborated with Google on developing NotebookLM, so he’s partly talking his own book. But it’s an interesting and hopeful vision that he sets out. ​ Significant news this week: Cloudflare, which hosts about 20% of the web, announced that it would block AI crawlers by default unless publishers are compensated (it had previously provided blocking as an opt-in). Cloudflare will monetise through a pay-per-crawl system, creating a new revenue stream for publishers. Lots of news publishers have signed up; fewer from books, but O’Reilly Media, often a bellwether for new tech and platform developments, is one of them. If you’re not sure about the cost/benefit analysis of this, consider this comparison: the data show that Google crawls websites about fourteen times for each referral it sends; the equivalent ratio for OpenAI is 1,700:1, and for Anthropic 73,000:1. That’s a lot of bandwidth cost for little traffic. ​ I’ve written about Vibecoding before: that is, creating code and applications using natural language prompts to LLMs. I’ve been pleasantly surprised by what I’ve been able to do, and it’s been a hit with many publishing clients who’ve developed niche applications. As an example of the genre, this video is pretty awesome: developing a complete custom app, including integration with third-party APIs, using nothing more than Slack messages as instructions. It’s not for complete beginners, but anyone branching out from basic scripts and macros will find plenty of inspiration. ​ There are further questions about editorial processes at a major scientific publisher after multiple hallucinated citations in a computer science textbook. If not quite as egregious as prompt artefacts in published text, it’s still a really bad look. ​ I’ve featured research on the use of AI in education recently; to give a qualitative perspective alongside the statistics, I recommend this first person piece in the Guardian. My eldest daughter is in the same disrupted cohort as the author, and I recognise many of the issues raised from recent teaching experience. It’s deeply relevant to educational publishers, lecturers, parents and citizens. ​ Tim Harford’s column in the FT this week looks at employment impacts from AI, based on new MIT research. Most jobs are collections of tasks, some of which can be performed by AI. The big question is which tasks it takes. If it takes away the core task, the worker is left with lower status and compensation. On the other hand, if takes away routine tasks, the worker is left with more time for higher value activities.

04 July 2025 | Read More

Context Window 33

It’s been a really significant week for legal developments: while the newsletter has more of a copyright focus than usual, the courtroom updates are balanced with some really interesting technical developments from Creative Commons, Anthropic and others (skip down if you’re less interested in the legalities). It points to the fact that, however long a road to a settled legal and licensing position, there are immediate practical uses for AI in publishing. ​ There were separate rulings in Bartz v. Anthropic and Kadrey v. Meta that the use of copyrighted material for AI training is, in certain circumstances, Fair Use. Regarding training, Judge William Alsup described the author position as “no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works. This is not the kind of competitive or creative displacement that concerns the Copyright Act”. However, despite that apparent similarity, the rulings differed in important aspects and there is enough nuance for both sides of the issue to claim positive aspects from them. There’s an excellent side-by-side comparison of the judgements here. ​ While finding that the ultimate use of copyrighted material may be Fair Use, Bartz v. Anthropic also suggested that the initial collection of books for training represented copyright infringement, and there will be a trial to assess damages for this, which could be considerable. If what this establishes is a precedent that training is Fair Use but that material should be legally obtained, commercial and practical arguments for both publishers and developers point strongly towards a collective licensing model of the sort being pursued by PLS, CLA and ALCS. ​ Intrinsic to these debates is the question of whether AI training is transformative, and whether the replication of verbatim sections of books is an aberrant behaviour or a common issue. In that context, new research showed a Meta model reproducing over 40% of the text of the first Harry Potter, but highly inconsistent results across different models and books. There’s a great explainer here. ​ Meanwhile, a completely new author lawsuit was filed against Microsoft. ​ In a parallel development, photo library Getty dropped some aspects of its UK litigation against Stability AI, though litigation continues on other aspects, and in the US. But it emphasises the importance for plaintiffs of pursuing precise, winnable arguments rather than broad claims. ​ Meanwhile, taking a different approach to content and rights, Creative Commons announced a new project, CC Signals, to allow publishers to communicate their preferences on how data is reused by AI developers. This feels more aspirational than necessarily enforceable, but for Open Access publishers in particular, it’s an interesting development. ​ Changing gear, on a super practical level, the in-built AI functions within Google Workspace are getting more and more useful. There’s a great and very practical set of examples of AI functions in spreadsheets here. ​ Anthropic released new features allowing users to create AI-powered apps (“interactive artefacts”) using Claude. For tasks that require structured interaction, like workflows, data analysis or working with content, this would offer publishers simple creation and greater control of user experience. I could see this being really useful for creating simple internal apps for publishers, as well as reader-facing experiences. ​ This kind of app is particularly interesting to me having read this piece by ProPublica’s Ben Werdmuller arguing that AI features should be developed further down the stack, in models, browsers or operating systems: “Publisher websites and apps are not destinations in themselves and no amount of AI will make them so… My proposal is this: you should consider what’s actually the most useful experience for the user, rather than what furthers your own interests, and make a bet on that, instead.” ​ Finally, not specifically related to publishing, but for general interest, games have often been one of the ways that we understand progress with AI: think of Garry Kasparov’s chess matches against IBM Deep Blue in the late nineties, or Go champion Lee Sedol losing to AlphaGo in 2016. Chess and Go are interesting use cases because while they are highly complex, the moves available each turn are clearly bounded by rules. So I was really interested to read a series of articles on using LLMs to play the classic boardgame Diplomacy, which has a significantly more complex set of interactions based on negotiation and deceit (it was famously the favourite game of public figures, including JFK and Henry Kissinger). AI media company Every pitted a series of LLMs against one another, finding that they displayed some of the worst of human behaviour. And they also wrote a detailed guide to how they achieved it.

27 June 2025 | Read More

Context Window 32

This edition covers Amazon CEO Andy Jassy’s bullish AI update and what billions of agents mean for supplier teams, Turing Institute/LEGO research on how children and teachers are using generative AI, a PLS consultation on licensing for AI training, new image-generation tools from Midjourney and Adobe Firefly, ChatGPT’s new Record mode on Mac, Reddit’s Community Intelligence product, a New Yorker piece on what AI is doing to reading, and the resignation of UK PM AI Adviser Matt Clifford.

20 June 2025 | Read More

Context Window 31

It’s Friday 13th. Unlucky for AI image platform Midjourney, which is about to find out why “don’t mess with The Mouse” (or an earthier paraphrase) is a popular aphorism in media law. Luckier for publishers, with some powerful new tools this week: in particular, Google’s Deep Research could be a game changer for production of ancillary content. ​ James Butcher’s always-excellent Journalogy newsletter looks at the documents from the recent Springer Nature AGM, including the section on executive compensation. The company’s Chief Publishing Officer achieved 150% of target for “market performance and usage of AI in publishing processes”; the Chief Operating Officer also hit their goal for “editor satisfaction and implementation of new AI tools.” James makes the point that “what gets measured gets done.” The corollary is that what isn’t measured and incentivised will struggle to get traction. In my experience, too many businesses expect AI—and digital innovation more broadly—to be an extra on top of the day job. But unless it’s built into performance targets and compensation—measured against real business outcomes, as at Springer—it risks being just another talking shop. Are you building AI experiments into your goals and incentives? ​ Bad news for AI image generator Midjourney, which was hit this week by a lawsuit from Disney and Universal. It’s hard to imagine litigants with deeper pockets or greater determination to protect their IP, and the filing highlights some pretty hubristic actions by Midjourney (including failing to respond to prior legal correspondence). On the other hand, if a Fair Use defence prevails in this case, it’s likely to hold more broadly. ​ Google released its own Deep Research tool in Gemini, which is able to research online information and files uploaded by users. The most useful aspect is the ability to create new media based on the research: with a single click, it produces infographics, audio overviews and quizzes. I’ve been really impressed by the results. For educational and non-fiction publishers, it could streamline creation of ancillary content from manuscripts. ​ Thanks to subscriber Alex Boden for recommending Prompt Maker: a custom GPT that takes basic prompts and develops them into more sophisticated, structured instructions. Worth trying if you’re having issues with a particular task or are looking to level up your prompting. ​ OpenAI released its latest model, o3-pro, which it recommends for uses where reliability matters more than speed. The benchmark performance is hugely impressive, as is an 80% reduction in cost for the existing o3 model, suggesting deeper economies of scale than many expected. ​ In parallel with the new release, Sam Altman published a new essay on the trajectory for AI. The general tone is big picture boosterism: it’s crying out for footnotes and data sources. But it contains one specific claim that caught my eye: that a single ChatGPT query uses only ~0.34 watt-hours of energy—that is, roughly ten times more efficient than Andy Masley’s recent, already revisionist estimate. On the face of it, this is really positive news, and may explain why significant price reductions are possible, though if the picture is truly better than expected, there’s no reason AI companies shouldn’t publish detailed environmental reports. ​ If you’re looking for an alternative to o3, French AI company Mistral released its first reasoning model, Magistral. In a competitive market, its positioning is interesting, highlighting traceability and audibility of its reasoning as making it particularly suitable for regulated industries. These compliance features could make it particularly attractive for scientific, medical, and legal publishers working with sensitive content. ​ Publisher Keith Riegert did a superb presentation at the US Book Show last week on how he is using AI in his business. He shares some highly practical advice on what to try right now, and predicts a New York Times bestseller from an agentic AI publisher by 2030. My bet’s sooner—2027. What’s your over/under? ​ Finally, the first step to figuring out where we’ll be in two years is establishing where we are right now. My friends at BISG are running research on that topic: if you are a North American publisher, please take ten minutes to complete their survey.

13 June 2025 | Read More

Context Window 30

Thirty weeks of doing this—thanks for sticking with me. Someone asked this week if this newsletter will be going paid, and to be really clear: NO. I enjoy researching and writing it, and it’s a wonderful way to start conversations. However, if it’s something that you find useful, please do share it with your colleagues. Personal referrals make a huge difference to me, and every one is genuinely appreciated. ​ A couple of data points on how publishing is using AI. The Bookseller published the first results from its recent survey of publishing workers. This showed a high level of concern about AI with 58% concerned and only 18% optimistic (interestingly almost an exact reversal of proportions compared to a recent survey on Spanish-language publishing). The Bookseller research also showed 20% of respondents using AI on a daily basis, compared to roughly 10% in the general economy. However, as I highlighted on Bluesky, a 100-person, self-selecting sample raises real questions about representativeness. Best to treat this as directional rather than definitive. ​ One of my issues with the Bookseller data is that it doesn’t break down responses by publishing area or segment. It’s long been my experience that academic publishers and university presses are ahead of their trade counterparts, and this fabulous compilation of 167 academic publishing AI use cases by Helen King confirms that. From metadata to accessibility to peer review, it’s a great benchmark for any academic publisher. ​ Between 1995-2019, Mary Meeker’s Internet Trends report was an annual fixture for analysts and investors. It relaunched this week in a new format: 340 pages of research and data on how AI is being used across the economy. ​ OpenAI announced a slew of new business features for ChatGPT, including integrations with Sharepoint, Teams, Outlook, Gmail, Google Drive and Dropbox, and the ability to record meetings. All of which are highly useful for publishers looking to bring their own data into their AI model’s context window. The documentation is available here. But this level of integration with core systems will require customer trust in data security and control, and on that front OpenAI hit a setback… ​ As part of the ongoing New York Times copyright litigation, OpenAI has been ordered by a judge to preserve logs and data, including those deleted by customers or in temporary, private chats. In addition to chat interactions, this also affects third party services using OpenAI’s API, which may now conflict with some platforms’ data deletion guarantees and customer instructions. Does your business have retention policy/guidelines for AI, and are any of your third party agreements affected by this? ​ If you want maximal control of your data, one option is to run LLMs locally on your computer rather than using online services. I actually spoke to a UK independent publisher recently that was taking that approach, though it isn’t one for the unskilled or faint hearted. If a hardcore approach appeals though, there’s a tutorial and link to resources here. ​ Interesting data here which shows that in recent weeks, Reddit has jumped up to become the second largest source of cited data on ChatGPT, behind only Wikipedia (I can’t quite bring myself to describe it as a source of truth, as the original article does). If you want to influence the model output, shape the training data… ​ Reddit’s data licensing deal with OpenAI may also explain the latest piece of litigation: Reddit is suing Anthropic for unauthorised use of its data. ​ I spoke at a webinar for the Open Institutional Publishing Association this week, which in a wide-ranging discussion addressed the interesting question of how AI training and content sits alongside copyright and Creative Commons licenses. I’m really grateful to fellow panelist Jane Secker for sharing this link to CC resources on the subject. ​ For UK readers, Politico has a good roundup of how the Data (Use and Access) bill has become a legislative pain for the government, but in time possibly also for creators—it suggests that having taken rigid positions, some rightsholder groups may struggle to sell an eventual compromise to their stakeholders… ​ For context though, any UK or other sovereign regulation of AI is likely to attract the attention of the US government, which this week [rebadged the US AI Safety Institute as the Center for AI Standards and Innovation: its remit includes “guard[ing] against burdensome and unnecessary regulation of American technologies by foreign governments”]16. ​ Finally, the most thought provoking thing I read this week was this piece by a developer on how AI is useful and why his smart, thoughtful friends are wrong about its utility. I think it’s interesting to read across from this to professional writing. AI generated text will never compete with great writing (the author John Scalzi made this point this week). But just as this piece argues that some code isn’t meant to be beautiful, a lot of writing is not based on great craft or artisanship, but speed and availability. Maybe AI has little to offer great writers. But a lot of writing is done by people who don’t find it easy, for a whole variety of reasons. Yes, there are profound questions about copyright and training data. But the framing of AI as a thieving stochastic parrot doesn’t fully capture the upside for a lot of users. Back full circle to one of the respondents quoted in that Bookseller research—”If you’ve never experienced what it’s like to constantly feel your brain ‘has let you down’ in a neurotypical world can’t understand how impactful AI is. I personally think it’s life changing for people who are neurodivergent.”

06 June 2025 | Read More

Context Window 29

Maybe there will be a quiet week for AI and publishing. If so, we haven’t seen it yet. Across a range of stories this week, the clear theme is that while AI might automate the output, the value lies in the inputs: good content, sound metadata, thoughtful contracts and sound human judgement. ​ OpenAI has published a guide to identifying and scaling AI use cases. It’s a fantastically practical document that I’m surprised isn’t being promoted more, and the underlying methodology works regardless of what AI model you’re using. It’s not publishing specific and needs a layer of industry experience for full utility—just as well for consultants like me—but I’ll be using it as a conversation starter with all my clients. ​ Accessibility expert Simon Mellins, whose podcast I recently appeared on, shared this excellent case study on using AI to write alt text for images, and which models do it best—something every publisher should be thinking about with EU Accessibility Act deadline next month. ​ Amazon’s latest AI innovation is short audio summaries of product features and customer reviews. These are rolling out now on selected US store listings, presumably headed for wider use over time. There seems to be little opportunity to directly shape those summaries, but publishers can influence the inputs to them through good product metadata and encouraging helpful customer reviews (e.g. through Vine)—hopefully areas you are already active in. ​ Bridging ecommerce and news, the New York Times announced a major content licensing deal with Amazon AI platforms. This includes news coverage and model training, but for Amazon a lot of the value will be in the Times’s content from specialist verticals such as Cookery and product review site Wirecutter. ​ This year the Pulitzer Prize required entrants to disclose any use of AI, and Nieman Lab has an interesting review of how newsrooms used it, across one winner and three finalists. It’s interesting that the focus here is very much on more traditional data science applications, though there are some good generative AI use cases as well. ​ Oxford University Press is the latest publisher to use AI for manuscript assessment and editorial workflow, signing a deal with Hum for its Alchemist Review product. Like similar trade-focused tools such as my friends at Storywise, the emphasis is on productivity and decision-support, not taking decisions away from editors. ​ Nature has a fun interactive feature on AI ethics and how it should be used for academic writing and peer review: take the quiz and find out how your views compare with the panel of 5,000 researchers that Nature asked about the issue. ​ AI platform Poe has published some useful trend data on which models its customers are selecting across key categories such as reasoning, text, image and video generation—useful if you’re currently considering a choice of model, as the data indicate where each major platform is strongest. ​ The latest legal skirmish between publishers and AI developers saw a California federal judge opine that Anthropic’s initial copying of books for training data was a violation of copyright law, but that subsequent use of that material was Fair Use. This underlines the importance for rights holders of successfully demonstrating harm from copyright infringement, something that courts have not been completely convinced by (see previous coverage of Universal Music v. Anthropic and Raw Story Media v. OpenAI). ​ “AI first puts humans first”: digital publishing OG Tim O’Reilly has a thoughtful essay repudiating the view he sees from many in Silicon Valley of AI as an opportunity to put people out of work. His argument is that companies that use AI to cut costs will be outcompeted by those that use it to expand human capabilities (unstated, but the implication is that companies that don’t use AI at all are in danger of being outcompeted by both camps). It’s a valuable point of view given his long experience and the fact that he relates the real opportunity to examples from his publishing business. These are zero-to-one use cases such as translation and ancillary content (it’s also really interesting that he notes that O’Reilly pays authors royalties on AI derived products such as quizzes, summaries and audio).

30 May 2025 | Read More

Context Window 28

This edition covers the US House passing a tax bill with a ten-year moratorium on state-level AI regulation, OpenAI’s continued lobbying for Fair Use, Anthropic’s new Claude 4 hybrid models, the announcements from Google I/O including SynthID Detector, Shopify’s AI enhancements and MCP integration, Anil Dash on Model Context Protocol as a Web 2.0 moment, a new paper on AI in higher education, the Chicago Sun Times’s hallucinated book list, and Steven Bartlett’s fully AI-written-and-voiced podcast.

23 May 2025 | Read More