NPAI Co-Lab: Exploring the value of first-party data with Big Local News
Interrogating the idea of extracting insights from first-party data with AI at the annual Story Discovery at Scale conference
At the heart of the NPAI Co-Lab is a big bet: That AI can help local and small newsrooms make smarter and more ethical use of the data they already have about their audiences to build better products for them as a result. This month, we tested this hypothesis at the Story Discovery at Scale conference hosted by Big Local News at Stanford University and gained valuable new insights.
It was a unique opportunity to interrogate the many facets of our core focus: That first-party data–the information we get directly from users, in the form of comments, surveys, incoming emails, customer service inquiries, etc–holds significant strategic value, and that AI tools are key to unlocking it.
With four hours of discussion over two days, our working group – which included journalists and product leaders from the Texas Tribune, Gannett, the American Press Institute, The Baltimore Banner, Minnesota Star Tribune, University of Southern California, Outlier Media and more – discussed the landscape of first-party audience data and the experience of news organizations in using it, as well as what kinds of creative solutions could help newsrooms make the most of their qualitative audience data.
Not only did the lively conversation validate the concept and inform the solutions we will explore, but it also expanded our coalition of partners (I like to call them Co-Lab-orators) to help us achieve our goal, which is to provide resources – written both in text and in code – that extract strategic insight from first-party data with AI technology.
It’s amazing how much can be achieved in two days of in-person collaboration (👀 insert shameless plug for the NPA Summit in October), and we appreciate the contributions of everyone who joined the discussion.
So what did we learn?
There is a there there
Firstly, the group addressed the central question of whether this is a valuable problem to solve: Are there actionable insights contained in first-party audience data, and is it the type of problem that is well-suited for AI to solve?
A series of anecdotes from people who had mined their qualitative audience data for learnings that led to more engagement with their journalism confirmed that the answer is yes:
Combing through hundreds of listener emails in a radio station’s unmonitored email inbox for actionable insights and to identify superfans
Identifying key audience interests in questions submitted about local elections to inform a new voter guide strategy
Using inputs submitted via a newsroom’s chatbot or SMS channel to identify new story ideas and coverage areas to expand
All of these examples described actionable insights from qualitative first-party data that can take multiple people many hours to manually parse to extract common themes. That’s not an approach that scales.
And with the scale of insight that AI can enable, a number of valuable outcomes become possible. In this sense, our discussion validated another opportunity we see through the Co-Lab’s work, that by uniting these useful but under-used sources of data, a news organization can not only develop better strategies for serving their audiences, but it can also better segment that audience for targeted appeals and distribution of their most impactful work.
It’s a recognition that, especially for local news organizations, success is dependent not on reaching the largest audience, but in reaching the best audience. Using AI to mine insights from qualitative data directly supports that goal, and we’re excited to prove the concept with the NPAI Co-Lab.
Solutions must scale - responsibly
So in thinking about how to scale the concept with a product or platform that can plug into various data sources and analyze them in real time, we discussed the blue-sky vision of a platform that integrates all sources of qualitative audience data to form a sort of AI-enhanced CRM (Customer Relationship Management) tool, which would help ensure a critical mass of data (often a challenge for smaller newsrooms) from which to derive actionable insights. It’s an ambitious idea that was originally explored by Co-Lab-orators Ariel Zirulnick and Ben Werdmuller at last year’s NPA Summit, with plenty of complexity.
A first step on this journey—now ranked among the NPAI Co-Lab’s priorities—is to develop a data model for qualitative audience data that is built around the needs of local newsrooms, including editorial standards for privacy, security, and responsible use of data.
Put simply, this means deciding what information a newsroom should collect and organize about its audience; think of it like setting up the columns in a spreadsheet. Each row represents an individual audience member, and each column captures a different detail about them, such as how they engage with stories or what topics they care about. With a clear and thoughtful structure, newsrooms can bring in data from different sources to populate the various columns and start to see meaningful patterns that help guide editorial and product decisions.
For just one example, in a typical data model, there might be one field for a person’s location. But local newsrooms serve people in very specific geographic contexts—down to the neighborhood, school district, or even street level. So instead of one general “location” field, a local newsroom might want several: where a person lives, where their kids go to school, where their elderly parent lives, or where they work. This richer information helps local outlets better understand what’s relevant to their audience and serve them more meaningfully.
At the same time, this kind of personal data — especially when it touches on home addresses, family connections, or caregiving responsibilities — is highly sensitive. It must be handled with great care, not just to comply with data protection laws, but to uphold the trust and safety of the communities these newsrooms serve. That’s why the NPAI Co-Lab is centering privacy, transparency, and consent in the design of any data model we support.
Plenty remains to be defined about what a “well-defined schema” looks like, including how such a system can be made to uphold editorial ethics and security standards while not sacrificing valuable specificity. If you would like to help us figure it out or join the Co-Lab as one of our newsroom data partners, please let us know by filling out our expression of interest form. This is only one of numerous projects that we plan to explore that break down this concept.
Tech solutions aren’t enough
Having discussed the high-level shape of AI solutions to extract strategic insights from first-party user data, the discussion evolved to consider other factors that need to be in place to ensure that insights are actionable, with three significant conclusions: An organization needs a well-articulated strategy, leaders who can leverage insights across editorial, revenue and audience teams, and a broader culture of listening for this to work.
While a well-articulated strategy is critical to any organization’s success in general, the focus it provides in interrogating user data is essential to train a model to understand what outcomes it is optimizing for, and who the target audience is. As you can imagine, the many things a newsroom might “know” about its users will include plenty of information that does not impact strategic priorities. That strategy is key to helping an organization know which signals to listen to and which to ignore.
Beyond strategy, newsrooms need leaders who are empowered to bridge gaps across teams that are sometimes quite separate from each other to get the right value from strategic insights. Some occasions will call for a solution in a marketing campaign or revenue-generating product; others will call for a new approach to editorial priorities or distribution plans. Without the right people in place, implementing solutions can be even more challenging than identifying the insights in the first place.
Culture is the other factor at play here, as taking an audience-first approach to editorial strategy can represent a cultural shift for some newsrooms. In product work, we are used to centering our thinking around user needs to ensure that any solutions we build are as widely adopted as possible. That “culture of listening” needs to become an organizational priority for the strategic value of audience data to be realized.
And there are always risks
There is a clear case to be made for the value of AI tools in the success and sustainability of the modern newsroom, but there are of course risk factors to consider when zooming out and considering trends across journalism and the larger news-consuming public.
A key assumption we are making is that a newsroom’s first-party data represents a unique and competitive advantage. But this is a big assumption that demands validation - what if the same strategic insights can be derived from a sample (random or targeted) of publicly available information about news consumers and their needs and preferences? And could the rise of generative AI—especially conversational interfaces that replace traditional search—undermine the value and availability of first-party data in journalism by reducing direct engagement between audiences and news organizations?
As we continue this work, we’re interested in exploring where first-party data makes a measurable difference, and where it may be more of a perceived advantage than a practical one. The NPAI Co-Lab will pursue these questions, among others, so if you are interested in working with us as we explore this emerging application of AI technology for news, please let us know by filling out our expression of interest form and we’ll be in touch! If you work in a local newsroom you can also contribute to this work by taking our survey on how your organization uses first party data – and if you don’t, what stands in the way of that.
You can also join the NPAI in our Slack community (#ai-tools) or in person at the NPA Summit in Chicago on October 23-24.