Future Tools: Some thoughts on the future of CAR

Printer-friendly version

I had the privilege of speaking on a panel with Sarah Cohen and Steve Doig last week in Baltimore about the future of computer-assisted reporting. Whoever thought I even belonged in the same room as those two gave me way more credit than I deserved.

But in preparing for that panel, I got to thinking: What skills and software tools are we going to be using in 10 years? What skills should we start learning now if we want to be prepared for the future? Or better yet: What types of problems in newsgathering and investigations could technology best help solve?

Sarah can probably answer this question better than I could. In her new role as a Knight Chair at Duke University, her job will be – as she describes it – to find ways to reduce the costs of high-quality investigative journalism using technology. And if you think about it, the way newsrooms are shrinking, that idea provides a solid framework for the skills and technologies we should invest our time in learning.

Think of the problems we’ve already addressed: Databases make us more efficient by letting us search large numbers of records almost instantly. Programming makes us more efficient by eliminating repetitive tasks. But now we’re confronted with new problems: Compensating for lost institutional knowledge, for example. Or culling through thousands of pages of documents quickly. Or finding obscure trends in large datasets without the luxury of time.

One answer to all three of those problems: data-mining and text-mining.

If you haven’t heard me extol the virtues of data-mining before, here’s a brief overview: By applying techniques like cluster analysis, regression and modeling, along with exploratory techniques like Benford’s Law, we can find patterns that not even the most hardened and experienced community observers would see.

One example: Not long ago, the University of Missouri teamed up with an organization called CommunityKnowledgebase, LLC, which is headed by University of Wisconsin professor Lew Friedland. Software developed by the company mined the archives of the Columbia Missourian and discovered a pattern of land acquisition outside of Columbia that appeared to be coordinated by several major developers over a period of years. Not even the Missourian’s experienced city editors realized the land grab was happening, but the software picked it up based on patterns over a vast landscape of seemingly unrelated documents and data. That story was relayed to me by Missourian Executive Editor for Innovation Tom Warhover last year, so I might have butchered the details, but you get the idea.

Another example: A small team consisting of myself, Brant Houston, ProPublica’s Jennifer LaFleur, David Donald of the Center for Public Integrity and IRE’s own Jaimi Dowdell have been conducting an experiment using software developed for the digital humanities to spot trends in SEC documents. The software, known as Meandre, simplifies the process of text-mining by giving it an intuitive interface designed for academics.

Mining structured data can yield similar gains. Earlier this year, I used the cluster analysis features of the SPSS software package to find undiscovered trends in a database of state hunting and fishing licenses. Open source statistical packages like R have equally useful data-mining modules, which can help us find patterns without even knowing what we’re looking for.

It sounds like science fiction, but with even a basic grounding in computer programming and statistics, it’s all within reach. That’s the first step in preparing for the future of CAR. We don’t need to invent the tools, but we must know how to use them.

Making sense of the flood of data we get from government agencies is critical, and emerging technology will increasingly have to bridge the gaps of expertise and time that we are losing as the business continues its free-fall.

As Google’s Jonathan Rosenberg wrote earlier this year: “When every business has free and ubiquitous data, the ability to understand it and extract value from it becomes the complimentary scarce factor.”

I say ditto for newsrooms.

Advertise in Uplink

IRE logo

The National Institute for Computer-Assisted Reporting is a joint program of
Investigative Reporters and Editors, Inc., and the Missouri School of Journalism.

141 Neff Annex, Missouri School of Journalism, Columbia MO, 65211, Tel. 573-882-2042, Fax 573-884-5544

All Rights Reserved