Tools for Data Journalism

Published April 2026 · Last updated April 2026

Data journalism has five stages: acquire, clean, analyze, visualize, publish. Every stage has free, open-source tools good enough for Pulitzer-level work. This guide covers the best tool for each stage and how they fit together.

The data journalism pipeline

Every data story follows the same path. You get data (FOIA requests, public databases, scraped records). You clean it (fix misspellings, standardize formats, remove duplicates). You analyze it (sort, filter, calculate, find patterns). You visualize it (charts, maps, interactive graphics). You publish it.

Most stories don't need all five stages. A simple bar chart from a clean government dataset skips cleaning and analysis entirely. But knowing the full pipeline means you can handle anything from a single spreadsheet to a multi-million-row database.

Acquire: getting the data

Google Sheets

Google Sheets is where most data journalism starts. It handles imports from CSV, TSV, and Excel files. It connects directly to Google Forms for collecting data. For datasets under 50,000 rows, it's fast enough for sorting, filtering, and basic analysis without any setup.

Google Sheets also has IMPORTHTML, IMPORTXML, and IMPORTDATA functions that pull live data from web pages. These are simple scrapers that work without code.

Tabula

Tabula extracts tables from PDFs. Government agencies publish budget data, inspection records, and campaign finance reports as PDFs. Tabula converts those tables into usable CSV files. It's open-source, runs locally, and handles most structured PDF tables reliably.

Clean: fixing messy data

OpenRefine

OpenRefine is the best tool for cleaning dirty data. It handles the problems spreadsheets can't: inconsistent naming ("New York" vs "NEW YORK" vs "N.Y."), duplicate records, and messy formatting across thousands of rows. Its cluster-and-edit feature groups similar values and lets you standardize them in bulk.

OpenRefine runs locally in your browser. Your data never leaves your machine. It processes millions of rows and keeps a full history of every transformation, so you can undo any step. For journalists working with public records, this is essential.

Analyze: finding the story

Google Sheets (for simple analysis)

For datasets under 50,000 rows, Google Sheets handles pivot tables, VLOOKUP, and basic statistical functions. Most newsroom data stories don't need more than this. Sort by column, filter by condition, calculate percentages. The story is often in the ranking.

Google Colab and Jupyter Notebooks

Google Colab and Jupyter Notebooks are for analysis that outgrows spreadsheets. They run Python code in cells, mixing code with narrative explanations. The pandas library handles millions of rows. You can merge datasets, run statistical tests, and build models.

Google Colab runs in the browser with no installation. It provides free GPU access for machine learning tasks. Jupyter Notebooks run locally and give you full control over your environment. Both produce reproducible analysis — every step is documented in code.

Visualize: making it clear

Datawrapper

Datawrapper is used by the New York Times, Washington Post, and Reuters. It produces clean, responsive charts that work on mobile. Paste data from a spreadsheet, choose a chart type, customize labels and colors. Published charts are accessible and load fast.

The free tier covers most newsroom needs. Datawrapper handles bar charts, line charts, scatter plots, election maps, and locator maps. No coding required.

Flourish

Flourish specializes in animated and interactive visualizations. Its Flourish Stories feature lets you build scrollable narratives that transition between charts. Race bar charts, animated maps, and interactive survey results are where Flourish outperforms Datawrapper.

Tableau Public

Tableau Public is the free version of Tableau. It handles complex dashboards with multiple linked views — click a bar in one chart and every other chart filters to match. For exploratory analysis and multi-variable stories, Tableau Public is more powerful than Datawrapper or Flourish.

The tradeoff: all visualizations on Tableau Public are public. Don't use it for unpublished investigations. The desktop app runs on Windows and Mac.

RAW Graphs

RAW Graphs fills the gap between spreadsheets and design tools. It produces unusual chart types — alluvial diagrams, bump charts, circle packing — that Datawrapper doesn't offer. Data stays in your browser. Export as SVG for further editing in Illustrator or Figma.

Mapping tools

QGIS

QGIS is a full geographic information system. It handles shapefiles, satellite imagery, spatial joins, and custom projections. Newsrooms use it for environmental investigations, redistricting analysis, and any story where geography is central to the finding.

QGIS has a steep learning curve. But for stories like mapping pollution near schools, overlaying flood zones with income data, or analyzing gerrymandered districts, nothing else comes close at this price (free, open-source).

MapShaper

MapShaper simplifies and converts geographic data files. It reduces file sizes for web maps, converts between GeoJSON and Shapefile formats, and merges geographic boundaries. It runs in the browser and handles most format conversions in seconds.

Start here if you're new to data journalism

Beginner recommendation

Start with Google Sheets and Datawrapper. Together they cover 80% of data journalism tasks: import data, sort and filter, calculate, then publish a chart. No code, no installation, no cost.

When you hit a data cleaning problem Sheets can't handle, add OpenRefine. When you need code-level analysis, try Google Colab. When you need maps, start with Datawrapper's built-in maps before moving to QGIS.

Recommended learning path

  1. Google Sheets — pivot tables, VLOOKUP, basic formulas. Takes a weekend.
  2. Datawrapper — paste data, make a chart. Takes an hour.
  3. OpenRefine — cluster and clean. Takes a few hours.
  4. Tabula — extract PDF tables. Takes 10 minutes.
  5. Google Colab + Python — pandas basics. Takes a few weeks.
  6. QGIS — spatial analysis. Takes a dedicated course or tutorial series.