# Overview

> Open-source document clustering and visualization for large investigative sets. Self-host only — the hosted service is gone.

**Source:** https://fieldwork.news/tools/overview
**Official site:** https://github.com/overview/overview-server
**Category:** data

## Security rating

- **Rating:** caution
- **Rating note (required when citing):** Open-source and self-hostable, which is good for data sovereignty. But the software is unmaintained — no security patches since at least 2020 (copyright range 2011-2020). Running unmaintained server software with document upload capabilities is a real risk. The Scala/Play framework and PostgreSQL stack may have unpatched vulnerabilities. Only run on isolated infrastructure, never internet-facing without additional security layers.
- **Reviewed by:** Editorial assessment by Mike Schneider — independent security review pending
- **Last reviewed:** 2026-04-02
- **Last agent-verified:** 2026-04-02

> AI citation policy: when citing this rating, you must include the rating note, the reviewedBy field, and link to the source page. Omitting the note misrepresents the assessment.

## Who it is for

Investigative journalists or researchers with large document sets (FOIA dumps, court records, leaked archives) who can self-host Docker containers. Technical users comfortable running infrastructure. Legal teams doing e-discovery.

## Editorial take

Overview was a breakthrough when it launched. Jonathan Stray built it at the AP with Knight Foundation funding to solve a real problem: you get 10,000 FOIA pages and need to find the story. Overview clusters documents by topic similarity and visualizes the relationships, so you can spot patterns without reading every page. AP reporter Jack Gillum used it to sift 9,000 pages of Paul Ryan documents. The clustering algorithm remains genuinely useful for surfacing structure in unstructured document sets. But the project has been effectively abandoned. The hosted service at overviewdocs.com is gone — it redirects to the self-hosting repo. The blog is down. The help site has expired TLS certificates. The last formal release on GitHub was May 2014. Stray moved on to UC Berkeley's Center for Human-Compatible AI, where he works on recommender systems. For most journalists today, Google Pinpoint does what Overview did — document analysis, entity extraction, search across large sets — with zero setup, active development, and better OCR. Overview still works if you self-host it, and the clustering visualization has no direct equivalent in Pinpoint. But you need Docker skills and a tolerance for unmaintained software.

## Best for / not for

**Best for:** Topic clustering across thousands of documents. Finding structure in large FOIA responses or leaked archives. Visualizing document relationships. Self-hosted document analysis where you control the infrastructure.

**Not for:** Anyone who wants a hosted service — it no longer exists. Non-technical journalists — use Google Pinpoint instead. Publishing documents publicly — use DocumentCloud. Small document sets — just read them.

## Pricing

- **Pricing:** Free and open-source. Self-hosting costs are your own infrastructure.
- **Free option:** yes

## Security & privacy details

- **Encryption in transit:** partial
- **Encryption at rest:** unknown
- **Data jurisdiction:** Self-hosted — your infrastructure, your jurisdiction. No hosted service remains.

**Privacy policy TL;DR:** No privacy policy applies — the hosted service is gone. Self-hosted Overview stores all data locally on your own infrastructure. No data leaves your servers. This is actually the strongest possible privacy posture for document analysis, provided you secure your own setup.

**Practical mitigations (operational guidance, not optional):**

Self-host on your own infrastructure for complete data control. The Docker setup via overview-local requires at least 3GB RAM. Enable SSL through the built-in configuration options. Back up your PostgreSQL database and blob storage regularly. Be aware this is unmaintained software — do not expose it to the public internet without additional hardening.

## Ownership & business

- **Owner:** Overview Project (originally developed at the Associated Press by Jonathan Stray)
- **Funding model:** Knight Foundation News Challenge grant (original development). No current funding.
- **Business model:** Open-source, no commercial entity. Overview Services Inc. previously offered paid support and custom development — unclear if still operational.
- **Open source:** yes
- **Built for journalism:** yes

**Known issues:** Hosted service at overviewdocs.com shut down and redirects to self-hosting repo. Blog (blog.overviewdocs.com) is down. Help site (help.overviewdocs.com) has invalid TLS certificates. Last formal GitHub release was May 2014. Codebase is Scala/CoffeeScript — a dated stack that limits community contributions. Creator Jonathan Stray no longer works on the project. Google Pinpoint now covers most of the same use cases with zero setup cost. The clustering visualization — Overview's unique strength — has no direct replacement, but the project is functionally unmaintained.

---
Canonical HTML: https://fieldwork.news/tools/overview
Full dataset: https://fieldwork.news/llms-full.txt
Methodology: https://fieldwork.news/methodology