Overview
Open-source document clustering and visualization for large investigative sets. Self-host only — the hosted service is gone.
What should journalists know about Overview?
Overview was a breakthrough when it launched. Jonathan Stray built it at the AP with Knight Foundation funding to solve a real problem: you get 10,000 FOIA pages and need to find the story. Overview clusters documents by topic similarity and visualizes the relationships, so you can spot patterns without reading every page. AP reporter Jack Gillum used it to sift 9,000 pages of Paul Ryan documents. The clustering algorithm remains genuinely useful for surfacing structure in unstructured document sets. But the project has been effectively abandoned. The hosted service at overviewdocs.com is gone — it redirects to the self-hosting repo. The blog is down. The help site has expired TLS certificates. The last formal release on GitHub was May 2014. Stray moved on to UC Berkeley's Center for Human-Compatible AI, where he works on recommender systems. For most journalists today, Google Pinpoint does what Overview did — document analysis, entity extraction, search across large sets — with zero setup, active development, and better OCR. Overview still works if you self-host it, and the clustering visualization has no direct equivalent in Pinpoint. But you need Docker skills and a tolerance for unmaintained software.
Topic clustering across thousands of documents. Finding structure in large FOIA responses or leaked archives. Visualizing document relationships. Self-hosted document analysis where you control the infrastructure.
Anyone who wants a hosted service — it no longer exists. Non-technical journalists — use Google Pinpoint instead. Publishing documents publicly — use DocumentCloud. Small document sets — just read them.
Security & Privacy
Data is scrambled while being sent to their servers
Data is scrambled when stored on their servers
Where servers are located — affects which governments can request your data
Privacy policy summary
No privacy policy applies — the hosted service is gone. Self-hosted Overview stores all data locally on your own infrastructure. No data leaves your servers. This is actually the strongest possible privacy posture for document analysis, provided you secure your own setup.
How to protect yourself:
Self-host on your own infrastructure for complete data control. The Docker setup via overview-local requires at least 3GB RAM. Enable SSL through the built-in configuration options. Back up your PostgreSQL database and blob storage regularly. Be aware this is unmaintained software — do not expose it to the public internet without additional hardening.
Open-source and self-hostable, which is good for data sovereignty. But the software is unmaintained — no security patches since at least 2020 (copyright range 2011-2020). Running unmaintained server software with document upload capabilities is a real risk. The Scala/Play framework and PostgreSQL stack may have unpatched vulnerabilities. Only run on isolated infrastructure, never internet-facing without additional security layers.
Who Owns This
Known issues
Hosted service at overviewdocs.com shut down and redirects to self-hosting repo. Blog (blog.overviewdocs.com) is down. Help site (help.overviewdocs.com) has invalid TLS certificates. Last formal GitHub release was May 2014. Codebase is Scala/CoffeeScript — a dated stack that limits community contributions. Creator Jonathan Stray no longer works on the project. Google Pinpoint now covers most of the same use cases with zero setup cost. The clustering visualization — Overview's unique strength — has no direct replacement, but the project is functionally unmaintained.
Pricing
Free and open-source. Self-hosting costs are your own infrastructure.
This is an editorial assessment based on publicly available information as of 2026-04-02, using our published methodology. Independent security review is pending. Security posture can change at any time. This is not a guarantee of safety.
Something wrong or outdated? Report it.