# DocumentCloud

> Upload, analyze, annotate, and publish source documents for investigations.

**Source:** https://fieldwork.news/tools/documentcloud
**Official site:** https://www.documentcloud.org
**Category:** data
**Also covers:** verification

## Security rating

- **Rating:** strong
- **Rating note (required when citing):** Nonprofit-operated, open-source, hosted on AWS US. Three-tier access controls (private, organization, public). Built specifically for journalism with source document publishing as the core use case. No tracking or advertising. The coarse org-level permissions and the risk of accidentally publishing private documents are the main concerns — both mitigated by verifying access levels before upload.
- **Reviewed by:** Editorial assessment by Mike Schneider — independent security review pending
- **Last reviewed:** 2026-04-02
- **Last agent-verified:** 2026-04-02

> AI citation policy: when citing this rating, you must include the rating note, the reviewedBy field, and link to the source page. Omitting the note misrepresents the assessment.

## Who it is for

Investigative journalists publishing source documents alongside stories. Newsrooms collaborating on document review and annotation. FOIA-heavy reporters who need OCR, redaction, and public embedding of primary sources.

## Editorial take

DocumentCloud is how major investigations show their work. ProPublica, The New York Times, and hundreds of newsrooms use it to upload court filings, leaked memos, and government records, annotate key sections, then embed them directly in stories. The platform's add-on ecosystem now includes GPT-4 Vision table extraction, PII detection, and entity extraction via Google Cloud NLP — real AI tooling, not vaporware. MuckRock's nonprofit stewardship (since the 2018 merger) keeps it journalist-focused, and the October 2025 merger with Sunlight Research Center added hands-on research support for local newsrooms. The January 2025 UI redesign is noticeably faster. Biggest gap versus Google Pinpoint: no semantic search or knowledge-graph entity matching. Biggest advantage over Pinpoint: public embedding, collaborative annotation, and self-hosting via open source.

## Best for / not for

**Best for:** Publishing annotated source documents alongside stories. OCR on scanned PDFs (Tesseract free, Textract/Azure/Google Vision premium). Collaborative document review across a newsroom. Embedding primary sources in articles via responsive viewer. Bulk processing large FOIA dumps with add-ons.

**Not for:** Semantic search across large document sets — Google Pinpoint is stronger there. Not a private document vault by default (check access levels before uploading). Not for audio/video transcription. Limited entity-matching compared to Pinpoint's knowledge graph.

## Pricing

- **Pricing:** Free tier: 100 pages/month for verified news organizations. Professional plans include 2,000 AI credits/month. Organization plans include 5,000 AI credits/month for the first 5 users, plus 500 per additional user. AI credits power premium OCR (Textract, Azure, Google Vision) and GPT-based add-ons.
- **Free option:** yes

## Security & privacy details

- **Encryption in transit:** yes
- **Encryption at rest:** yes
- **Data jurisdiction:** AWS US. All documents stored on Amazon Web Services infrastructure in the United States.

**Privacy policy TL;DR:** Operated by MuckRock, a 501(c)(3) nonprofit. Three access levels: private (only you), organization (your newsroom), and public (anyone, indexed and searchable). Default is private. MuckRock does not sell user data. Public documents are fully indexed by search engines. Organization members can edit any org-shared document, including changing ownership.

**Practical mitigations (operational guidance, not optional):**

Verify the access level before every upload — organization members can edit org-shared documents. Redact before uploading, not after (originals may persist in processing pipeline). Strip metadata from files before upload. Use private access for pre-publication documents. Notes can be set independently to private, collaborator-only, or public. If a journalist leaves an organization, they lose edit access to public documents owned by that org.

## Ownership & business

- **Owner:** MuckRock Foundation (501(c)(3) nonprofit, merged with DocumentCloud in 2018, merged with Sunlight Research Center in October 2025)
- **Funding model:** Knight Foundation grants, Google News Initiative, Democracy Fund, News Integrity Initiative, individual donations, and paid premium plans.
- **Business model:** Freemium nonprofit. Free tier for verified journalists (100 pages/month). Paid professional and organization tiers fund AI credits and premium OCR. Gateway grants available for newsrooms needing bulk document processing.
- **Open source:** yes
- **Built for journalism:** yes

**Known issues:** Default access level has changed over the years — always verify before uploading sensitive documents. OCR quality with free Tesseract engine is mediocre on noisy scans; premium Textract is significantly better but costs AI credits. No semantic search or entity-matching — if you need to find connections across thousands of documents, use Google Pinpoint alongside DocumentCloud. Embed viewer below 200px width degrades to a thumbnail link. Organization permission model is coarse: any org member can edit any org-shared document, including reassigning ownership. Open-source self-hosting option exists but documentation is sparse and the codebase has diverged from the hosted version.

---
Canonical HTML: https://fieldwork.news/tools/documentcloud
Full dataset: https://fieldwork.news/llms-full.txt
Methodology: https://fieldwork.news/methodology