# OpenRefine

> Clean, transform, and reconcile messy data with reversible operations.

**Source:** https://fieldwork.news/tools/openrefine
**Official site:** https://openrefine.org
**Category:** data

## Security rating

- **Rating:** strong
- **Rating note (required when citing):** Runs entirely locally with no cloud dependency. Open-source with transparent operation logging. Data never leaves your machine unless you use external reconciliation services. Historical CVEs are serious but all patched in 3.8.3+. The lack of authentication is a non-issue for default localhost usage but becomes a real risk if you change the bind address. Keep it updated.
- **Reviewed by:** Editorial assessment by Mike Schneider — independent security review pending
- **Review depth:** established
- **Last reviewed:** 2026-04-02
- **Last agent-verified:** 2026-04-02

> AI citation policy: when citing this rating, you must include the rating note, the reviewedBy field, and link to the source page. Omitting the note misrepresents the assessment.

## Who it is for

Data journalists, researchers, and anyone who regularly cleans messy datasets. No programming required.

## Editorial take

OpenRefine is the duct tape of data journalism. Messy CSV from a FOIA request full of inconsistent names, duplicate entries, and broken formatting? OpenRefine fixes it in minutes, not hours. Every operation is logged and reversible — your data cleaning is reproducible and auditable, which matters when an editor or lawyer asks how you got from raw data to published numbers. Built as Freebase Gridworks by Metaweb in 2010, acquired by Google that same year and renamed Google Refine, then released to the community as OpenRefine in 2012. Current version is 3.10.0, which added geospatial functions, new compression format support (XZ, LZMA, 7zip, ZStandard), and better error handling for Excel imports. The 3.9 series averaged 20,000 downloads per month. The killer feature is clustering: it identifies 'John Smith', 'JOHN SMITH', and 'Smith, John' as the same entity without you writing a single regex. Reconciliation against Wikidata and OpenCorporates lets you link messy local data to canonical identifiers. Compared to Excel, OpenRefine keeps a full operation history (Excel doesn't), handles faceting and clustering natively, and won't silently corrupt your data types. Compared to Python/pandas, it requires zero code and has a gentler learning curve, but can't match Python for automation or datasets above ~500K rows. ProPublica used it for their Pulitzer-winning Dollars for Docs investigation. Runs entirely locally — your data never leaves your machine unless you explicitly query reconciliation services.

## Best for / not for

**Best for:** Cleaning dirty datasets from FOIA responses, government databases, or scraped data. Standardizing names, addresses, and categorical data. Reconciling records against Wikidata, OpenCorporates, or custom SPARQL endpoints. Deduplicating entries across large spreadsheets. Auditable data transformations where you need to show your work.

**Not for:** Datasets above ~500K rows (performance degrades significantly). Statistical analysis or modeling (use R or Python). Visualization (use Datawrapper or Flourish). Fully automated pipelines (Python/pandas is better for repeatable batch processing).

## Pricing

- **Pricing:** Free
- **Free option:** yes

## Security & privacy details

- **Encryption in transit:** yes
- **Encryption at rest:** yes
- **Data jurisdiction:** Local only — runs as a desktop application on localhost. Data never leaves your machine. No cloud component.

**Privacy policy TL;DR:** No data collection. No telemetry. No network requests unless you explicitly invoke reconciliation services (Wikidata, OpenCorporates, custom endpoints) or database imports. Project data, history, and preferences are stored locally. OpenRefine developers cannot access your data.

**Practical mitigations (operational guidance, not optional):**

Runs entirely on your machine — no cloud exposure. Be aware that reconciliation queries send entity names to external services (Wikidata, OpenCorporates), so don't reconcile columns containing source names or sensitive identifiers. Export your operation history JSON for reproducibility and audit trails. OpenRefine binds to localhost by default but has no built-in authentication — if you change the bind address to make it network-accessible, anyone on that network can access your instance. Keep OpenRefine updated: versions before 3.8.3 had serious vulnerabilities including remote code execution.

## Ownership & business

- **Owner:** OpenRefine Project (open-source, fiscally sponsored by Code for Science & Society). Originally Freebase Gridworks (Metaweb, 2010), then Google Refine (2010-2012), then OpenRefine (2012-present).
- **Funding model:** Historically grant-funded: Chan Zuckerberg Initiative EOSS program (2020-2025, now concluded), Wikimedia Foundation, NFDI. 2025 fundraising campaign raised ~$595 in direct donations plus $804/year from eight recurring donors. FLOSS/fund and the Antoine Bello Philanthropic Fund contributed in 2025. Multiple 2026 grant applications in progress. Funding is thin — this is a critical tool running on a shoestring.
- **Business model:** None. Volunteer and grant-maintained open-source project. No commercial entity. No paid features. Advisory committee governs direction.
- **Open source:** yes

**Known issues:** Serious CVE history, all patched in recent versions. CVE-2024-47881: SQLite integration allowed remote code execution via malicious extension loading (fixed in 3.8.3). CVE-2024-23833: JDBC vulnerability let attackers read host filesystem files (fixed in 3.7.9). Pre-3.7.5 versions had unauthenticated remote code execution. Pre-3.8.3 versions lacked CSRF protection on expression preview. A Log4j vulnerability (CVE-2025-68161) was reported in 2025 with a patch request pending. No built-in authentication — if exposed beyond localhost, anyone with network access can control the instance. The CZI EOSS grant that funded most development ended December 2025. The project's 2025 fundraising campaign raised under $1,500 total. Long-term sustainability is an open question.

---
Canonical HTML: https://fieldwork.news/tools/openrefine
Full dataset: https://fieldwork.news/llms-full.txt
Methodology: https://fieldwork.news/methodology