Churro

🤗 Model • 🗂️ Dataset • 📄 Paper

📚 Docs • 🏆 Leaderboard •

Churro is the fastest way to turn hard-to-read historical scans into reliable text. It gives researchers, libraries, archives, and product teams a unified OCR toolkit for handwritten and printed sources, combining high accuracy, low operating cost, and a clean Python API and CLI workflow.

Supported OCR Models and Backends

We provide first-party support for Churro VLM, the best OCR model for historical documents.

Churro also includes built-in profiles, templates, and post-processing for many other models and integrations, including:

Hosted vision-language models, including Gemini, GPT, Claude, and more, through LiteLLM integration
OpenAI-compatible servers, including vLLM, Ollama, TGI, and more
Azure Document Intelligence
Mistral OCR
Chandra OCR
DeepSeek OCR
Dots OCR
MinerU
Infinity Parser
PaddleOCR VL
LFM VL

Quick Start

Python 3.12+ and uv are required.

uv tool install churro-ocr
churro-ocr install hf
churro-ocr transcribe --image scan.png --backend hf --model stanford-oval/churro-3B

For more in-depth information, see the Getting Started guide.

Churro Model and Dataset

Churro 3B VLM exceeds the accuracy of Gemini 2.5 Pro at 15.5x lower cost.
Churro-DS dataset contains ~100K pages from 155 historical collections spanning 22 centuries and 46 language clusters.

_{Cost vs. accuracy: Churro (3B) achieves higher accuracy than much larger commercial and open-weight VLMs while being substantially cheaper.}

The following are pages from the CHURRO dev set, randomly picked from the subset where Churro outperforms Gemini 2.5 Pro on the main metric, Normalized Levenshtein Similarity (NLS).

Arabic _handwriting _{Churro 93.9 vs Gemini 92.3 NLS}	Bangla _print _{Churro 91.4 vs Gemini 84.3 NLS}	Bulgarian _print _{Churro 99.8 vs Gemini 99.2 NLS}	Catalan _handwriting _{Churro 95.2 vs Gemini 94.1 NLS}	Chinese _handwriting _{Churro 100.0 vs Gemini 95.0 NLS}
Czech _print _{Churro 95.9 vs Gemini 95.3 NLS}	Dutch _print _{Churro 98.7 vs Gemini 98.0 NLS}	English _handwriting _{Churro 99.8 vs Gemini 99.4 NLS}	Finnish _print _{Churro 99.6 vs Gemini 99.5 NLS}	French _handwriting _{Churro 92.6 vs Gemini 90.6 NLS}
German _handwriting _{Churro 81.4 vs Gemini 45.6 NLS}	Greek _handwriting _{Churro 81.2 vs Gemini 71.2 NLS}	Hebrew _handwriting _{Churro 90.0 vs Gemini 21.6 NLS}	Hindi _print _{Churro 98.1 vs Gemini 88.0 NLS}	Italian _handwriting _{Churro 93.3 vs Gemini 88.3 NLS}
Japanese _handwriting _{Churro 68.9 vs Gemini 13.8 NLS}	Khmer _handwriting _{Churro 27.7 vs Gemini 23.3 NLS}	Latin _handwriting _{Churro 75.1 vs Gemini 58.3 NLS}	Norwegian _handwriting _{Churro 69.7 vs Gemini 65.3 NLS}	Persian _handwriting _{Churro 77.6 vs Gemini 74.6 NLS}
Polish _print _{Churro 84.4 vs Gemini 0.0 NLS}	Portuguese _handwriting _{Churro 52.0 vs Gemini 51.6 NLS}	Romanian _print _{Churro 90.9 vs Gemini 45.7 NLS}	Sanskrit _print _{Churro 97.5 vs Gemini 97.0 NLS}	Slovenian _print _{Churro 98.7 vs Gemini 98.5 NLS}
Spanish _print _{Churro 97.9 vs Gemini 78.5 NLS}	Swedish _handwriting _{Churro 87.1 vs Gemini 85.1 NLS}	Turkish _handwriting _{Churro 74.1 vs Gemini 42.9 NLS}	Vietnamese _handwriting _{Churro 87.6 vs Gemini 86.0 NLS}

Citation

If you use CHURRO or CHURRO-DS, please cite:

@inproceedings{semnani2025churro,
	title        = {{CHURRO}: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition},
	author       = {Semnani, Sina J. and Zhang, Han and He, Xinyan and Tekg{"u}rler, Merve and Lam, Monica S.},
	booktitle    = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025)},
	year         = {2025}
}

License

Code: Apache 2.0
Model weights: Qwen research license
Dataset: research use only because of the underlying source licenses

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src/churro_ocr		src/churro_ocr
static		static
tests		tests
tooling		tooling
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
benchmark_results.json		benchmark_results.json
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
ruff.toml		ruff.toml
ty.toml		ty.toml
workdir		workdir

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Churro

Supported OCR Models and Backends

Quick Start

Churro Model and Dataset

Citation

License

About

Uh oh!

Releases 2

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Churro

Supported OCR Models and Backends

Quick Start

Churro Model and Dataset

Citation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages