Skip to content

stanford-oval/Churro

Repository files navigation

CHURRO logo Churro

🤗 Model🗂️ Dataset📄 Paper

📚 Docs🏆 Leaderboard GitHub Stars badge

Churro is the fastest way to turn hard-to-read historical scans into reliable text. It gives researchers, libraries, archives, and product teams a unified OCR toolkit for handwritten and printed sources, combining high accuracy, low operating cost, and a clean Python API and CLI workflow.

Supported OCR Models and Backends

We provide first-party support for Churro VLM, the best OCR model for historical documents.

Churro also includes built-in profiles, templates, and post-processing for many other models and integrations, including:

  • Hosted vision-language models, including Gemini, GPT, Claude, and more, through LiteLLM integration
  • OpenAI-compatible servers, including vLLM, Ollama, TGI, and more
  • Azure Document Intelligence
  • Mistral OCR
  • Chandra OCR
  • DeepSeek OCR
  • Dots OCR
  • MinerU
  • Infinity Parser
  • PaddleOCR VL
  • LFM VL

Quick Start

Python 3.12+ and uv are required.

uv tool install churro-ocr
churro-ocr install hf
churro-ocr transcribe --image scan.png --backend hf --model stanford-oval/churro-3B

For more in-depth information, see the Getting Started guide.

Churro Model and Dataset

  • Churro 3B VLM exceeds the accuracy of Gemini 2.5 Pro at 15.5x lower cost.
  • Churro-DS dataset contains ~100K pages from 155 historical collections spanning 22 centuries and 46 language clusters.

Cost vs Performance comparison showing Churro's accuracy advantage at significantly lower cost
Cost vs. accuracy: Churro (3B) achieves higher accuracy than much larger commercial and open-weight VLMs while being substantially cheaper.

The following are pages from the CHURRO dev set, randomly picked from the subset where Churro outperforms Gemini 2.5 Pro on the main metric, Normalized Levenshtein Similarity (NLS).

Arabic historical page example where Churro beats Gemini 2.5 Pro
Arabic
handwriting
Churro 93.9 vs Gemini 92.3 NLS
Bangla historical page example where Churro beats Gemini 2.5 Pro
Bangla
print
Churro 91.4 vs Gemini 84.3 NLS
Bulgarian historical page example where Churro beats Gemini 2.5 Pro
Bulgarian
print
Churro 99.8 vs Gemini 99.2 NLS
Catalan historical page example where Churro beats Gemini 2.5 Pro
Catalan
handwriting
Churro 95.2 vs Gemini 94.1 NLS
Chinese historical page example where Churro beats Gemini 2.5 Pro
Chinese
handwriting
Churro 100.0 vs Gemini 95.0 NLS
Czech historical page example where Churro beats Gemini 2.5 Pro
Czech
print
Churro 95.9 vs Gemini 95.3 NLS
Dutch historical page example where Churro beats Gemini 2.5 Pro
Dutch
print
Churro 98.7 vs Gemini 98.0 NLS
English historical page example where Churro beats Gemini 2.5 Pro
English
handwriting
Churro 99.8 vs Gemini 99.4 NLS
Finnish historical page example where Churro beats Gemini 2.5 Pro
Finnish
print
Churro 99.6 vs Gemini 99.5 NLS
French historical page example where Churro beats Gemini 2.5 Pro
French
handwriting
Churro 92.6 vs Gemini 90.6 NLS
German historical page example where Churro beats Gemini 2.5 Pro
German
handwriting
Churro 81.4 vs Gemini 45.6 NLS
Greek historical page example where Churro beats Gemini 2.5 Pro
Greek
handwriting
Churro 81.2 vs Gemini 71.2 NLS
Hebrew historical page example where Churro beats Gemini 2.5 Pro
Hebrew
handwriting
Churro 90.0 vs Gemini 21.6 NLS
Hindi historical page example where Churro beats Gemini 2.5 Pro
Hindi
print
Churro 98.1 vs Gemini 88.0 NLS
Italian historical page example where Churro beats Gemini 2.5 Pro
Italian
handwriting
Churro 93.3 vs Gemini 88.3 NLS
Japanese historical page example where Churro beats Gemini 2.5 Pro
Japanese
handwriting
Churro 68.9 vs Gemini 13.8 NLS
Khmer historical page example where Churro beats Gemini 2.5 Pro
Khmer
handwriting
Churro 27.7 vs Gemini 23.3 NLS
Latin historical page example where Churro beats Gemini 2.5 Pro
Latin
handwriting
Churro 75.1 vs Gemini 58.3 NLS
Norwegian historical page example where Churro beats Gemini 2.5 Pro
Norwegian
handwriting
Churro 69.7 vs Gemini 65.3 NLS
Persian historical page example where Churro beats Gemini 2.5 Pro
Persian
handwriting
Churro 77.6 vs Gemini 74.6 NLS
Polish historical page example where Churro beats Gemini 2.5 Pro
Polish
print
Churro 84.4 vs Gemini 0.0 NLS
Portuguese historical page example where Churro beats Gemini 2.5 Pro
Portuguese
handwriting
Churro 52.0 vs Gemini 51.6 NLS
Romanian historical page example where Churro beats Gemini 2.5 Pro
Romanian
print
Churro 90.9 vs Gemini 45.7 NLS
Sanskrit historical page example where Churro beats Gemini 2.5 Pro
Sanskrit
print
Churro 97.5 vs Gemini 97.0 NLS
Slovenian historical page example where Churro beats Gemini 2.5 Pro
Slovenian
print
Churro 98.7 vs Gemini 98.5 NLS
Spanish historical page example where Churro beats Gemini 2.5 Pro
Spanish
print
Churro 97.9 vs Gemini 78.5 NLS
Swedish historical page example where Churro beats Gemini 2.5 Pro
Swedish
handwriting
Churro 87.1 vs Gemini 85.1 NLS
Turkish historical page example where Churro beats Gemini 2.5 Pro
Turkish
handwriting
Churro 74.1 vs Gemini 42.9 NLS
Vietnamese historical page example where Churro beats Gemini 2.5 Pro
Vietnamese
handwriting
Churro 87.6 vs Gemini 86.0 NLS

Citation

If you use CHURRO or CHURRO-DS, please cite:

@inproceedings{semnani2025churro,
	title        = {{CHURRO}: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition},
	author       = {Semnani, Sina J. and Zhang, Han and He, Xinyan and Tekg{"u}rler, Merve and Lam, Monica S.},
	booktitle    = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025)},
	year         = {2025}
}

License

  • Code: Apache 2.0
  • Model weights: Qwen research license
  • Dataset: research use only because of the underlying source licenses

About

CHURRO is an OCR toolkit for historical document transcription, built to make handwritten and printed sources readable at high accuracy and lower cost.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages