17 lines (11 loc) · 347 Bytes

Performance Tuning Guide

Optimize VLM Inference Server for your workload.

Benchmarks

See README Performance section for current benchmarks.

Tuning

Performance optimization guide - coming soon.

Topics to be covered:

GPU vs CPU tradeoffs
Batch size optimization
Memory management
Caching strategies