Releases · openvinotoolkit/openvino.genai

07 Apr 16:44

as-suvorov

2026.1.0.0

1dabb8c

2026.1.0.0 Latest

Latest

What's Changed

Skip nodejs tests on mac by @as-suvorov in #3227
Bump json5 from 0.12.1 to 0.13.0 by @dependabot[bot] in #3169
Bump product version 2026.1.0 by @akladiev in #3234
[NPU] Support NPUW for text-embedding models by @mengweiguo in #3088
Implement Adaptive R-KV mode for cache eviction by @vshampor in #2762
Bump einops from 0.8.1 to 0.8.2 in /tests/python_tests by @dependabot[bot] in #3240
Bump einops from 0.8.1 to 0.8.2 in /samples by @dependabot[bot] in #3237
[CI] WWB and LLM Bench for Video Generation CI by @sgonorov in #3228
[Coverity]: Word level timestamps dtw overflow by @as-suvorov in #3241
[wwb] Update requirements for miniCPM-o by @sbalandi in #3230
Text2Video sample fix by @sgonorov in #3243
Bump xgrammar version by @pavel-esir in #3221
Add close stale PR workflow by @as-suvorov in #3229
Enable mark and close stale PRs workflow by @as-suvorov in #3247
Skip NPU cache dir tests for LLM pipeline by @as-suvorov in #3251
Bump peft from 0.17.1 to 0.18.1 in /samples by @dependabot[bot] in #3238
Bump peft from 0.17.1 to 0.18.1 in /tests/python_tests by @dependabot[bot] in #3239
Add copilot_instructions.md by @as-suvorov in #3246
[VLM][NPU] Enable F16IC for gemma3 by @AlexanderKalistratov in #3256
Bump timm from 1.0.22 to 1.0.24 by @dependabot[bot] in #3262
Fix coverity issues by @sgonorov in #3252
Bump opencv-python from 4.12.0.88 to 4.13.0.90 by @dependabot[bot] in #3261
Bump timm from 1.0.22 to 1.0.24 in /tests/python_tests by @dependabot[bot] in #3265
[JS] Refactor LLMPipeline by @Retribution98 in #3206
[Docs]: Add whisper word level timestamps by @as-suvorov in #3231
[WWB] Increased the number of long-prompt samples by @l-bat in #3266
Update codeowners by @as-suvorov in #3268
Bump pillow from 12.0.0 to 12.1.0 in /samples by @dependabot[bot] in #3275
[DOCS] LTX Video Generation Docs by @likholat in #3232
Extend .github/copilot-instructions.md by @Wovchena in #3281
Update pull_request_template.md by @MaximProshin in #3271
Bump diffusers from 0.35.2 to 0.36.0 by @dependabot[bot] in #3140
[wwb] Avoid install auto-gptq/autoawq by @sbalandi in #3242
Added GGUF format support for LoRA adapters by @goyaladitya05 in #3204
Extend copilot-instructions.md by @Wovchena in #3285
Bump @isaacs/brace-expansion from 5.0.0 to 5.0.1 in /.github/actions/install_wheel in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3279
Bump opencv-python from 4.13.0.90 to 4.13.0.92 in /samples by @dependabot[bot] in #3289
Bump vector-quantize-pytorch from 1.27.15 to 1.27.20 in /tests/python_tests by @dependabot[bot] in #3283
[wwb] Update reranker/embedder tests by @sbalandi in #2983
Revert rope shape and If input type changes from PR 3124 (target 2026/1) by @mitruska in #3248
[Doc] Visual Token Pruning by @peterchen-intel in #2861
Make Tensor data access const-correct by @sgonorov in #3291
[NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU by @AlexanderKalistratov in #3220
Unskip NPU cache_dir tests by @as-suvorov in #3292
Bump langchain-core from 1.2.5 to 1.2.9 in /tests/python_tests by @dependabot[bot] in #3297
Lint: whitespaces and newlines rules applied by @as-suvorov in #3295
CVS-178941: Align VLM resize pre-processing with optimum by @RyanMetcalfeInt8 in #3172
Switch tests from 8 cores 64 GB RAM Linux runner to 32 GB by @ababushk in #3099
Fix Text2VideoPipeline crash with guidance_scale by @sgonorov in #3270
Bump webpack from 5.98.0 to 5.105.0 in /site in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3302
kczarnec-add-metrics-to-memory-monitor by @krzyczar in #3164
[wwb] Fix num_inference_steps for inpainting and im2im by @sbalandi in #3299
[Whisper]: Fix word timestasmps processing perf metrics by @as-suvorov in #3294
Deprecate start_chat() / finish_chat() API in LLM pipeline, update samples by @yatarkan in #3217
[JS] Refactor tests by @Retribution98 in #3245
Update Tokenizers by @apaniukov in #3311
[wwb] Update model and add metrics for video generation simularity by @sbalandi in #3286
[wwb] Fix progress bars by @sbalandi in #3319
Bump langchain-core from 1.2.9 to 1.2.11 in /tests/python_tests by @dependabot[bot] in #3325
Make link visible by @Wovchena in #3333
Extend copilot-instructions.md by @Wovchena in #3332
Bump pillow from 12.1.0 to 12.1.1 by @dependabot[bot] in #3323
Bump qs from 6.14.1 to 6.14.2 in /site in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3342
Fix load_dataset fails by @as-suvorov in #3345
[llm_bench] Add class for lfm2-moe by @sbalandi in #3357
[wwb] Fix Qwen3-Reranker-0.6B-seq-cls with GenAI low similarity by @sbalandi in #3341
Kczarnec small fixes in memory monitor by @krzyczar in #3356
Fix OpenVINO version by @as-suvorov in #3358
Bump minimatch from 10.1.1 to 10.2.1 in /.github/actions/install_wheel in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3360
Update LongBench tests by @l-bat in #3351
[wwb][llm-bench] Fix version of required dependency in requirements.txt by @sbalandi in #3314
Fix Text2VideoPipeline crash with guidance_scale=1 when using default constructor by @sgonorov in #3316
Remove flux-1-dev-non-commercial-license by @Wovchena in #3350
Bump bitsandbytes from 0.49.1 to 0.49.2 in /tools/llm_bench by @dependabot[bot] in #3374
Update scipy requirement from <=1.17.0,>=1.3.2 to >=1.3.2,<=1.17.1 in /tools/who_what_benchmark by @dependabot[bot] in #3375
Bump ajv from 6.12.6 to 6.14.0 in /.github/actions/install_wheel in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3370
Extend copilot-instructions.md by @Wovchena in #3377
Use OV latest_available_commit...

Contributors

pavel-esir, ababushk, and 30 other contributors

Assets 2

25 Feb 13:46

as-suvorov

2026.0.0.0

dab5b99

2026.0.0.0

What's Changed

Add CONTRIBUTING.md by @Wovchena in #2903
[JS] Support ChatHistory in .generate() by @Retribution98 in #2896
Bump product version 2026.0 by @akladiev in #2970
Support QWen VL video inputs by @xipingyan in #2514
Correct usage of ov::Tensor::data for 26.0 release by @praasz in #2934
[WWB] Apply qwen3 rerank template for GenAI pipeline by @as-suvorov in #2969
Bump pydantic from 2.12.3 to 2.12.4 in /samples by @dependabot[bot] in #2974
Bump timm from 1.0.21 to 1.0.22 in /tests/python_tests by @dependabot[bot] in #2965
Bump timm from 1.0.21 to 1.0.22 in /samples by @dependabot[bot] in #2967
Bump langchain-core from 1.0.1 to 1.0.3 in /tests/python_tests by @dependabot[bot] in #2960
Update use tensor data member by @praasz in #2978
Update condition for enum value conflicting with windows headers by @yatarkan in #2979
[Docs] Add Structured Output Docs by @apaniukov in #2950
[JS] Fix structured output test by @Retribution98 in #2951
[Docs] Remove LoRA column for not supported use cases, remove duplicated documents & assets by @yatarkan in #2981
[WWB] Add eagle3 pipeline by @sunxiaoxia2022 in #2812
Correct video_pad tag by @peterchen-intel in #2977
[JS] Update docs for js package by @Retribution98 in #2968
Pre-commit test mark removal by @sgonorov in #2921
Bump pytest from 8.4.2 to 9.0.0 in /tests/python_tests by @dependabot[bot] in #2989
Bump langchain-core from 1.0.3 to 1.0.4 in /tests/python_tests by @dependabot[bot] in #2990
Bump actions/dependency-review-action from 4.8.1 to 4.8.2 by @dependabot[bot] in #2995
Bump optimum-intel from 1.26.0 to 1.26.1 in /tests/python_tests by @dependabot[bot] in #2992
[tools] ReadME update and fixes by @sbalandi in #2819
[DOCS] Add TextRerankPipeline to readme.md by @as-suvorov in #2976
Bump optimum-intel[nncf] from 1.26.0 to 1.26.1 by @dependabot[bot] in #2993
[wwb] Fix from_onnx for text gen by @sbalandi in #2959
[Whisper] Rely on timestamps parsing to detect end of chunk by @as-suvorov in #2996
Bump langchain-community from 0.4 to 0.4.1 in /tests/python_tests by @dependabot[bot] in #2930
Fix PerfMetrics collection for beam search scenario by @yatarkan in #2943
Pin OpenVINO runtime commit to avoid circular dependency issues by @CuriousPanCake in #3003
Fix compile error for Tensor::data() by @praasz in #3009
Optimize CPU image preprocessing of Phi3 vision by @jade-cho in #2999
[ImageGeneration] Use device for AutoencoderKL in sample by @as-suvorov in #3020
Bump js-yaml from 4.1.0 to 4.1.1 in /src/js in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3022
Text-to-speech use case page by @tsavina in #3017
Bump langchain-core from 1.0.4 to 1.0.5 in /tests/python_tests by @dependabot[bot] in #3023
[Docs] Reformat README, remove duplicated sections by @yatarkan in #3018
Bump js-yaml from 3.14.1 to 3.14.2 in /site in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3027
Remove usage of deprecated functions by @olpipi in #3011
Bump actions/checkout from 5.0.0 to 5.0.1 by @dependabot[bot] in #3029
[GHA] Fix samples dependabot duplicated PRs by @as-suvorov in #3031
Bump glob from 11.0.1 to 11.1.0 in /.github/actions/install_wheel in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3033
Add XAttention documentation by @l-bat in #3010
Update pybind11 to 3.0.1 version by @as-suvorov in #3044
Complete GenAI Logger by @yangwang201911 in #2879
Bump Tokenizers by @as-suvorov in #3048
Fix for deadlock in python callback by @sgonorov in #3034
Bump langchain-core from 1.0.5 to 1.0.7 in /tests/python_tests in the pip group across 1 directory by @dependabot[bot] in #3054
Bump pytest from 9.0.0 to 9.0.1 in /tests/python_tests by @dependabot[bot] in #3006
Base LLM pipeline testing refactoring and optimization by @sgonorov in #2779
[NPU] Add batch_size support for embedding model by @mengweiguo in #2986
Const-related fixes on tesor creation by @sgonorov in #3026
Fix const in phi3 qwen and minicpm by @sgonorov in #3072
Bump node-forge from 1.3.1 to 1.3.2 in /site in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3079
Bump pydantic from 2.12.4 to 2.12.5 by @dependabot[bot] in #3080
Move SDPAToPA-related functionality to runtime by @CuriousPanCake in #2937
Kcz/master/support for video in benchmark by @krzyczar in #3019
[JS] Update tokenizer methods by @Retribution98 in #3012
Fix for deadlock in python callback by @sgonorov in #3073
Bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /site in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3091
Add MileBench validation for VLMs by @l-bat in #2358
[JS] Bump js package up to 2026.0.0 and update dependecies by @Retribution98 in #3090
llm_bench: use built in sum by @Wovchena in #3096
UTF8 validate and replace invalid in GGUF output decoding by @mzegla in #3062
Use specialized Windows runners by @ababushk in #3098
[JS] Remove deprecated generate behavior. by @Retribution98 in #3095
Support json config from file or string by @peterchen-intel in #2912
Add forceRunPrecommitScope param to Jenkinsfile by @akladiev in #3107
[wwb] Fix crash on exit() due to loading def dataset for inpainting in streaming mode by @sbalandi in #3103
Minimal pre-commit implementation by @sgonorov in #3008
Use ov::Model for phi3.5-vision image preprocessing by @jade-cho in #3063
[GPU] Use GPU for OV preprocessing of phi4mm by @jade-cho in #3100
NPU Whisper switch to use stateful pipeline by default by @eshiryae in #3085
[Docs] Add optimum-cli export command with quantization for Whisper use case page by @yatarkan in #3105
[wwb] Add possibility to check video inputs for VLM by @sbalandi in #3074
Fix pre-commit Darker deps and pyi igno...

Contributors

pavel-esir, zhaohb, and 38 other contributors

Assets 2

18 Dec 09:26

Wovchena

2025.4.1.0

fc59365

2025.4.1.0

What's Changed

Bump product version 2025.4.1 by @akladiev in #3093
I2I Python deadlock tests and fixes by @sgonorov in #3082

Full Changelog: 2025.4.0.0...2025.4.1.0

Contributors

akladiev and sgonorov

Assets 2

03 Dec 06:05

Wovchena

2025.4.0.0

5041b1d

2025.4.0.0

What's Changed

Bump product version 2025.4 by @akladiev in #2620
Fix CLEANUP_CACHE by @Wovchena in #2617
[wwb] Add tests for mac/win to ci by @sbalandi in #2603
xfail embed by @Wovchena in #2618
Bump py-build-cmake from 0.4.3 to 0.5.0 by @dependabot[bot] in #2624
Bump actions/upload-pages-artifact from 3.0.1 to 4.0.0 by @dependabot[bot] in #2625
Bump optimum-intel[nncf] from 1.25.1 to 1.25.2 by @dependabot[bot] in #2613
Bump optimum-intel from 1.25.1 to 1.25.2 in /tests/python_tests by @dependabot[bot] in #2614
[GGUF] Fix Q4_1 accuracy by @wine99 in #2563
[llm bench] Add support of arcee model by @sbalandi in #2636
Warn about older transformers by @Wovchena in #2634
Limited max GPU KV-cache considering max allocatable GPU memory size by @popovaan in #2633
Bump actions/dependency-review-action from 4.7.1 to 4.7.2 by @dependabot[bot] in #2639
[llm_bench] Override max_length to preserve max_new_tokens by @Wovchena in #2641
Cache images by @Wovchena in #2629
Test logging by @Wovchena in #2621
[CMAKE] Fix samples installation by @mryzhov in #2649
[JS] Add prettier and align eslint by @almilosz in #2631
[WWB] Remove use_flash_attention_2 argument for phi4mm by @nikita-savelyevv in #2653
Implement text embedding pipeline shape fix by @as-suvorov in #2449
[CMAKE] Solve pybind targets conflict by @mryzhov in #2655
[GHA] Disable Cacheopt tests on mac by @mryzhov in #2663
Fix Coverity by @Wovchena in #2601
Bump peft from 0.17.0 to 0.17.1 in /samples by @dependabot[bot] in #2658
Bump peft from 0.17.0 to 0.17.1 in /tests/python_tests by @dependabot[bot] in #2660
downgrade xgrammar version in master by @pavel-esir in #2668
Extend chat template test models by @yatarkan in #2648
[NPU]Enable chunk prefill for VLM. by @intelgaoxiong in #2657
[JS] Add build NodeJS bindings into Manylinux 2_28 by @Retribution98 in #2537
WWB empty_adapters mode by @likholat in #2671
Bump actions/dependency-review-action from 4.7.2 to 4.7.3 by @dependabot[bot] in #2674
Bump aquasecurity/trivy-action from 0.32.0 to 0.33.0 by @dependabot[bot] in #2677
Bump langchain-core from 0.3.74 to 0.3.75 in /tests/python_tests by @dependabot[bot] in #2673
Bump actions/download-artifact from 4.3.0 to 5.0.0 by @dependabot[bot] in #2676
[OV JS] Add perfMetrics grammar getters & update docstrings by @almilosz in #2681
Bump langchain-community from 0.3.27 to 0.3.29 in /tests/python_tests by @dependabot[bot] in #2680
Bump actions/checkout from 4.2.2 to 5.0.0 by @dependabot[bot] in #2685
Fix StructuredOutputConfig pybind11-subgen signatures generation by @pavel-esir in #2669
[llm_bench] Add start memory info by @sbalandi in #2686
Updating KVCrush hyperparameters by @gopikrishnajha in #2678
[Docs] Convert whisper as stateless in the quantization example by @nikita-savelyevv in #2690
print genai version by @wgzintel in #2684
Fix initializer for the sparse attention mode by @vshampor in #2689
Add docs entry about building GenAI with free threaded Python by @p-wysocki in #2679
Reduce structured output controller mutex locking scope by @mzegla in #2687
[speculative decoding] Move from ManualTimer to pure metrics by @sbalandi in #2695
[CI] [GHA] Use custom actions/download-artifact action with the fixed retries logic by @akashchi in #2692
Enable VLM generation on NPU without image input by @AlexanderKalistratov in #2694
[llm bench] Add possibility to setup cache eviction config for LLM by @sbalandi in #2693
Remove not supported rerank models from docs by @as-suvorov in #2702
Tune automatic memory allocation by @popovaan in #2697
Bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in #2705
Bump pytest from 8.4.1 to 8.4.2 in /tests/python_tests by @dependabot[bot] in #2710
[WWB] friendly error message for wrong model type by @isanghao in #2672
[CI] [GHA] Use smaller runners for image generation samples by @akashchi in #2682
Add image generation pipeline reuse into README by @JohnLeFeng in #2701
Align benchmark_vlm.py and cpp by @Wovchena in #2711
[CI] Fix NodeJS tests for manylinux by @Retribution98 in #2715
fix checking tokenizers version by @pavel-esir in #2667
Optimize qwen2vl encoder by @WeldonWangwang in #2630
Check available memory before allocating KV-cache. by @popovaan in #2683
Fixed clearing of kv-cache for GPU by @popovaan in #2717
[GHA] w/a to build ov samples by @mryzhov in #2734
[OV JS] Initial support for SchedulerConfig by @almilosz in #2696
Test LLM samples with GGUF models by @Retribution98 in #2464
[llm_bench] LLMPipeline fix negative time by @sbalandi in #2742
Bump pydantic from 2.11.7 to 2.11.9 in /samples by @dependabot[bot] in #2732
[llm bench] Add mem info on initial/compilation phase to json/csv by @sbalandi in #2741
Increase timeouts by @Wovchena in #2743
Allow additional_params for tokenizer decode in TextStreamer by @dkalinowski in #2729
Fix attention mask pass for whisper (static) by @eshiryae in #2665
Support from-onnx parameter by @sstrehlk in #2441
Use model path property for caching by @praasz in #2720
Increase GGUF timeouts by @Wovchena in #2756
[VLM] Fixed measuring of embeddings preparation. by @popovaan in #2752
Bump timm from 1.0.19 to 1.0.20 by @dependabot[bot] in #2754
[llm_bench] Fix OpenVINO config not being passed for speech-to-text and Whisper models by @aobolensk in #2763
OPT & Clean code of openvino_vision_embeddings_merger_model inputs processing by @zhaixuejun1993 in #2726
Add .github/pull_request_template.md by @Wovchena in #2765
Update transformers to 4.53.3 by @as-suvorov in https://gi...

Contributors

pavel-esir, zhaohb, and 41 other contributors

Assets 2

03 Sep 15:37

as-suvorov

2025.3.0.0

3c0e2d3

2025.3.0.0

What's Changed

Bump product version 2025.3 by @akladiev in #2255
Implement SnapKV by @vshampor in #2067
[WWB] Additional processing of native phi4mm by @nikita-savelyevv in #2276
Update ov genai version in samples by @as-suvorov in #2275
use chat templates in vlm by @eaidova in #2279
Fix 'Unsupported property' fails if set prompt_lookup to False by @sbalandi in #2240
Force the PA implementation in the llm-bench by default by @sbalandi in #2271
Update Whisper README.md as "--disable-stateful" is no longer required to export models for NPU by @luke-lin-vmc in #2249
Removed 'slices' from EncodedImage by @popovaan in #2258
support text embeddings in llm_bench by @eaidova in #2269
[wwb]: load transformers model first, then only trust_remote_code by @eaidova in #2270
[GHA] Coverity pipeline fixes by @mryzhov in #2283
[GHA][DEV] Fixed coverity path creation by @mryzhov in #2285
[GHA][DEV]Save coveity tool to cache by @mryzhov in #2286
[GHA][DEV] Set cache key for coverity tool by @mryzhov in #2288
Image generation multiconcurrency (#2190) by @dkalinowski in #2284
[GGUF] Support GGUF format for tokenizers and detokenizers by @rkazants in #2263
Unskip whisper tests & update optimum-intel by @as-suvorov in #2247
Update README.md with text-to-speech by @rkazants in #2294
add new chat template for qwen3 by @eaidova in #2297
[DOCS] Correct cmd-line for TTS conversion by @rkazants in #2303
[GHA] Enabled product manifest.yml by @mryzhov in #2281
Bump the npm_and_yarn group across 3 directories with 2 updates by @dependabot[bot] in #2309
[GHA] Save artifacts to cloud share by @akladiev in #1943
[GHA][[COVERITY] added manual trigger by @mryzhov in https://github.com//pull/2289
[GHA] Fix missing condition for Extract Artifacts step by @akladiev in #2313
[llm bench] Turn off PA backend for VLM by @sbalandi in #2312
[llm_bench] Add setting of max_num_batched_tokens for SchedulerConfig by @sbalandi in #2316
[GHA] Fix missing condition for LLM & VLM test by @sammysun0711 in #2326
[Test] Skip gguf test on MacOS due to sporadic failure by @sammysun0711 in #2328
[GGUF] support Qwen3 architecture by @TianmengChen in #2273
[llm_bench] Increase max_num_batched_tokens to the largest positive integer by @sbalandi in #2327
Bump aquasecurity/trivy-action from 0.30.0 to 0.31.0 by @dependabot[bot] in #2310
Bump actions/download-artifact from 4.1.9 to 4.3.0 by @dependabot[bot] in #2315
Fix system_message forwarding by @Wovchena in #2325
Disabled crop of the prompt for minicpmv. by @andreyanufr in #2320
[llm bench] Fix setting ATTENTION_BACKEND to plugin config in case of fallback to Optimum by @sbalandi in #2332
Bump brace-expansion from 2.0.1 to 2.0.2 in /.github/actions/install_wheel in the npm_and_yarn group across 1 directory by @dependabot[bot] in #2338
Fix multinomial sampling for PromptLookupDecoding by @sbalandi in #2331
[llm bench] Avoid using not supported by beam_search parameters for beam_search case by @sbalandi in #2336
Update Export Requirements by @apaniukov in #2342
[GGUF] Serialize Generated OV Model for Faster LLMPipeline Init by @sammysun0711 in #2218
Fixed system message in chat mode. by @popovaan in #2343
Bump librosa from 0.10.2.post1 to 0.11.0 in /samples by @dependabot[bot] in #2346
[Test][GGUF] Add DeepSeek-R1-Distill-Qwen GGUF in CI by @sammysun0711 in #2329
[llm_bench] Remove default scheduler config by @sbalandi in #2341
master: add Phi-4-multimodal-instruct by @Wovchena in #2264
Fix paths with unicode for tokenizers by @yatarkan in #2337
[WWB] Add try-except block for processor loading by @nikita-savelyevv in #2352
[WWB] Bring back eager attention implementation by default by @nikita-savelyevv in #2353
fix supported models link in TTS samples by @eaidova in #2300
StaticLLMPipeline: Add tests on caching by @smirnov-alexey in #1905
[WWB] Fix loading the tokenizer for VLMs by @l-bat in #2351
Pass Scheduler Config for VLM Pipeline in WhoWhatBenchmark. by @popovaan in #2318
Remove misspelled CMAKE_CURRENT_SOUCE_DIR by @Wovchena in #2362
Increase timeout for LLM & VLM by @Wovchena in #2359
[llm bench] Fix setting ATTENTION_BACKEND to plugin config in case of fallback to Optimum for VLM by @sbalandi in #2361
Support multi images for vlm benchmarking in samples and llm_bench by @wgzintel in #2197
CB: Hetero pipeline parallel support by @WeldonWangwang in #2227
Update conversion instructions by @adrianboguszewski in #2287
Merge stderr from failed samples by @Wovchena in #2156
Revert cache folder by @Wovchena in #2372
Update README in Node.js API by @almilosz in #2374
[Docs] Rework home page by @yatarkan in #2368
Align PromptLookupDecoding with greedy when dynamic_split_fuse works by @sbalandi in #2360
Support to collect latency for transformers V4.52.0 by @wgzintel in #2373
Bump diffusers from 0.33.1 to 0.34.0 in /samples by @dependabot[bot] in #2381
Bump diffusers from 0.33.1 to 0.34.0 in /tests/python_tests by @dependabot[bot] in #2380
Structured Output generation with XGrammar by @pavel-esir in #2295
Disable XGrammar on Android by @apaniukov in #2389
[wwb] Take prompts from different categories for def dataset for VLM by @sbalandi in #2349
Fix for cloning NPU Image Generation pipelines (#2376) by @dkalinowski in #2393
Set add_special_tokens=false for image tags in MiniCPM. by @popovaan in #2404
Fix missing use cases for inpainting models and defining use case with relative path by @sbalandi in #2387
temporary skip failing whisper tests by @pavel-esir in #2396
Fix test_vlm_npu_no_exception by @AlexanderKalistratov in #2388
Bump timm from 1.0.15 to 1.0.16 by @dependabot[bot] in #2390
Optimize VisionEncoderQwen2VL::encode by @usst...

Contributors

adrianboguszewski, pavel-esir, and 41 other contributors

Assets 2

18 Jun 13:39

Wovchena

2025.2.0.0

01f0fe1

2025.2.0.0

What's Changed

[GHA] Replaced isual_language_chat_sample-ubuntu-minicpm_v2_6 job by @mryzhov in #1909
[GHA] Replaced cpp-chat_sample-ubuntu pipeline by @mryzhov in #1913
Add support of Prompt Lookup decoding to llm bench by @sbalandi in #1917
[GHA] Introduce SDL pipeline by @mryzhov in #1924
Switch Download OpenVINO step to aks-medium-runner by @ababushk in #1889
Bump product version 2025.2 by @akladiev in #1920
[GHA] Replaced cpp-continuous-batching by @mryzhov in #1910
Update dependencies in samples by @ilya-lavrenov in #1925
phi3_v: add universal tag by @Wovchena in #1921
Fix image_id unary error by @rkazants in #1927
[Docs] Image generation use case by @yatarkan in #1877
Add perf metrics for CB VLM by @pavel-esir in #1897
Enhance the flexibility of the c streamer by @apinge in #1941
add Gemma3 LLM to supported models by @eaidova in #1942
Added GPTQ/AWQ support with HF Transformers by @AlexKoff88 in #1933
Add --static_reshape option to llm_bench, to force static reshape + compilation at pipeline creation by @RyanMetcalfeInt8 in #1851
benchmark_image_gen: Add --reshape option, and ability to specify multiple devices by @RyanMetcalfeInt8 in #1878
Revert perf regression changes by @dkalinowski in #1949
Add running greedy_causal_lm for JS to the sample tests by @Retribution98 in #1930
[Docs] Add VLM use case by @yatarkan in #1907
Added possibility to generate base text on GPU for text evaluation. by @andreyanufr in #1945
VLM: change infer to start_async/wait by @dkalinowski in #1948
[WWB]: Addressed issues with validation on Windows by @AlexKoff88 in #1953
[GHA] Remove bandit pipeline by @mryzhov in #1956
Disable MSVC debug assertions, addressing false positives in iterator checking by @apinge in #1952
[GHA] Replaced genai-tools pipeline by @mryzhov in #1954
configurable delay by @eaidova in #1963
Update cast of tensor data pointer for const tensors by @praasz in #1966
Remove tokens after EOS for draft model for speculative decoding by @sbalandi in #1951
Add testcase for chat_sample_c by @apinge in #1934
Skip warm-up iteration during llm_bench results averaging by @nikita-savelyevv in #1972
Reset pipeline cache usage statistics on each generate call by @vshampor in #1961
[Docs] Update models, rebuild on push by @yatarkan in #1922
Updated logic whether PA backend is explicitly required by @ilya-lavrenov in #1976
[GHA] [MAC] Use latest_available_commit OV artifacts by @mryzhov in #1977
[GHA] Set HF_TOKEN by @mryzhov in #1986
[GHA] Setup ov_cache by @mryzhov in #1962
[GHA] Changed cleanup runner by @mryzhov in #1995
Added mutex to methods which use blocks map. by @popovaan in #1975
Add documentation and sample on KV cache eviction by @vshampor in #1960
StaticLLMPipeline: Simplify compile_model call logic by @smirnov-alexey in #1915
Fix reshape in heterogeneous SD samples by @helena-intel in #1994
Update tokenizers by @mryzhov in #2002
docs: fix max_new_tokens option description by @tpragasa in #1987
[Docs] Add speech recognition with whisper use case by @yatarkan in #1971
Revert "VLM: change infer to start_async/wait " by @ilya-lavrenov in #2004
Revert "Revert perf regression changes" by @ilya-lavrenov in #2003
Set xfail to failing tests. by @popovaan in #2006
[GHA] Use cpack bindings in the samples tests by @mryzhov in #1979
[Docs]: add Phi3.5MoE to supported models by @eaidova in #2012
add TensorArt SD3.5 models to supported list by @eaidova in #2013
Move MiniCPM resampler to vision encoder by @popovaan in #1997
[GHA] Fix ccache on Win/Mac by @mryzhov in #2008
samples/python/text_generation/lora.py -> samples/python/text_generation/lora_greedy_causal_lm.py by @Wovchena in #2007
Whisper timestamp fix by @RyanMetcalfeInt8 in #1918
Unskip Qwen2-VL-2B-Instruct sample test by @as-suvorov in #1970
[GHA] Use developer openvino packages by @mryzhov in #2000
Added NNCF to export-requirements.txt by @ilya-lavrenov in #1974
Bump py-build-cmake from 0.4.2 to 0.4.3 by @dependabot in #2016
Use OV_CACHE for python tests by @as-suvorov in #2020
[GHA] Disable HTTP calls to the Hugging Face Hub by @mryzhov in #2021
Add python bindings to VLMPipeline for encrypted models by @olpipi in #1916
Bump the npm_and_yarn group across 1 directory with 2 updates by @dependabot in #2017
CB: auto plugin support by @ilya-lavrenov in #2034
timeout-minutes: 90 by @Wovchena in #2039
Bump diffusers from 0.32.2 to 0.33.1 by @dependabot in #2031
Bump diffusers from 0.32.2 to 0.33.1 in /samples by @dependabot in #2032
Enable cache and add cache encryption to samples by @olpipi in #1990
Fix VLM concurrency by @mzegla in #2022
Move Phi3 vision projection model to vision encoder by @popovaan in #2009
Fix spelling by @Wovchena in #2025
[Docs] Enable autogenerated samples docs by @yatarkan in #2029
Synchronize entire embeddings calculation phase (#1967) by @mzegla in #1993
Add missing finish reason set when finishing the sequence by @mzegla in #2036
Bump image-size from 1.2.0 to 1.2.1 in /site in the npm_and_yarn group across 1 directory by @dependabot in #1998
Add README for C Samples by @apinge in #2040
Use ov_cache for test_vlm_pipeline by @as-suvorov in #2042
increase timeouts by @Wovchena in #2041
[GHA] Use azure runners for python tests by @mryzhov in #1991
[WWB]: move diffusers imports closer to usage by @eaidova in #2046
[llm bench] Move calculation of memory consumption to memory_monitor tool by @sbalandi in #1...

Contributors

ilya-lavrenov, pavel-esir, and 41 other contributors

Assets 2

10 Apr 09:35

Wovchena

2025.1.0.0

e5a8bb6

2025.1.0.0

What's Changed

skip failing Chinese prompt on Win by @pavel-esir in #1573
Bump product version 2025.1 by @akladiev in #1571
Bump tokenizers submodule by @akladiev in #1575
[LLM_BENCH] relax md5 checks and allow pass cb config without use_cb by @eaidova in #1570
[VLM] Add Qwen2VL by @yatarkan in #1553
Fix links, remind about ABI by @Wovchena in #1585
Add nightly to instructions similar to requirements by @Wovchena in #1582
GHA: use nightly from 2025.1.0 by @ilya-lavrenov in #1577
NPU LLM Pipeline: Switch to STATEFUL by default by @dmatveev in #1561
Verify not empty rendered chat template by @yatarkan in #1574
[RTTI] Fix passes rtti definitions by @t-jankowski in #1588
Test add_special_tokens properly by @pavel-esir in #1586
Add indentation for llm_bench json report dumping by @nikita-savelyevv in #1584
prioretize config model type under path-based task determination by @eaidova in #1587
Replace openvino.runtime imports with openvino by @helena-intel in #1579
Add tests for Whisper static pipeline by @eshiryae in #1250
CB: removed handle_dropped() misuse by @ilya-lavrenov in #1594
Bump timm from 1.0.13 to 1.0.14 by @dependabot in #1595
Update samples readme by @olpipi in #1545
[ Speculative decoding ][ Prompt lookup ] Enable Perf Metrics for assisting pipelines by @iefode in #1599
[LLM] [NPU] StaticLLMPipeline: Export blob by @smirnov-alexey in #1601
[llm_bench] enable prompt permutations for prevent prefix caching and fix vlm image load by @eaidova in #1607
LLM: use set_output_seq_len instead of WA by @ilya-lavrenov in #1611
CB: support different number of K and V heads per layer by @ilya-lavrenov in #1610
LLM: fixed Slice / Gather of last MatMul by @ilya-lavrenov in #1616
Switch to VS 2022 by @mryzhov in #1598
Add Phi-3.5-vision-instruct and Phi-3-vision-128k-instruct by @Wovchena in #1609
Whisper pipeline: apply slice matmul by @as-suvorov in #1623
GHA: use OV master in mac.yml by @ilya-lavrenov in #1622
[Image Generation] Image2Image for FLUX by @likholat in #1621
add missed ignore_eos in generation config by @eaidova in #1625
Master increase priority for rt info to fix Phi-3.5-vision-instruct and Phi-3-vision-128k-instruct by @Wovchena in #1626
Correct model name by @wgzintel in #1624
Token rotation by @vshampor in #987
Whisper pipeline: use Sampler by @as-suvorov in #1615
Fix setting eos_token_id with kwarg by @Wovchena in #1629
Extract cacheopt E2E tests into separate test matrix field by @vshampor in #1630
[CB] Split token streaming and generation to different threads for all CB based pipelines by @iefode in #1544
Don't silence a error if a file can't be opened by @Wovchena in #1620
[CMAKE]: use different version for macOS arm64 by @ilya-lavrenov in #1632
Test invalid fields assignment raises in GenerationConfig by @Wovchena in #1633
do_sample=False for NPU in chat_sample, add NPU to README by @helena-intel in #1637
[JS] Add GenAI Node.js bindings by @vishniakov-nikolai in #1193
CB: preparation for relying on KV cache precisions from plugins by @ilya-lavrenov in #1634
[LLM bench]support providing adapter config mode by @eaidova in #1644
Automatically apply chat template in non-chat scenarios by @sbalandi in #1533
beam_search_causal_lm.cpp: delete wrong comment by @Wovchena in #1639
[WWB]: Fixed chat template usage in VLM GenAI pipeline by @AlexKoff88 in #1643
[WWB]: Fixed nano-Llava preprocessor selection by @AlexKoff88 in #1646
[WWB]: Added config to preprocessor call in VLMs by @AlexKoff88 in #1638
CB: remove DeviceConfig class by @ilya-lavrenov in #1640
[WWB]: Added initialization of nano-llava in case of Transformers model by @AlexKoff88 in #1649
WWB: simplify code around start_chat / use_template by @ilya-lavrenov in #1650
Tokenizers update by @ilya-lavrenov in #1653
DOCS: reorganized support models for image generation by @ilya-lavrenov in #1655
Fix using lm_bemch/wwb with version w/o apply_chat_template by @sbalandi in #1651
Fix Qwen2VL generation without images by @yatarkan in #1645
Parallel sampling with threadpool by @mzegla in #1252
[Coverity] Enabling coverity scan by @akazakov-github in #1657
[ CB ] Fix streaming in case of empty outputs by @iefode in #1647
Allow overriding eos_token_id by @Wovchena in #1654
CB: remove GenerationHandle:back by @ilya-lavrenov in #1662
Fix tiny-random-llava-next in VLM Pipeline by @yatarkan in #1660
[CB] Add KVHeadConfig parameters to PagedAttention's rt_info by @sshlyapn in #1666
Bump py-build-cmake from 0.3.4 to 0.4.0 by @dependabot in #1668
pin optimum version by @pavel-esir in #1675
[LLM] Enabled CB by default by @ilya-lavrenov in #1455
SAMPLER: fixed hang during destruction of ThreadPool by @ilya-lavrenov in #1681
CB: use optimized scheduler config for cases when user explicitly asked CB backend by @ilya-lavrenov in #1679
[CB] Return Block manager asserts to destructors by @iefode in #1569
phi3_v: allow images, remove unused var by @Wovchena in #1670
[Image Generation] Inpainting for FLUX by @likholat in #1685
[WWB]: Added support for SchedulerConfig in LLMPipeline by @AlexKoff88 in #1671
Add LongBench validation by @l-bat in #1220
Fix Tokenizer for several added special tokens by @pavel-esir in #1659
Unpin optimum-intel version by @ilya-lavrenov in #1680
Image generation: proper error message when encode() is used w/o encoder passed to ctor by @ilya-lavrenov in #1683
Fix excluding stop str from output for some tokenizer by @sbalandi in #1676
[VLM] Fix chat template fallback in chat mode with defined system message by @yatarkan in https://github.com/openvinotoolkit/openvino.genai/pull/...