Releases: openvinotoolkit/openvino.genai
Releases · openvinotoolkit/openvino.genai
2026.1.0.0
What's Changed
- Skip nodejs tests on mac by @as-suvorov in #3227
- Bump json5 from 0.12.1 to 0.13.0 by @dependabot[bot] in #3169
- Bump product version 2026.1.0 by @akladiev in #3234
- [NPU] Support NPUW for text-embedding models by @mengweiguo in #3088
- Implement Adaptive R-KV mode for cache eviction by @vshampor in #2762
- Bump einops from 0.8.1 to 0.8.2 in /tests/python_tests by @dependabot[bot] in #3240
- Bump einops from 0.8.1 to 0.8.2 in /samples by @dependabot[bot] in #3237
- [CI] WWB and LLM Bench for Video Generation CI by @sgonorov in #3228
- [Coverity]: Word level timestamps dtw overflow by @as-suvorov in #3241
- [wwb] Update requirements for miniCPM-o by @sbalandi in #3230
- Text2Video sample fix by @sgonorov in #3243
- Bump xgrammar version by @pavel-esir in #3221
- Add close stale PR workflow by @as-suvorov in #3229
- Enable mark and close stale PRs workflow by @as-suvorov in #3247
- Skip NPU cache dir tests for LLM pipeline by @as-suvorov in #3251
- Bump peft from 0.17.1 to 0.18.1 in /samples by @dependabot[bot] in #3238
- Bump peft from 0.17.1 to 0.18.1 in /tests/python_tests by @dependabot[bot] in #3239
- Add copilot_instructions.md by @as-suvorov in #3246
- [VLM][NPU] Enable F16IC for gemma3 by @AlexanderKalistratov in #3256
- Bump timm from 1.0.22 to 1.0.24 by @dependabot[bot] in #3262
- Fix coverity issues by @sgonorov in #3252
- Bump opencv-python from 4.12.0.88 to 4.13.0.90 by @dependabot[bot] in #3261
- Bump timm from 1.0.22 to 1.0.24 in /tests/python_tests by @dependabot[bot] in #3265
- [JS] Refactor LLMPipeline by @Retribution98 in #3206
- [Docs]: Add whisper word level timestamps by @as-suvorov in #3231
- [WWB] Increased the number of long-prompt samples by @l-bat in #3266
- Update codeowners by @as-suvorov in #3268
- Bump pillow from 12.0.0 to 12.1.0 in /samples by @dependabot[bot] in #3275
- [DOCS] LTX Video Generation Docs by @likholat in #3232
- Extend .github/copilot-instructions.md by @Wovchena in #3281
- Update pull_request_template.md by @MaximProshin in #3271
- Bump diffusers from 0.35.2 to 0.36.0 by @dependabot[bot] in #3140
- [wwb] Avoid install auto-gptq/autoawq by @sbalandi in #3242
- Added GGUF format support for LoRA adapters by @goyaladitya05 in #3204
- Extend copilot-instructions.md by @Wovchena in #3285
- Bump @isaacs/brace-expansion from 5.0.0 to 5.0.1 in /.github/actions/install_wheel in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3279
- Bump opencv-python from 4.13.0.90 to 4.13.0.92 in /samples by @dependabot[bot] in #3289
- Bump vector-quantize-pytorch from 1.27.15 to 1.27.20 in /tests/python_tests by @dependabot[bot] in #3283
- [wwb] Update reranker/embedder tests by @sbalandi in #2983
- Revert rope shape and If input type changes from PR 3124 (target 2026/1) by @mitruska in #3248
- [Doc] Visual Token Pruning by @peterchen-intel in #2861
- Make Tensor data access const-correct by @sgonorov in #3291
- [NPUW][VLM][AUTO] Using Auto plugin for VLMs embeddings when running on NPU by @AlexanderKalistratov in #3220
- Unskip NPU cache_dir tests by @as-suvorov in #3292
- Bump langchain-core from 1.2.5 to 1.2.9 in /tests/python_tests by @dependabot[bot] in #3297
- Lint: whitespaces and newlines rules applied by @as-suvorov in #3295
- CVS-178941: Align VLM resize pre-processing with optimum by @RyanMetcalfeInt8 in #3172
- Switch tests from 8 cores 64 GB RAM Linux runner to 32 GB by @ababushk in #3099
- Fix Text2VideoPipeline crash with guidance_scale by @sgonorov in #3270
- Bump webpack from 5.98.0 to 5.105.0 in /site in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3302
- kczarnec-add-metrics-to-memory-monitor by @krzyczar in #3164
- [wwb] Fix num_inference_steps for inpainting and im2im by @sbalandi in #3299
- [Whisper]: Fix word timestasmps processing perf metrics by @as-suvorov in #3294
- Deprecate
start_chat()/finish_chat()API in LLM pipeline, update samples by @yatarkan in #3217 - [JS] Refactor tests by @Retribution98 in #3245
- Update Tokenizers by @apaniukov in #3311
- [wwb] Update model and add metrics for video generation simularity by @sbalandi in #3286
- [wwb] Fix progress bars by @sbalandi in #3319
- Bump langchain-core from 1.2.9 to 1.2.11 in /tests/python_tests by @dependabot[bot] in #3325
- Make link visible by @Wovchena in #3333
- Extend copilot-instructions.md by @Wovchena in #3332
- Bump pillow from 12.1.0 to 12.1.1 by @dependabot[bot] in #3323
- Bump qs from 6.14.1 to 6.14.2 in /site in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3342
- Fix load_dataset fails by @as-suvorov in #3345
- [llm_bench] Add class for lfm2-moe by @sbalandi in #3357
- [wwb] Fix Qwen3-Reranker-0.6B-seq-cls with GenAI low similarity by @sbalandi in #3341
- Kczarnec small fixes in memory monitor by @krzyczar in #3356
- Fix OpenVINO version by @as-suvorov in #3358
- Bump minimatch from 10.1.1 to 10.2.1 in /.github/actions/install_wheel in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3360
- Update LongBench tests by @l-bat in #3351
- [wwb][llm-bench] Fix version of required dependency in requirements.txt by @sbalandi in #3314
- Fix Text2VideoPipeline crash with guidance_scale=1 when using default constructor by @sgonorov in #3316
- Remove flux-1-dev-non-commercial-license by @Wovchena in #3350
- Bump bitsandbytes from 0.49.1 to 0.49.2 in /tools/llm_bench by @dependabot[bot] in #3374
- Update scipy requirement from <=1.17.0,>=1.3.2 to >=1.3.2,<=1.17.1 in /tools/who_what_benchmark by @dependabot[bot] in #3375
- Bump ajv from 6.12.6 to 6.14.0 in /.github/actions/install_wheel in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3370
- Extend copilot-instructions.md by @Wovchena in #3377
- Use OV latest_available_commit...
2026.0.0.0
What's Changed
- Add CONTRIBUTING.md by @Wovchena in #2903
- [JS] Support ChatHistory in .generate() by @Retribution98 in #2896
- Bump product version 2026.0 by @akladiev in #2970
- Support QWen VL video inputs by @xipingyan in #2514
- Correct usage of ov::Tensor::data for 26.0 release by @praasz in #2934
- [WWB] Apply qwen3 rerank template for GenAI pipeline by @as-suvorov in #2969
- Bump pydantic from 2.12.3 to 2.12.4 in /samples by @dependabot[bot] in #2974
- Bump timm from 1.0.21 to 1.0.22 in /tests/python_tests by @dependabot[bot] in #2965
- Bump timm from 1.0.21 to 1.0.22 in /samples by @dependabot[bot] in #2967
- Bump langchain-core from 1.0.1 to 1.0.3 in /tests/python_tests by @dependabot[bot] in #2960
- Update use tensor data member by @praasz in #2978
- Update condition for enum value conflicting with windows headers by @yatarkan in #2979
- [Docs] Add Structured Output Docs by @apaniukov in #2950
- [JS] Fix structured output test by @Retribution98 in #2951
- [Docs] Remove LoRA column for not supported use cases, remove duplicated documents & assets by @yatarkan in #2981
- [WWB] Add eagle3 pipeline by @sunxiaoxia2022 in #2812
- Correct video_pad tag by @peterchen-intel in #2977
- [JS] Update docs for js package by @Retribution98 in #2968
- Pre-commit test mark removal by @sgonorov in #2921
- Bump pytest from 8.4.2 to 9.0.0 in /tests/python_tests by @dependabot[bot] in #2989
- Bump langchain-core from 1.0.3 to 1.0.4 in /tests/python_tests by @dependabot[bot] in #2990
- Bump actions/dependency-review-action from 4.8.1 to 4.8.2 by @dependabot[bot] in #2995
- Bump optimum-intel from 1.26.0 to 1.26.1 in /tests/python_tests by @dependabot[bot] in #2992
- [tools] ReadME update and fixes by @sbalandi in #2819
- [DOCS] Add TextRerankPipeline to readme.md by @as-suvorov in #2976
- Bump optimum-intel[nncf] from 1.26.0 to 1.26.1 by @dependabot[bot] in #2993
- [wwb] Fix from_onnx for text gen by @sbalandi in #2959
- [Whisper] Rely on timestamps parsing to detect end of chunk by @as-suvorov in #2996
- Bump langchain-community from 0.4 to 0.4.1 in /tests/python_tests by @dependabot[bot] in #2930
- Fix PerfMetrics collection for beam search scenario by @yatarkan in #2943
- Pin OpenVINO runtime commit to avoid circular dependency issues by @CuriousPanCake in #3003
- Fix compile error for Tensor::data() by @praasz in #3009
- Optimize CPU image preprocessing of Phi3 vision by @jade-cho in #2999
- [ImageGeneration] Use device for AutoencoderKL in sample by @as-suvorov in #3020
- Bump js-yaml from 4.1.0 to 4.1.1 in /src/js in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3022
- Text-to-speech use case page by @tsavina in #3017
- Bump langchain-core from 1.0.4 to 1.0.5 in /tests/python_tests by @dependabot[bot] in #3023
- [Docs] Reformat README, remove duplicated sections by @yatarkan in #3018
- Bump js-yaml from 3.14.1 to 3.14.2 in /site in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3027
- Remove usage of deprecated functions by @olpipi in #3011
- Bump actions/checkout from 5.0.0 to 5.0.1 by @dependabot[bot] in #3029
- [GHA] Fix samples dependabot duplicated PRs by @as-suvorov in #3031
- Bump glob from 11.0.1 to 11.1.0 in /.github/actions/install_wheel in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3033
- Add XAttention documentation by @l-bat in #3010
- Update pybind11 to 3.0.1 version by @as-suvorov in #3044
- Complete GenAI Logger by @yangwang201911 in #2879
- Bump Tokenizers by @as-suvorov in #3048
- Fix for deadlock in python callback by @sgonorov in #3034
- Bump langchain-core from 1.0.5 to 1.0.7 in /tests/python_tests in the pip group across 1 directory by @dependabot[bot] in #3054
- Bump pytest from 9.0.0 to 9.0.1 in /tests/python_tests by @dependabot[bot] in #3006
- Base LLM pipeline testing refactoring and optimization by @sgonorov in #2779
- [NPU] Add
batch_sizesupport for embedding model by @mengweiguo in #2986 - Const-related fixes on tesor creation by @sgonorov in #3026
- Fix const in phi3 qwen and minicpm by @sgonorov in #3072
- Bump node-forge from 1.3.1 to 1.3.2 in /site in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3079
- Bump pydantic from 2.12.4 to 2.12.5 by @dependabot[bot] in #3080
- Move SDPAToPA-related functionality to runtime by @CuriousPanCake in #2937
- Kcz/master/support for video in benchmark by @krzyczar in #3019
- [JS] Update tokenizer methods by @Retribution98 in #3012
- Fix for deadlock in python callback by @sgonorov in #3073
- Bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /site in the npm_and_yarn group across 1 directory by @dependabot[bot] in #3091
- Add MileBench validation for VLMs by @l-bat in #2358
- [JS] Bump js package up to 2026.0.0 and update dependecies by @Retribution98 in #3090
- llm_bench: use built in sum by @Wovchena in #3096
- UTF8 validate and replace invalid in GGUF output decoding by @mzegla in #3062
- Use specialized Windows runners by @ababushk in #3098
- [JS] Remove deprecated generate behavior. by @Retribution98 in #3095
- Support json config from file or string by @peterchen-intel in #2912
- Add forceRunPrecommitScope param to Jenkinsfile by @akladiev in #3107
- [wwb] Fix crash on exit() due to loading def dataset for inpainting in streaming mode by @sbalandi in #3103
- Minimal pre-commit implementation by @sgonorov in #3008
- Use ov::Model for phi3.5-vision image preprocessing by @jade-cho in #3063
- [GPU] Use GPU for OV preprocessing of phi4mm by @jade-cho in #3100
- NPU Whisper switch to use stateful pipeline by default by @eshiryae in #3085
- [Docs] Add optimum-cli export command with quantization for Whisper use case page by @yatarkan in #3105
- [wwb] Add possibility to check video inputs for VLM by @sbalandi in #3074
- Fix pre-commit Darker deps and pyi igno...
2025.4.1.0
What's Changed
- Bump product version 2025.4.1 by @akladiev in #3093
- I2I Python deadlock tests and fixes by @sgonorov in #3082
Full Changelog: 2025.4.0.0...2025.4.1.0
2025.4.0.0
What's Changed
- Bump product version 2025.4 by @akladiev in #2620
- Fix CLEANUP_CACHE by @Wovchena in #2617
- [wwb] Add tests for mac/win to ci by @sbalandi in #2603
- xfail embed by @Wovchena in #2618
- Bump py-build-cmake from 0.4.3 to 0.5.0 by @dependabot[bot] in #2624
- Bump actions/upload-pages-artifact from 3.0.1 to 4.0.0 by @dependabot[bot] in #2625
- Bump optimum-intel[nncf] from 1.25.1 to 1.25.2 by @dependabot[bot] in #2613
- Bump optimum-intel from 1.25.1 to 1.25.2 in /tests/python_tests by @dependabot[bot] in #2614
- [GGUF] Fix Q4_1 accuracy by @wine99 in #2563
- [llm bench] Add support of arcee model by @sbalandi in #2636
- Warn about older transformers by @Wovchena in #2634
- Limited max GPU KV-cache considering max allocatable GPU memory size by @popovaan in #2633
- Bump actions/dependency-review-action from 4.7.1 to 4.7.2 by @dependabot[bot] in #2639
- [llm_bench] Override max_length to preserve max_new_tokens by @Wovchena in #2641
- Cache images by @Wovchena in #2629
- Test logging by @Wovchena in #2621
- [CMAKE] Fix samples installation by @mryzhov in #2649
- [JS] Add prettier and align eslint by @almilosz in #2631
- [WWB] Remove use_flash_attention_2 argument for phi4mm by @nikita-savelyevv in #2653
- Implement text embedding pipeline shape fix by @as-suvorov in #2449
- [CMAKE] Solve pybind targets conflict by @mryzhov in #2655
- [GHA] Disable Cacheopt tests on mac by @mryzhov in #2663
- Fix Coverity by @Wovchena in #2601
- Bump peft from 0.17.0 to 0.17.1 in /samples by @dependabot[bot] in #2658
- Bump peft from 0.17.0 to 0.17.1 in /tests/python_tests by @dependabot[bot] in #2660
- downgrade xgrammar version in master by @pavel-esir in #2668
- Extend chat template test models by @yatarkan in #2648
- [NPU]Enable chunk prefill for VLM. by @intelgaoxiong in #2657
- [JS] Add build NodeJS bindings into Manylinux 2_28 by @Retribution98 in #2537
- WWB empty_adapters mode by @likholat in #2671
- Bump actions/dependency-review-action from 4.7.2 to 4.7.3 by @dependabot[bot] in #2674
- Bump aquasecurity/trivy-action from 0.32.0 to 0.33.0 by @dependabot[bot] in #2677
- Bump langchain-core from 0.3.74 to 0.3.75 in /tests/python_tests by @dependabot[bot] in #2673
- Bump actions/download-artifact from 4.3.0 to 5.0.0 by @dependabot[bot] in #2676
- [OV JS] Add perfMetrics grammar getters & update docstrings by @almilosz in #2681
- Bump langchain-community from 0.3.27 to 0.3.29 in /tests/python_tests by @dependabot[bot] in #2680
- Bump actions/checkout from 4.2.2 to 5.0.0 by @dependabot[bot] in #2685
- Fix StructuredOutputConfig pybind11-subgen signatures generation by @pavel-esir in #2669
- [llm_bench] Add start memory info by @sbalandi in #2686
- Updating KVCrush hyperparameters by @gopikrishnajha in #2678
- [Docs] Convert whisper as stateless in the quantization example by @nikita-savelyevv in #2690
- print genai version by @wgzintel in #2684
- Fix initializer for the sparse attention mode by @vshampor in #2689
- Add docs entry about building GenAI with free threaded Python by @p-wysocki in #2679
- Reduce structured output controller mutex locking scope by @mzegla in #2687
- [speculative decoding] Move from ManualTimer to pure metrics by @sbalandi in #2695
- [CI] [GHA] Use custom
actions/download-artifactaction with the fixed retries logic by @akashchi in #2692 - Enable VLM generation on NPU without image input by @AlexanderKalistratov in #2694
- [llm bench] Add possibility to setup cache eviction config for LLM by @sbalandi in #2693
- Remove not supported rerank models from docs by @as-suvorov in #2702
- Tune automatic memory allocation by @popovaan in #2697
- Bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in #2705
- Bump pytest from 8.4.1 to 8.4.2 in /tests/python_tests by @dependabot[bot] in #2710
- [WWB] friendly error message for wrong model type by @isanghao in #2672
- [CI] [GHA] Use smaller runners for image generation samples by @akashchi in #2682
- Add image generation pipeline reuse into README by @JohnLeFeng in #2701
- Align benchmark_vlm.py and cpp by @Wovchena in #2711
- [CI] Fix NodeJS tests for manylinux by @Retribution98 in #2715
- fix checking tokenizers version by @pavel-esir in #2667
- Optimize qwen2vl encoder by @WeldonWangwang in #2630
- Check available memory before allocating KV-cache. by @popovaan in #2683
- Fixed clearing of kv-cache for GPU by @popovaan in #2717
- [GHA] w/a to build ov samples by @mryzhov in #2734
- [OV JS] Initial support for SchedulerConfig by @almilosz in #2696
- Test LLM samples with GGUF models by @Retribution98 in #2464
- [llm_bench] LLMPipeline fix negative time by @sbalandi in #2742
- Bump pydantic from 2.11.7 to 2.11.9 in /samples by @dependabot[bot] in #2732
- [llm bench] Add mem info on initial/compilation phase to json/csv by @sbalandi in #2741
- Increase timeouts by @Wovchena in #2743
- Allow additional_params for tokenizer decode in TextStreamer by @dkalinowski in #2729
- Fix attention mask pass for whisper (static) by @eshiryae in #2665
- Support from-onnx parameter by @sstrehlk in #2441
- Use model path property for caching by @praasz in #2720
- Increase GGUF timeouts by @Wovchena in #2756
- [VLM] Fixed measuring of embeddings preparation. by @popovaan in #2752
- Bump timm from 1.0.19 to 1.0.20 by @dependabot[bot] in #2754
- [llm_bench] Fix OpenVINO config not being passed for speech-to-text and Whisper models by @aobolensk in #2763
- OPT & Clean code of openvino_vision_embeddings_merger_model inputs processing by @zhaixuejun1993 in #2726
- Add .github/pull_request_template.md by @Wovchena in #2765
- Update transformers to 4.53.3 by @as-suvorov in https://gi...
2025.3.0.0
What's Changed
- Bump product version 2025.3 by @akladiev in #2255
- Implement SnapKV by @vshampor in #2067
- [WWB] Additional processing of native phi4mm by @nikita-savelyevv in #2276
- Update ov genai version in samples by @as-suvorov in #2275
- use chat templates in vlm by @eaidova in #2279
- Fix 'Unsupported property' fails if set prompt_lookup to False by @sbalandi in #2240
- Force the PA implementation in the llm-bench by default by @sbalandi in #2271
- Update Whisper README.md as "--disable-stateful" is no longer required to export models for NPU by @luke-lin-vmc in #2249
- Removed 'slices' from EncodedImage by @popovaan in #2258
- support text embeddings in llm_bench by @eaidova in #2269
- [wwb]: load transformers model first, then only trust_remote_code by @eaidova in #2270
- [GHA] Coverity pipeline fixes by @mryzhov in #2283
- [GHA][DEV] Fixed coverity path creation by @mryzhov in #2285
- [GHA][DEV]Save coveity tool to cache by @mryzhov in #2286
- [GHA][DEV] Set cache key for coverity tool by @mryzhov in #2288
- Image generation multiconcurrency (#2190) by @dkalinowski in #2284
- [GGUF] Support GGUF format for tokenizers and detokenizers by @rkazants in #2263
- Unskip whisper tests & update optimum-intel by @as-suvorov in #2247
- Update README.md with text-to-speech by @rkazants in #2294
- add new chat template for qwen3 by @eaidova in #2297
- [DOCS] Correct cmd-line for TTS conversion by @rkazants in #2303
- [GHA] Enabled product manifest.yml by @mryzhov in #2281
- Bump the npm_and_yarn group across 3 directories with 2 updates by @dependabot[bot] in #2309
- [GHA] Save artifacts to cloud share by @akladiev in #1943
- [GHA][[COVERITY] added manual trigger by @mryzhov in https://github.com//pull/2289
- [GHA] Fix missing condition for Extract Artifacts step by @akladiev in #2313
- [llm bench] Turn off PA backend for VLM by @sbalandi in #2312
- [llm_bench] Add setting of max_num_batched_tokens for SchedulerConfig by @sbalandi in #2316
- [GHA] Fix missing condition for LLM & VLM test by @sammysun0711 in #2326
- [Test] Skip gguf test on MacOS due to sporadic failure by @sammysun0711 in #2328
- [GGUF] support Qwen3 architecture by @TianmengChen in #2273
- [llm_bench] Increase max_num_batched_tokens to the largest positive integer by @sbalandi in #2327
- Bump aquasecurity/trivy-action from 0.30.0 to 0.31.0 by @dependabot[bot] in #2310
- Bump actions/download-artifact from 4.1.9 to 4.3.0 by @dependabot[bot] in #2315
- Fix system_message forwarding by @Wovchena in #2325
- Disabled crop of the prompt for minicpmv. by @andreyanufr in #2320
- [llm bench] Fix setting ATTENTION_BACKEND to plugin config in case of fallback to Optimum by @sbalandi in #2332
- Bump brace-expansion from 2.0.1 to 2.0.2 in /.github/actions/install_wheel in the npm_and_yarn group across 1 directory by @dependabot[bot] in #2338
- Fix multinomial sampling for PromptLookupDecoding by @sbalandi in #2331
- [llm bench] Avoid using not supported by beam_search parameters for beam_search case by @sbalandi in #2336
- Update Export Requirements by @apaniukov in #2342
- [GGUF] Serialize Generated OV Model for Faster LLMPipeline Init by @sammysun0711 in #2218
- Fixed system message in chat mode. by @popovaan in #2343
- Bump librosa from 0.10.2.post1 to 0.11.0 in /samples by @dependabot[bot] in #2346
- [Test][GGUF] Add DeepSeek-R1-Distill-Qwen GGUF in CI by @sammysun0711 in #2329
- [llm_bench] Remove default scheduler config by @sbalandi in #2341
- master: add Phi-4-multimodal-instruct by @Wovchena in #2264
- Fix paths with unicode for tokenizers by @yatarkan in #2337
- [WWB] Add try-except block for processor loading by @nikita-savelyevv in #2352
- [WWB] Bring back eager attention implementation by default by @nikita-savelyevv in #2353
- fix supported models link in TTS samples by @eaidova in #2300
- StaticLLMPipeline: Add tests on caching by @smirnov-alexey in #1905
- [WWB] Fix loading the tokenizer for VLMs by @l-bat in #2351
- Pass Scheduler Config for VLM Pipeline in WhoWhatBenchmark. by @popovaan in #2318
- Remove misspelled CMAKE_CURRENT_SOUCE_DIR by @Wovchena in #2362
- Increase timeout for LLM & VLM by @Wovchena in #2359
- [llm bench] Fix setting ATTENTION_BACKEND to plugin config in case of fallback to Optimum for VLM by @sbalandi in #2361
- Support multi images for vlm benchmarking in samples and llm_bench by @wgzintel in #2197
- CB: Hetero pipeline parallel support by @WeldonWangwang in #2227
- Update conversion instructions by @adrianboguszewski in #2287
- Merge stderr from failed samples by @Wovchena in #2156
- Revert cache folder by @Wovchena in #2372
- Update README in Node.js API by @almilosz in #2374
- [Docs] Rework home page by @yatarkan in #2368
- Align PromptLookupDecoding with greedy when dynamic_split_fuse works by @sbalandi in #2360
- Support to collect latency for transformers V4.52.0 by @wgzintel in #2373
- Bump diffusers from 0.33.1 to 0.34.0 in /samples by @dependabot[bot] in #2381
- Bump diffusers from 0.33.1 to 0.34.0 in /tests/python_tests by @dependabot[bot] in #2380
- Structured Output generation with
XGrammarby @pavel-esir in #2295 - Disable XGrammar on Android by @apaniukov in #2389
- [wwb] Take prompts from different categories for def dataset for VLM by @sbalandi in #2349
- Fix for cloning NPU Image Generation pipelines (#2376) by @dkalinowski in #2393
- Set add_special_tokens=false for image tags in MiniCPM. by @popovaan in #2404
- Fix missing use cases for inpainting models and defining use case with relative path by @sbalandi in #2387
- temporary skip failing whisper tests by @pavel-esir in #2396
- Fix test_vlm_npu_no_exception by @AlexanderKalistratov in #2388
- Bump timm from 1.0.15 to 1.0.16 by @dependabot[bot] in #2390
- Optimize VisionEncoderQwen2VL::encode by @usst...
2025.2.0.0
What's Changed
- [GHA] Replaced isual_language_chat_sample-ubuntu-minicpm_v2_6 job by @mryzhov in #1909
- [GHA] Replaced cpp-chat_sample-ubuntu pipeline by @mryzhov in #1913
- Add support of Prompt Lookup decoding to llm bench by @sbalandi in #1917
- [GHA] Introduce SDL pipeline by @mryzhov in #1924
- Switch Download OpenVINO step to aks-medium-runner by @ababushk in #1889
- Bump product version 2025.2 by @akladiev in #1920
- [GHA] Replaced cpp-continuous-batching by @mryzhov in #1910
- Update dependencies in samples by @ilya-lavrenov in #1925
- phi3_v: add universal tag by @Wovchena in #1921
- Fix image_id unary error by @rkazants in #1927
- [Docs] Image generation use case by @yatarkan in #1877
- Add perf metrics for CB VLM by @pavel-esir in #1897
- Enhance the flexibility of the c streamer by @apinge in #1941
- add Gemma3 LLM to supported models by @eaidova in #1942
- Added GPTQ/AWQ support with HF Transformers by @AlexKoff88 in #1933
- Add --static_reshape option to llm_bench, to force static reshape + compilation at pipeline creation by @RyanMetcalfeInt8 in #1851
- benchmark_image_gen: Add --reshape option, and ability to specify multiple devices by @RyanMetcalfeInt8 in #1878
- Revert perf regression changes by @dkalinowski in #1949
- Add running greedy_causal_lm for JS to the sample tests by @Retribution98 in #1930
- [Docs] Add VLM use case by @yatarkan in #1907
- Added possibility to generate base text on GPU for text evaluation. by @andreyanufr in #1945
- VLM: change infer to start_async/wait by @dkalinowski in #1948
- [WWB]: Addressed issues with validation on Windows by @AlexKoff88 in #1953
- [GHA] Remove bandit pipeline by @mryzhov in #1956
- Disable MSVC debug assertions, addressing false positives in iterator checking by @apinge in #1952
- [GHA] Replaced genai-tools pipeline by @mryzhov in #1954
- configurable delay by @eaidova in #1963
- Update cast of tensor data pointer for const tensors by @praasz in #1966
- Remove tokens after EOS for draft model for speculative decoding by @sbalandi in #1951
- Add testcase for chat_sample_c by @apinge in #1934
- Skip warm-up iteration during llm_bench results averaging by @nikita-savelyevv in #1972
- Reset pipeline cache usage statistics on each generate call by @vshampor in #1961
- [Docs] Update models, rebuild on push by @yatarkan in #1922
- Updated logic whether PA backend is explicitly required by @ilya-lavrenov in #1976
- [GHA] [MAC] Use latest_available_commit OV artifacts by @mryzhov in #1977
- [GHA] Set HF_TOKEN by @mryzhov in #1986
- [GHA] Setup ov_cache by @mryzhov in #1962
- [GHA] Changed cleanup runner by @mryzhov in #1995
- Added mutex to methods which use blocks map. by @popovaan in #1975
- Add documentation and sample on KV cache eviction by @vshampor in #1960
- StaticLLMPipeline: Simplify compile_model call logic by @smirnov-alexey in #1915
- Fix reshape in heterogeneous SD samples by @helena-intel in #1994
- Update tokenizers by @mryzhov in #2002
- docs: fix max_new_tokens option description by @tpragasa in #1987
- [Docs] Add speech recognition with whisper use case by @yatarkan in #1971
- Revert "VLM: change infer to start_async/wait " by @ilya-lavrenov in #2004
- Revert "Revert perf regression changes" by @ilya-lavrenov in #2003
- Set xfail to failing tests. by @popovaan in #2006
- [GHA] Use cpack bindings in the samples tests by @mryzhov in #1979
- [Docs]: add Phi3.5MoE to supported models by @eaidova in #2012
- add TensorArt SD3.5 models to supported list by @eaidova in #2013
- Move MiniCPM resampler to vision encoder by @popovaan in #1997
- [GHA] Fix ccache on Win/Mac by @mryzhov in #2008
- samples/python/text_generation/lora.py -> samples/python/text_generation/lora_greedy_causal_lm.py by @Wovchena in #2007
- Whisper timestamp fix by @RyanMetcalfeInt8 in #1918
- Unskip Qwen2-VL-2B-Instruct sample test by @as-suvorov in #1970
- [GHA] Use developer openvino packages by @mryzhov in #2000
- Added NNCF to export-requirements.txt by @ilya-lavrenov in #1974
- Bump py-build-cmake from 0.4.2 to 0.4.3 by @dependabot in #2016
- Use OV_CACHE for python tests by @as-suvorov in #2020
- [GHA] Disable HTTP calls to the Hugging Face Hub by @mryzhov in #2021
- Add python bindings to VLMPipeline for encrypted models by @olpipi in #1916
- Bump the npm_and_yarn group across 1 directory with 2 updates by @dependabot in #2017
- CB: auto plugin support by @ilya-lavrenov in #2034
- timeout-minutes: 90 by @Wovchena in #2039
- Bump diffusers from 0.32.2 to 0.33.1 by @dependabot in #2031
- Bump diffusers from 0.32.2 to 0.33.1 in /samples by @dependabot in #2032
- Enable cache and add cache encryption to samples by @olpipi in #1990
- Fix VLM concurrency by @mzegla in #2022
- Move Phi3 vision projection model to vision encoder by @popovaan in #2009
- Fix spelling by @Wovchena in #2025
- [Docs] Enable autogenerated samples docs by @yatarkan in #2029
- Synchronize entire embeddings calculation phase (#1967) by @mzegla in #1993
- Add missing finish reason set when finishing the sequence by @mzegla in #2036
- Bump image-size from 1.2.0 to 1.2.1 in /site in the npm_and_yarn group across 1 directory by @dependabot in #1998
- Add README for C Samples by @apinge in #2040
- Use ov_cache for test_vlm_pipeline by @as-suvorov in #2042
- increase timeouts by @Wovchena in #2041
- [GHA] Use azure runners for python tests by @mryzhov in #1991
- [WWB]: move diffusers imports closer to usage by @eaidova in #2046
- [llm bench] Move calculation of memory consumption to memory_monitor tool by @sbalandi in #1...
2025.1.0.0
What's Changed
- skip failing Chinese prompt on Win by @pavel-esir in #1573
- Bump product version 2025.1 by @akladiev in #1571
- Bump tokenizers submodule by @akladiev in #1575
- [LLM_BENCH] relax md5 checks and allow pass cb config without use_cb by @eaidova in #1570
- [VLM] Add Qwen2VL by @yatarkan in #1553
- Fix links, remind about ABI by @Wovchena in #1585
- Add nightly to instructions similar to requirements by @Wovchena in #1582
- GHA: use nightly from 2025.1.0 by @ilya-lavrenov in #1577
- NPU LLM Pipeline: Switch to STATEFUL by default by @dmatveev in #1561
- Verify not empty rendered chat template by @yatarkan in #1574
- [RTTI] Fix passes rtti definitions by @t-jankowski in #1588
- Test
add_special_tokensproperly by @pavel-esir in #1586 - Add indentation for llm_bench json report dumping by @nikita-savelyevv in #1584
- prioretize config model type under path-based task determination by @eaidova in #1587
- Replace openvino.runtime imports with openvino by @helena-intel in #1579
- Add tests for Whisper static pipeline by @eshiryae in #1250
- CB: removed handle_dropped() misuse by @ilya-lavrenov in #1594
- Bump timm from 1.0.13 to 1.0.14 by @dependabot in #1595
- Update samples readme by @olpipi in #1545
- [ Speculative decoding ][ Prompt lookup ] Enable Perf Metrics for assisting pipelines by @iefode in #1599
- [LLM] [NPU] StaticLLMPipeline: Export blob by @smirnov-alexey in #1601
- [llm_bench] enable prompt permutations for prevent prefix caching and fix vlm image load by @eaidova in #1607
- LLM: use set_output_seq_len instead of WA by @ilya-lavrenov in #1611
- CB: support different number of K and V heads per layer by @ilya-lavrenov in #1610
- LLM: fixed Slice / Gather of last MatMul by @ilya-lavrenov in #1616
- Switch to VS 2022 by @mryzhov in #1598
- Add Phi-3.5-vision-instruct and Phi-3-vision-128k-instruct by @Wovchena in #1609
- Whisper pipeline: apply slice matmul by @as-suvorov in #1623
- GHA: use OV master in mac.yml by @ilya-lavrenov in #1622
- [Image Generation] Image2Image for FLUX by @likholat in #1621
- add missed ignore_eos in generation config by @eaidova in #1625
- Master increase priority for rt info to fix Phi-3.5-vision-instruct and Phi-3-vision-128k-instruct by @Wovchena in #1626
- Correct model name by @wgzintel in #1624
- Token rotation by @vshampor in #987
- Whisper pipeline: use Sampler by @as-suvorov in #1615
- Fix setting eos_token_id with kwarg by @Wovchena in #1629
- Extract cacheopt E2E tests into separate test matrix field by @vshampor in #1630
- [CB] Split token streaming and generation to different threads for all CB based pipelines by @iefode in #1544
- Don't silence a error if a file can't be opened by @Wovchena in #1620
- [CMAKE]: use different version for macOS arm64 by @ilya-lavrenov in #1632
- Test invalid fields assignment raises in GenerationConfig by @Wovchena in #1633
- do_sample=False for NPU in chat_sample, add NPU to README by @helena-intel in #1637
- [JS] Add GenAI Node.js bindings by @vishniakov-nikolai in #1193
- CB: preparation for relying on KV cache precisions from plugins by @ilya-lavrenov in #1634
- [LLM bench]support providing adapter config mode by @eaidova in #1644
- Automatically apply chat template in non-chat scenarios by @sbalandi in #1533
- beam_search_causal_lm.cpp: delete wrong comment by @Wovchena in #1639
- [WWB]: Fixed chat template usage in VLM GenAI pipeline by @AlexKoff88 in #1643
- [WWB]: Fixed nano-Llava preprocessor selection by @AlexKoff88 in #1646
- [WWB]: Added config to preprocessor call in VLMs by @AlexKoff88 in #1638
- CB: remove DeviceConfig class by @ilya-lavrenov in #1640
- [WWB]: Added initialization of nano-llava in case of Transformers model by @AlexKoff88 in #1649
- WWB: simplify code around start_chat / use_template by @ilya-lavrenov in #1650
- Tokenizers update by @ilya-lavrenov in #1653
- DOCS: reorganized support models for image generation by @ilya-lavrenov in #1655
- Fix using lm_bemch/wwb with version w/o apply_chat_template by @sbalandi in #1651
- Fix Qwen2VL generation without images by @yatarkan in #1645
- Parallel sampling with threadpool by @mzegla in #1252
- [Coverity] Enabling coverity scan by @akazakov-github in #1657
- [ CB ] Fix streaming in case of empty outputs by @iefode in #1647
- Allow overriding eos_token_id by @Wovchena in #1654
- CB: remove GenerationHandle:back by @ilya-lavrenov in #1662
- Fix tiny-random-llava-next in VLM Pipeline by @yatarkan in #1660
- [CB] Add KVHeadConfig parameters to PagedAttention's rt_info by @sshlyapn in #1666
- Bump py-build-cmake from 0.3.4 to 0.4.0 by @dependabot in #1668
- pin optimum version by @pavel-esir in #1675
- [LLM] Enabled CB by default by @ilya-lavrenov in #1455
- SAMPLER: fixed hang during destruction of ThreadPool by @ilya-lavrenov in #1681
- CB: use optimized scheduler config for cases when user explicitly asked CB backend by @ilya-lavrenov in #1679
- [CB] Return Block manager asserts to destructors by @iefode in #1569
- phi3_v: allow images, remove unused var by @Wovchena in #1670
- [Image Generation] Inpainting for FLUX by @likholat in #1685
- [WWB]: Added support for SchedulerConfig in LLMPipeline by @AlexKoff88 in #1671
- Add LongBench validation by @l-bat in #1220
- Fix Tokenizer for several added special tokens by @pavel-esir in #1659
- Unpin optimum-intel version by @ilya-lavrenov in #1680
- Image generation: proper error message when encode() is used w/o encoder passed to ctor by @ilya-lavrenov in #1683
- Fix excluding stop str from output for some tokenizer by @sbalandi in #1676
- [VLM] Fix chat template fallback in chat mode with defined system message by @yatarkan in https://github.com/openvinotoolkit/openvino.genai/pull/...
2025.0.0.0
Please check out the latest documentation pages related to the new openvino_genai package!
2024.6.0.0
Please check out the latest documentation pages related to the new openvino_genai package!
2024.5.0.0
Please check out the latest documentation pages related to the new openvino_genai package!