Releases · microsoft/openaivec

13 Apr 15:25

piroyoung

v2.2.1

0631cff

v2.2.1 Latest

Latest

Bug Fixes

fabric: Use import notebookutils instead of getattr(builtins, 'notebookutils') for Fabric environment detection (#157)
- notebookutils is a regular site-package on the Fabric runtime, not a builtins attribute
- Previous approach caused is_fabric_environment() to always return False
provider: Treat OPENAI_API_KEY=place_holder_for_fabric_internal as absent so the library correctly falls through to Azure / Entra ID authentication

Assets 8

13 Apr 13:26

piroyoung

v2.2.0

625b521

v2.2.0 — Fabric Entra ID Authentication

What's New

Microsoft Fabric Key Vault Authentication

Automatic Entra ID (Service Principal) authentication for Microsoft Fabric notebooks.

Two auth paths:

Key Vault path: Set AZURE_TENANT_ID, AZURE_CLIENT_ID, KEY_VAULT_URL, KEY_VAULT_SECRET_NAME — the library auto-retrieves the SP client secret from Key Vault via notebookutils
Direct path: Set AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET directly

Key design points:

No os.environ side effects — credentials flow through DI (ValueObject + Container)
BearerTokenProvider builds ClientSecretCredential directly from DI-resolved values
Detailed guidance when configuration is incomplete (auth flow, setup steps, per-variable status)
setup_entra_id() in spark_ext supports both direct secret and KV retrieval for executors

New ValueObjects

KeyVaultURL, KeyVaultSecretName — registered via DI, consistent with existing pattern

Documentation

All "Azure AD" references replaced with "Entra ID"
Updated auth flow descriptions to reflect ClientSecretCredential via DI

Assets 8

12 Apr 03:54

piroyoung

v2.1.1

26571e7

v2.1.1 — Test fix & docs

v2.1.1

🐛 Bug Fixes

fix(test): Add missing multimodal parameter to fake_responses_with_cache mock in async pandas_ext test. The multimodal=False default was leaking into **api_kwargs, causing assertion failures in CI.

📄 Docs

Add executed outputs to receipt_parsing.ipynb notebook for GitHub Pages rendering.

Assets 8

12 Apr 03:35

piroyoung

v2.1.0

75a600e

v2.1.0 — Multimodal File Support

🚀 Multimodal File Support

BatchResponses and all extension modules (pandas_ext, duckdb_ext, spark_ext) now support multimodal inputs — process images, PDFs, source code files, and plain text in a single call with multimodal=True.

✨ Highlights

Smart Input Routing

Inputs are automatically classified for optimal performance:

Text files (.py, .js, .md, .csv, .json, .html, …) → read as strings, batched & deduplicated
Binary documents (.pdf, .docx, .xlsx, .pptx, …) → uploaded via Files API
Images (.png, .jpg, .gif, local or URL) → inlined as base64 data URIs
Plain text → batched as before

batch = BatchResponses.of(
    client=client,
    model_name="gpt-4.1-mini",
    system_message="Extract key information.",
    response_format=MySchema,
    multimodal=True,   # ← enables file routing
)

# Mix file paths and text freely
results = batch.parse([
    "plain text query",
    "/path/to/report.pdf",     # → Files API upload
    "/path/to/code.py",        # → read & batched with text
    "/path/to/photo.png",      # → base64 inline
])

DuckDB Receipt Parsing Example

New notebook: docs/examples/receipt_parsing.ipynb

-- Register UDF with Pydantic schema → DuckDB STRUCT
SELECT
    parse_receipt(file).store_name AS store,
    parse_receipt(file).total      AS total
FROM glob(receipts/*.pdf)

📦 New Module

_multimodal.py — MultimodalContentBuilder, file type detection, extension sets
- _BINARY_DOCUMENT_EXTENSIONS (28) — PDF, Office formats → Files API
- _TEXT_DOCUMENT_EXTENSIONS (45) — source code, markup, data → inline batching
- Extension lists verified against Responses API context stuffing

⚠️ Breaking Changes

None — multimodal=False by default preserves existing behavior.

🔧 API Changes

Module	Change
`BatchResponses.of()`	Added `multimodal: bool = False`
`AsyncBatchResponses.of()`	Added `multimodal: bool = False`
`pandas_ext` `.ai.responses()`	Added `multimodal` param
`duckdb_ext.responses_udf()`	Added `multimodal` param
`spark_ext.responses_udf()`	Added `multimodal` param

🚫 Audio

Audio files (.mp3, .wav) raise ValueError — the Responses API does not support audio input. Use the Realtime API or Chat Completions API instead.

🧪 Testing

28 multimodal unit tests
E2E verified against real API: PNG, JPG, PDF, TXT, CSV, JSON, MD, HTML, YAML, XML, RTF, TeX, RST, PY, JS, Java, C, Scala, SH, CSS, DIFF, MJS, PL, BAT

Assets 8

09 Apr 12:45

piroyoung

v2.0.1

8af9bdb

v2.0.1

openaivec v2.0.1

🔄 Breaking Changes

openaivec.spark → openaivec.spark_ext — Module renamed for consistency with pandas_ext / duckdb_ext. Update all from openaivec.spark import ... to from openaivec.spark_ext import ....
duckdb_ext.register_*_udf → duckdb_ext.*_udf — Removed register_ prefix for consistency with spark_ext:
- register_responses_udf → responses_udf
- register_embeddings_udf → embeddings_udf
- register_task_udf → task_udf

⚡ Improvements

DI-managed DuckDB connection — _df_rows_to_json_series reuses a singleton connection via the DI container instead of creating/closing one per call.
Persistent BackgroundLoop — DuckDB UDFs reuse a single background event loop instead of spawning a thread per batch.
Shared async utilities — BackgroundLoop, run_async, create_event_loop, close_event_loop, run_partition_async extracted to _util.py, shared by duckdb_ext and spark_ext.
Explicit DuckDB table registration — Avoids variable name collisions when _df_rows_to_json_series is called.

🆕 Type Coverage

Decimal → DECIMAL
UUID → UUID
datetime → TIMESTAMP
date → DATE
time → TIME

🐛 Fixes

fillna.py: Handle numpy types in json.dumps with default=str.
Arrow UDF empty batches: Use typed null arrays (pa.string()) instead of untyped.
Remove unused _model_to_dict function.

📦 Dependencies

pyarrow>=19.0.0 added as core dependency.

Assets 8

08 Apr 23:55

piroyoung

v2.0.0

996589c

v2.0.0

openaivec v2.0.0 Release Notes

🎉 Highlights

openaivec 2.0 introduces DuckDB as a first-class integration, bringing persistent caching,
SQL-native AI functions, and Arrow-optimized data pipelines. This is a major release with breaking
changes to class names and configuration APIs.

🆕 DuckDB Integration (`openaivec.duckdb_ext`)

Arrow Vectorized UDFs — register_responses_udf, register_embeddings_udf,
register_task_udf register AI-powered functions directly in DuckDB. Rows are processed in batches
with async concurrency and automatic deduplication — all transparent to SQL.
Structured Output as STRUCT — Pydantic BaseModel response formats return native DuckDB
STRUCT types with direct field access in SQL (SELECT udf(text).sentiment). Supports nested
models, Enum, and Literal.
Persistent Cache — DuckDBCacheBackend stores API results in a DuckDB table with LRU
eviction. Eliminates redundant API calls across sessions.
Vector Similarity — similarity_search() performs top-k cosine similarity queries via
list_cosine_similarity.
Schema → DDL — pydantic_to_duckdb_ddl() converts Pydantic models to CREATE TABLE
statements.

⚡ Performance Improvements

Arrow-backed Embeddings — Embedding results are now stored as pa.FixedSizeListArray<float32>
in pandas, enabling zero-copy transfer to DuckDB/Parquet and 2-3x memory reduction.
Zero-copy Similarity — DataFrame.ai.similarity() extracts numpy matrices directly from Arrow
buffers.
DuckDB JSON Serialization — _df_rows_to_json_series uses DuckDB's C++ to_json() instead of
Python json.dumps (5-10x faster on large DataFrames).
Batched Token Counting — count_tokens uses tiktoken.encode_batch() for 2-3x speedup.

🔄 Breaking Changes

Before (v1.x)	After (v2.0)
`BatchingMapProxy`	`BatchCache`
`AsyncBatchingMapProxy`	`AsyncBatchCache`
`ProxyBase`	`BatchCacheBase`
`proxy._cache` (private field)	`proxy.cache` (public field)
`pandas_ext.set_client()`	`openaivec.set_client()`
`pandas_ext.set_responses_model()`	`openaivec.set_responses_model()`

Migration: pandas_ext.set_* / get_* still work but emit DeprecationWarning. Use
openaivec.set_* / openaivec.get_* instead.

🏗 Architecture Changes

CacheBackend Protocol — Runtime-checkable protocol for pluggable cache backends.
InMemoryCacheBackend (default) and DuckDBCacheBackend both satisfy it.
Unified Configuration — set_client, get_client, set_responses_model, etc. defined in
_provider.py and exported from openaivec.*. Shared across pandas, DuckDB, and Spark.
Notebook-safe Async — _run_async() helper runs coroutines from any context (including
Jupyter) via a background thread.

📦 New Dependencies

duckdb>=1.0.0 — core dependency
pyarrow>=19.0.0 — core dependency

📖 Documentation

New API reference page: duckdb_ext
New example notebook: DuckDB customer survey sentiment analysis
README: "Using with DuckDB" section with structured output and embedding examples
Updated coding conventions: @dataclass for all classes, typed fields, DI via fields, of()
factories

Assets 8

08 Apr 06:08

piroyoung

v1.2.1

7575dd5

v1.2.1

What's Changed

Fix/reliability performance hardening by @piroyoung in #147
Refactor/split pandas ext by @piroyoung in #148

Full Changelog: v1.1.10...v1.2.1

Contributors

piroyoung

Assets 8

31 Mar 05:56

piroyoung

v1.1.10

7b8ef4a

v1.1.10

Merge pull request #146 from microsoft/chore/deps

chore: update deps

Assets 8

24 Mar 03:14

piroyoung

v1.1.9

dfd6ca7

v1.1.9

What's Changed

chore: update deps by @piroyoung in #142

Full Changelog: v1.1.8...v1.1.9

Contributors

piroyoung

Assets 8

02 Mar 06:16

piroyoung

v1.1.8

c7fc762

v1.1.8

What's Changed

chore: update deps by @piroyoung in #139

Full Changelog: v1.1.7...v1.1.8

Contributors

piroyoung

Assets 8

Releases: microsoft/openaivec

v2.2.1

Bug Fixes

Uh oh!

v2.2.0 — Fabric Entra ID Authentication

What's New

Microsoft Fabric Key Vault Authentication

New ValueObjects

Documentation

Uh oh!

v2.1.1 — Test fix & docs

v2.1.1

🐛 Bug Fixes

📄 Docs

Uh oh!

v2.1.0 — Multimodal File Support

🚀 Multimodal File Support

✨ Highlights

Smart Input Routing

DuckDB Receipt Parsing Example

📦 New Module

⚠️ Breaking Changes

🔧 API Changes

🚫 Audio

🧪 Testing

Uh oh!

v2.0.1

openaivec v2.0.1

🔄 Breaking Changes

⚡ Improvements

🆕 Type Coverage

🐛 Fixes

📦 Dependencies

Uh oh!

v2.0.0

openaivec v2.0.0 Release Notes

🎉 Highlights

🆕 DuckDB Integration (openaivec.duckdb_ext)

⚡ Performance Improvements

🔄 Breaking Changes

🏗 Architecture Changes

📦 New Dependencies

📖 Documentation

Uh oh!

v1.2.1

What's Changed

Contributors

Uh oh!

v1.1.10

Uh oh!

v1.1.9

What's Changed

Contributors

Uh oh!

v1.1.8

What's Changed

Contributors

Uh oh!

🆕 DuckDB Integration (`openaivec.duckdb_ext`)