Skip to content

fix: implement batch generation for OpenAI models#1846

Open
ianliuy wants to merge 2 commits intodottxt-ai:mainfrom
ianliuy:fix/issue-1391
Open

fix: implement batch generation for OpenAI models#1846
ianliuy wants to merge 2 commits intodottxt-ai:mainfrom
ianliuy:fix/issue-1391

Conversation

@ianliuy
Copy link
Copy Markdown

@ianliuy ianliuy commented Apr 14, 2026

What's broken?

Calling the generator with a list of prompts (vectorized/batched calls) crashes for OpenAI models. For example:

model = outlines.models.openai("gpt-4o-mini")
generator = outlines.generate.json(model, MyModel)
results = generator(["prompt1", "prompt2"])  # crashes

Who is affected?

Any user trying to batch multiple prompts with OpenAI (or AsyncOpenAI) models. Single-prompt calls work fine. This was originally reported against v0.1.13, and while the codebase has been completely restructured for v1.0, the underlying user need (batched generation) is still unmet — generate_batch() raises NotImplementedError.

When does it trigger?

Every time model.batch([...]) or generator.batch([...]) is called with an OpenAI model. 100% reproducible.

Where is the bug?

  • outlines/models/openai.py lines 295-303: OpenAI.generate_batch() raises NotImplementedError
  • outlines/models/openai.py lines 445-453: AsyncOpenAI.generate_batch() raises NotImplementedError

Why does it happen?

When batch support was added to the model hierarchy (commit c350ddc9), models with native batch APIs (Transformers, vLLM) got real implementations, while API-based models (OpenAI, Anthropic, etc.) got NotImplementedError stubs since their APIs don't support multi-prompt-in-one-call.

However, batch can be trivially implemented for API models by looping over individual generate() calls — each prompt becomes a separate API request. The async variant can use asyncio.gather() for concurrent execution.

How did we fix it?

Replaced the NotImplementedError stubs in OpenAI.generate_batch() and AsyncOpenAI.generate_batch() with implementations that:

  • Sync: Loop over inputs, calling self.generate() for each prompt
  • Async: Use asyncio.gather() to fire all prompts concurrently

This is a minimal, surgical change (~10 lines per method) that follows the existing generate() contract — input formatting, output type handling, and error handling are all delegated to the already-tested generate() method.

The same pattern could be applied to other API models (Anthropic, Gemini, Mistral, etc.) as a follow-up.

How do we know it works?

  • Updated test_openai_batch and test_openai_async_batch from asserting NotImplementedError to verifying correct batch results using mock clients
  • Tests cover both sync and async paths, including duplicate-prompt edge case
  • All 10 non-API-call OpenAI tests pass (no regressions)
  • Pre-commit checks pass (mypy, ruff, merge conflicts, debug statements)

cc @RobinPicard @cpfiffer

Replace NotImplementedError stubs in OpenAI.generate_batch() and
AsyncOpenAI.generate_batch() with working implementations:
- Sync: loops over inputs calling self.generate() for each prompt
- Async: uses asyncio.gather() for concurrent execution

Update tests from asserting NotImplementedError to verifying correct
batch results using mock clients.

Fixes dottxt-ai#1391

Signed-off-by: Yiyang Liu <37043548+ianliuy@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@RobinPicard RobinPicard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening a PR! Just 2 things to change on top of the comment below:

  • add type hints and docstrings for the methods that are now implemented thanks to the PR
  • update the the documentation for the OpenAI model, specifying in particular that it's not true batching

Comment thread outlines/models/openai.py Outdated
raise NotImplementedError(
"The `openai` library does not support batch inference."
)
return list(await asyncio.gather(*(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think return_exceptions=True would make sense here as users may want to access results for threads that did not raise an exception.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants