The fork system call allows an LLM process to create copies of itself to handle multiple tasks in parallel, similar to the Unix fork() system call but with advantages specific to LLM-based applications.
The fork feature enables an LLM process to:
- Create multiple copies of itself, each with the full conversation history
- Process independent tasks in parallel without filling up the context window
- Combine results from multiple forked processes into a single response
⚠️ Access Control: The fork tool is only available to processes with ADMIN access level. Child processes created by fork are given WRITE access level by default, which prevents them from calling fork again.
- Shared Context: Each forked process inherits the full conversation history, ensuring continuity and context preservation.
- Parallel Processing: Multiple tasks can be processed simultaneously, improving efficiency.
- Prompt Caching: Shared conversation history prefix can be cached for performance benefits. Some providers expose explicit caching, while others implicitly reuse the shared context.
- Focus: Each fork can concentrate on a specific subtask without distraction.
To enable the fork system call, add it to the [tools] section in your TOML configuration file:
tools:
builtin:
- forkYou can also combine it with other system tools:
tools:
builtin:
- fork
- spawnOnce enabled, the fork tool is available to the LLM through the standard tool-calling interface.
{
"name": "fork",
"description": "Create copies of the current process to handle multiple tasks in parallel. Each copy has the full conversation history.",
"input_schema": {
"type": "object",
"properties": {
"prompts": {
"type": "array",
"description": "List of prompts/instructions for each forked process",
"items": {
"type": "string",
"description": "A specific task or query to be handled by a forked process"
}
}
},
"required": ["prompts"]
}
}The fork system call is ideal for:
- Breaking complex tasks into parallel subtasks
- Performing multiple independent operations simultaneously
- Processing data from multiple sources in parallel
- Executing operations that would otherwise consume excessive context length
- Research: Fork to read and analyze multiple documents in parallel.
- Code Analysis: Fork to examine different parts of a codebase simultaneously.
- Data Processing: Fork to process different data segments independently.
- Content Generation: Fork to generate multiple variations of content in parallel.
The fork feature is implemented through:
- A
fork_toolfunction intools/builtin/fork.pythat handles forking requests - The
_fork_processmethod inLLMProcessthat manages the forking process - The standard
ToolManagersystem for registering and calling the fork tool - Access level controls to manage which processes can use the fork tool
The fork tool is a first-class provider-agnostic tool executed via ToolManager just like any other tool, following the Unix-inspired model where tools are accessed through a consistent interface.
When a process is forked:
- A new process is created via
program.start()to ensure proper initialization - The entire conversation state is deep-copied to the new process using the
ProcessSnapshotmechanism - All preloaded content and system prompts are preserved
- File descriptors are properly cloned to maintain independence between processes
- The access level is set to WRITE for child processes, enforcing security boundaries
- The forked process runs independently with its own query using the provider's appropriate executor
- Results from all forked processes are collected and returned to the parent process as a list of dictionaries with
idandmessagefields
The implementation stores per-iteration buffers (msg_prefix and tool_results_prefix) on process.iteration_state. These maintain proper causal ordering of messages and tool results so tools executed later in a turn can see results from earlier tools.
Each iteration also exposes the currently executing tool via process.iteration_state.current_tool.
While inspired by the Unix fork() system call, the LLMProc fork implementation has some key differences:
- It creates multiple forks at once rather than a single child process.
- Each fork is given a specific prompt/task rather than continuing execution from the fork point.
- The parent immediately waits for all child processes and collects their results.
See examples/fork.yaml for a complete example program configuration that demonstrates the fork system call.
- The implementation executes all forked processes in parallel with asyncio.gather(), with the parent waiting for all children to complete.
- The access level system (AccessLevel.ADMIN for parents, AccessLevel.WRITE for children) enforces security boundaries, preventing unauthorized fork operations.
- Each process has complete state isolation through deep copying and proper file descriptor cloning.
- The streaming implementation ensures proper causal ordering of messages and tool results.
- Future enhancements may include:
- More sophisticated process management and job control features
- Implementation of the Unix Fork-Exec pattern by combining fork with system prompt or tool modifications
- Performance optimizations for large state handling