A multi-agent collaborative screenplay storyboarding system that splits scripts in various formats into script units optimized for AI text-to-video generation durations. It outputs high-quality storyboard fragment descriptions while ensuring narrative continuity. Built on LangChain and LangGraph, the system leverages LLMs to parse any script format into "Text-to-Video" prompt fragments compatible with mainstream AI video models. It supports task pool priority queuing, multi-level memory management, and Chroma vector retrieval.
One-Click Conversion: Any screenplay format → Shot-level descriptions → Sora/Veo/Runway/Kling-ready prompts
Continuity Guaranteed: Multi-level memory + vector retrieval ensures character/scene/plot consistency across shots
Get Started in 5 Minutes:pip install penshot+ 3 lines of code
中文 | English | Documentation | PyPI | WebSite | RAG Knowledge · MCP Service
From Story to Shot - Transform your scripts into AI-powered storyboards.
Named "penshot" on PyPI - because every story starts with a pen.
| Pain Point | PenShot Solution |
|---|---|
| Scripts too long for AI video models | Smart chunking + precise duration planning for model-friendly fragments |
| Character outfit changes / scene jumps break continuity | Multi-level memory + Chroma vector retrieval auto-maintains context |
| Manual prompt engineering is time-consuming | Auto-generates bilingual visual descriptions + negative prompts + audio cues |
| Complex multi-model adaptation | One codebase, supports OpenAI/Qwen/DeepSeek/Ollama & more |
| Feature | Description |
|---|---|
| Intelligent Script Parsing | Automatically identifies scenes, dialogue, and action cues; understands narrative structure; supports long-text chunking. |
| Precise Temporal Planning | Intelligently segments content at the shot level, allocating optimal durations that strictly comply with AI video model constraints. |
| Continuity Guard | Leverages task pool priority queuing, multi-level memory (short/mid/long-term), and Chroma vector retrieval to ensure high consistency in character states, scenes, and plot across adjacent shots. |
| High-Quality Prompt Output | Generates detailed bilingual (Chinese/English) visual descriptions, negative prompts, and audio prompts, ready for immediate use. |
| Multi-Model Compatibility | Supports OpenAI, Qwen, DeepSeek, Ollama, and other major LLM providers with plug-and-play switching. |
| Multi-Protocol Integration | Provides Python SDK, REST API, LangGraph nodes, A2A collaboration protocol, and standard MCP interfaces. |
| Robustness & Traceability | Built-in auto-retry and error fallback mechanisms. Every storyboard fragment is bidirectionally traceable to its original script location. |
flowchart TD
subgraph Input [Input Layer]
A1[Client / Upstream Agent] --> A2[REST API / MCP / A2A]
A2 --> A3[Task Manager]
end
subgraph Core [LangGraph Multi-Agent Core Workflow]
direction TB
P1[Script Parser Agent] --> P2[Storyboard Generator Agent]
P2 --> P3[Video Splitter Agent]
P3 --> P4[Prompt Converter Agent]
P4 --> P5[Quality Auditor Agent]
P5 --> P6[Continuity Guardian Agent]
P6 --> P7[Auxiliary Generator Agent<br/>Three-view/Background/Keyframe]
subgraph Control [Control Nodes]
C1[Loop Check] --> C2[Error Handling]
C2 --> C3[Human Intervention]
C3 --> C4[Result Generation]
end
P1 -.->|Retry/Fix| Control
P2 -.->|Retry/Fix| Control
P3 -.->|Retry/Fix| Control
P4 -.->|Retry/Fix| Control
P5 -.->|Retry/Fix| Control
P6 -.->|Retry/Fix| Control
Control -.->|Routing Decision| P1
end
subgraph Memory [Memory Layer]
M1[(Short-term Memory)]
M2[(Medium-term Memory)]
M3[(Long-term Memory)]
M4[(Vector Database<br/>Chroma)]
M1 <--> Core
M2 <--> Core
M3 <--> Core
M4 <--> Core
end
subgraph Output [Output Layer]
O1[Workflow Output Fixer<br/>Segment Sequence Repair] --> O2[Result Formatting]
O2 --> O3[JSON / SDK / MCP / A2A]
end
subgraph Downstream [Downstream Rendering]
D1[Multi-model Adapter] --> D2[Sora/Veo/Runway/Keling/SVD]
D2 --> D3[FFmpeg Synthesis]
D3 --> D4[Final Video]
end
A3 --> P1
P7 --> O1
O3 --> D1
This system is a typical Natural Language Processing (NLP) application that achieves end-to-end storyboard transcoding through multi-agent collaboration and memory mechanisms. For detailed architectural design, memory pool implementation, and continuity assurance, please refer to: Architecture Design & Implementation
# Install via PyPI
pip install penshotNote:
penshotis the PyPI package name, whilestory-shot-agentis the GitHub repository name. Both refer to the same project.
cp .env.example .envEdit the .env file to configure the required LLM and Embedding parameters:
########################## LLM Configuration #########################
PENSHOT_LLM__DEFAULT__BASE_URL=https://api.openai.com/v1
PENSHOT_LLM__DEFAULT__API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
PENSHOT_LLM__DEFAULT__MODEL_NAME=gpt-4o
PENSHOT_LLM__DEFAULT__TIMEOUT=30
########################## Embedding Model Configuration #########################
PENSHOT_EMBED__DEFAULT__BASE_URL=https://api.openai.com/v1
PENSHOT_EMBED__DEFAULT__API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
PENSHOT_EMBED__DEFAULT__MODEL_NAME=text-embedding-v4
########################## Redis Configuration ##########################
PENSHOT_REDIS_URL=redis://:123456@localhost:6379/0from penshot.api import create_penshot_agent
agent = create_penshot_agent(max_concurrent=5)
script = "Morning, a girl reading in a cafe, sunlight streaming through the window..."
task_id = agent.breakdown_script_async(
script,
callback=lambda r: print(f"Task {r.task_id} completed")
)
status = agent.get_task_status(task_id)
result = await agent.wait_for_result_async(task_id)Full example: direct_usage.py
Integrate into existing systems via standard HTTP endpoints:
from fastapi import FastAPI, HTTPException
from penshot.api import create_penshot_agent
app = FastAPI(title="Penshot API", version="0.1.0")
agent = create_penshot_agent(max_concurrent=5)
@app.post("/api/generate")
async def generate(script_text: str):
task_id = agent.breakdown_script_async(script_text)
return {"task_id": task_id, "status": "PENDING"}Full example: web_app.py
Can be embedded as an independent node in LangChain/LangGraph workflows for end-to-end automation. Full example: langgraph_integration.py
Supports context passing and task orchestration with upstream scriptwriting agents and downstream text-to-video/editing agents. Full example: a2a_integration.py
Start the MCP Server:
python -m penshot.mcp_server --max-concurrent 5 --queue-size 500Clients can call the breakdown_script and get_task_result tools to seamlessly integrate with MCP-compatible IDEs or agent frameworks. Full example: mcp_client.py
The system returns standardized JSON containing video prompts, negative prompts, duration estimates, style parameters, and accompanying audio prompts:
{
"fragments": [
{
"fragment_id": "frag_001",
"prompt": "Cinematic wide shot: midnight 11 PM in a compact urban apartment living room...",
"negative_prompt": "cartoon, anime, 3D render, bright lighting, text, watermark...",
"duration": 4.2,
"model": "runway_gen2",
"style": "cinematic 35mm film, moody realism, shallow depth of field...",
"audio_prompt": {
"audio_id": "audio_001",
"prompt": "Low-frequency rain ambience (intensity 0.95), distant muffled TV static...",
"model_type": "AudioLDM_3",
"audio_style": "cinematic"
}
}
]
}| Category | Description |
|---|---|
| Network Dependency | Requires stable access to external LLM APIs. Proxy or domestic mirrors are recommended. |
| Long Text Processing | For extremely long scripts, segmented input is advised. The system includes built-in context memory and RAG mechanisms. |
| Generation Duration | AI video models may output clips with ±10% duration variance, which is industry-standard. |
| Multilingual Support | Currently optimized for Chinese scripts. Support for other languages is under active iteration. |
| Audio Synchronization | Audio prompts are provided. Lip-sync and environmental sound fusion require downstream tooling. |
| Error Handling | Auto-retry and fallback mechanisms are built-in. Extreme edge cases may require manual intervention. |
- Optimize long-shot segmentation logic for action continuity
- Implement consistency validators for character clothing, positioning, and props
- Specialized prompt format adaptation for Sora, Pika, and other models
- Hybrid architecture combining rule-based engines and LLMs
- Full English script support and intelligent node failure fallback
- Fragment confidence scoring and debug mode (intermediate result persistence)
- Advanced camera language support (pan, tilt, zoom, tracking, follow)
- Emotion-driven automatic visual style adjustment
- Ultra-long script chunking + vector DB context memory
- Multi-script batch queue processing & Web visualization interface
- Character/scene reference image integration & multi-format export (XML/EDL/JSON)
- Multimodal input (image + audio + text hybrid)
- Real-time low-resolution preview & automatic continuity repair
- Professional editing software plugins (Premiere/FCP/DaVinci)
- Multi-user collaboration, version control, & autonomous learning from feedback
- Bidirectional script-fragment traceability, semantic alignment detection, & multi-round correction mechanisms
Achieve zero-information-loss visualization for scripts of any length, language, or genre, delivering a standardized workflow that meets professional director-level storyboarding standards. The system will feature customizable styles, full traceability, automatic optimization loops, and cross-modal high consistency.
We welcome contributions via Issues or Pull Requests:
- Bug Reports: Please provide reproduction steps, environment details, and error logs.
- Feature Requests: Use the
enhancementlabel. - Code Optimization: Performance tuning, architectural refactoring, or adding test cases.
- Documentation: Translations, example additions, or technical corrections.
Quick dev environment setup:
git clone https://github.com/neopen/story-shot-agent.git
cd story-shot-agent
pip install -e ".[dev]"
pytest tests/This project is licensed under the MIT License. See the LICENSE file for details. Copyright (c) 2025 HiPeng
- Project Homepage: https://github.com/neopen/story-shot-agent
- Documentation: https://pengline.cn/2026/02/7e6cd67dd5ee45248f2276ac145555f5/
Special thanks to LangChain, LangGraph, Chroma, Ollama, and the open-source community for their technical support. If this project has been helpful to your work, please consider starring the repository and sharing your feedback.