Skip to content

khanhhuy1304/ai-media-studio-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Releases

AI Media Studio CLI: Multi-Modal AI Media Gen with Google

A professional, multi-modal AI media generation CLI. Generate videos, images, and music with Google AI models. It features an interactive UI, batch processing, and an extensible architecture for all AI media types. This tool is built for developers who want fast, reliable AI media creation from scripts, automation pipelines, or hands-on sessions.

Preview banner A showcase of the CLI’s capabilities using popular AI media tasks.

Table of contents

  • Overview
  • Why this project exists
  • Core concepts
  • Features
  • Getting started
  • Installation
  • Quick start guides
  • How to use the CLI
  • Interactive UI and batch processing
  • Model support and extensibility
  • Architecture and design
  • Configuration and workflows
  • Project structure
  • Development and testing
  • Accessibility and internationalization
  • Security and privacy
  • Performance and optimization
  • Deployment and distribution
  • API and plugin system
  • Diagnostics and troubleshooting
  • Community, contributions, and governance
  • Licensing

Overview The AI Media Studio CLI is designed to bring sophisticated AI media generation into your terminal and scripts. It leverages Google AI models to craft visuals, audio, and motion content with precise prompts, controlling quality, duration, style, and output format. The interactive UI makes it easy to prototype ideas, while batch processing powers large-scale production pipelines.

Why this project exists

  • Speed: Automate repetitive tasks with reliable, repeatable results.
  • Scale: Generate dozens or hundreds of assets in a single run.
  • Flexibility: Support for video, image, and audio within a single tool.
  • Extensibility: A pluggable architecture that invites new AI media types and providers.
  • Reproducibility: Configurable seeds, prompts, and model versions for consistent results.

Core concepts

  • Media types: video, image, audio
  • Models: Google Imagen, Lyra-inspired audio models, and other compatible providers
  • Execution units: prompts, parameters, and model adapters
  • Workflows: single-task generation, batch pipelines, and interactive prompts
  • Output formats: common multimedia formats suitable for web, media pipelines, and content delivery

Features

  • Multi-modal generation: Create video, image, and music from prompts
  • Google model integration: Imagen and related Google AI capabilities as primary models
  • Interactive user interface: A guided, hands-on UI for quick experiments
  • Batch processing: Process multiple prompts and assets in one run
  • Extensible architecture: Plug in new models, adapters, and media types
  • Cross-platform CLI: Works on Linux, macOS, and Windows
  • Workflow automation: Integrate into CI/CD, bots, and production pipelines
  • Output customization: Resolution, frame rate, duration, audio sampling, and more
  • Asset management: Organize, store, and reference generated media
  • Logging and diagnostics: Detailed run logs with severity levels
  • Versioning: Pin model versions for reproducibility
  • Open-source licensing: Accessible for modification and contribution

Getting started This project is built for developers who want a fast route from idea to media asset. The CLI focuses on clarity, reliability, and extensibility. It aims to reduce friction when performing complex multi-modal tasks and to provide a strong foundation for automation and experimentation.

To get the latest release and try the binary, visit the Releases page. The Releases page hosts binaries and installers for different platforms. For example, you can download the Linux, macOS, or Windows assets and run them directly. See the Releases page for details and assets. The URL is also available in the download section below.

Downloads and releases

Note: If you cannot access the link from your network or if the page is down, check the repository's Releases section for archived assets or alternative download links. The Releases page is the primary source of official builds and changelogs.

Prerequisites

  • Python 3.11 or newer (or a suitable runtime for your platform)
  • Pip or a Python package manager
  • FFmpeg installed and accessible from the system path
  • A network connection to reach Google AI model endpoints (for model prompts and fetches)
  • Optional: Docker or a container runtime for isolation and reproducible environments

Installation There are multiple ways to install and run the CLI, depending on your preferences and environment.

From PyPI (recommended for quick start)

  • Install with pip:
    • pip install ai-media-studio-cli
    • ai-media-studio-cli --help
  • Upgrade when needed:
    • pip install --upgrade ai-media-studio-cli

Using pipx (isolated environments)

  • Install pipx if you don’t have it:
    • python -m pip install --user pipx
    • python -m ensurepip
  • Install the CLI with pipx:
    • pipx install ai-media-studio-cli
    • pipx run ai-media-studio-cli --help

From source (for developers)

Docker (for containerized environments)

  • Build a local image:
    • docker build -t ai-media-studio-cli .
  • Run a container:
    • docker run --rm -it ai-media-studio-cli --help

Quick start guides

  • Generating a simple image
    • ai-media-studio-cli generate --image --prompt "A serene sunset over a quiet lake, stylized in watercolor" --width 1024 --height 768
    • The CLI will connect to Google Imagen or compatible models, render the image, and save it to your output directory.
  • Generating a short video
    • ai-media-studio-cli generate --video --prompt "A fast-paced sci-fi city at night" --duration 15 --fps 30 --resolution 1920x1080
    • You can adjust the duration, frame rate, and resolution to fit your project needs.
  • Creating a music track
    • ai-media-studio-cli generate --music --prompt "Ambient electronic pad with evolving textures" --duration 60 --tempo 120
    • The system will synthesize audio that follows the prompt and your tempo setting.
  • Batch rendering
  • Interactive UI usage
    • ai-media-studio-cli ui
    • The built-in UI helps you craft prompts, choose models, and monitor progress as assets are generated.

How to use the CLI

  • Basic structure
    • ai-media-studio-cli [options]
    • Commands include: generate, batch, ui, init, config, status, clean, export
  • Generation options
    • --type: image, video, or music
    • --prompt: textual prompt for the model
    • --models: a comma-separated list of models to use, e.g. imagen, vea3
    • --width, --height: image dimensions
    • --duration: media length for video or audio
    • --fps: frames per second for video
    • --output: output file or directory
    • --seed: seed for deterministic results
    • --style: style hint for image generation (e.g., photorealistic, painterly)
  • Batch mode
    • batch mode reads from a YAML or JSON manifest
    • You can define multiple tasks with their own prompts, types, and outputs
    • Example manifest structure: prompts:
      • type: image prompt: "A sunny meadow with wildflowers" width: 1024 height: 768 models: imagen
      • type: video prompt: "A bustling market scene" duration: 20 fps: 24 models: imagen, veo3
  • Interactive UI
    • ui command opens an interactive session
    • You can drag prompts, adjust gains, toggle models, and preview assets
  • Model selection and adapters
    • The CLI supports multiple adapters for different models
    • You can specify which adapter to use per task
    • Example: --models imagen,lyria
  • Output handling
    • The CLI saves assets to a structured output path
    • Each run includes a metadata file with prompts, model versions, seeds, and timing
    • You can export assets to various formats if supported by the model backend
  • Logging and debugging
    • Use --log-level or -l to set verbosity (debug, info, warning, error)
    • Logs include timestamps, task IDs, and model responses
    • Logs help diagnose failures and performance issues

Interactive UI and batch processing

  • The interactive UI provides a guided workflow
    • Step 1: choose media type
    • Step 2: enter prompts and constraints
    • Step 3: select models and adapters
    • Step 4: set output preferences
    • Step 5: run and monitor progress
  • Batch processing capabilities
    • Process multiple prompts in a single run
    • Support parallel execution with controlled concurrency
    • Respect rate limits and quotas for model providers
    • Produce a consolidated report with success/failure statuses
  • Best practices for batch runs
    • Group prompts by media type to optimize model usage
    • Use seeds for reproducibility when needed
    • Validate prompts with a dry run before committing to a full batch
    • Archive outputs with a consistent naming convention

Model support and extensibility

  • Primary models
    • Imagen family for image generation
    • Veo2 and Veo3 for video-related tasks
    • Lyra-inspired models for music and audio synthesis
  • Model adapters
    • Each model has a dedicated adapter that translates CLI prompts into model calls
    • Adapters handle authentication, rate limiting, and result normalization
  • Adding new models
    • Create a new adapter module within the adapters folder
    • Implement a common interface: generate_media(prompt, options) and parse_result(raw_output)
    • Add unit tests to verify prompt translation, output handling, and error cases
    • Update the configuration to expose the new model to users
  • Extending media types
    • The architecture is designed to accept new media types (e.g., 3D assets, animations)
    • Follow the existing media type contract to keep behavior consistent
    • Add UI support and batch definitions for the new type
  • Dependency management
    • The system uses a clean separation between the CLI core and model providers
    • External libraries are loaded lazily to speed up startup and reduce memory usage
    • Pin model versions to ensure deterministic results when needed

Architecture and design

  • Core components
    • CLI core: handles argument parsing, command routing, and execution flow
    • UI layer: provides an interactive, user-friendly entry point
    • Model adapters: separate modules for each provider or model family
    • Media pipeline: orchestrates prompts, rendering, and post-processing
    • Output manager: handles saving, metadata, and exports
  • Data flow
    • Prompt -> Model adapter -> Raw media -> Post-processing -> Output
    • Metadata tracks prompts, seeds, model versions, and timings
    • Logs capture events at each stage for traceability
  • Extensibility pattern
    • Plug-in friendly: new adapters can be added without changing the core
    • Config-driven: users can enable/disable adapters via configuration files
    • Layered: separation of concerns reduces risk when swapping models
  • Performance considerations
    • Parallel task execution with safe concurrency limits
    • Caching of repeated prompts to speed up identical runs
    • Efficient streaming for long videos or large audio assets
    • Resource-aware scheduling to avoid overloading local machines

Configuration and workflows

  • Global configuration
    • Stores defaults for output directories, model preferences, and seed control
    • Supports per-project overrides to keep workflows portable
  • Per-task configuration
    • Define media type, prompt, models, and output format
    • Include constraints such as max duration or max file size
  • YAML/JSON workflow definitions
    • Workflows describe a sequence of tasks with dependencies
    • You can specify pre- and post-processing steps
  • Example configuration snippet
  • Environment management
    • Use virtual environments to isolate dependencies
    • Separate environments for AI model dependencies and general Python tools
  • Reproducibility
    • Pin model versions
    • Use seeds where possible
    • Store prompt inputs and outputs with timestamps
  • Security and privacy
    • Manage credentials securely
    • Do not embed private keys in prompts
    • Respect rate limits and terms of service for model providers

Project structure

  • cli/ Core CLI code and entry points
  • adapters/ Model adapters for each provider
  • models/ Abstract representations of AI media models
  • pipelines/ Media generation and post-processing pipelines
  • ui/ Interactive user interface modules
  • assets/ Sample prompts, example assets, and metadata
  • tests/ Unit and integration tests
  • docs/ Additional documentation and reference guides
  • config/ Default configuration files and templates
  • examples/ End-to-end example scripts and notebooks

Development and testing

  • Local development
    • Use virtual environments
    • Run unit tests with pytest
    • Linting with flake8 or black for code quality
  • Testing strategy
    • Unit tests cover adapters and core logic
    • Integration tests verify end-to-end generation for each media type
    • Mocked model calls to keep tests fast and deterministic
  • CI/CD
    • GitHub Actions run on push and pull requests
    • Lint, test, and build steps ensure code quality
    • Release automation updates the version and publishes builds
  • Debugging tips
    • Increase log verbosity for failures
    • Use dry-run options to validate prompts without producing media
    • Inspect metadata to trace prompt-to-output mappings

Accessibility and internationalization

  • Keyboard navigable UI
  • Screen reader compatible narratives for prompts and statuses
  • Localization support for prompts, messages, and UI labels
  • Documentation translated or scaffolded to support multiple languages

Security and privacy

  • Avoid leaking credentials in logs or output
  • Secrets are loaded from secured environments (not embedded in source)
  • Validate and sanitize user inputs to prevent code injection in prompts
  • Use secure storage for tokens and API keys

Performance and optimization

  • Parallelization strategies balanced to avoid resource contention
  • Caching of model responses for identical prompts
  • Streaming post-processing to handle large media efficiently
  • Configurable timeouts to recover gracefully from slow model prompts

Deployment and distribution

  • Ready-to-run binaries on the Releases page
  • Docker images for containerized deployments
  • Lightweight builds for edge devices via minimal Python environments
  • Documentation for deployment in cloud pipelines and on-prem environments

API and plugin system

  • Clear public API for adapters and media pipelines
  • Plugin hooks for new models and media types
  • Versioned interfaces to maintain compatibility across releases
  • Example plugins and adapters included in the repository for guidance

Diagnostics and troubleshooting

  • Quick checks for common issues
    • Ensure FFmpeg is installed and in PATH
    • Verify Python version compatibility
    • Confirm that model adapters are reachable and authenticated
  • Common error patterns
    • Network timeouts when contacting model providers
    • Output directory permission errors
    • Model incompatibilities or deprecated endpoints
  • Logging and telemetry
    • Structured logs to help diagnose failures
    • Optional telemetry data for usage insights (configurable)

Community, contributions, and governance

  • Contributor guidelines
    • Start with issues labeled “good first issue”
    • Submit pull requests with clear summaries and tests
    • Maintain coding standards and tests
  • Community standards
    • Be respectful and constructive
    • Follow the project’s code of conduct
  • How to propose changes
    • Open an issue to discuss the feature
    • Prepare a well-scoped PR with tests and docs
  • Documentation collaboration
    • Help improve tutorials, examples, and API docs
    • Add new usage patterns and edge-case guides

Licensing

  • The project is released under an open-source license. See LICENSE for details.
  • Contributions retain license compatibility with project terms.

Footnotes and references

  • Google Imagen integration notes
  • Lyra-inspired audio synthesis notes
  • Veo2 and Veo3 references for video generation
  • FFmpeg usage and best practices
  • Model provider terms of service and usage limits

Get involved

  • If you want to contribute to the core, adapters, or UI, start by forking the repository and exploring the adapter interfaces.
  • Share your use cases and feedback in issues to help shape future features.
  • Create demonstrations and sample prompts to help users understand how to craft prompts for video, image, and music generation.

Screenshots and visuals (illustrative)

  • Banner and section previews help readers grasp the workflow at a glance.
  • Simple diagrams to illustrate the data flow from prompts to outputs.
  • Inline visuals for UI components and batch editor layout.

Illustrations and visuals

Documentation references

  • Model adapters and their interfaces
  • Command reference and options
  • Configuration file format and examples
  • Tutorial notebooks showing end-to-end workflows

Appendix: Practical tips

  • Planning prompts for images
    • Define scene, mood, lighting, and composition
    • Use style hints to guide the rendering process
    • Keep prompts concise but expressive to reduce ambiguity
  • Crafting prompts for videos
    • Outline the scene progression, camera movements, and pacing
    • Determine the length and frame rate beforehand
    • Include transitions or audio cues to synchronize visuals
  • Composing music with prompts
    • Specify tempo, mood, and texture
    • Consider loops and variations for longer tracks
    • Use seeds to reproduce identical sections when needed

End-user guidance

  • For beginners, start with simple prompts and small outputs
  • For advanced users, combine prompts with batch definitions to explore variations
  • Use the UI to understand model responses before automating with scripts
  • Save configurations to maintain a stable baseline across runs

Releases and versioning

  • The Releases page hosts binaries, installers, and changelogs
  • Each release contains a version tag, a summary of changes, and assets for download
  • Use the binary corresponding to your operating system and architecture
  • After downloading, grant execute permissions if required and run the binary or installer

Final note

  • The Releases page contains the official builds and should be your first stop for distribution-ready assets. For quick access, you can revisit the link provided above.

Remember

  • The link above is the releases page and is essential for obtaining the official builds. The same link is referenced again in the download and distribution sections to ensure you can locate the latest stable assets quickly and reliably. For the latest updates, check the Releases section periodically and follow the project for announcements. The process and assets described here align with the repository’s intended workflow and usage patterns, enabling efficient creation of AI-generated media across formats.

About

🐙 AI Media Studio CLI enables multi-modal AI media generation: create videos, images, and music from prompts using Google's AI models. Open-source CLI for creators.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages