Model Developers
OpenAI
Creator of the GPT language model series and TTS voice synthesis
Anthropic
Creator of Claude language models, focused on safety & alignment
ByteDance
Creator of Seedream, Seedance & Dreamina visual AI models
Alibaba
Creator of Wan series and Qwen series AI models
Creator of the Gemini language model series
xAI
Creator of the Grok language model series
Black Forest Labs
Creator of the FLUX image editing model series
Kuaishou
Creator of the Kling video generation model
Mistral
Europe's leading open-source language model developer
Cheng et al.
Research team behind the MMAudio video-to-audio model (UIUC / Sony Research)
Image Generation Models
Generate high-quality images from text descriptions with various styles and resolutions.
| Model | Developer | Description | Tier |
|---|---|---|---|
| Seedream 4.5 | ByteDance | Latest flagship · Native bilingual · 4K ultra-HD | Standard |
| Seedream 4 | ByteDance | High-quality image generation · Bilingual | Fast |
| Dreamina 3.1 | ByteDance | High-fidelity aesthetics · Artistic style | Premium |
| Qwen Image | Alibaba | 20B parameters · Excellent Chinese text rendering | Standard |
| Wan 2.6 Image | Alibaba | Wan series image model · High resolution | Fast |
Image Editing Models
Upload existing images for editing, enhancement, or style transformation.
| Model | Developer | Description | Tier |
|---|---|---|---|
| FLUX Kontext Pro | Black Forest Labs | Context-aware editing · Best for image & text editing | Premium |
| FLUX Kontext Pro Multi | Black Forest Labs | Multi-image context editing · Style consistency | Premium |
| UNO | ByteDance | Universal image editing · Image + text | Standard |
| Real-ESRGAN | Xintao Wang et al. | Image super-resolution · Quality enhancement | Fast |
Video Generation Models (Text-to-Video)
Auto-generate short videos from text descriptions. Some models support synchronized audio generation.
| Model | Developer | Description | Tier |
|---|---|---|---|
| Wan 2.2 — 480p Ultra Fast | Alibaba | Ultra-fast generation · ~5s per video | Fast |
| Wan 2.2 — 720p | Alibaba | High-definition resolution | Standard |
| Wan 2.6Audio | Alibaba | Latest Wan series · Audio support · Best quality | Standard |
| Seedance 1.5 ProAudio | ByteDance | Cinematic quality · Audio support | Premium |
| Kling Video O3 | Kuaishou | Best motion quality · Premium dynamics | Premium |
Video Generation Models (Image-to-Video)
Transform static images into dynamic videos, bringing images to life.
| Model | Developer | Description | Tier |
|---|---|---|---|
| Wan 2.2 i2v — 480p Fast | Alibaba | Image-to-video · Fast generation | Fast |
| Wan 2.2 i2v — 720p | Alibaba | Image-to-video · HD resolution | Standard |
| Seedance 1.5 Pro i2vAudio | ByteDance | Image-to-video · Cinematic · Audio support | Premium |
Text Generation Models
Multiple leading AI language models for social content creation, rewriting, and optimization.
| Model | Developer | Description | Tier |
|---|---|---|---|
| GPT-4o | OpenAI | Flagship · Most capable overall | Premium |
| GPT-4o Mini | OpenAI | Lightweight · Cost-effective | Fast |
| GPT-5 | OpenAI | Latest flagship model | Premium |
| Claude Sonnet 4 | Anthropic | Excellent writing quality | Premium |
| Claude 3.5 Haiku | Anthropic | Fast · Cost-efficient | Fast |
| Gemini 2.5 Flash | Ultra-fast · Low cost | Fast | |
| Gemini 2.5 Pro | High performance reasoning | Premium | |
| Grok 3 | xAI | Real-time aware | Premium |
| Grok 3 Mini | xAI | Lightweight and fast | Fast |
| Mistral Small | Mistral | Efficient European model | Fast |
| Mistral Medium | Mistral | Balanced performance | Standard |
Voice Synthesis Models
Convert text to natural speech with multiple voice options and speed control.
| Model | Developer | Description | Tier |
|---|---|---|---|
| TTS-1 | OpenAI | High-quality text-to-speech · 6 voice options | Standard |
Background Music Generation Models
Auto-generate synchronized background music from video content and text descriptions, no extra assets needed.
| Model | Developer | Description | Tier |
|---|---|---|---|
| MMAudio V2 | Cheng et al. | Video-to-audio · Multimodal sync · High-quality BGM generation | Standard |
Video Narration Models
AI automatically analyzes video content and generates voiced narration. This feature uses two models in tandem: Gemini 2.5 Flash analyzes the video frames, then TTS-1 converts the generated script to speech.
| Model | Developer | Description | Tier |
|---|---|---|---|
| Gemini 2.5 FlashAnalysis | Video content analysis · Auto-generate narration scripts | Fast | |
| TTS-1Synthesis | OpenAI | Narration voice synthesis · 6 voice options | Standard |
Model Tier Guide
Fastest generation, lowest cost. Ideal for quick iteration and daily use.
Best balance of speed and quality. Recommended for most use cases.
Highest quality output. Best for professional work and important content.