AI Model Documentation

Learn about the AI models and their developers on our platform

Model Developers

🖼️

Image Generation Models

Generate high-quality images from text descriptions with various styles and resolutions.

ModelDeveloperDescriptionTier
Seedream 4.5ByteDanceLatest flagship · Native bilingual · 4K ultra-HDStandard
Seedream 4ByteDanceHigh-quality image generation · BilingualFast
Dreamina 3.1ByteDanceHigh-fidelity aesthetics · Artistic stylePremium
Qwen ImageAlibaba20B parameters · Excellent Chinese text renderingStandard
Wan 2.6 ImageAlibabaWan series image model · High resolutionFast
✏️

Image Editing Models

Upload existing images for editing, enhancement, or style transformation.

ModelDeveloperDescriptionTier
FLUX Kontext ProBlack Forest LabsContext-aware editing · Best for image & text editingPremium
FLUX Kontext Pro MultiBlack Forest LabsMulti-image context editing · Style consistencyPremium
UNOByteDanceUniversal image editing · Image + textStandard
Real-ESRGANXintao Wang et al.Image super-resolution · Quality enhancementFast
🎬

Video Generation Models (Text-to-Video)

Auto-generate short videos from text descriptions. Some models support synchronized audio generation.

ModelDeveloperDescriptionTier
Wan 2.2 — 480p Ultra FastAlibabaUltra-fast generation · ~5s per videoFast
Wan 2.2 — 720pAlibabaHigh-definition resolutionStandard
Wan 2.6AudioAlibabaLatest Wan series · Audio support · Best qualityStandard
Seedance 1.5 ProAudioByteDanceCinematic quality · Audio supportPremium
Kling Video O3KuaishouBest motion quality · Premium dynamicsPremium
🎞️

Video Generation Models (Image-to-Video)

Transform static images into dynamic videos, bringing images to life.

ModelDeveloperDescriptionTier
Wan 2.2 i2v — 480p FastAlibabaImage-to-video · Fast generationFast
Wan 2.2 i2v — 720pAlibabaImage-to-video · HD resolutionStandard
Seedance 1.5 Pro i2vAudioByteDanceImage-to-video · Cinematic · Audio supportPremium
📝

Text Generation Models

Multiple leading AI language models for social content creation, rewriting, and optimization.

ModelDeveloperDescriptionTier
GPT-4oOpenAIFlagship · Most capable overallPremium
GPT-4o MiniOpenAILightweight · Cost-effectiveFast
GPT-5OpenAILatest flagship modelPremium
Claude Sonnet 4AnthropicExcellent writing qualityPremium
Claude 3.5 HaikuAnthropicFast · Cost-efficientFast
Gemini 2.5 FlashGoogleUltra-fast · Low costFast
Gemini 2.5 ProGoogleHigh performance reasoningPremium
Grok 3xAIReal-time awarePremium
Grok 3 MinixAILightweight and fastFast
Mistral SmallMistralEfficient European modelFast
Mistral MediumMistralBalanced performanceStandard
🎙️

Voice Synthesis Models

Convert text to natural speech with multiple voice options and speed control.

ModelDeveloperDescriptionTier
TTS-1OpenAIHigh-quality text-to-speech · 6 voice optionsStandard
Available voices: Alloy · Echo · Fable · Onyx · Nova · Shimmer
🎵

Background Music Generation Models

Auto-generate synchronized background music from video content and text descriptions, no extra assets needed.

ModelDeveloperDescriptionTier
MMAudio V2Cheng et al.Video-to-audio · Multimodal sync · High-quality BGM generationStandard
🗣️

Video Narration Models

AI automatically analyzes video content and generates voiced narration. This feature uses two models in tandem: Gemini 2.5 Flash analyzes the video frames, then TTS-1 converts the generated script to speech.

ModelDeveloperDescriptionTier
Gemini 2.5 FlashAnalysisGoogleVideo content analysis · Auto-generate narration scriptsFast
TTS-1SynthesisOpenAINarration voice synthesis · 6 voice optionsStandard
Narration styles: Professional · Casual · Dramatic · Documentary · Enthusiastic

Model Tier Guide

Fast

Fastest generation, lowest cost. Ideal for quick iteration and daily use.

Standard

Best balance of speed and quality. Recommended for most use cases.

Premium

Highest quality output. Best for professional work and important content.

Try Media Studio →