35+ LLMs at Your Fingertips
From complex logic to creative writing, find the perfect model for every task. Powered by generous amounts of tokens.
Here's a demo video showcasing VividLLM's interface and features in action, and displaying the seamless experience of showing the reasoning logic of LLM models along with multimodal inputs.
VividLLM Features & Description
35+ Elite Models
Access everything from Gemini 3 Pro to GPT-5 Nano in one place. Every model is labeled with its Output Weight and Speed so you can optimize your token usage.
Weight: 0.5x
Speed: super fast
Advanced Multimodal AI Inputs
Upload images, audio, or documents directly into your chats. Our platform supports up to 4 files per prompt (4MB limit), allowing for deep analysis of your data across both Casual and Pro models.
Real-time AI Reasoning
Watch the AI think. While most platforms hide the Chain of Thought, VividLLM streams the internal logic of models, including models like Grok-4.1, Gemini 3, DeepSeek, GPT-5.2 and Claude Opus 4.6 in real-time. Perfect for complex debugging or deep research where the thought process matters.
Web Search
Perform Web Search with a button press regardless of model selected.
AI Context Window
Each Model has a context window, ranging from 16k till 128k depending on the model.
Token Pool Separation
Tokens are separated into Casual and Pro Token pools. Casual models use Casual tokens, Pro and Web Search models use Pro tokens. This allows you to optimize your token usage based on model type. The Tokens are further Divided into Input and Output for each pool.
Token Transfer System
You can transfer tokens between Input and Output within same pool after a conversion rate is applied, i.e., between Casual Input and Output, and between Pro Input and Output.
Supported AI LLM models
GEMINI-2.5-FLASH-LITE
CasualAn efficiency-focused model from the Gemini family, engineered for high-speed processing and low-latency responses.
GEMINI-3-FLASH-PREVIEW
CasualA versatile preview model designed for balanced performance across speed and complex reasoning tasks.
GEMINI-2.5-FLASH
CasualA high-performance workhorse model suitable for large-scale multimodal processing and agentic workflows.
GEMMA-3-27B-IT
CasualA high-performance open-weight model from Google, capable of efficient text and multimodal understanding.
GEMINI-2.5-PRO
ProA sophisticated reasoning model designed for complex analysis in coding, mathematics, and long-form document processing.
GEMINI-3-PRO-PREVIEW
ProGoogle's advanced preview model for deep multimodal understanding and complex instruction following.
GEMINI-3.1-PRO-PREVIEW
ProGoogle's model built to refine performance of Gemini 3 Pro, optimized for usability and better workflow
GPT-OSS-120B
CasualGPT-oss-120b is OpenAI's most powerful open-weight model. Fastest response speed for any model available on this website.
GPT-5-NANO
CasualGPT-5 Nano is the speed demon of the GPT-5 lineup. Best for quick summaries, labels, and low-cost automation.
GPT-5-MINI
CasualGPT-5 Mini is a lean GPT-5 model that thrives on clarity. Perfect for structured tasks and tight, intentional prompting.
GPT-5.3-CODEX
ProOpenAI's model optimized for coding tasks
GPT-5.1-CODEX
ProCodex version of GPT-5.1
GPT-4O-SEARCH-PREVIEW
Web SearchIs a specialized model for Web Search
GPT-5.1
ProGPT-5.1 is a flexible OpenAI model that lets you tune how much reasoning it uses. Great for balancing speed and depth depending on the task.
GPT-5.2
ProGPT-5.2 is OpenAI's top-tier model built for complex coding and agent workflows. Best suited for advanced automation, tools, and multi-step tasks.
CLAUDE-SONNET-4.6
ProClaude Sonnet 4.6 is Anthropic’s model, designed for daily use, complex coding tasks and professional workflow.
CLAUDE-SONNET-4.5
ProClaude Sonnet 4.5 is a high-intelligence model built for complex agents and serious coding work. Strong at multi-step reasoning, planning, and tool-driven tasks.
CLAUDE-SONNET-4
ProClaude Sonnet 4 offers improved reasoning and coding performance over earlier Sonnet models. Designed for precise, controllable outputs across structured tasks.
CLAUDE-HAIKU-4.5
CasualClaude Haiku 4.5 is Anthropic’s fastest model with surprisingly strong intelligence. Ideal for low-latency responses and high-throughput workloads.
CLAUDE-3.5-HAIKU
CasualClaude 3.5 Haiku focuses on speed with solid coding and instruction accuracy. A good fit for lightweight tasks that need quick, reliable output.
CLAUDE-OPUS-4.6
ProClaude Opus 4.6 is a top-tier model designed for improved intelligence, enhanced capabilities and coding performance. Designed for complex coding
DEEPSEEK-CHAT-V3.1
CasualDeepSeek Chat V3.1 uses hybrid inference for faster responses and smarter reasoning. Well-suited for agent-style workflows and interactive tasks.
DEEPSEEK-V3.1-TERMINUS
CasualDeepSeek V3.1 Terminus builds on V3.1 with improved consistency and refinements. Delivers more reliable language output and stronger code-focused agent behavior.
DEEPSEEK-V3.2
CasualModel with strong focus on reasoning and agentic capabilities, designed to be a reasoning-first model for agent-based tasks
MISTRAL-SMALL-3.2-24B-INSTRUCT
CasualMistral Small 3.2 (24B) is tuned for strong instruction following and cleaner responses. Reduces repetition while improving tool and function-calling reliability.
CODESTRAL-2508
CasualCodestral 2508 is Mistral’s high-performance model for code generation and completion. Optimized for low-latency workflows like fill-in-the-middle and rapid edits.
DEVSTRAL-2512
CasualDevstral 2512 is a powerful code agent model built by Mistral AI for complex software engineering tasks.
DEVSTRAL-SMALL
CasualDevstral Small is a 24B open-weight model built for software engineering agents. Designed for structured coding workflows and tool-driven development tasks.
MISTRAL-SMALL-CREATIVE
CasualMistral Small Creative is small model for creative writing, role play and much more.
MISTRAL-LARGE-2512
CasualMistral Large 3 is a high-end, open-weight multimodal model with a Mixture-of-Experts design. Built for advanced reasoning, generation, and general-purpose use.
MISTRAL-MEDIUM-3.1
CasualMistral Medium 3.1 is a frontier-class multimodal model with improved tone and performance. Balances capability and efficiency across a wide range of tasks.
GROK-4.1-FAST
CasualFast and capable version of Grok 4. Great balance of speed and intelligence for everyday tasks, coding help, and real-time reasoning. Streams thought process when reasoning is enabled.
GROK-4-FAST
CasualOptimized for speed while retaining strong reasoning and broad knowledge. Perfect when you want Grok-level performance without the wait. Supports real-time reasoning streaming.
GROK-CODE-FAST-1
CasualSpecialized fast variant tuned for coding and technical tasks. Excellent at writing, debugging, and explaining code across many languages, with quick response times.
GROK-4
ProThe full flagship Grok 4 model for maximum intelligence, deepest reasoning, and strongest performance across complex tasks, creative writing, math, and research. Highest quality output available.
NOVA-2-LITE-V1
CasualIt is a fast reasoning model by Amazon.
LLAMA-4-SCOUT
CasualLlama-4 is a multimodal model by Meta.
KIMI-K2.5
CasualKimi K2.5 is a versatile model by Moonshot AI, supporting multimodal input and reasoning.
TRINITY-LARGE-PREVIEW:FREE
CasualTrinity Large is an open-weight model by Arcee AI, with strong performance in coding and math. It is built on 400B parameters, with 13B active parameters per token.
TRINITY-MINI:FREE
CasualTrinity Mini is fast model by Arcee AI, suitable for regular tasks. It is built on 26B parameters, with 3B active parameters per token.
SONAR
Web SearchPerplexity's fast, lightweight search model delivers quick answers with built-in web citations for reliable, sourced results.
VividLLM Pricing, Plans & Access
Pro Access
8M tokens per month, split into :
Tokens for Casual Models
✅ 5M Input / 1.5M Output
Tokens for Pro Models
✅ 1M Input / 500k Output
✅ 100 Web Searches (tokens will be deducted from pro pool)
✅ Large Context Window, ranging from 16k till 128k depending on the model in use.

