VividLLM

35+ LLMs at Your Fingertips

From high-speed logic to creative writing, find the perfect model for every task. Powered by generous amounts of tokens.

Features

35+ Elite Models

Access everything from Gemini 3 Pro to GPT-5 Nano in one place. Every model is labeled with its Output Weight and Speed so you can optimize your token usage.

GEMINI-2.5-FLASH-LITECASUAL

Weight: 0.5x

Speed: super fast

Real-time Reasoning

Watch the AI think. Our platform streams the internal logic of models like Grok or DeepSeek in real-time. Perfect for complex debugging or deep research where the thought process matters.

▼ Reasoning

Analyzing logic...

Searching for optimal solution...

Multimodal Magic

Upload images, audio, or documents directly into your chats. Our platform supports up to 4 files per prompt (4MB limit), allowing for deep analysis of your data across both Casual and Pro models.

▼ File Upload
🖼️
🎵
📄
✅ Files ready!

Web Search

Perform Web Search with a button press regardless of model selected.

Context Window

Each Model has a context window, ranging from 32k till 128k depending on the model.

Token Pool Separation

Tokens are separated into Casual and Pro Token pools. Casual models use Casual tokens, Pro and Web Search models use Pro tokens. This allows you to optimize your token usage based on model type. The Tokens are further Divided into Input and Output for each pool.

Token Transfer System

You can transfer tokens between Input and Output within same pool after a conversion rate is applied, i.e., between Casual Input and Output, and between Pro Input and Output.

Models

GEMINI-2.5-FLASH-LITE

Casual
Weight:0.5x
Speed:super fast

An efficiency-focused model from the Gemini family, engineered for high-speed processing and low-latency responses.

GEMINI-3-FLASH-PREVIEW

Casual
Weight:1.25x
Speed:fast

A versatile preview model designed for balanced performance across speed and complex reasoning tasks.

GEMINI-2.5-FLASH

Casual
Weight:1x
Speed:medium

A high-performance workhorse model suitable for large-scale multimodal processing and agentic workflows.

GEMMA-3-27B-IT

Casual
Weight:0.4x
Speed:fast

A high-performance open-weight model from Google, capable of efficient text and multimodal understanding.

GEMINI-2.0-FLASH-001

Casual
Weight:0.5x
Speed:medium

A second-generation Gemini model optimized for consistent multimodal performance and reliability.

GEMINI-2.5-PRO

Pro
Weight:1x (Pro)
Speed:slow

A sophisticated reasoning model designed for complex analysis in coding, mathematics, and long-form document processing.

GEMINI-3-PRO-PREVIEW

Pro
Weight:1x (Pro)
Speed:medium

Google's advanced preview model for deep multimodal understanding and complex instruction following.

GPT-OSS-120B

Casual
Weight:0.57x
Speed:hyper fast

GPT-oss-120b is OpenAI's most powerful open-weight model. Fastest response speed for any model available on this website.

GPT-5-NANO

Casual
Weight:0.5x
Speed:fast

GPT-5 Nano is the speed demon of the GPT-5 lineup. Best for quick summaries, labels, and low-cost automation.

GPT-5-MINI

Casual
Weight:1x
Speed:medium

GPT-5 Mini is a lean GPT-5 model that thrives on clarity. Perfect for structured tasks and tight, intentional prompting.

GPT-5.1-CODEX

Pro
Weight:1x (Pro)
Speed:medium

Codex version of GPT-5.1

GPT-4O-SEARCH-PREVIEW

Web Search
Weight: 4x (Pro)
Speed:slow

Is a specialized model for Web Search

GPT-5.1

Pro
Weight:1x (Pro)
Speed:slow

GPT-5.1 is a flexible OpenAI model that lets you tune how much reasoning it uses. Great for balancing speed and depth depending on the task.

GPT-5.2

Pro
Weight:1.5x (Pro)
Speed:slow

GPT-5.2 is OpenAI's top-tier model built for complex coding and agent workflows. Best suited for advanced automation, tools, and multi-step tasks.

CLAUDE-SONNET-4.5

Pro
Weight:1.5x (Pro)
Speed:medium

Claude Sonnet 4.5 is a high-intelligence model built for complex agents and serious coding work. Strong at multi-step reasoning, planning, and tool-driven tasks.

CLAUDE-SONNET-4

Pro
Weight:1.5x (Pro)
Speed:medium

Claude Sonnet 4 offers improved reasoning and coding performance over earlier Sonnet models. Designed for precise, controllable outputs across structured tasks.

CLAUDE-HAIKU-4.5

Casual
Weight:2x
Speed:super fast

Claude Haiku 4.5 is Anthropic’s fastest model with surprisingly strong intelligence. Ideal for low-latency responses and high-throughput workloads.

CLAUDE-3.5-HAIKU

Casual
Weight:2x
Speed:medium

Claude 3.5 Haiku focuses on speed with solid coding and instruction accuracy. A good fit for lightweight tasks that need quick, reliable output.

DEEPSEEK-CHAT-V3.1

Casual
Weight:0.67x
Speed:fast

DeepSeek Chat V3.1 uses hybrid inference for faster responses and smarter reasoning. Well-suited for agent-style workflows and interactive tasks.

DEEPSEEK-V3.1-TERMINUS

Casual
Weight:0.67x
Speed:fast

DeepSeek V3.1 Terminus builds on V3.1 with improved consistency and refinements. Delivers more reliable language output and stronger code-focused agent behavior.

DEEPSEEK-PROVER-V2

Casual
Weight:1x
Speed:slow

Not much description available

MISTRAL-SMALL-3.2-24B-INSTRUCT

Casual
Weight:0.5x
Speed:fast

Mistral Small 3.2 (24B) is tuned for strong instruction following and cleaner responses. Reduces repetition while improving tool and function-calling reliability.

CODESTRAL-2508

Casual
Weight:0.67x
Speed:slow

Codestral 2508 is Mistral’s high-performance model for code generation and completion. Optimized for low-latency workflows like fill-in-the-middle and rapid edits.

DEVSTRAL-SMALL

Casual
Weight:0.5x
Speed:super fast

Devstral Small is a 24B open-weight model built for software engineering agents. Designed for structured coding workflows and tool-driven development tasks.

DEVSTRAL-2512:FREE

Casual
Weight:0.25x
Speed:fast

Devstral 2512 (free) is an open-source model focused on agentic coding behavior. A solid choice for automation, code planning, and multi-step dev tasks.

MISTRAL-LARGE-2512

Casual
Weight:1x
Speed:medium

Mistral Large 3 is a high-end, open-weight multimodal model with a Mixture-of-Experts design. Built for advanced reasoning, generation, and general-purpose use.

MISTRAL-MEDIUM-3.1

Casual
Weight:1x
Speed:fast

Mistral Medium 3.1 is a frontier-class multimodal model with improved tone and performance. Balances capability and efficiency across a wide range of tasks.

GROK-4.1-FAST

Casual
Weight:0.57x
Speed:medium

Fast and capable version of Grok 4. Great balance of speed and intelligence for everyday tasks, coding help, and real-time reasoning. Streams thought process when reasoning is enabled.

GROK-4-FAST

Casual
Weight:0.57x
Speed:fast

Optimized for speed while retaining strong reasoning and broad knowledge. Perfect when you want Grok-level performance without the wait. Supports real-time reasoning streaming.

GROK-CODE-FAST-1

Casual
Weight:1x
Speed:medium

Specialized fast variant tuned for coding and technical tasks. Excellent at writing, debugging, and explaining code across many languages, with quick response times.

GROK-4

Pro
Weight:1.5x (Pro)
Speed:slow

The full flagship Grok 4 model for maximum intelligence, deepest reasoning, and strongest performance across complex tasks, creative writing, math, and research. Highest quality output available.

NOVA-2-LITE-V1

Casual
Weight:1x
Speed:super fast

It is a fast reasoning model by Amazon.

LLAMA-4-SCOUT

Casual
Weight:0.5x
Speed:super fast

Llama-4 is a multimodal model by Meta.

SONAR

Web Search
Weight: 2x (Pro)
Speed:medium

Perplexity's fast, lightweight search model delivers quick answers with built-in web citations for reliable, sourced results.

SONAR-REASONING

Web Search
Weight: 2x (Pro)
Speed:medium

Perplexity's advanced reasoning model excels at complex problem-solving, step-by-step analysis, and deeper insights.

Pricing

Pro Access

$15/mo

8M tokens per month, split into :

Tokens for Casual Models

✅ 5M Input / 1.5M Output

Tokens for Pro Models

✅ 1M Input / 500k Output

✅ 100 Web Searches (tokens will be deducted from pro pool)

✅ Large Context Window, ranging from 16k till 128k depending on the model in use.

FAQ

Frequently Asked Questions

You get 15,000 Casual tokens to explore our standard models. Features like Web Search, Reasoning Capabilities, and file uploads (Images/Audio/Docs) are reserved for our Pro subscribers.
Each month, Pro users receive: • 5M Casual Input Tokens • 1.5M Casual Output Tokens • 1M Pro Input Tokens • 500k Pro Output Tokens • 100 Web Searches
No. We have a strict hard-delete policy. Once you click on delete chat, all the prompts, responses and files related to the chat are permanently deleted.
Yes. But we have disabled streaming for first response of new chat for better overall experience. So first request always takes time to respond, so please keep first requests short or use an existing chat to see streaming immediately.
Yes, we will be adding new models in the future.
Weights decide how many tokens are consumed per usage. A 0.5x weight means 1 token used only costs you 0.5 from your balance, effectively doubling your usage on lighter models. Lower weight = more efficiency. 0.5x weight = you get 2x token.
Web Searches are more expensive than regular prompts. Whether the Web Search takes place on Casual models or Pro models or Dedicated Web Search models, the tokens will be deducted from pro token pool. It takes lots of tokens for each web search, and the models are expensive, so please utilize web search carefully.
Reasoning allows you to see the AI's thought process in real-time before it provides a final answer. This is perfect for coding, math, or research tasks
Yes, Reasoning costs will be included in the output tokens, and will be deducted from same pool as the selected model.
No, not all models have reasoning capabilities. And some models, do have reasoning capabilities but they do not return the reasoning process, and thus hidden from output. While some models return reasoning process by default even when reasoning option is not selected.
We use openrouter paid credits to provide access to multiple models. We are an independent platform and are not directly affiliated with the parent companies of the models. All product names and brands (such as Gemini, GPT, DeepSeek, Grok, Mistral, Perplexity and Claude) are property of their respective owners.
You can upload images, audio files and documents, but as of now, the maximum file size is 4MB and maximum number of files is 4 per prompt.
Documents and Audio files by default will be processed by Gemini-2.5-flash-lite regardless of model selected, and the file processing tokens will be deducted from Casual pool based on weight of Gemini-2.5-flash-lite regardless of model which generates response. Images are only allowed for specific models as of now and are handled by those models themself.
You can send your mails to contact@vividllm.chat. Our developer would personally respond to all your mails, you can send suggestions or requests regarding any features you would like to see in future and we will consider it if we feel it reasonable
No. But we send a basic system prompt with each prompt, which consumes around 100 tokens.
Yes, you can cancel anytime from your profile. You keep your unused tokens at the point of cancellation until the end of billing cycle and can restart subscription whenever you want to in future.
Yes, paid tokens reset each month. But if you are not satisfied and want to cancel your subscription, you can retain the tokens you already paid for till the end of billing cycle.
You can delete your account at any time in your Profile Settings. To ensure a clean wipe of your data and prevent accidental future charges, our system requires you to: • Clear your Chat History: Manually delete your chats to confirm you no longer need the data. • Cancel Active Subscription: Ensure your Pro plan is cancelled and the billing cycle has ended. This protects your paid access and ensures our system can safely offboard you without leaving active billing records in our payment gateway.
VividLLM | Powerful AI Chat Platform for an affordable price