Major AI models. One subscription. One Platform.

1. The Power User Control

Switch between the best available models, from Claude Opus 4.7 to GPT-5.5 in one place instantly.

See exactly how the AI thinks, in real-time. The Chain of Thought is no longer hidden, giving you unprecedented insight into the model's decision-making process.

Explore different ideas simultaneously without losing your place. You can branch a chat at any AI response to explore a new direction.

2. The Wallet Friendly Edge

Get 8M tokens for just $15/month. No hidden scaling fees. Tokens split into 6.5 Million Casual and 1.5 Million Pro Token pools.

Your unused credits don't just vanish, they roll over. Carry forward 20% of your unused tokens to the next billing cycle, up to maximum of 20% of allowed base plan tokens per billing cycle.

One Subscription, Maximum Value

Stop paying for different AI subscriptions. VividLLM gives you access to the best models and features in one place, so you can optimize your token usage and get the most out of your AI experience without juggling multiple accounts or surprise costs.

3. The Simple and Secure Foundation

Privacy First

We use AES 256 Encryption for all your prompts, AI responses and AI reasoning text.

Hard Delete Policy

We have a strict hard-delete policy. Once you click on delete chat, all the prompts, responses and related chat files are permanently deleted.

Multimodal Ready

Upload images, audio, or documents directly into your chats. Our platform supports up to 4 files per prompt (4MB limit), allowing for deep analysis of your data across both Casual and Pro models.

AI Context Window

Each Model has a context window, ranging from 16k till 128k depending on the model.

Web Search

Perform Web Search with a button press regardless of model selected.

Token Pool Separation

8 Million monthly Tokens are separated into 6.5 Million Casual and 1.5 Million Pro Token pools. Casual models use Casual tokens, Pro and Web Search models use Pro tokens.

Token Transfer System

Dynamically rebalance your tokens between Input and Output of same pool to perfectly match your specific workflow.

Get Started for Free Today!

Here's a demo video showcasing VividLLM's interface and features in action, and displaying the seamless experience of showing the reasoning logic of LLM models along with multimodal inputs.

Model Comparison

Model NameKnown ForSpeedInput / Output WeightModel class
Claude Opus 4.7Advanced CodingSlow4x / 4xPro
Gemini 3.1 Flash LiteMultimodal capabilities, Cost efficiencySuper Fast1x / 1xCasual
Gemini 3.5 FlashSpeed and IntelligenceSuper Fast1x / 1xPro
Grok Build 0.1CodingSuper Fast3x / 1xCasual
CodestralCode CorrectionSuper Fast0.8x / 0.67xCasual
GPT-oss-120bFast and Detailed responseHyper Fast0.5x / 0.57xCasual
Deepseek V4 FlashStrong ReasoningMedium0.5x / 0.5xCasual
    Model Speed is calculated based on the following criteria:
  • Dead Slow -> 0 to 25 tokens per second
  • Slow -> 26 to 50 tokens per second
  • Medium -> 51 to 100 tokens per second
  • Fast -> 101 to 200 tokens per second
  • Super Fast -> 201 to 1000 tokens per second
  • Hyper Fast -> 1001 and above, tokens per second
    Model weights are calculated based on the following criteria:
  • Based on the actual cost per 1M tokens
  • The context window we provide for each model
  • The throughput and average latency of response
  • If model weight is 0.5x, it means a token consumed by AI only costs half the amount of tokens from our token pool
  • If model weight is 2x, it means a token consumed by AI costs twice the amount of tokens from our token pool
Explore Model Specs

Supported AI LLM models

Weight(I/O):0.5x / 0.5x
Speed:super fast
Weight(I/O):1x / 1x
Speed:super fast
Weight(I/O):1x (Pro) / 1x (Pro)
Speed:super fast
Weight(I/O):1.25x / 1.25x
Speed:fast
Weight(I/O):1x / 1x
Speed:medium
Weight(I/O):0.5x / 0.5x
Speed:dead slow
Weight(I/O):0.4x / 0.4x
Speed:fast
Weight(I/O):1x (Pro) / 1x (Pro)
Speed:slow
Weight(I/O):1.5x (Pro) / 1x (Pro)
Speed:medium
Weight(I/O):0.5x / 0.57x
Speed:hyper fast
Weight(I/O):0.5x / 0.5x
Speed:fast
Weight(I/O):1x / 1x
Speed:medium
Weight(I/O):0.67x / 1x
Speed:super fast
Weight(I/O):2x / 2x
Speed:fast
Weight(I/O):1.33x (Pro) / 1.5x (Pro)
Speed:medium
Weight(I/O):4x (Pro) / 4x (Pro)
Speed:slow
Weight(I/O):1.5x (Pro) / 1.5x (Pro)
Speed:slow
Weight(I/O):4x (Pro) / 4x (Pro)
Speed:slow
Weight(I/O):4x (Pro) / 4x (Pro)
Speed:medium
Weight(I/O):2x (Pro) / 1.5x (Pro)
Speed:slow
Weight(I/O):2x (Pro) / 1.5x (Pro)
Speed:medium
Weight(I/O):2x (Pro) / 1.5x (Pro)
Speed:medium
Weight(I/O):3x / 2x
Speed:super fast
Weight(I/O):3x / 2x
Speed:medium
Weight(I/O):4x (Pro) / 4x (Pro)
Speed:slow
Weight(I/O):4x (Pro) / 4x (Pro)
Speed:slow
Weight(I/O):0.5x / 0.5x
Speed:medium
Weight(I/O):0.67x / 0.67x
Speed:fast
Weight(I/O):0.67x / 0.67x
Speed:fast
Weight(I/O):1x / 0.5x
Speed:slow
Weight(I/O):0.5x / 0.67x
Speed:fast
Weight(I/O):0.5x / 0.5x
Speed:fast
Weight(I/O):0.8x / 0.67x
Speed:super fast
Weight(I/O):0.5x / 0.5x
Speed:slow
Weight(I/O):0.5x / 0.5x
Speed:super fast
Weight(I/O):1x / 1x
Speed:medium
Weight(I/O):1x / 1x
Speed:fast

GROK-4.3

Casual
Weight(I/O):2x / 1.5x
Speed:medium
Weight(I/O):3x / 1x
Speed:super fast
Weight(I/O):0.5x / 0.57x
Speed:super fast
Weight(I/O):1x / 1x
Speed:super fast
Weight(I/O):0.5x / 0.5x
Speed:super fast

KIMI-K2.5

Casual
Weight(I/O):1.33x / 1.25x
Speed:slow

SONAR

Web Search
Weight(I/O):2x (Pro) / 2x (Pro)
Speed:medium

VividLLM Pricing, Plans & Access

Pro Access

$15/mo

8M tokens per month, split into :

Tokens for Casual Models

✅ 5M Input / 1.5M Output

Tokens for Pro Models

✅ 1M Input / 500k Output

✅ 100 Web Searches (tokens will be deducted from pro pool)

✅ Large Context Window, ranging from 16k till 128k depending on the model in use.

FAQ

Frequently Asked Questions

You get 15,000 Casual tokens to explore our standard models. Features like Web Search, Reasoning Capabilities, and file uploads (Images/Audio/Docs) are reserved for our Pro subscribers.

Each month, Pro users receive: • 5M Casual Input Tokens • 1.5M Casual Output Tokens • 1M Pro Input Tokens • 500k Pro Output Tokens • 100 Web Searches

Yes! You can branch a chat at any AI response to explore a new direction. Each branch is independent, meaning you can even switch to a different AI model without affecting your original conversation.

No. We have a strict hard-delete policy. Once you click on delete chat, all the prompts, responses and files related to the chat are permanently deleted.

Yes. We prioritize your privacy. All the text (User prompt, AI response and AI reasoning) is encrypted using AES-256, before it is stored in the database. This means that your actual conversations are unreadable in case of a data breach. We are currently working to implement encryption to files as well, but cannot guarantee that just yet.

Yes. But we have disabled streaming for first response of new chat for better overall experience. So first request always takes time to respond, so please keep first requests short or use an existing chat to see streaming immediately.

Yes, we will be adding new models in the future.

Weights decide how many tokens are consumed per usage. A 0.5x weight means 1 token used only costs you 0.5 from your balance, effectively doubling your usage on lighter models. Lower weight = more efficiency. 0.5x weight = you get 2x token.

Web Searches are more expensive than regular prompts. Whether the Web Search takes place on Casual models or Pro models or Dedicated Web Search models, the tokens will be deducted from pro token pool. It takes lots of tokens for each web search, and the models are expensive, so please utilize web search carefully.

Reasoning allows you to see the AI's thought process in real-time before it provides a final answer. This is perfect for coding, math, or research tasks

Yes, Reasoning costs will be included in the output tokens, and will be deducted from same pool as the selected model.

No, not all models have reasoning capabilities. And some models, do have reasoning capabilities but they do not return the reasoning process, and thus hidden from output. While some models return reasoning process by default even when reasoning option is not selected. When reasoning is turned off, some models still think, but the process will not be shown on screen.

We use openrouter paid credits to provide access to multiple models. We are an independent platform and are not directly affiliated with the parent companies of the models. All product names and brands (such as Gemini, GPT, DeepSeek, Grok, Mistral, Perplexity and Claude) are property of their respective owners.

You can upload images, audio files and documents, but as of now, the maximum file size is 4MB and maximum number of files is 4 per prompt.

Documents files by default will be processed by Gemini-3.1-flash-lite regardless of model selected, and the file processing tokens will be deducted from Casual pool based on weight of Gemini-3.1-flash-lite regardless of model which generates response. Audio files will be processed by Voxtral Mini Transcribe and tokens will be deducted from casual pool by adjusting the usage cost (Voxtral costs 0.003$ per minute of audio). Images are only allowed for specific models as of now and are handled by those models themself.

Unlike other platforms where your unused messages expire at the end of the month, VividLLM allows you to bank your efficiency. If you don't use your full allotment, 20% of your remaining tokens are added as a bonus to your next billing cycle’s limits.

It’s a simple Base + Bonus math. We take your remaining balance at the end of the billing cycle. We calculate 20% of that leftover amount. We add that to your Standard Base Plan (e.g., 5M Casual Input) for the new month.

Yes. To ensure platform stability, your total token limit can grow up to 120% of your base plan. Example: If your base Casual Input is 5M, your maximum limit with bonuses can reach 6M tokens. Once you hit this full tank, the bonus stops accumulating until you use some of that reserve.

If you have a massive coding sprint and use 100% of your tokens, you simply start the next month with your Standard Base Plan (100% allotment). There is never a negative penalty; you just reset to your guaranteed foundation.

Yes! Carry-forward applies independently to all five of your Silos: Casual Input & Output Pro Input & Output Web Searches (Up to a max of 120 searches/mo)

Carry-forward is a benefit for active subscribers. If a subscription is cancelled or expires, the banked bonuses are reset to zero. When you resubscribe, you start fresh with the standard base plan.

As a developer-led platform, we want to be the most generous aggregator on the market while remaining sustainable. The 20% rollover ensures that power users get rewarded for their efficiency without creating token inflation that would force us to raise our $15/mo price.

You can send your mails to contact@vividllm.chat. Our developer would personally respond to all your mails, you can send suggestions or requests regarding any features you would like to see in future and we will consider it if we feel it reasonable

No. But we send a basic system prompt with each prompt, which consumes around 100 tokens.

Yes, you can cancel anytime from your profile. You keep your unused tokens at the point of cancellation until the end of billing cycle and can restart subscription whenever you want to in future.

Yes, paid tokens reset each month, and 20% of unused tokens are carried forward to next month if you have an active billing cycle. But if you are not satisfied and want to cancel your subscription, you can retain the tokens you already paid for till the end of billing cycle.

You can delete your account at any time in your Profile Settings. To ensure a clean wipe of your data and prevent accidental future charges, our system requires you to: • Clear your Chat History: Manually delete your chats to confirm you no longer need the data. • Cancel Active Subscription: Ensure your Pro plan is cancelled and the billing cycle has ended. This protects your paid access and ensures our system can safely offboard you without leaving active billing records in our payment gateway.

Last Updated: June 2026 - Added Claude Opus 4.8

© 2026 VividLLM. All rights reserved.

VividLLM | Chat with GPT-5.5, Claude Opus 4.8, Grok-4.3, Gemini 3.5 Flash and much more