The Base + Bonus Model: Introducing Token Carry Forward.

2026-04-17

Token Carry Forward allows you to carry forward 20% of your unused tokens into the next billing cycle, adding up to the base tokens of monthly plan, thus giving you a little more freedom. It is a small feature, yet an honest one.

Table of Contents

  1. What is Token Carry Forward?
  2. How I implemented Token Carry Forward
  3. The Strategy: Gaming the Model Weights
  4. The Comparison: Transparency vs The Black Box

What is Token Carry Forward?

Most AI subscriptions work on a use-it or lose-it model. You pay for the month, and at midnight on your billing date, any unused capacity simply vanishes. If you had a busy month of planning but didn't get around to the heavy-duty execution, your remaining tokens are gone. I’ve always felt that this creates Usage Anxiety. the feeling that you're being punished for having a light week.

At VividLLM, I wanted to treat your subscription like an asset, not a perishable good.

Token Carry Forward is my solution for the nonlinear nature of professional work. For all paid users, when your next billing cycle begins, I don't just reset your account to zero. Instead, I perform an audit of your previous month’s efficiency:

The 20% Rollover:

  • I take 20% of your remaining unused tokens from every pool; Casual Input, Casual Output, Pro Input, Pro Output, and even Web Searches.

The Base + Bonus Model:

  • This rollover is added as a bonus on top of your standard monthly base plan.

The 120% Reservoir:

  • To keep the platform sustainable, your total tokens (Base + Carry Forward) can scale up to a maximum of 120% of your plan's total value. Example: If your base for casual input is 5M tokens, the maximum casual input tokens you can have at the start of a new billing cycle will be 6M (5M base + 1M max carry forward).

The Metaphor:

  • Think of it like a water reservoir. If you don't need the full supply this month, I keep some behind the dam for your next month's usage. You aren't just paying for access, you're building a reserve.

By capping the growth at 120%, I can ensure that VividLLM remains the most generous aggregator on the market without the "token inflation" that would force price hikes. It’s a win for your budget and a win for the platform’s longevity.

How I implemented Token Carry Forward

To a user, a billing reset feels like a single moment in time. Behind the scenes, it’s a surgical operation on your account’s Token Bank.

When I architected this, I had to ensure two things: Mathematical Accuracy and Platform Stability. I didn't want a Token Inflation bug, where limits spiraled out of control, nor did I want a Hard Reset that felt unfair.

The Snap-Back Logic:

  • I implemented a Dynamic Elastic Limit using a Base + Bonus architecture. Instead of just adding tokens to your current limit, the system performs a fresh calculation every time a subscription renews via Razorpay.

  • The logic follows a strict three-step process:

    • The Audit: I calculate your exact leftover (e.g., Max Allowed - Used).
    • The Floor Check: If your leftover is zero (you had a massive month), the system snaps back to the standard 100% base plan.
    • The Ceiling Check: If you have a surplus, I calculate the 20% bonus and apply a Math.min cap. This ensures your total reservoir never exceeds 120% of your plan.

Decoupling Plan from Bonus:

  • A major technical choice I made was to keep the Base Allotment as a hard constant in the database.

  • By calculating the new limit as Base + Math.min(Bonus, Cap), I ensured that your bonus is always a Reward on Top rather than a shifting goalpost. This means that even if I change the models in the Pro or Casual pools, your carry-forward logic remains stable and predictable.

High-Concurrency Safety:

  • Since VividLLM handles 35+ models and real-time streaming, the billing reset has to be atomic. I wrapped the entire rollover logic in a withRetries Prisma transaction. Whether you are in the middle of a high-speed chat with Gemma 4B or a deep-reasoning session with Claude 4.6, the transition between billing cycles is seamless, ensuring not a single token of your earned bonus is lost in the shuffle.

The Strategy: Gaming the Model Weights

VividLLM is built on a Weighted System rather than a flat message count. This means that not all intelligence costs the same. If you understand how to navigate these weights, you can essentially manufacture your own Token Carry-Forward bonus every single month.

The Lite Leverage:

  • The secret to a massive token reserve is knowing when not to use a Frontier model. For daily drafting, brainstorming, or smaller tasks, using a Pro model is like driving a semi-truck to the grocery store, it's overkill.

  • Using Lite weight models Gemini-2.5-Flash-Lite or the Gemma 4B (with their 0.5x or lower weights) allows to accomplish many of your daily tasks while barely consuming your Base Allotment. This means for every 1,000 tokens the AI processes, only 500 are deducted from your balance. By using these high-speed, high-efficiency models for your busy work, you keep your token pools nearly untouched, setting the stage for a maximum rollover.

The Math of Savings:

  • Let’s look at the numbers. If you have a standard month where you mainly do research and light drafting:

    • Base Plan: 5,000,000 Casual Input tokens.
    • Usage: You use 2,000,000 native tokens on Lite models (0.5x weight).
    • Deduction: Only 1,000,000 tokens are removed from your balance.
    • Leftover: 4,000,000 tokens.
  • When the new billing cycle hits, our 20% Carry-Forward logic kicks in: 4,000,000 * 0.20 = 800,000 bonus tokens.

  • You start the next month with 5,800,000 casual input tokens. By being efficient, you’ve effectively earned an extra 800k of capacity for free.

Strategic Sprints:

  • Professional workflows are rarely constant, they consist of long research phases followed by intense execution sprints.

  • When it’s finally time to ship your project, you’ll have a Super-Powered account with 1.2M Pro Input tokens ready to handle the most demanding models like GPT-5.4 or Claude 4.6 Opus. Your Carry-Forward acts as a buffer, ensuring you never run out of Pro Power when the deadline is looming.

The Comparison: Transparency vs The Black Box

In the rapidly evolving AI landscape of 2026, the way platforms handle your usage has become a defining factor in user trust.

The Industry Standard: Use it or Lose it

  • The current industry standard for AI subscriptions is a strict expiration model. Whether you have Compute Points or Message Limits, those assets have a shelf life.

  • Most platforms treat tokens like perishable goods. If you don't eat them today, they are thrown away at midnight.

The Accountability Factor:

  • At VividLLM, I believe in Usage Accountability. If you’ve paid for a subscription, you’ve essentially leased a portion of our compute power. If you don't use all of it, it shouldn't just vanish into the void.

  • By implementing Token Carry Forward, I am introducing a level of transparency that is rare in this space.

  • No Hidden Resets: You can see exactly how much much was banked for the new month.

  • Fair Scaling: I don’t promise Unlimited tokens, which is often a marketing lie used to hide aggressive throttling. Instead, I give you a clear 120% Reservoir that grows based on your actual efficiency.

  • VividLLM isn't just a place to access 35+ elite models, it's a workspace that treats you like a partner. When you save tokens, I reward you. When you have a massive month, I support you.


I’m excited to see how you use your new "120% Reservoir" to tackle your biggest projects yet.

Launch VividLLM Now => :
VividLLM | Access 35+ LLM Models