Why Whizard, The Story Behind the Models

█ WHIZARD ANALYTICS · THE STORY

BUILT FROM
CURIOSITY

A YouTube video on Excel modeling during COVID. That's where this started. Five years and a full lineup of sports later, the results are all on the record. Here's the full story.

01 · The Origin 02 · The Models 03 · Track Record 04 · Why Access

Chapter 01

The Origin

"A YouTube video. Free time. A rabbit hole with no bottom."

Spring 2020. Like a lot of people, I had more time than I knew what to do with. A YouTube video on building a sports betting model in Excel came across my feed. I had no formal background in data science or statistics. Just curiosity and free time. I watched it, built the model, got it wrong, and built it again.

That was it. I was hooked. What started as an afternoon project turned into months of reading about probability theory, historical data, and predictive modeling. I learned what standard error actually means. I found out how to convert odds to implied probabilities and what an overround is. I discovered Closing Line Value and figured out why win rate alone tells you almost nothing about whether a bettor actually has edge.

The more I dug in, the more I realized there is a significant gap between how professional bettors think about this problem and how most people approach it. The frameworks that govern long-run profitability are not complicated once you understand them. They are just not widely known. That asymmetry was exactly where the opportunity lived.

From Excel to R to Python

The Excel model was the starting point. R was the first real coding language, which forced me to think more rigorously about data structures and statistical computation. Then Python, which became the backbone of everything that followed. Picked up enough C++ and SQL along the way to understand the tooling. Now it is mainly Python and R, depending on what I am building. The language matters less than the thinking behind it.

I spent months on Bayesian inference. Understanding how to update a probability estimate as new information arrives changes how you think about every game, every line, every injury report. It is not about having a hot take. It is about having a prior belief grounded in data and updating it correctly when new evidence comes in.

Shrinkage estimators, Monte Carlo simulation, variance decomposition. Each framework I picked up changed how I thought about the problem. The question stopped being who wins and became something more useful: what is the true probability distribution of outcomes, and where does the market price disagree with that distribution? That is origination. I build the number from the data up, with no market inputs, and bet only where the book has it wrong.

Learning what breaks before learning what works

Early on, I built a backtest that showed a win rate too good to be real. Spent days trying to understand it before finding the problem: I had accidentally used closing line data to generate features that were supposed to represent pre-game information. The model had essentially been trained on the answer. In backtesting it looked like genius. In live use it would have been worthless. That is data leakage, one of the most common and most destructive mistakes in predictive modeling, and it is completely invisible unless you understand where every piece of data comes from and when it actually becomes available.

There is no shortcut to learning that. You learn it by building something that breaks and having to figure out why. That kind of failure teaches you in a way that reading about it never does, and it is the foundation everything else was built on.

"I wasn't trying to predict outcomes. I was trying to find markets where the price was wrong. Those are different problems, and only one of them is solvable with a model."

Chapter 02

The Models

"A dedicated system for every sport. Rebuilt every season."

The Excel model became a Python model. After that, one sport at a time, each needing its own architecture built for that market's specific dynamics. There is no single model running across everything. Every sport means a separate system, each with different inputs, different data sources, and different market structures.

Every sport required building from scratch, and every season means evaluating what worked and rebuilding what didn't. The NCAAB totals model was rebuilt when tempo-based features stopped producing CLV. The NHL model was a 2-way moneyline system for four seasons before switching to 3-way regulation pricing when the draw kept eating into returns. The golf model went through a complete architectural overhaul when the original approach was found to have calibration bias. Building a model once and leaving it static is not how this works.

Statistical frameworks across the stack

Each model uses exponential decay weighting so recent performance matters more than data from two years ago, without pretending a single hot week tells you anything meaningful. Bayesian shrinkage keeps sample-size uncertainty from inflating predictions on teams or players with thin historical records.

Monte Carlo simulation handles the variance side. For any given game, I'm not producing a point estimate. I'm producing a probability distribution across 100,000 simulated outcomes, then comparing that distribution against the market price to find where the implied probability is wrong.

Fully traceable

Every prediction has a traceable path back to its inputs. I can explain any output: here is what drove the number, here is how each component was weighted, here is why this model produced this result and not another. That transparency is not a marketing feature. It is a requirement for knowing when a model is wrong and why.

Built before the shortcuts existed

All of this was built before AI made it trivially easy to generate code. When I started, if I didn't understand a concept well enough to implement it myself, the model couldn't use it. There was no tool you could describe a framework to and get a working pipeline back in seconds. Every piece had to reflect a real understanding of what it was doing and why, because there was no other way to get it running. That constraint turned out to be the whole point.

That era is over. Today anyone can describe a model to an AI and have functional-looking code in minutes. That sounds like progress until you understand what "functional-looking" actually means. A model that runs is not a model that works. The two have nothing to do with each other.

What AI cannot protect you from

No AI tool will flag data leakage. It will write clean, structured code using whatever features you describe, produce a beautiful backtest, and show you performance numbers that look legitimate. What it cannot do is know whether the features you fed it contain information that would not actually be available at the time of the bet. That one mistake, data that bleeds future information into the training set, makes a model look like a money printer in backtesting and a guaranteed loss in live deployment. The backtest is not lying to you. The model is lying to the backtest, and it does not know any better because nobody told it what the mistake looks like from the inside.

There are more failure modes layered on top of that. Overfitting: a model that memorized historical patterns rather than learning predictive ones, which looks excellent on in-sample data and falls apart immediately on anything new. Walk-forward test contamination: a validation set that saw information from the training set, so the out-of-sample performance is not actually out-of-sample. Feature selection that introduces look-ahead bias. Calibration that made the backtest numbers neat but degraded live probability estimates. None of these show up as errors. They show up as live results that do not match what the model promised, after the money is already in the market.

Understanding these failure modes is not a bonus feature. It is the entire job. I know them because I built models that had them, found them, and rebuilt. That process is what protects the subscriber, not the code. Not the framework. Not the tool that generated it. The years of understanding what can go wrong and making sure it doesn't.

"Anyone can generate a model now. Very few can tell you whether the backtest was honest. That difference is where your money lives."

The Model Stack

NFL Spread Model

Spreads + Totals

Built around one idea: opening lines are the softest price of the week. Sharp syndicate money hits a board within hours of posting. Picks release Monday to capture that window before correction.

Opening-line timing vs sharp volume divergence
Line movement directionality modeling
Spread and total both tracked separately

430-360 · +6.3% ROI · 58% CLV

NCAAF Efficiency Model

Spreads + Totals

College football pricing lags sharp attention. Pace-adjusted efficiency ratings updated weekly across a 130+ team slate where books can't give every game equal analytical depth.

Pace-adjusted offensive and defensive efficiency
Schedule fatigue and travel factors
Home field and neutral site adjustments

1278-1028 · +10.0% ROI · 61% CLV

NCAAB Spread Model

ATS

Volume is the edge. Books can't watch every Tuesday night mid-major game closely. Efficiency modeling with back-to-back fatigue tracking across a schedule where teams can play 3 games in 4 days.

Adjusted efficiency with pace normalization
Back-to-back and multi-game fatigue tracking
Conference scheduling and travel penalties

2312-1877 · +5.0% ROI · 56% CLV

NCAAB Totals Model

O/U Totals

The highest CLV model in the stack. Tempo mismatch pricing is where books consistently err. Projecting game pace correctly before it happens produces edge that holds across a large sample.

Defensive pace projection and tempo mismatch
Team scoring pace and floor spacing factors
67% CLV rate across 806-unit sample

467-339 · +11.7% ROI · 67% CLV

NHL 3-Way Draw Model

Regulation Draw

Originally a 2-way moneyline model for four seasons. The draw was consistently eating into returns. Rebuilt as 3-way regulation pricing in 2025-26. The format change locked in the edge that was being given back.

3-way regulation-only pricing (Win / Draw / Loss)
Goaltender save% and special teams modeling
Travel schedule and back-to-back fatigue

580-431 · +13.3% ROI · 54% CLV

MLB Moneyline Model

Moneyline

Baseball is a volume game. 162 games per team, daily betting, and a lineup that changes every day. The model tracks pitcher workload against lineup depth and park factors across the full season.

Pitcher fatigue tracking and bullpen depth
Park factor and weather adjustments
Daily lineup construction modeling

1630-1358 · +9.1% ROI · 54% CLV

PGA Tour Outright Model

Tournament Outrights

Golf has a structural edge: SG-based modeling is relatively rare in the market. Each player's performance is decomposed into four Strokes Gained components, then weighted by specific course fit.

SG decomposition across OTT, App, ARG, Putting
Course fit weighting across 368 PGA Tour events
144,000 rounds of historical data with exponential decay

Active since 2024 · +19% ROI this season

Chapter 03

The Track Record

"Five years. Every season on the board. Good ones and bad ones."

The models have been tracked since 2021. Every pick logged with the line I received, every closing line recorded for CLV measurement. The results are documented year by year across every sport, and that includes the losing seasons.

2022-23 NCAAB spreads was down 27.6 units. Those numbers are on the results page the same way the good ones are. That is the only way to evaluate a betting process honestly. Cherry-picked results are not results. They are a narrative.

CLV is the number that matters most

If lines consistently move in your direction after a pick is released, you are finding value before the market corrects to it. That is the mathematical definition of edge. A 58% CLV rate on NFL means that on 58 out of 100 NFL picks, the line moved toward my side after the pick went public. Not won. Moved. The result of any individual bet is noise. The direction lines move after you bet is signal.

Closing Line Value is the most honest metric available for evaluating a betting process. Win rate fluctuates with variance. CLV, tracked over hundreds of bets, converges toward your actual edge. Our CBB Totals model has the strongest CLV we track: 67% of lines have moved in our direction after release across an 806-unit sample.

Live results matching the backtest is not an accident

A model with data leakage in its backtest produces spectacular historical numbers. It also converges toward random in live deployment, usually within a few hundred bets, because the edge was never real. The live record being in the same range as what walk-forward testing predicted across five years is the most meaningful validation this process has. It means the backtest was built honestly. No leaked features, no contaminated test sets, no optimistic calibration. What the model said it could do in testing is roughly what it has done live, including the losing seasons. That match does not happen with a model nobody stress-tested.

"Anyone can post a winning month. The question is whether the closing lines moved your way. That's the only measure of process that survives a large sample."

All-Time Summary

Years Tracked

Sports

+1,034u

Total Units Won

Results by Sport (All-Time)

NCAAB Totals 467-339 · 806u

+11.7%67% CLV

NCAAF 1278-1028 · 2507u

+10.0%61% CLV

NHL 580-431 · 1011u

+13.3%54% CLV

MLB 1630-1358 · 2988u

+9.1%54% CLV

NFL 430-360 · 790u

+6.3%58% CLV

NCAAB Spreads 2312-1877 · 4189u

+5.0%56% CLV

MLB K Props 271-252 · 523u · 2026: 198-188 (+2.0%)

+9.2%tracked

Chapter 04

Why I Sell Access

"The honest version. No dancing around it."

Most services give a vague answer to this question. Here is mine.

Accountability. When subscribers are tracking every pick in real time, I track every line I received, every closing line, every result. That discipline is not optional. It forces rigor that running models privately never would. The public record has made the models better.

Sustainability. The data subscriptions, the compute, the time required to run every sport at a high level every week add up. Selling access at a fair price makes it possible to keep improving the systems rather than treating this as a side project that gets dropped when life gets busy.

The edge is real. Five years of documented results across every sport, with CLV data to back it up, point to a process that finds genuine pricing inefficiencies in markets soft enough to exploit. I built something I believe in, and selling access is how I share it.

What this is not

A guarantee. Sports betting is probabilistic by design. You will have losing weeks. You will have losing months. Anyone who tells you otherwise is either lying or doesn't understand variance. The edge accumulates in the long run, not over the next weekend.

If you are looking for a lock-of-the-week service that promises guaranteed profit, this is the wrong place. If you understand probability, track your bets, and want picks backed by five years of documented statistical work, you are in exactly the right place.

What the market looks like right now

The barrier to generating a model that looks credible has never been lower. Anyone can prompt their way to a backtest, post the results, and start selling picks. What you cannot prompt your way to is understanding whether that backtest is valid, whether the features are clean, whether the live performance will match what the historical numbers showed, or what to actually fix when it doesn't.

Most of what is entering the market right now is built by people who have never watched a model fail in live deployment and had to diagnose why. They haven't seen data leakage destroy a backtest that looked real. They haven't rebuilt a calibration from scratch after realizing their variance estimates were wrong. They haven't watched three months of live results confirm or contradict five years of historical testing. The model runs. That is all they know.

The protection is not the technology. It is the understanding behind it, built the hard way, before the shortcuts existed, by someone who has already made the expensive mistakes in their own time so you don't make them with your money.

"This started as curiosity in 2020 and became a five-year process. The math behind what we do is real. The results, good years and bad, are all on the record. That's what trust is built on."

What You Get

Picks with the line I got

Every release includes the odds I received. No retroactive adjustments if the line moves before you see it.

Full transparency on results

Year-by-year records posted publicly. Every season, including the ones that lost money.

CLV tracking on every model

Closing line data recorded for every sport. The most honest measure of whether the process works.

Models built on real statistical depth

Not gut feelings or trend systems. Every pick comes from a process grounded in five years of framework development.

Early line access

Picks released early in the week to capture opening lines before sharp money corrects them. NFL on Monday. NCAAF on Monday for Saturday games.

            Past performance does not guarantee future results.
Bet responsibly and within your bankroll.
          

Five Years. Every Sport. All on the Record.

The Process Is There. The Results Are There.

This started with a YouTube video and free time during COVID. It became something built on real statistical depth, documented results, and a process that holds up to scrutiny. If that is what you are looking for, the door is open.

Get Access View All Models

Why Whizard, The Story Behind the Models

█ WHIZARD ANALYTICS · THE STORY

BUILT FROM
CURIOSITY

A YouTube video on Excel modeling during COVID. That's where this started. Five years and a full lineup of sports later, the results are all on the record. Here's the full story.

01 · The Origin 02 · The Models 03 · Track Record 04 · Why Access

Chapter 01

The Origin

"A YouTube video. Free time. A rabbit hole with no bottom."

From Excel to R to Python

Learning what breaks before learning what works

"I wasn't trying to predict outcomes. I was trying to find markets where the price was wrong. Those are different problems, and only one of them is solvable with a model."

Chapter 02

The Models

"A dedicated system for every sport. Rebuilt every season."

Statistical frameworks across the stack

Fully traceable

Built before the shortcuts existed

What AI cannot protect you from

"Anyone can generate a model now. Very few can tell you whether the backtest was honest. That difference is where your money lives."

The Model Stack

NFL Spread Model

Spreads + Totals

Built around one idea: opening lines are the softest price of the week. Sharp syndicate money hits a board within hours of posting. Picks release Monday to capture that window before correction.

Opening-line timing vs sharp volume divergence
Line movement directionality modeling
Spread and total both tracked separately

430-360 · +6.3% ROI · 58% CLV

NCAAF Efficiency Model

Spreads + Totals

College football pricing lags sharp attention. Pace-adjusted efficiency ratings updated weekly across a 130+ team slate where books can't give every game equal analytical depth.

Pace-adjusted offensive and defensive efficiency
Schedule fatigue and travel factors
Home field and neutral site adjustments

1278-1028 · +10.0% ROI · 61% CLV

NCAAB Spread Model

ATS

Volume is the edge. Books can't watch every Tuesday night mid-major game closely. Efficiency modeling with back-to-back fatigue tracking across a schedule where teams can play 3 games in 4 days.

Adjusted efficiency with pace normalization
Back-to-back and multi-game fatigue tracking
Conference scheduling and travel penalties

2312-1877 · +5.0% ROI · 56% CLV

NCAAB Totals Model

O/U Totals

The highest CLV model in the stack. Tempo mismatch pricing is where books consistently err. Projecting game pace correctly before it happens produces edge that holds across a large sample.

Defensive pace projection and tempo mismatch
Team scoring pace and floor spacing factors
67% CLV rate across 806-unit sample

467-339 · +11.7% ROI · 67% CLV

NHL 3-Way Draw Model

Regulation Draw

3-way regulation-only pricing (Win / Draw / Loss)
Goaltender save% and special teams modeling
Travel schedule and back-to-back fatigue

580-431 · +13.3% ROI · 54% CLV

MLB Moneyline Model

Moneyline

Baseball is a volume game. 162 games per team, daily betting, and a lineup that changes every day. The model tracks pitcher workload against lineup depth and park factors across the full season.

Pitcher fatigue tracking and bullpen depth
Park factor and weather adjustments
Daily lineup construction modeling

1630-1358 · +9.1% ROI · 54% CLV

PGA Tour Outright Model

Tournament Outrights

Golf has a structural edge: SG-based modeling is relatively rare in the market. Each player's performance is decomposed into four Strokes Gained components, then weighted by specific course fit.

SG decomposition across OTT, App, ARG, Putting
Course fit weighting across 368 PGA Tour events
144,000 rounds of historical data with exponential decay

Active since 2024 · +19% ROI this season

Chapter 03

The Track Record

"Five years. Every season on the board. Good ones and bad ones."

CLV is the number that matters most

Live results matching the backtest is not an accident

"Anyone can post a winning month. The question is whether the closing lines moved your way. That's the only measure of process that survives a large sample."

All-Time Summary

Years Tracked

Sports

+1,034u

Total Units Won

Results by Sport (All-Time)

NCAAB Totals 467-339 · 806u

+11.7%67% CLV

NCAAF 1278-1028 · 2507u

+10.0%61% CLV

NHL 580-431 · 1011u

+13.3%54% CLV

MLB 1630-1358 · 2988u

+9.1%54% CLV

NFL 430-360 · 790u

+6.3%58% CLV

NCAAB Spreads 2312-1877 · 4189u

+5.0%56% CLV

MLB K Props 271-252 · 523u · 2026: 198-188 (+2.0%)

+9.2%tracked

Chapter 04

Why I Sell Access

"The honest version. No dancing around it."

Most services give a vague answer to this question. Here is mine.

What this is not

What the market looks like right now

"This started as curiosity in 2020 and became a five-year process. The math behind what we do is real. The results, good years and bad, are all on the record. That's what trust is built on."

What You Get

Picks with the line I got

Every release includes the odds I received. No retroactive adjustments if the line moves before you see it.

Full transparency on results

Year-by-year records posted publicly. Every season, including the ones that lost money.

CLV tracking on every model

Closing line data recorded for every sport. The most honest measure of whether the process works.

Models built on real statistical depth

Not gut feelings or trend systems. Every pick comes from a process grounded in five years of framework development.

Early line access

Picks released early in the week to capture opening lines before sharp money corrects them. NFL on Monday. NCAAF on Monday for Saturday games.

            Past performance does not guarantee future results.
Bet responsibly and within your bankroll.
          

Five Years. Every Sport. All on the Record.

The Process Is There. The Results Are There.

Get Access View All Models

BUILT FROMCURIOSITY

The Origin

From Excel to R to Python

Learning what breaks before learning what works

The Models

Statistical frameworks across the stack

Fully traceable

Built before the shortcuts existed

What AI cannot protect you from

The Track Record

CLV is the number that matters most

Live results matching the backtest is not an accident

Why I Sell Access

What this is not

What the market looks like right now

The Process Is There. The Results Are There.

BUILT FROMCURIOSITY

The Origin

From Excel to R to Python

Learning what breaks before learning what works

The Models

Statistical frameworks across the stack

Fully traceable

Built before the shortcuts existed

What AI cannot protect you from

The Track Record

CLV is the number that matters most

Live results matching the backtest is not an accident

Why I Sell Access

What this is not

What the market looks like right now

The Process Is There. The Results Are There.

BUILT FROM
CURIOSITY

BUILT FROM
CURIOSITY