A two-layer XGBoost machine learning system that decomposes college basketball games into pace and efficiency, predicts scoring totals, and identifies soft opening lines before the market corrects. Filtered to only the strongest 8+ point edges.
Unlike models that predict a single "total points" number, Whizard decomposes the game into its two fundamental components: how many possessions and how efficiently each team scores. This decomposition captures game dynamics that monolithic models miss.
Play-by-play data, team box scores, player statistics, and full schedules are aggregated from proprietary sources. Opening lines tracked across every major US sportsbook.
113+ features built: rolling windows (L5/L10), tempo control, shot selection profiles, rim attack rates, exploitation matchups, pace clashes, and conference effects.
Three gradient-boosted tree models trained independently: Pace (possessions), Home PPP (home points per possession), Away PPP.
A calibration layer corrects for systematic bias in the Pace x PPP product using a non-linear fit. Outputs final predicted total.
Predicted total vs. opening line = edge. Games with 8+ point edge fire automated alerts to Discord with the line, side, and CLV tracking.
poss = predict(xgb_pace, features)
// possessions = FGA + 0.44xFTA - OREB + TOV
home_ppp = predict(xgb_home, features)
away_ppp = predict(xgb_away, features)
raw = poss x (h_ppp + a_ppp)
total = calibrate(raw)
The pace model understands how fast teams play. The PPP models understand how well they score. Together, they reconstruct the entire game.
Predicts possessions per team, the tempo of the game. Dozens of features capture pace identity, rest, fatigue, tempo control, and style clashes.
Predicts how efficiently the home team converts possessions into points. Features capture shooting, shot selection, rim attack, and matchup exploitation.
Mirror of the home model, tuned for the away team's offensive efficiency against the home team's defense. Same feature architecture, separate trained weights.
CLV measures a simple thing: did you get a better number than where the line closed? Across every sport and every market, it's the single strongest predictor of whether a bettor makes money long-term. Here's exactly how it works.
CLV is straightforward: it's the difference between the price you got and the closing line. If you bet Over 150.0 and the game closes at Over 154.0, you captured 4 points of closing line value. The market moved toward your position, confirming you had a better number.
A sportsbook's closing line isn't random. It's the product of millions of dollars in market activity. Between when a line opens and when it closes, sharp bettors, professional syndicates, and competing algorithms all push the line toward its most accurate value. Think of it like a stock price: the more traders participating, the more efficient the price becomes.
The closing line is the market's final consensus on a game's true total, incorporating injury news, weather, lineup changes, and every available data point. Academic research on millions of bets at sharp books like Pinnacle confirms that closing lines are unbiased predictors of actual outcomes.
The logic is simple: if the closing line is the most accurate price, and you consistently get a better number, then you are consistently betting at prices that are too generous. Over enough bets, too-generous prices translate directly into a higher win rate, and profit.
This isn't a theory you have to take on faith. We've measured it directly from our data:
1 point of CLV ≈ 2% win rate increase
At our avg CLV of +1.35 pts:
Baseline 50% + (1.35 × 2%) = 52.7% implied
Actual measured: 56.5% → model adds value beyond CLV alone
CLV predicting profit isn't magic. It follows directly from market efficiency, probability theory, and sample size. Here's the logical chain.
A sportsbook total isn't one person's guess. It's a price shaped by millions of dollars. Between open and close, sharp bettors (who have their own models) hammer the line in the direction they believe is correct. By tip-off, the closing line reflects the aggregate wisdom of every informed participant in the market. Academic studies on Pinnacle's closing lines confirm they are statistically unbiased, meaning on average, the closing line equals the true outcome probability.
This is the part people miss: CLV isn't just a "nice to have"; it directly translates to winning more bets. If the true Over/Under is 154, and you bet Over 150 instead of Over 154, you win every game that lands between 150 and 154 that you would have lost at the closing number. From our data across 2,500+ CBB totals bets, each point of CLV adds roughly 2% to your win rate. This is a measured, empirical relationship, not a theoretical estimate.
Any bettor can go 60% over 50 bets on pure variance. But at 2,500+ bets, the math doesn't allow luck to explain the results. The standard error shrinks to under 1%, meaning our 56.5% win rate is 6+ standard deviations above a coin flip. The probability of this being random noise is less than 1 in a billion. CLV is the mechanism, large sample is the proof.
This isn't a theoretical number; we measured it directly. By bucketing our 2,500+ historical bets by their CLV and plotting the actual win rate of each bucket, a clear linear relationship emerges:
Bets with 0 CLV win at ~50% (coin flip, as expected). Bets with +2 CLV win at ~54%. Bets with +4 CLV win at ~58%. The relationship is remarkably stable.
This is the core insight: CLV is not a vanity metric. It is a direct, measurable predictor of how often your bets win. Get better numbers, win more bets. It's that simple.
The model generates hundreds of projections per day. But we only alert bets where the model's predicted total diverges from the opening line by 8 or more points. This is the signal-to-noise threshold that separates real edges from market noise.
For every game on the slate, the model produces a predicted total. The edge is simply the distance between our prediction and the sportsbook's opening line:
// If betting OVER:
edge = pred_total − opening_line
// If betting UNDER:
edge = opening_line − pred_total
// Filter:
alert only if edge ≥ 8.0 points
At 8+ points, the model has strong conviction. These aren't coin-flip calls; they're games where the opening line is dramatically mispriced relative to the model's decomposed Pace × Efficiency projection. The market almost always corrects toward our number, generating positive CLV.
Try the slider: drag it to see how different thresholds affect win rate, CLV, volume, and ROI.
Walk-forward validated. The model never sees future data. Every backtest metric reflects real, out-of-sample predictive power.
These are the features the XGBoost models rely on most heavily: the signals that actually predict game totals.
Every bet timestamped, tracked, and verified. Wins and losses published transparently in the Discord.
The system polls live odds feeds every 2 minutes. Opening lines tracked, current lines updated, new games detected automatically.
Three XGBoost models predict possessions, home PPP, away PPP. Meta-model calibrates. Edge calculated vs. opening line.
Only games with 8+ point edge + positive CLV fire to Discord. Timestamped, with edge size, direction, and best available line.
Final scores ingested. W/L logged. CLV calculated (your line vs. close). Running P/L, hit rate, and distributions updated live.
Instead of predicting "total points," the model predicts possessions and efficiency separately, capturing the distinct mechanics of pace and scoring that monolithic models blend together.
Features like rim_advantage and two_pt_exploit measure how well a team's offensive strengths map against the opponent's specific defensive weaknesses.
Weekly retraining with strict chronological ordering. The model is tested exactly as it performs live, training only on past data. No look-ahead bias. No data leakage.
The system polls live odds every 2 minutes, tracking opening lines, detecting movement, calculating CLV in real-time, and firing alerts the moment a line passes the edge threshold.
Odds pulled from every major US book. Alerts include the best available line: the lowest total for overs, highest for unders, maximizing your CLV at execution.
Line tracking freezes 1 minute before tip-off, ensuring CLV is calculated against the true pre-game close, not contaminated by live in-game movement.
Join the Discord. Get real-time 8+ edge alerts. Start building your edge with the model.