In our previous article, we introduced baseline leakage: the systematic error that occurs when marketing mix models incorrectly attribute revenue from seasonal demand, trends, or external events to marketing campaigns. We showed how this is a fundamental flaw that inverts your understanding of what’s working, leading you to overspend during expensive periods while cutting budgets during your actual best opportunities.
But understanding that baseline leakage exists raises a more urgent question: why can’t traditional MMMs avoid it? After all, the industry has been building marketing mix models for decades. Surely someone would have solved this problem by now?
The uncomfortable answer is that baseline leakage isn’t a bug that can be patched with better tuning or more sophisticated statistics. It’s a structural limitation arising from how most MMMs are architected. When your model assumes that baseline demand and marketing effects are separable and independent—and when marketing spend is strategically timed to coincide with high-demand periods—you create a mathematical problem that has no unique solution. The model will converge to an answer, but not necessarily the correct answer.
This article explores what causes baseline leakage at a structural level, why the most common solutions don’t actually solve it, and how you can detect whether your current MMM suffers from this problem.
What causes baseline leakage?
Baseline leakage doesn’t happen randomly. It emerges predictably from the interaction between how MMMs are built and how marketing is actually practiced. Four specific factors create the conditions where baseline leakage becomes inevitable.
The correlation problem: Marketing is strategically timed
The root cause of baseline leakage is straightforward: you don’t spend marketing budget randomly throughout the year. You increase spend during periods when you expect high conversion rates and pull back when demand is naturally lower.
Every retailer increases advertising during Q4 holidays. Every tax software company concentrates spend in March and April. Every fitness brand leans into January. This is smart strategy. You’re allocating resources to periods when customer intent is highest and conversion probability is strongest.
But this strategic timing creates perfect correlation between your marketing spend and baseline demand. When both variables move together, mathematical decomposition (separating out baseline from your spend) becomes ambiguous. The model sees December revenue spike at the same time December spend spikes, and it has to decide how to split credit between them.
This ambiguity is mathematically provable. When two variables are highly correlated, there exist infinite linear combinations of those variables that explain the observed outcome equally well. Your data alone cannot determine which decomposition is correct. This is what statisticians call a non-identifiable problem. Most MMMs don’t have external structure to resolve this ambiguity, so they’re solving an impossible problem.
The model architecture problem: Separability assumptions
Most marketing mix models start with an equation that looks like this: Revenue = Baseline + Channel 1 Effect + Channel 2 Effect + … + Noise
This formulation assumes separability: that you can cleanly decompose total revenue into independent contributions from baseline factors and from each marketing channel. The baseline captures what would have happened without marketing. The channel effects capture incremental lift.
This seems intuitive, but it imposes a strong structural assumption: that baseline and marketing operate independently. In reality, they don’t. Marketing effectiveness varies based on baseline context: a YouTube awareness campaign might be highly effective in September when people are beginning holiday shopping, but less effective in late December when purchase decisions have already been made.
Similarly, baseline demand is influenced by accumulated marketing. Your brand equity—a component of baseline—is the result of years of marketing investment. The mental availability that drives organic traffic was created by past upper-funnel campaigns.
These dynamics mean that baseline and marketing are coupled, not separable. But the standard MMM equation assumes they can be pulled apart and analyzed independently. When you force a non-separable system into a separable framework, you create attribution errors.
The forced saturation problem: Universal diminishing returns
Many widely used MMM frameworks rely on bounded response functions like Hill curves or Weibull saturation functions. These mathematical forms are chosen because they capture the idea that marketing has diminishing returns—the first dollar you spend is more effective than the ten-thousandth dollar.
But these functions force saturation to occur regardless of whether it actually exists in your data. By construction, they require that response eventually flattens as spend increases. They cannot represent a campaign that maintains linear or increasing returns across the range where you actually operate (and, yes, those absolutely exist).
This creates a systematic bias. When a campaign is genuinely operating below its saturation point—meaning each dollar you spend is still producing roughly the same amount of lift—the model must still make the response curve bend and flatten because that’s what the mathematical function is designed to do. Where does that artificial flattening come from? Often, it leaks from baseline variation—the natural ups and downs in demand throughout the year.
Here’s how it works: Imagine you have a Facebook campaign that’s genuinely linear over your spend range. You spend $25K during December and see great results. You spend $8K during February and see proportionally lower results.
A model using forced saturation curves looks at this data and sees “evidence” of diminishing returns: spending more produced less-than-proportional outcomes. But the actual explanation is that December has higher baseline demand than February. The campaign itself didn’t saturate. The context changed.
The forced saturation function can’t represent this reality, so it fits a saturation curve where the apparent diminishing returns arise from baseline variation getting attributed to the campaign at different spend levels. This is baseline leakage expressed through response function constraints.
The regularization illusion: stabilizing the wrong answer
When practitioners encounter attribution instability—channel contributions that vary dramatically across model runs or specification changes—the standard solution is to add regularization. L1/L2 penalties or closely related Laplace/Gaussian priors all serve to stabilize estimates by penalizing extreme solutions.
This appears to help. Regularized models produce more consistent attribution across runs. Everything feels more reliable.
But regularization doesn’t solve baseline leakage. It obscures it. Regularization reduces variance by introducing bias. It constrains the model to prefer certain types of solutions over others. In a non-identifiable problem where many decompositions fit the data equally well, regularization picks one based on its implicit preferences, not based on causal correctness.
When you apply these methods to a problem with baseline leakage, you get stable estimates, but they’re stably wrong. The model converges confidently to a decomposition that satisfies the regularization criteria while remaining causally incorrect. You’ve traded noisy answers that might alert you to a problem for precise answers that hide the problem entirely.
Traditional approaches that don’t solve baseline leakage
Understanding what causes baseline leakage naturally leads to the question: can’t we just fix it? Practitioners have tried multiple approaches to improve MMM reliability. Unfortunately, most of these solutions address symptoms rather than causes.
Why L1/L2 penalties and regularization aren’t enough
Many open-source MMM frameworks use L2 penalties (or their Bayesian equivalent, Gaussian priors) specifically because media attribution problems involve severe multicollinearity (different channels’ spend patterns are correlated with each other and with baseline components). L2 regularization stabilizes coefficient estimates under these conditions by shrinking them toward zero. Some frameworks use L1 penalties (Laplace priors in Bayesian terms) to create sparse solutions where only the most important effects remain non-zero.
Both approaches succeed in creating stability. Run the model ten times and you’ll get similar results each time. The problem is that you’re getting consistently wrong results, not consistently correct ones.
Regularization operates within the hypothesis class defined by your model structure. If that structure assumes separability between baseline and marketing, regularization will find the most stable separable decomposition, but it cannot make a non-separable system separable. The fundamental identifiability problem remains untouched.
Why longer time windows don’t solve it
Another common suggestion is that baseline leakage arises from insufficient data. If you had more observations across more time periods and different demand contexts, surely the model could tease apart baseline from marketing effects?
This intuition is wrong for a fundamental mathematical reason: more data doesn’t resolve non-identifiability. As you collect more observations, your estimates become more precise—the model becomes more confident in its answer—but you’re just getting a more confident version of whatever split your model’s built-in assumptions prefer, not necessarily the split that reflects what actually caused your sales.
This is what’s called asymptotic failure. With infinite data, a mis-specified model doesn’t eventually discover the truth. It converges with perfect confidence to the wrong answer.
Brands with five years of daily data still experience baseline leakage if their MMM uses separable decomposition. The problem is never the data volume, it’s the model structure.
Why Bayesian priors have limited impact
We’ve established that regularization—whether implemented as L1/L2 penalties or their Bayesian equivalents (Laplace/Gaussian priors)—can’t resolve the fundamental identifiability problem. But there’s an additional concern specific to Bayesian approaches: different prior choices yield different attributions, all consistent with the data but potentially inconsistent with causality (read our guide to why causality in marketing matters for a deeper dive).
Tighter priors on baseline smoothness will push more variation into marketing effects. Tighter priors on media saturation will push more variation into baseline. You’re choosing between mathematically equivalent but causally distinct explanations based on subjective prior specifications rather than causal correctness.
This is why MMM outputs are often so sensitive to prior choices and why different Bayesian MMM implementations can yield substantially different channel attributions for the same data. The priors are selecting between non-identifiable alternatives.
Why incrementality test calibration can’t fix a broken model
“Use incrementality tests to calibrate your MMM.” This is perhaps the most common proposed solution. Run experiments to measure ground truth lift for key channels, then adjust your MMM to match those results.
But calibrating to experimental results cannot fix structural model mis-specification. If your MMM cannot represent the true causal mechanisms—because it assumes baseline and marketing are separable, that marketing effectiveness stays constant over time, or that channels don’t interact with each other—then experimental anchoring becomes a patch on a broken foundation.
What happens in practice is that the model absorbs the experimental constraint locally but remains globally inconsistent. You might force the model to match an incrementality test result for Facebook during the test period, but because the underlying structure hasn’t changed, the model compensates by reallocating attribution across other channels or time periods in ways that may be equally wrong.
Additionally, incrementality tests themselves have limitations. They provide point-in-time estimates under specific conditions, but marketing effects vary across contexts. A geo test in October tells you about October performance, not December performance. If baseline leakage means your MMM is confusing seasonal efficiency changes with saturation, calibrating to one time period won’t fix the attribution in other periods.
How to detect baseline leakage in your MMM
Understanding what causes baseline leakage and why traditional fixes don’t work raises an urgent practical question: how do you know if your current MMM suffers from this problem? Detecting baseline leakage requires looking for its characteristic signatures.
The hold-out period test: Attribution across demand contexts
The most telling diagnostic is to examine how your MMM’s attribution changes across periods with dramatically different baseline demand. If the same campaign shows wildly different effectiveness depending on the season, that’s often a sign that the model is confusing baseline variation with marketing impact.
Look at your MMM’s estimated ROAS or incremental revenue for a specific channel across different months or quarters. Focus on channels where spend is relatively consistent but outcomes vary seasonally.
For example, if you run Facebook prospecting campaigns year-round at similar daily budgets, how does the MMM’s attributed ROAS compare between your peak season (November-December) and your slowest period (February-March)?
A well-functioning MMM should show that campaign effectiveness varies based on context (maybe your targeting is more efficient when competition is lower). But the variation should be explainable and proportionate.
If your MMM shows that the same campaign delivers 5x ROAS in December and 1.5x ROAS in February—despite roughly similar spend, creative, and targeting—you should be suspicious. The most likely explanation is that the model is attributing some of December’s naturally high baseline demand to the campaign.
Limitations of baseline leakage detection
It’s important to acknowledge what this diagnostic test cannot do: it cannot tell you the exact magnitude of baseline leakage, it cannot correct it, and it cannot distinguish all forms of model mis-specification from specifically baseline leakage.
What it can do is raise red flags that your MMM’s attribution may not be causally correct. If you see warning signs—attribution that varies too dramatically across demand contexts in ways that don’t match your operational reality—you should be skeptical of using that model’s output to guide major budget decisions.
This diagnostic is most useful for challenging your current measurement approach and motivating the search for more robust alternatives.
What comes next
We’ve now covered the technical causes of baseline leakage and the reasons why common solutions don’t actually resolve it. We’ve also shown you how to detect whether your MMM suffers from this structural problem.
The central insight is that baseline leakage isn’t a bug that can be patched. When your MMM assumes separability between baseline and marketing, forces universal saturation, and lacks the capacity to represent funnel dynamics or cross-channel interaction, it will converge to internally consistent but causally incorrect attributions. More data, better regularization, and experimental calibration cannot fix a model that’s structurally incapable of representing how marketing actually works.
So, what kind of model architecture can avoid baseline leakage? What would a structurally correct MMM look like, and what evidence exists that such approaches actually work?
In our final article, we’ll explore how mechanistic modeling addresses baseline leakage through fundamentally different design principles. Rather than trying to split revenue into independent baseline and marketing pieces, mechanistic models represent marketing as a system where demand accumulates over time in ways you can’t directly see, and where baseline conditions and marketing continuously influence each other rather than operating separately.