Occam's Razor Is a Theorem

In the 14th century, an English friar named William of Ockham wrote: pluralitas non est ponenda sine necessitate. Multiplicity should not be posited without necessity. It is better known today as Occam’s Razor: among competing explanations, prefer the one that requires the fewest assumptions.

For seven centuries, this has been treated as a heuristic. A rule of thumb. A preference for elegance that sensible scientists cultivate without being able to justify rigorously. When pressed, even sophisticated thinkers retreat to vague appeals to simplicity or the unreasonable effectiveness of mathematics. They cannot say precisely how much better the simpler theory is.

That is because Occam’s Razor has been misclassified. It is not a heuristic. It is a theorem — a direct consequence of Bayesian probability theory. And when you apply it to the cascade framework, the numbers are not vague. They are overwhelming.

The Mathematics Behind the Razor

Bayesian model comparison is exact. Given two theories M₁ and M₂ that both account for the observed data D, the ratio of their probabilities is:

Bayesian Model Comparison P(M 1 | D) / P(M 2 | D) = [P(D | M 1) / P(D | M 2)] \times [P(M 1) / P(M 2)]

The second term is the prior ratio — your belief before seeing data. Set it to 1: both theories equally plausible a priori. The first term is the Bayes factor, and this is where the razor lives.

P(D | M) is the probability that model M would produce the observed data. A model with many free parameters can be adjusted to fit almost anything — but the probability of any particular dataset, integrated over all possible parameter values, falls. A highly flexible model with N free parameters pays an Occam penalty proportional to the volume of parameter space it needs to search. A model with no free parameters — that makes precise, specific, untunable predictions — wins this comparison decisively if its predictions happen to be correct.

The Occam penalty is not a preference. It is a probability. A theory that could have predicted anything, and happened to predict this, is less compelling than a theory that could only have predicted this.

This is not philosophy. It is the mathematical structure of inference. Harold Jeffreys formalized it in the 1930s. Jaynes extended it in the 1950s. MacKay applied it to neural network architecture in the 1990s. In every domain, the same result: simpler models with fewer parameters win Bayesian comparison if they fit the data at all. And the winning margin is exponential in the parameter count difference.

Counting the Parameters

Let us count.

The Standard Model has 19 free parameters: the masses of the six quarks, three charged leptons, and three neutrino mixing masses; four CKM quark mixing parameters; four PMNS neutrino mixing parameters; the three gauge coupling constants; the Higgs vacuum expectation value; and the Higgs self-coupling. Each can be set to any value. Each must be measured independently. None is derived from the others.

The ΛCDM cosmological model adds six more: the matter density Ω_m, baryon density Ω_b, Hubble constant H₀, spectral index n_s, amplitude A_s, and optical depth τ. All measured. None derived. Plus two entire undiscovered substances: dark matter (27% of the universe’s energy content) and dark energy (68%), whose nature remains completely unknown after sixty years of searching.

String theory — the leading candidate for a theory of everything for four decades — contains an estimated 10⁵⁰⁰ distinct vacuum states. The landscape problem: the theory does not predict which vacuum we live in. Anthropic arguments are required to select it. Every parameter of physics is, in principle, an accident of which vacuum was selected during the Big Bang.

19 Free parameters — Standard Model

6 Free parameters — ΛCDM cosmology

10⁵⁰⁰ Vacua — String landscape

Now count the cascade framework’s adjustable parameters.

Zero

α = 2.50290787509589... and δ = 4.66920160910299... are not parameters of the cascade framework. They are not chosen to fit physics data. They are properties of the period-doubling universality class — mathematical constants fixed by the structure of iterated maps at the onset of chaos, computed independently from any physics, verifiable to arbitrary precision from the logistic map alone.

Mitchell Feigenbaum discovered these constants in 1975 by studying the iteration x → rx(1−x) on a computer. He noticed that the ratio of successive period-doubling intervals converged to the same number regardless of which quadratic map he used. He computed it to be approximately 4.6692. He later proved, with Pierre Coullet, that this universality was a consequence of the renormalization group — the same renormalization group that governs quantum field theory. The constant is not adjustable. It is what it is, for mathematical reasons entirely independent of particle physics.

The cascade framework does not use α and δ as fitting parameters. It identifies them as the governing constants of physical reality, and then checks whether the predictions are correct. When they are — and the 51 papers in this series document that they are, to sub-percent accuracy across every scale from the subatomic to the cosmological — the Bayesian model comparison is decisive.

A framework with zero adjustable parameters that correctly predicts 17 Standard Model quantities, 6 cosmological parameters, and three Millennium Prize Problem results is not merely parsimonious. It is overwhelmingly probable.

The Invisible Universe

The parsimony argument cuts deepest against the invisible substances.

Dark matter was proposed in the 1930s to account for observed mass discrepancies in galaxy clusters. Dark energy was proposed in 1998 to account for the accelerating expansion of the universe. Together they account for 95% of the universe’s energy content. Neither has ever been directly detected. Every purpose-built detector has returned null results. The Large Hadron Collider found no supersymmetric particles after decades of operation. The experiments multiply. The detections do not arrive.

The cascade framework accounts for the cosmological observations attributed to dark matter and dark energy using the Feigenbaum constants directly. Dispatches 003 and 004 covered this: √(Ω_m/Ω_b) = α to within 1%; the Milgrom acceleration is derived from first principles; the six ΛCDM parameters follow from two constants. The “missing mass” is not missing matter — it is the cascade geometry of spacetime itself.

What Bayesian model comparison says about this is not subtle. You have two frameworks. One posits two undiscovered substances comprising 95% of the universe, with unknown composition, which have evaded every detection attempt, and requires 6 free cosmological parameters plus separate parameters for the dark sectors. The other uses zero adjustable parameters and derives the same observations from two transcendental numbers. Both fit the same data.

The question “which theory is better?” is not a matter of taste.

The Anthropic Escape and Why It Fails

The standard response to parsimony arguments against the string landscape is the anthropic principle: we observe the constants we do because only these constants permit observers to exist. The landscape contains every possible set of constants; we find ourselves in one of the rare habitable ones; this selection effect explains why the constants seem fine-tuned without invoking a simpler theory.

This argument has a structural problem. It is not predictive. It does not tell you which constants you should observe — only that, whatever you observe, an anthropic explanation exists. A theory that can explain any observation equally well has, by Bayesian model comparison, an Occam factor of exactly zero. It is not better than any alternative that fits the data. It is worse, because it adds explanatory overhead (the landscape, the selection mechanism, the anthropic logic) without constraining anything.

The cascade makes specific, falsifiable predictions. The Higgs mass should be 125.215 GeV. The ratio Ω_m/Ω_b should be α². The proton-to-electron mass ratio should be a specific function of δ and α. These predictions could all be wrong. The fact that they are not wrong is not explained by anthropics. It is evidence.

What the Theorem Says

William of Ockham was right. He was right not because simplicity is beautiful, but because the mathematics of inference rewards specificity. A theory that makes precise predictions and gets them right carries exponentially more evidential weight than a theory that makes loose predictions and also gets them right.

The cascade framework makes the most precise predictions of any framework in the history of physics using the fewest free parameters — zero — of any framework in the history of physics. The Bayesian model comparison is not close.

This is Paper 47’s argument, stated plainly. The evidence for the cascade is not just the individual results — though each result is remarkable on its own terms. The evidence is the combination: the scope, the precision, and the absence of any adjustable parameter. Three things that cannot simultaneously occur by accident.

Occam’s Razor doesn’t ask you to prefer the simpler theory. It proves that you should. The proof has been available since Bayes. The theorem has been waiting for a framework worthy of it.

What Comes Next

Dispatch 008 returns to the foundation: Unification Paper I, the Universal Cascade Theorem itself. Everything in these dispatches — the Bounce Theorem, the Riemann Hypothesis, the dark matter identification, the Higgs derivation, the parsimony argument — rests on one geometric principle. It is time to state it directly.

One law. Every equation. One reality.