# Conditional probability as the basic notion of probability theory

Overview
A number of conceptual debates in both applications and philosophy of statistics and probability implicitly or explicitly depend on which concept of conditional probability is used. In particular there are two main conceptions floating about – ‘ratio’ based, which takes unconditional probability as basic and conditional probability as derived (and corresponds to Kolmogorov’s approach), and the reverse case which takes conditional probability as basic. I will call this the ‘conditionalist’ view. I point to a few arguments in favor of this latter view, how it relates to ‘model closure’ and a hierarchical/structural view of theories, and why it is popular among certain Bayesians as a resolution of the ‘catchall’ problem.

Disclaimer
Obviously I am only one of many to make this point. I still find it useful to record my agreement with the ‘conditionalists’. Very rough for now. Many more examples to come. Version 0.2

[Edit: I have a new appreciation for Kolmogorov’s approach after teaching it recently. If we consider a Kolmogorov ‘probability model’ to be a full probability space/probability triple, rather than just the measure, then we effectively get the same thing as advocated here. We have to imagine that we have – in principle – a sufficiently large (and generally ‘inaccessible’) background probability space to work ‘within’. Each concrete probability space, i.e. particular model, is then a restriction of this ‘global universe’. This makes the fact explicit that we are always using at least some restriction conditions, and forces us to give the form these restriction conditions take (e.g. orthogonality conditions?).

A rough idea occurs to me – we trade-off the fact that our ‘background universe’ grows exponentially as we relax closure assumptions (i.e. make less details irrelevant and hence more details relevant and hence more unique possibilities) with the expectation that as we include more details our models will become more deterministic. Hence our distributions ‘shrink’ relative to the new domains even if they are ‘bigger’ than their restriction to the old domains. So each ‘point’ in a higher dimension contains a whole universe of lower dimension. Think power sets.  An interesting starting point for thinking more about the mathematical ‘universe(s)’ we work in (and how we get a hierarchy of sub-universes) is https://ncatlab.org/nlab/show/universe. See also https://en.wikipedia.org/wiki/Universe_mathematics]. See also the more recent blog post on the meaning of the terms linear/nonlinear and infinite/finite.

What conditional probability could not be
The above heading is the title of a paper by Alan Hájek (2003, Synthese) see here.

…the ratio analysis of conditional probability…has become so entrenched that it is often referred to as the definition of conditional probability. I argue that it is not even an adequate analysis of that concept…I marshal many examples from scientific and philosophical practice against the ratio analysis. I conclude more positively: we should reverse the traditional direction of analysis. Conditional probability should be taken as the primitive notion, and unconditional probability should be analyzed in terms of it.

The article is a good read and is in agreement with the positions of a number of Bayesians, as well as my own sort-of/occasional-Bayesian-but-there-are-probably-deeper-issues-to-worry-about view.

In and of itself I find the above article fairly convincing; the point has been reinforced to me however by reading a number of similar arguments for and against, as well as my own thinking about the nature of mathematical modelling.

I will collect some of these below, and then present my own main motivations for adopting a conditionalist view, which are somewhat independent of the Bayesian/Frequentist divide.

A collection of examples
[To fill in]
Bayesian classics
– de Finetti
– Jaynes
– ?

Internet arguments
– Gelman, Mayo, Wasserman and other characters
– Pearl, causality and conditioning

My motivation – hierarchies, closure, contradiction, expansion and invariant structure
Catchall vs conditional closure
My first post used conditional probability statements to formulate the basic idea of ‘model closure’ and argue against needing a ‘catchall’ (see the post for details). You’ll notice, however, that this argument only makes sense if you accept (as I did somewhat implicitly) that conditional probability is a basic notion and can be defined even in the absence of a joint distribution.

So, for the record, I take conditional probability as basic and definable even in the absence of unconditional distributions. Thus a ‘catchall’ unconditional distribution is not required for closure.

[Another disclaimer re: the following – these view of mine have been motivated by a number of authors, from Jaynes to Gelman, to my own lecturers in mathematical modeling and physics. So while I present it as my own perspective, it is inevitably strongly derivative of a number of others’ views. Perhaps the most original part is relating this view to the ideas of structural invariance, but this concept has itself been advocated by many.]

Conditional contradiction, hierarchies, regularisation, model expansion and invariant structure
Models always use temporary, approximate closures.

We generally need to begin work by ‘fixing’ (conditioning on) ‘external’ variables and working ‘within’ a system. As illustrated in the previous post, however, we often (inevitably?) reach contradictions or inconsistencies within our models as we approach the ‘boundary’ of our model closures. This leads to the idea (for one example) of ‘singular limits’.

Again as illustrated in the previous post, the way (or one way) to resolve this inconsistency is to ‘expand’ our model by embedding it in a larger model which relaxes a constraint implicit in the smaller model. This naturally leads to greater undetermination due to the additional degrees of freedom. This larger model is also often structurally isomorphic to the original model (at least in some respects), however, and thus gives us a ‘hierarchical’ and ‘structural’ – if not absolutely fixed – foundation to reason from. [Shades of Godel.]

So my perspective is thus ‘conditionalist’, ‘hierarchicalist’ and ‘structuralist’.

A (slightly) more concrete example
Consider a model of the form

p(a|c) = ∫ p(a|b)p(b|c) db

Where we have used the closure condition p(a|b,c) = p(a|b) to make p(a|b) ‘internal’ (invariant) relative to the ‘external’ variable (last conditioning variable) c.

We reason as follows – we want a model (directly) independent of our controlled variables c, with only boundary values b depending in a known manner on c.

IF we reach an internal contradiction – identified for example by p(a|b,c) != p(a|b) – we can (hopefully) expand our model to resolve this by moving previously controlled or ignored variables into the set of explanatory variables (ie expanding the state space) and then rewriting things so as to recover a model of the same schematic/structural ‘causal’ form via the redefinitions

p(a|c”) = ∫ p(a|b,c’)p(b,c’|c”) dbdc’

Equiv.

p(a|c”) = ∫ p(a|b’)p(b’|c”) db’

Where we have split c into (c’,c”) and defined b’ as (b,c’).

We now have an expanded theory having a different partition of variable classes. This leads to greater indeterminacy in the (internal/explanatory) variables, but gives a corresponding theory which possesses the same (invariant) structure as before. By prioritising the theory form I am taking a structuralist view of the essence of mathematical and scientific theories. Variable indeterminacy is the price we pay for removing inconsistency and maintaining structure at a higher level, but it is very often worth it (and exciting) – it corresponds in many cases to ‘new’ or ‘novel’ phenomena appearing. [Bifurcations].

Again, see the previous post for a simple example of expanding a model to remove a singularity and hence introducing indeterminacy.

Observations
Note that we make crucial use of a ‘conditionalist’ and hierarchical view of model structure. Yet another reason to take conditional probability (and conditional thinking) as basic, instead of unconditional probability.

Note also that what was previously a non-probabilistic variable can always become probabilistic as we ‘shift’ where we are in the hierarchy. The position of a variable in the structure is more important than the nature of the variable itself. Another reason to not dismiss Bayesian modelling for allowing us to treat variables as probabilistic (internal to the theory) if and when we choose to – or are forced to.

A possible point of agreement with the frequentist view, however, is that we always maintain some ‘conditioned on’ but non-probabilistic variables (controlled or ‘external’ variables) as temporary scaffolding.