Substitutional vs objectual quantification

Overview/motivation
The interpretation of logical quantifiers such as ‘there exists‘ and ‘for all‘ and the associated ontological implications of these interpretations is (apparently – or at least it was) an important topic in philosophy. I encountered these interpretations a few years ago when reading Haack’s ‘Philosophy of Logics’, but didn’t pay much attention. Quine is a central figure here – e.g. his famous (within philosophy, anyway) saying ‘to be is to be the value of a variable‘ concerns this issue.

I realised recently that I’ve been thinking about somewhat similar issues, albeit in a more ‘applied’ context, e.g. when talking about the interpretation of ‘for all’ in formulating ‘schematic’ model closure assumptions (see here). So, here a few notes on the topic. The obvious disclaimer applies – I am not a philosopher. I’m just hoping to get a few basic conceptual ideas straightened out in my head, so that I may better formalise some arguments useful in science and statistics. I am not aiming to ‘solve’ the general philosophical problems! Corrections or comments welcome.

The problem
Here is a brief sketch of the issue as it arises for the existential quantifier. The question is: how should we interpret statements of quantified logic of the form

\exists x P(x)

We have (or there exists??), in fact, two options.

Objectual: There exists an object x such that it has property P.

Substitutional: There exists an instance of a statement having the general form P(x), obtained by substituting some name, term or expression etc for x, that is true.

In the former, the emphasis is placed on objects and their possession of properties, in the latter, the emphasis is placed on statement forms and the truth of particular statement instances.

In particular, in the latter, substitutional, case truth is a property of statements ‘as a whole’ and need not relate to ‘actual objects’ occurring in the sentence.

The classic example is that, on the objectual reading,

(S): Pegasus is a flying horse

can be taken to mean

(S via Obj.): “There exists an object (e.g. Pegasus) which is both a horse and can fly”

We would normally take this as false, since no such object ‘really exists’. On the other hand, on the substitutional reading we may take this to mean

(S via Subs.): There is a true statement of the form ‘x is a flying horse’ (e.g. Pegasus is a flying horse)

The justification for taking this as true is that, given our knowledge of mythology (certainly a real subject itself), we may take this to express a true statement without further commitment to (or even ‘attention to’) the existence of the ‘objects’ or ‘properties’ involved.

Thus the substitutional interpretation refers to the truth or falsity of resultant sentences/statements ‘as a whole’ (and the forms of such sentences/statements), while the objectual intepretation refers to the existence of objects with properties, and hence in a sense gives a more ‘granular’ interpretation.

Both seem to me to involve subtle issues of context, however – e.g. we can presumably only interpret the above statement instance as ‘true’ in the substitutional interpretation given the context of mythology.

Marcus and Kripke offered defenses of the substitutional interpretation while Quine advocated the objectual interpretation (hence ‘to be is to be the value of a variable’).

There is obviously much more to this topic – see e.g. Haack’s book, the SEP. For now, I note that I find myself reasonably sympathetic to the substitutional interpretation (or perhaps both interpretations, depending on the circumstances). This appears to be roughly consistent with what I was attempting to express here.

There also seems to be something here that depends on whether, given the ‘function’ P(x), we focus on the ‘domain of naming’ or on the ‘codomain of statements’. These issues hence also seem to connect with the issue of how to interpret (proper) names e.g. as ‘mere tags’ (Marcus), ‘rigid designators’ (Kripke), ‘definite descriptions’ (Russell) or as ‘predicates’ (Quine). The substitutional interpretation is generally allied with the view of proper names as ‘mere tags’ or as ‘rigid designators’, and I have become quite fond of (what I understand by) this idea. It would be too much to go into this in any detail at the moment, however.

The tacking ‘paradox’ revisited – notes on the dimension and ordering of ‘propositional space’

Another short (and simple) note on the so-called tacking paradox from the philosophy of science literature. Continuing on from here and related to a recent blog comments exchange here. See those links for the proper background.

[Disclaimer: written quickly and using wordpress latex haphazardly with little regard for aesthetics…]

Consider a scientific theory with two ‘free’ or ‘unknown’ parameters, a and b say. This theory is a function f(a,b) which outputs predictions y. I will assume this is a deterministic function for simplicity.

Suppose further that each of the parameters is discrete-valued and can take values in \{0,1\}. Assuming that there is no other known constraint (i.e. they are ‘variation independent’ parameters) then the set of possible values is the set of all pairs of the form

(a,b) \in \{(0,0), (0,1), (1,0), (1,1)\}

That is, (a,b) \in \{0,1\}\times \{0,1\}. Just to be simple-minded let’s arrange these possibilities in a matrix giving

\begin{pmatrix} (0,0), & (0,1)\\(1,0), & (1,1) \end{pmatrix}

This leads to a set of predictions for each possibility, again arranged in a matrix

\begin{pmatrix} f(0,0), & f(0,1)\\ f(1,0), & f(1,1) \end{pmatrix}

Now our goal is to determine which of these cases are consistent with, supported by and/or confirmed by some given data (measured output) y_0.

Suppose we define another function of these two parameters to represent this and call it C(a,b;y_0) for ‘consistency of’ or, if you are more ambitious, ‘confirmation of’ any particular pair of values (a,b) with respect to the observed data y_0.

For simplicity we will suppose that f(a,b) outputs a definite y value which can be definitively compared to the given y_0. We will then require C(a,b;y_0) = 1 iff f(a,b) = y_0, and C(a,b;y_0) = 0 otherwise. That is, it outputs 1 if the predictions given a and b values match, 0 if the predictions do not. Since y_0 will be fixed here I will drop y_0, i.e. I will use C(a,b) without reference to y_0.

Now suppose that we find the following results for our particular case

\begin{pmatrix} C(0,0) = 1, & C(0,1) = 1\\ C(1,0) = 0, & C(1,1) = 0 \end{pmatrix}

How could we interpret this? We could say e.g. (0,0) and (0,1) are ‘confirmed/consistent’ (i.e. C(0,0) = C(0,1) = 1), or we could shorten this to say (0,\cdot) is confirmed for any replacement of the second argument. Clearly this corresponds to a case where the first argument is ‘doing all the work’ in determining whether or not the theory matches observations.

Now the ‘tacking paradox’ argument is essentially:

C(0,0) = 1

so

(0,0)

is confirmed, i.e. ‘a=0 & b=0’ is confirmed. But ‘a=0 & b=0’ logically implies ‘b=0’ so we should want to say ‘b=0’ is confirmed. But we saw

C(0,1) =1

and so

(0,1)

is also confirmed, which under the same reasoning gives that ‘b=1’ is confirmed!

Contradiction!

There are a number of problems with this argument, that I would argue are particularly obscured by the slip into simplistic propositional logic reasoning.

In particular, we started with a clearly defined function of two variables C(a,b). Now, we found that in our particular case we could reduce some statements involving C(a,b) to an ‘essentially’ one argument expression of the form ‘C(0,\cdot) = 1‘ or ‘(0,\cdot) is confirmed’, i.e. we have confirmation for a=0 and b ‘arbitrary’. This is of course just ‘quantifying’ over the second argument – we of course can’t leave any free (c.f. bound) variables. But then we are led to ask

What does it mean to say ‘b is confirmed’ in terms of our original givens?

Is this supposed to refer to C(b)? But this is undefined – C is of course a function of two variables. Also, b is a free (unbound) variable in this expression. Our previous expression had one fixed and one quantified variable, which is different to having a function of one variable.

OK – what about trying something similar to the previous case then? That is, what about saying C(\cdot,0) = 1? But this is a short for a claim that both C(0,0) = 1 and C(1,0) = 1 hold (or that their conjunction is confirmed, if you must). This is clearly not true. Similarly for C(\cdot,1).

So we can clearly see that when our theory and hence our ‘confirmation’ function is a function of two variables we can only ‘localise’ when we spot a pattern in the overall configuration, such as our observation that C(0,\cdot) = 1 holds.

So, while the values of the C function (i.e. the outputs of 0 and 1) are ordered (or can be assumed to be), this does not guarantee a total order when it is ‘pulled back’ to the parameter space. That is, C^{-1}\{1\} does not guarantee an ordering on the parameter space that doesn’t already admit an ordering! It also doesn’t allow us to magically reduce a function of two variables to a function of one without explicit further assumptions. Without these we are left with ‘free’ (unbound) variables.

This is essentially a type error – I take a scientific theory (here) to be a function of the form

f: A \times B \rightarrow Y,

i.e a function from a two-dimensional ‘parameter’ (or ‘proposition’) space to a (here) one dimensional ‘data’ (or ‘prediction’, ‘output’ etc) space. The error (or ‘paradox’) occurs when taking a scientific theory to be simply a pair

A \times B,

rather than a function defined on this pair.

That is, the paradox arises from a failure to explicitly specify how the parameters of the theory are to be evaluated against data, i.e. a failure to give a ‘measurement model’.

(Note: Bayesian statistics does of course allow us to reduce a function of two variables to one via marginalisation, and given assumptions on correlations, but this process again illustrates that there is no paradox; see previous posts).

One objection is to say – “well this clearly shows a ‘logic’ of confirmation is impossible”. Staying agnostic with respect to this response, I would instead argue that what it shows is that:

The ‘logic’ of scientific theories cannot be a logic only of ‘one-dimensional’ simple propositions. A scientific theory is described at the very, very minimum by a ‘vector’ of such propositions (i.e. by a vector of parameters), which in turn lead to ‘testable’ predictions (outputs from the theory). That is, scientific theories are specified by multivariable functions. To reduce such functions of collections of propositions, e.g. a function f(a,b) of a pair (a,b) of propositions, to functions of less propositions, e.g. ‘f(a)’, requires the use – again at very, very minimum, of quantifiers over the ‘removed’ variables, e.g. ‘f(a,b) = f(a,-) for all choices of b’.

Normal probability theory (e.g the use of Bayesian statistics) is still a potential candidate in the sense that it extends to the multivariable case and allows function reduction via marginalisation. Similarly, pure likelihood theory involves concepts like profile likelihood to reduce dimension (localise inferences). While standard topics of discussion in the statistical literature (e.g. ‘nuisance parameter elimination’), this all appears to be somewhat overlooked in the philosophical discussions I’ve seen.

So this particular argument is not, to me, a good one against Bayes/Likelihood approaches.

(I am, however, generally sympathetic to the idea that C functions like that above are better considered as consistency functions rather than as confirmation functions – in this case the, still fundamentally ill-posed, paradox ‘argument’ is blocked right from the start since it is ‘reasonable’ for both ‘b=0’ and ‘b=1‘ to be consistent with observations. On the other hand it is still not clear how you are supposed to get from a function of two variables to a function of one. Logicians may notice that there are also interesting similarities with intuitionistic/constructive logic see here or here and/or modal logics, see here – I might get around to discussing this in more detail someday…)

To conclude: the slip into the language of simple propositional logic, after starting from a mathematically well-posed problem, allows one to ‘sneak in’ a ‘reduction’ of the parameter space, but leaves us trying to evaluate a mathematically undefined function like C(b=0).

The tacking ‘paradox’ is thus a ‘non-problem’ caused by unclear language/notation.

Addendum – recently, while searching to see if people have made similar points before, I came across this nice post ‘Probability theory does not extend logic‘. 

The basic point is that while probability theory uncontroversially ‘extends’ what I have called simple ‘one-dimensional’ propositional logic here, it does not uncontroversially extend predicate logic (i.e. the basic logical language required for mathematics, which uses quantifiers) nor logic involving relationships between quantities requiring considerations ‘along different dimensions’. 

While probability theory can typically be made to ‘play nice’ with predicate logic and other systems of interest it is important to note that it is usually the e.g. predicate logic or functional relationships – basically, the rest of mathematical language – doing the work, not the fact that we replace atomic T/F with real number judgements. Furthermore the formal justifications of probability theory as an extension of logic used in the propositional case do not translate in any straightforward way to these more complicated logical or mathematical systems.

Interestingly for the Cox-Jaynesians, while (R.T.) Cox appears to have been aware of this, and he even considered extensions involving ‘vectors of propositions’ – leading to systems which no longer satisfy all the Boolean logic rules (see e.g. the second chapter of his book) – Jaynes appears to have missed the point (see e.g. Section 1.8.2 of his book). As hinted at above, some of the ambiguities encountered are potentially traceable – or at least translatable – into differences between classical and constructive logic. Jaynes also appears to have misunderstood the key issues here, but again that’s a topic for another day.

Now all of this is not to say that Bayesian statistics as practiced is either right or wrong but that the focus on simple propositional logic is the source of numerous confusions on both sides. 

Real science and real applications of probability theory involve much more than ‘one-dimensional’ propositional logic. Addressing these more complex cases involves numerous unsolved problems.

Hierarchical Bayes

This is ‘Not a Research Blog’, but nevertheless some thoughts on, and application of, hierarchical Bayes that are related to what I’ve been posting about here can be found in my recent preprint:

A hierarchical Bayesian framework for understanding the spatiotemporal dynamics of the intestinal epithelium

A few comments. I actually wrote essentially all of this about a year ago. The quirks of interdisciplinary research mean, however, that I have only just recently been able to post even a preprint of this work online (data/other manuscript availability issues etc). Some of my views may have changed slightly since then – but probably not overly much (and most of the more different ideas would relate to alternative frameworks rather than modifications of the present approach). Of course the usual delays of publication mean this happens fairly often – yet another reason for using preprints. This was also my first bioRxiv submission (bioRvix is essentially arXiv targeted specifically at biology and biological applications) – it was extremely easy to use and went through screening in less than a day.

This manuscript was also a first attempt to pull together a lot of ideas I’d been playing around with relating to hierarchical models, statistical inference, prediction, evidence, causality, discrete vs continuum mechanistic models, model checking etc, and apply them to a real problem with real data. As such it’s reasonably long, but I think readable enough. In some ways it probably reads more like a textbook, but some might find that useful so I’ve tried to frame that as a positive.

Linear or nonlinear with respect to what?

Overview
I’m teaching a partial differential equations (PDEs) course in the mathematics department at the moment. A typical ‘gimme’ question for assignments and tests is to get the students to classify a given equation as linear or nonlinear (most of the theory we develop in the course is for linear equations so we need to know what this means). Since we aim to introduce the students to a bit of operator theory we often switch back and forward between talking about linear/nonlinear PDEs and linear/nonlinear operators.

One of the students noticed that this introduced some ambiguity into our classification problem and asked a great question. I think it illustrates a useful general point about terminology like linear vs nonlinear and how these terms can be misleading or ambiguous. So here’s the question and my attempt at clarifying the ambiguity.

The question
It’s my understanding that a PDE is linear if we can write it in the form Lu = f(x,t), where L is a linear differential operator.

If we are given a PDE that looks like Au = 0 for some differential operator A and asked to show that the PDE is nonlinear, I can (probably) show that A is not a linear differential operator. However this doesn’t necessarily imply that you cannot rearrange the equation in such a way to make it linear.

For example the operator A defined by

Au = (u^2+1)u_t+(u^2+1)u_{xx}

is not a linear differential operator. However the equation Au = 0 is the same as u_t + u_{xx} = 0, and the differential operator B defined by Bu = u_t + u_{xx} is linear.

So (I believe I’m correct in saying this), the original PDE is linear, because it can be rewritten in this form Lu = f(x,t) for some linear differential operator L and function f(x,t).

My question is what sort of working are we expected to show, if we aim to prove the PDE Au=0 is not linear? For the purposes of the assignment does it suffice to prove that A is not linear?

My response
Here was my response and attempt to clarify (corrections/comments welcome!).

Great question!

As you’ve noticed there is some ambiguity when we move back and forward between talking about equations and operators. This is to be expected since a function (e.g. an operator) is a different type of mathematical object to an equation

For example the function f:x \mapsto x^2 is a different ‘object’ to the equation x^2 = 0.

You’ve correctly noticed that if we can write a differential equation as Lu = f where L is some linear operator then the differential equation is also called linear. Unfortunately, again as you’ve noticed, this definition makes it hard to decide when an equation is nonlinear as you may be able to write a linear equation in terms of a nonlinear operator with the right choice of f. This is because the negation of ‘there exists’ a linear operator is ‘there doesn’t exist a linear operator’.

So proving that an equation is linear is easy using the operator definition – we just find any linear operator that works.

On the other hand, proving that an equation is nonlinear is harder using this definition – it would require showing all operators for which Au = f are nonlinear.

This seems too hard to do directly, so let’s reformulate it in an equivalent but easier-to-use way.

We want to keep our definitions of linear and nonlinear as close as possible for the two cases of operators and equations.

So, how about:

Improved definitions

An operator acting on u is linear iff L(au+bv) = aL(u) + bL(v) for any u and v in the operator’s domain and constants a, b.

and

Given an equation written in the from Au = f for some operator A and forcing function f, the equation is linear iff A(au+bv) = aA(u) + bA(v) for any two solutions u, v to the equation Au = f.

I think this definition should cover your example (try it! Note that it is slightly subtle how this makes a difference! But, basically, we get to use the f = 0 in the equation case now).

Also note that:

The function definition now explicitly talks about linearity with respect to how it operates on objects in its domain while the equation definition talks explicitly about behaviour with respect to solutions to that equation. This seems natural given the different ‘nature’ of ‘functions’ and ‘equations’.

Does that make sense?

Morally speaking
I think the broader lesson is that terms like linear/nonlinear are relative to the specific mathematical representation chosen and how we interact with that representation. A ‘system’ is not really intrinsically linear or nonlinear, rather an ‘action’ (or function or operator or process) is linear or nonlinear with respect to a specific set of ‘objects’ or ‘measurements’ or ‘perturbations’ or whatever. This needs to be made explicit for an unambiguous classification to be carried out.

Generalisation
Perhaps generalising too far, something like this came up in some recent ‘philosophical’ discussions I’ve been having over at Mayo’s blog (and was also at the heart of another scientific disagreement I once had with an experimentalist about interpreting aquaporin knockout experiments…).

For example, it has been pointed out that while ‘chaos’ is typically associated with (usually finite-dimensional) nonlinear systems, there are examples of infinite-dimensional linear systems that exhibit all the hallmarks of chaos – see e.g. ‘Linear vs nonlinear and infinite vs finite: An interpretation of chaos‘ by Protopopescu for just one example. So, changing the underlying ‘objects’ used in the representation changes the classification as ‘linear’ or ‘nonlinear’ or, as Protopopescu states

Linear and nonlinear are somewhat interchangeable features, depending on scale and representation…chaotic behavior occurs… when we have to deal with infinite amounts of information at a finite level of operability. In this sense, even the most deterministic system will behave stochastically due to unavoidable and unknown truncations of information.

This theme appears again and again at various levels of abstraction – e.g. we saw it in a high-school math problem where a singularity (a type of ‘lack of regularity’) arose (which we interpreted as) due to an incompatibility between a regular higher-dimensional system and a constraint restricting that system to a lower-dimensional space. (Compare the abstract operator itself with the operator + equating it to zero to get an equation.) We were faced with the choice of a regular but underdetermined system that required additional information for a unique solution or a ‘unique’ but singular (effectively overdetermined) system. Similarly other ‘irregular’ behaviour like ‘irreversibility’ can often be thought of as arising due to a combination of ‘reversible’ (symmetric/regular etc) microscopic laws + asymmetric boundary conditions/incomplete measurement constraints. Similar connections between ‘low/high dimensional’ systems and ‘stable/unstable’ systems are discussed by Kuehn in ‘The curse of instability‘.

To me this presents a helpful heuristic decomposition of models of the world into two-level decompositions like ‘irregular nature’ -> ‘regular, high-dimensional nature’ + ‘limited accessibility to nature’ (h/t Plato) or ‘internal dynamics’ + ‘boundary conditions’, ‘reversible laws’ + ‘irreversible reductions/coarse-graining’ etc. Note also that, on this view, ‘infinite’ and ‘finite’ are effectively ‘relative’, ‘structural’ concepts – if our ‘access’ to the ‘real world’ is always and instrinsically limited it leads us to perceive the world as effectively infinite (in some sense) regardless of whether the world is ‘actually’ infinite. You still can’t really avoid ‘structural infinities’ –  e.g. continuous transformations – though.

It seems clear that this also inevitably introduces ‘measurement problems’ that aren’t that dissimilar to those considered to be intrinsic to quantum mechanics into even ‘classical’ systems, and leads to ideas like conceiving of ‘stochastic’ models as ‘chaotic deterministic’ systems and vice-versa.

Recent reading: a miscellany of slightly obscure things

Sometimes I forget which things I’m currently reading (i.e. dipping in and out of). So, here are a few notes, mainly to myself and mainly about books and more obscure sources than the usual current research papers.

A couple of things on category theory: Category Theory for the Sciences by Spivak and Sets for Mathematics by Lawvere and Rosebrugh. (Also Mathematical Physics by Geroch, but that is more of a broad coverage of essential mathematics using category theory than a book introducing/studying category theory itself.) Really enjoying both. Would like to code up some of the content of Spivak to illustrate the main ideas.

A few things on mathematical biology/physiology etc (mainly for work/background I should know but have either forgotten or not learned). Mathematical Physiology by Keener and Sneyd (the latter being my old PhD supervisor). Free Energy Transduction and Biochemical Cycle Kinetics by Hill (as well as the older, longer version). An underrated book, I need to summarise the best bits at some point. Basic Principles of Membrane Transport by Schultz. Another great classic, helped me a lot during my PhD. Both a bit old but the main thing that seems to have changed is that we have actually identified a lot of the proteins behind the mechanisms originally predicted on based on coarse information and largely theoretical modelling!

Stochastic Modelling for Systems Biology by Wilkinson, Chemical Biophysics by Qian and Beard and Stochastic Process for Physics and Chemistry by van Kampen. Good complements to the above books, generally more focused on stochastic aspects, but still similar concepts. See also the papers Entropy Production in Mesoscopic Stochastic Thermodynamics: Nonequilibrium Kinetic Cycles Driven by Chemical Potentials, Temperatures, and Mechanical Forces by Qian et al. as well as Contact Geometry of Mesoscopic Thermodynamics and Dynamics by Grmela. Also, the book Statistical Thermodynamics of Nonequilibrium processes by Keizer. Should summarise the various key concepts and how to think about ‘mesoscopic’ processes in biology.

A few references on mechanics: some point particle stuff (want to use in some applications), also differential geometry, symmetry etc. Introduction to Physical Modelling by Wellstead (mainly interested in the ‘mobility analogy’). The Variational Principles of Mechanics by Lanczos (a classic!). Analytical Dynamics by Udwadia and Kalaba. Nonholonomic Mechanics and Control by Bloch et al. First Steps in Differential Geometry: Riemannian, Contact, Symplectic by McInerney. Discrete Differential Geometry: An Applied Introduction by Grinspun et al. Foundations of Mechanics by Abraham and Marsden. Introduction to Mechanics and Symmetry by Ratiu and Marsden. Mathematical Foundations of Elasticity by Marsden and Hughes. Also the paper: ‘On the Nature of Constraints for Continua Undergoing Dissipative Processes’ by Rajagopal and Srinivasa.

Dynamical systems (research and teaching – solution and analysis methods): Numerical Continuation Methods for Dynamical Systems by Krauskopf, Osinga and Galan-Vioque. Recipes for Continuation by Dankowicz and Schilder. Stability, Instability and Chaos by Glendinning. Nonlinear Systems by Drazin. Elements of Applied Bifurcation Theory by Kuznetsov. Applications of Lie Groups to Differential Equations by Olver. Scaling by Barenblatt. Renormalization Methods: A Guide For Beginners by McComb. Multiple Time Scale Dynamics by Kuehn.

Measure, Integral and Probability by Capinski and Kopp, Integral, Measure and Derivative by Shilov and Gurevich and Hilbert Space Methods in Probability and Statistical Inference by Small and McLeish (see also Functional Analysis by Muscat). Probability via Expectation By Whittle. Functional Analysis for Probability and Stochastic Processes: An Introduction by Bobrowski. Trying to decide on my preferred abstract framework for thinking about these topics. Each presents a slightly different perspective, each has its strengths and weaknesses. Will have to write a ‘compare and contrast’ to help me decide. I’ve pretty well decided on the functional analysis point of view. Update: see also Differential Geometry and Statistics by Amari and Differential Geometry and Statistics by Murray and Rice. So basically: functional analysis + differential geometry seems to be the way to go. Same as for mechanics.

Related to the above, a few books (and a paper or two) on inverse problems, parameter estimation, Bayesian inference and numerical approximation. Data Assimilation: A Mathematical Introduction by Law, Stuart and Zygalakis. Inverse Problems: A Bayesian Perspective by Stuart. Mapping Of Probabilities by Tarantola (as well as his classic book Inverse Problem Theory). Statistical and Computational Inverse Problems by Kaipio and Somersalo. PTLoS by Jaynes (Ch. 18; I keep reinventing something similar to this but don’t quite understand it. I think it might correspond to reinventing the functional analysis approach?). Data Analysis and Approximate Models by Davies. Moore, Kearfott and Cloud Introduction to Interval Analysis. Measuring Statistical Evidence Using Relative Belief by Evans. Theoretical Numerical Analysis: A Functional Analysis Framework by Atkinson and Han. Moore and Cloud Computational Functional Analysis. Discrete and Continuous Boundary Problems by Atkinson. Fletcher Computational Galerkin Methods. Functional Data Analysis by Ramsay Silverman.

Teaching PDEs: Partial Differential Equations for Scientists and Engineers by Farlow. Applied Mathematics by Logan. Partial Differential Equations by Evans. Advanced Engineering Mathematics by Greenberg. Green’s functions and boundary value problems by Stakgold. Principles and Techniques of Applied Mathematics by Friedman. Partial Differential Equations of Applied Mathematics by Zauderer. A First Course in Continuum Mechanics by Gonzalez and Stuart. Physical Foundations of Continuum Mechanics By Murdoch. Nonlinear Partial Differential Equations by Debnath. Mathematical Methods for Engineers and Scientists 3: Fourier Analysis, Partial Differential Equations and Variational Methods by Tang.Methods of Mathematical Physics II by Courant and Hilbert. Ames Nonlinear PDEs in Engineering.Ern and Guermond Theory and Practice of Finite Elements. Still need to find a book I really like that balances mathematical, numerical and physical concepts at the right level. The short article Generalized Solutions by Tao is nice.

Conditional probability as the basic notion of probability theory

Overview
A number of conceptual debates in both applications and philosophy of statistics and probability implicitly or explicitly depend on which concept of conditional probability is used. In particular there are two main conceptions floating about – ‘ratio’ based, which takes unconditional probability as basic and conditional probability as derived (and corresponds to Kolmogorov’s approach), and the reverse case which takes conditional probability as basic. I will call this the ‘conditionalist’ view. I point to a few arguments in favor of this latter view, how it relates to ‘model closure’ and a hierarchical/structural view of theories, and why it is popular among certain Bayesians as a resolution of the ‘catchall’ problem.

Disclaimer
Obviously I am only one of many to make this point. I still find it useful to record my agreement with the ‘conditionalists’. Very rough for now. Many more examples to come. Version 0.2

[Edit: I have a new appreciation for Kolmogorov’s approach after teaching it recently. If we consider a Kolmogorov ‘probability model’ to be a full probability space/probability triple, rather than just the measure, then we effectively get the same thing as advocated here. We have to imagine that we have – in principle – a sufficiently large (and generally ‘inaccessible’) background probability space to work ‘within’. Each concrete probability space, i.e. particular model, is then a restriction of this ‘global universe’. This makes the fact explicit that we are always using at least some restriction conditions, and forces us to give the form these restriction conditions take (e.g. orthogonality conditions?). 

A rough idea occurs to me – we trade-off the fact that our ‘background universe’ grows exponentially as we relax closure assumptions (i.e. make less details irrelevant and hence more details relevant and hence more unique possibilities) with the expectation that as we include more details our models will become more deterministic. Hence our distributions ‘shrink’ relative to the new domains even if they are ‘bigger’ than their restriction to the old domains. So each ‘point’ in a higher dimension contains a whole universe of lower dimension. Think power sets.  An interesting starting point for thinking more about the mathematical ‘universe(s)’ we work in (and how we get a hierarchy of sub-universes) is https://ncatlab.org/nlab/show/universe. See also https://en.wikipedia.org/wiki/Universe_mathematics]. See also the more recent blog post on the meaning of the terms linear/nonlinear and infinite/finite.

What conditional probability could not be
The above heading is the title of a paper by Alan Hájek (2003, Synthese) see here.

The basic point made is

…the ratio analysis of conditional probability…has become so entrenched that it is often referred to as the definition of conditional probability. I argue that it is not even an adequate analysis of that concept…I marshal many examples from scientific and philosophical practice against the ratio analysis. I conclude more positively: we should reverse the traditional direction of analysis. Conditional probability should be taken as the primitive notion, and unconditional probability should be analyzed in terms of it.

The article is a good read and is in agreement with the positions of a number of Bayesians, as well as my own sort-of/occasional-Bayesian-but-there-are-probably-deeper-issues-to-worry-about view.

In and of itself I find the above article fairly convincing; the point has been reinforced to me however by reading a number of similar arguments for and against, as well as my own thinking about the nature of mathematical modelling.

I will collect some of these below, and then present my own main motivations for adopting a conditionalist view, which are somewhat independent of the Bayesian/Frequentist divide.

A collection of examples
[To fill in]
Bayesian classics
– de Finetti
– Jaynes
– ?

Internet arguments
– Gelman, Mayo, Wasserman and other characters
– Pearl, causality and conditioning

My motivation – hierarchies, closure, contradiction, expansion and invariant structure
Catchall vs conditional closure
My first post used conditional probability statements to formulate the basic idea of ‘model closure’ and argue against needing a ‘catchall’ (see the post for details). You’ll notice, however, that this argument only makes sense if you accept (as I did somewhat implicitly) that conditional probability is a basic notion and can be defined even in the absence of a joint distribution.

So, for the record, I take conditional probability as basic and definable even in the absence of unconditional distributions. Thus a ‘catchall’ unconditional distribution is not required for closure.

[Another disclaimer re: the following – these view of mine have been motivated by a number of authors, from Jaynes to Gelman, to my own lecturers in mathematical modeling and physics. So while I present it as my own perspective, it is inevitably strongly derivative of a number of others’ views. Perhaps the most original part is relating this view to the ideas of structural invariance, but this concept has itself been advocated by many.]

Conditional contradiction, hierarchies, regularisation, model expansion and invariant structure
Models always use temporary, approximate closures.

We generally need to begin work by ‘fixing’ (conditioning on) ‘external’ variables and working ‘within’ a system. As illustrated in the previous post, however, we often (inevitably?) reach contradictions or inconsistencies within our models as we approach the ‘boundary’ of our model closures. This leads to the idea (for one example) of ‘singular limits’.

Again as illustrated in the previous post, the way (or one way) to resolve this inconsistency is to ‘expand’ our model by embedding it in a larger model which relaxes a constraint implicit in the smaller model. This naturally leads to greater undetermination due to the additional degrees of freedom. This larger model is also often structurally isomorphic to the original model (at least in some respects), however, and thus gives us a ‘hierarchical’ and ‘structural’ – if not absolutely fixed – foundation to reason from. [Shades of Godel.]

So my perspective is thus ‘conditionalist’, ‘hierarchicalist’ and ‘structuralist’.

A (slightly) more concrete example
Consider a model of the form

p(a|c) = ∫ p(a|b)p(b|c) db

Where we have used the closure condition p(a|b,c) = p(a|b) to make p(a|b) ‘internal’ (invariant) relative to the ‘external’ variable (last conditioning variable) c.

We reason as follows – we want a model (directly) independent of our controlled variables c, with only boundary values b depending in a known manner on c.

IF we reach an internal contradiction – identified for example by p(a|b,c) != p(a|b) – we can (hopefully) expand our model to resolve this by moving previously controlled or ignored variables into the set of explanatory variables (ie expanding the state space) and then rewriting things so as to recover a model of the same schematic/structural ‘causal’ form via the redefinitions

p(a|c”) = ∫ p(a|b,c’)p(b,c’|c”) dbdc’

Equiv.

p(a|c”) = ∫ p(a|b’)p(b’|c”) db’

Where we have split c into (c’,c”) and defined b’ as (b,c’).

We now have an expanded theory having a different partition of variable classes. This leads to greater indeterminacy in the (internal/explanatory) variables, but gives a corresponding theory which possesses the same (invariant) structure as before. By prioritising the theory form I am taking a structuralist view of the essence of mathematical and scientific theories. Variable indeterminacy is the price we pay for removing inconsistency and maintaining structure at a higher level, but it is very often worth it (and exciting) – it corresponds in many cases to ‘new’ or ‘novel’ phenomena appearing. [Bifurcations].

Again, see the previous post for a simple example of expanding a model to remove a singularity and hence introducing indeterminacy.

Observations
Note that we make crucial use of a ‘conditionalist’ and hierarchical view of model structure. Yet another reason to take conditional probability (and conditional thinking) as basic, instead of unconditional probability.

Note also that what was previously a non-probabilistic variable can always become probabilistic as we ‘shift’ where we are in the hierarchy. The position of a variable in the structure is more important than the nature of the variable itself. Another reason to not dismiss Bayesian modelling for allowing us to treat variables as probabilistic (internal to the theory) if and when we choose to – or are forced to.

A possible point of agreement with the frequentist view, however, is that we always maintain some ‘conditioned on’ but non-probabilistic variables (controlled or ‘external’ variables) as temporary scaffolding.

Is this high-school mathematics problem well-posed?

Overview and background
A brief discussion of well-posedness, singular problems and invariance, in the context of a high-school mathematics problem. Promoted by my return to NZ for a bit and catching up with family – my Dad is doing a PhD in mathematics education (more on that one day) and asked me to have a go at a problem he is using in a demonstration. I present my first naive solution and subsequent refinement. My Dad and I argue and then possibly agree. I was hospitalized shortly after but our discussion (probably) had nothing to do with this. Version 0.5.

The problem
No, not this one.

Instead consider the following ‘ladder problem’ as posed in an NCEA Level 3 mathematics exam (final year of high school in NZ):

ladder-problem

A naive solution
Under ‘exam conditions’ – drinking my obligatory daily flat white and having a maths problem suddenly handed to me by my Dad – this was (roughly) my approach. In sketchy, narrative form.

1. Read problem definition. Derivatives. Constraint.
2. Chain rule, implicit differentiation, or something.

So
x’ given. y’ desired. c(x, y)=0 given.

(1) x^2 + y^2 = 25

Differentiate. Drop constants.

(2) xx’ + yy’ = 0

ie
(2′) y’ = -xx’/y

Need x. Use (1) again for x:

(1): x = sqrt(25-y^2)

Into (2′):

(3) y’ = -sqrt(25-y^2)x’/y

All RHS quantities known. Plug in.

Ans: y’ = 0.8 m/s

Assuming no outrageous errors, I think this is what they were after.

A ‘paradox’
My Dad then asked for the solution for y=0.3m.

What he was getting at was this – looking at (3) clearly the problem is ill-defined, or singular, as y approaches zero. This can’t really be saved by any sensible, obvious or consistent dominant balance involving x or x’ going to zero at the same time.

This presents a nice toy model for thinking about regularisation (see also here, though the examples there are less directly relevant to the current problem) – I often find it a good principle to think about exactly how singularities arise and think of ways to remove them and hence ‘regularise’ a problem. This often points to a better conceptual understanding of a given problem.

As I have said again and again elsewhere on this blog, this sort of process concerns finding, testing and modifying different ‘model closures’.

A ‘resolution’
Let’s look at one resolution, that is not in itself incorrect but I don’t find especially illuminating. This was what my Dad pointed me to at some point. We argued a bit about whether this captured the essence of the ‘paradox’ and its resolution. My preferred – but, ultimately complementary – solution is given in the following section.

The solution my Dad preferred is presented in the link here and is described as follows:

Using results from related rate problems, some calculus books suggest that a ladder leaning against a wall and sliding under the influence of gravity will reach speeds that approach infinity. This Demonstration is built from the actual equations that govern the motion of the ladder as determined by the theory of rigid body mechanics. It shows that a sliding ladder never reaches very high speeds. The motion can be followed in two contrasting situations, with the top of the ladder either free to move away from the wall or constrained to be in contact with the wall. The forces are calculated for the falling ladder just before the top hits the floor.

The problem I have with this resolution is that, while likely correct (I haven’t checked all the details), it seems to obscure the key issues. It jumps straight to forces and gravity and Newton. But how exactly does the purely ‘geometric’ problem breakdown? Does it? When do we, if ever, need to move from kinematics to dynamics? What are the key/minimal conservation relations required for a well-posed problem?

(In other words, due to my undergrad education and for better or worse, I’ve been somewhat influenced by the spirit of Rational Mechanics [a la Truesdell, Noll], and would quite like a more axiomatic breakdown.)

An alternative perspective
Note: I don’t think the modification here contradicts the sort of solution proposed in the previous section. It is simply another perspective aimed at conceptual clarification.

Again, l’ll adopt a sketchy, narrative description.

Singular problems often result from an incorrect reduction of dimension and hence can be regularised by reintroducing additional scales, dimensions, quantities or cutoffs.

The ‘physical’ resolution noted that the ladder can detach from the wall. A tension between the wall constraint and the motion constraints appears to produce the singularity.

Consider a perfectly horizontal ladder lying on the ground. If it stays attached and the other end continues to move according to the given kinematic condition then the only possibility is that the ladder is being stretched. This violates the (presumably valid) assumption that the ladder is a rigid object (but see later for more on this!).

In fact, this shows up in the Wolfram example. The simulation allows you to (requires you to?) solve two different problems – the ladder able to detach and the kinematic constraint (given horizontal rate of motion for the bottom of the ladder) satisfied (I think) OR the ladder not able to detach and the horizontal (kinematic motion) constraint dropped in favour of a rate determined by angular and linear momentum conservation for a rigid rod falling under gravity.

Let’s consider the first case – i.e. a detachable ladder with the constraint of a fixed horizontal rate of motion for the bottom of the ladder satisfied. (This is presumably just as physically realisable in an experimental setup as a freely-falling ladder, e.g. by connecting it to a controlled pulling mechanism, and closer to the original problem specification.)

In this case we can remove the contradiction between the model and constraints (which generates the singularity) by simply introducing a moving coordinate system. This is implicitly fixed in the original solution. The key invariant is still the ladder length. See the figure below

Ladder Problem Sketch

Now, for convenience, let’s continue to fix the y coordinate origin at 0, but allow the x coordinate origin to be variable. Call this x0, but note this is not in general constant.

Redo the calculations. Keep the same numbering.

(1) (x-x0)^2 + y^2 = 25

Differentiate. Note x0 varies in time! Drop constants.

(2) (x-x0)(x’-x0′) + yy’ = 0

This expresses the key problem invariant – the ladder length. As expected, the price of an enlarged, non-singular problem is greater underdetermination. The original problem has x0, x0′ = 0, but if the ladder detaches then these are not true in general.

Note y=0 now implies x0 = 0 and/or x0′ = x’. This latter case, with x0 unknown, allows a rigid sliding of the ladder along the ground. In general, we can maintain sensible dominant balances so as to define the behaviour for small y and in the limit as y goes to zero.

In general, preservation of the key invariant (ladder length) plus special boundary constraints (touching the wall and/or floor) now allows the solution of particular cases. So we now have two well-posed (or better-posed) problems – touching the wall and touching the floor, respectively – with an underdetermined but non-singular problem in-between. We can’t, for example, say exactly when the ladder might be expected to detach from the wall, on the basis of the given info. The detachment point is unknown. For the sliding problem the initial x0 is also unknown in general. (Relevant exercise for the reader: Google ‘matched asymptotic expansion’).

So no, the problem is not fully well-posed, though it is soluble by making special assumptions. It is also (to me) clearer now where the additional information should come from – for example (a bound on) the rotation rate required to keep the bar in contact with the wall, given the kinematic condition (staying as close as possible to the problem as posed). This is of course determined by angular and linear momentum conservation, as in the Wolfram simulation.

It also raises other, equally realistic, possibilities though – violation of the rigid body assumption leading to deformation (stretching/strain, where x0=0 say but x’>x0′) or fracture (similar to the detachment case).

So, at some point one may need to introduce additional information – eg conservation of linear/angular momentum but also maybe material properties – to solve the expanded problem, but this shouldn’t obscure the key invariants and assumptions used, why they are required and at what point they are introduced.

This leads to a more general lesson.

Morally speaking
The key lesson to me is this:

The price of removing a singularity by embedding a problem in a higher dimensional problem is typically greater undetermination requiring additional information to solve in full generality. Regardless, it is helpful to view the original problem as a particular limit of an expanded problem.

Asymptotics, renormalization and scientific theories

Overview
In lieu of a post with original material and/or updates on the other posts, here is a nice quote relating to some of the key themes that I’ve started exploring on this blog. Specifically a quote about asymptotics and renormalization (and, by implication, model closure, approximation and invariance), and how these can illuminate some aspects of the nature of scientific theories.

On renormalization
From ‘Intermediate Asymptotics and Renormalization Group Theory’  by Goldenfeld, Martin, Oono (1989).

[a] macroscopic phenomenological description…consists of two parts: the universal structure, i.e., the structure of the equation itself, and phenomenological parameters sensitive to the specific microscopic physics of the system. Any good phenomenological description of a system always has this structure: a universal part and a few detail-sensitive parameters…In this sense, it is [also] possible that there is no good macroscopic phenomenology [for a given system of interest].

Thus if we consider a set of transformations that alters only the microscopic parameters of a model…the macroscopic universal features should remain unchanged. Therefore, if we can absorb the changes caused by modification of microscopic parameters into a few phenomenological parameters, we can obtain universal relations between phenomenological parameters.

If this is possible by introducing a finite number of phenomenological parameters, we say that the model (or the system) is renormalizable. This is the standard method of formulating the problem of extracting macroscopic phenomenology with RG. RG seeks the microscopic detail sensitive parts in the theory and tries to absorb them into macroscopic phenomenological parameters.

…Suppose that the macroscopic phenomenology of a system can be described successfully with a renormalizable microscopic model. The phenomenological parameters must be provided from either experiment or from a description valid at a smaller length scale. Is this a fundamental limitation of the renormalizable theory? If one is a reductionist, the answer is probably yes. However, another point of view is that microscopic models are not more fundamental than macroscopic phenomenology.

In fact, it is inevitable that in constructing models of physical systems, phenomena beyond some energy scale (or on length scales below a threshold) are neglected. In this sense, all present-day theoretical physics is macroscopic phenomenology.

Renormalization group theory has taught us how to extract definite macroscopic conclusions from this vague description. Of course, this is not always possible…However, we clearly recognize general macroscopic features of the world in our daily lives as macroscopic creatures! Thus, we may believe that for many important aspects of the macroscopic world there must be renormalizability. We may say that renormalizability makes physics possible.

Closure: objective and subjective, truth and approximation

Overview
A sketch of a few thoughts on ‘objective’ vs ‘subjective’ and ‘truth’ vs ‘approximation’ in the context of what I’ve been calling ‘model closure‘. Taking a roughly/informally category theory perspective. Includes more discussion of how the data space is idealised/closed as well as the parameter/theory space, as well as issues of invariance, multiple scales, intermediate asymptotics and renormalization.

Disclaimer
Still very rough. I have included some handwritten notes for now – will convert to typeset later. [Version: 0.3]

Orientation: objective and subjective, truth and approximation
First, I want to set the basic conceptual picture. I’ve mentioned this perspective a few times but I think it’s good to re-emphasise using some visualisation. Consider the following conceptual pictures, all making similar points:

combined_objective
Figure 1: ‘Thinking’ as a process of ‘mirroring’ ‘reality’ (L) and
the ‘objective/subjective thinking’ distinction as a further mirroring (essentially via a ‘functor’) of this ‘thinking-reality’ relationship within the ‘thinking’ concept itself (R; both from ‘Conceptual Mathematics’ by Lawvere and Schanuel).

combined_inside_outside
Figure 2: Testing ‘within’ and ‘without’ relative to a model (L; from ‘Probability theory and statistical inference’ by Spanos 1999) and a geometric picture of model closure relative to the ‘truth’ (R; my own drawing).

Each of these figures makes the point that:

even in ‘model world’ (c.f. the ‘real’ world) we need to distinguish between the ‘objective, external’ world and the ‘subjective, internal’ world. In particular, this distinction is drawn relative to the boundary defining the model closure, and applies to both ‘data’ and ‘parameters’.

As I have discussed in other posts, closure is what delinates the boundary between estimating parameters within a model structure and testing the model adequacy with respect to external reality. We have essentially already considered the parameter closure, i.e. discarding ‘irrelevant’ parameters (theoretical constructs). The same idea applies, however, to the data space closure. Some do not distinguish ‘within’ and ‘without’ in the way done here for various reasons – from ‘all models are wrong and therefore subjective’ to leaving ‘lumps of probability‘ to keep the ‘options open’ somewhat. There is some truth in these general ideas; after all, all closures are provisional. I still prefer to explicitly introduce and distinguish ‘inside’ and ‘outside’ a model and ‘objective’ and ‘subjective’ constructs, however – even when both are (and really, can only be) imagined.

‘Intermediate’ structure and multiple scales
On the other hand, a subtle issue emerges in a similar way to in the ‘tacking paradox’ post – the distinction between predictive irrelevance and more ‘complete’ irrelevance, i.e. the presence or absence and nature of further internal degrees of freedom. We need to find a way to follow the advice to

Rule out the accidental features
And you will see: the world is marvellous

– Alexander Block (translated by Sir James Lighthill)

This ‘intermediate’ perspective is described in Barenblatt’s ‘Scaling‘ which quotes the above and also give the following painting as a conceptual example:

Lincoln_in_Dalivision,_Salvador_Dali_Lincoln_in_Dalivision_Print,_Lincoln_in_Dalivision
Figure 3
“Lincoln in Dalivision, Salvador Dali Lincoln in Dalivision Print, Lincoln in Dalivision”. One (relatively) small scale depicts ‘Gala’ gazing at the sea, which in turn ‘merges into’, at an ‘intermediate’ scale, a portrait of Abraham Lincoln. The ‘frame’ of the full painting ends our ‘boundary of interest’. If we stand much much further back, we no longer recognise any interesting features – our ‘largest’ observation scale determines the largest scale features we wish to perceive.

Related to the (applied mathematics) concepts of intermediate asymptotics and renormalization scaling is another set of concepts that I will (loosely) draw on below – the (thermodynamic) concepts of ‘external variables’, ‘internal variables’ and ‘internal coordinates’. Roughly speaking, the external variables determine the overall ‘shape’ of the closure as determined by ‘background’ conditions and connect our invariant theories (see next) to external measurements, the internal variables are intermediate variables that form (approximately, at least) an invariant and predictively complete set for a (scale-free) phenomenon of interest, while the internal coordinates index a finer set of internal degrees of freedom. In general the internal variables are determined from integrals over internal degrees of freedom/internal coordinates. So we have (at least) three scales – ‘external’, ‘intermediate’ and ‘small’.

This enables us [or will eventually] to compare theories that are a priori distinct, e.g. have different parameter domains and definitions, but seem similar when looked at in the right way. That is, it may be possible to find a common, scale-free predictive theory with a (relatively) invariant set of internal variables that serve as a common target mapping for the variables of distinct theories to enable consistent comparison. To connect back to reality requires ‘boundary closures’ on ‘either side’ of the intermediate, invariant theory – i.e. data space closure via a notion of measurement and parameter space closure via a notion of stability under manipulation/variation in other degrees of freedom (and relates to the formulation of priors).

A basic theme emerges:

‘causality’ and ‘mechanistic’ understanding are about invariant structures under the scales and controls of interest; probability enters into consideration in a somewhat secondary manner: to capture uncertainty within and between structural relationships, and in determining the resolution of control and measurement accuracy.

Additional notes
For now, here are some (very quickly sketched) handwritten notes.

0.0 A first attempt at a ‘closure functor’

cat-stat

0.1 A first/another attempt at relating model closure to ideas of invariance, intermediate asymptotics etc

Invariance_Intermediate_Asymptotics_Causality_Categories_20151015_Combined

Further notes
Besides properly tidying these ideas up, I also want to connect them to Laurie Davies’ ‘Approximate models‘ approach.

Causal recipes

From Cakes, Custards and Category Theory by Eugenia Cheng:

The idea of maths is to look for similarities between things so that you only need one ‘recipe’ for many different situations. The key is that when you ignore some details, the situations become easier to understand, and you can fill in the variables later…

…once you’ve made the abstract ‘recipe’ you will find that you won’t be able to apply it to everything. But you are at least in a position to try, and sometimes surprising things turn out to work in the same recipe.

This connects with my earlier post on what the domain of the ‘for all’ is in the closure conditions – we are taking a rather structuralist view of causal theories (or model closure schema). That is, we are saying what the structure, expressed in terms of relationships between a collection of objects, of an idealised causal theory looks like without worrying too much (for now) about the nature of objects to be ‘filled in’.

Obviously more needs to be said on the crucial ideas of idealisation and approximation (though I’ve touched on these somewhat) and hence the process of slotting objects in. This is what I’d like to focus on next, hopefully, before further linking to some of the other causal literature.

Postscript
This idea of focusing on the essence of the recipe rather than the details of the objects is of course quite generally applicable (get it!) and, I feel, has a lot of pedagogical value. For example I recently read a nice article on improving the teaching of simple significance testing here. The author takes a quite similar ‘structuralist’ (in my view) and ‘abstract recipe’ perspective. Which is somewhat ironic since, without meaning to nitpick a nice article, claims

When statistics is taught by mathematicians, I can see the temptation. In mathematical terms, the differences between tests are the interesting part. This is where mathematicians show their chops, and it’s where they do the difficult and important job of inventing new recipes to cook reliable results from new ingredients in new situations. Users of statistics, though, would be happy to stipulate that mathematicians have been clever, and that we’re all grateful to them, so we can get onto the job of doing the statistics we need to do

Ironically, as argued above, a mathematician (or at least one who likes the ‘abstract nonsense’ of category theory) would probably prefer the view expressed earlier in the same article:

Every significance test works exactly the same way. We should teach this first, teach it often, and teach it loudly; but we don’t. Instead, we make a huge mistake: we whiz by it and begin teaching test after test, bombarding students with derivations of test statistics and distributions and paying more attention to differences among tests than to their crucial, underlying identity. No wonder students resent statistics.