Is this high-school mathematics problem well-posed?

Overview and background
A brief discussion of well-posedness, singular problems and invariance, in the context of a high-school mathematics problem. Promoted by my return to NZ for a bit and catching up with family – my Dad is doing a PhD in mathematics education (more on that one day) and asked me to have a go at a problem he is using in a demonstration. I present my first naive solution and subsequent refinement. My Dad and I argue and then possibly agree. I was hospitalized shortly after but our discussion (probably) had nothing to do with this. Version 0.5.

The problem
No, not this one.

Instead consider the following ‘ladder problem’ as posed in an NCEA Level 3 mathematics exam (final year of high school in NZ):

ladder-problem

A naive solution
Under ‘exam conditions’ – drinking my obligatory daily flat white and having a maths problem suddenly handed to me by my Dad – this was (roughly) my approach. In sketchy, narrative form.

1. Read problem definition. Derivatives. Constraint.
2. Chain rule, implicit differentiation, or something.

So
x’ given. y’ desired. c(x, y)=0 given.

(1) x^2 + y^2 = 25

Differentiate. Drop constants.

(2) xx’ + yy’ = 0

ie
(2′) y’ = -xx’/y

Need x. Use (1) again for x:

(1): x = sqrt(25-y^2)

Into (2′):

(3) y’ = -sqrt(25-y^2)x’/y

All RHS quantities known. Plug in.

Ans: y’ = 0.8 m/s

Assuming no outrageous errors, I think this is what they were after.

A ‘paradox’
My Dad then asked for the solution for y=0.3m.

What he was getting at was this – looking at (3) clearly the problem is ill-defined, or singular, as y approaches zero. This can’t really be saved by any sensible, obvious or consistent dominant balance involving x or x’ going to zero at the same time.

This presents a nice toy model for thinking about regularisation (see also here, though the examples there are less directly relevant to the current problem) – I often find it a good principle to think about exactly how singularities arise and think of ways to remove them and hence ‘regularise’ a problem. This often points to a better conceptual understanding of a given problem.

As I have said again and again elsewhere on this blog, this sort of process concerns finding, testing and modifying different ‘model closures’.

A ‘resolution’
Let’s look at one resolution, that is not in itself incorrect but I don’t find especially illuminating. This was what my Dad pointed me to at some point. We argued a bit about whether this captured the essence of the ‘paradox’ and its resolution. My preferred – but, ultimately complementary – solution is given in the following section.

The solution my Dad preferred is presented in the link here and is described as follows:

Using results from related rate problems, some calculus books suggest that a ladder leaning against a wall and sliding under the influence of gravity will reach speeds that approach infinity. This Demonstration is built from the actual equations that govern the motion of the ladder as determined by the theory of rigid body mechanics. It shows that a sliding ladder never reaches very high speeds. The motion can be followed in two contrasting situations, with the top of the ladder either free to move away from the wall or constrained to be in contact with the wall. The forces are calculated for the falling ladder just before the top hits the floor.

The problem I have with this resolution is that, while likely correct (I haven’t checked all the details), it seems to obscure the key issues. It jumps straight to forces and gravity and Newton. But how exactly does the purely ‘geometric’ problem breakdown? Does it? When do we, if ever, need to move from kinematics to dynamics? What are the key/minimal conservation relations required for a well-posed problem?

(In other words, due to my undergrad education and for better or worse, I’ve been somewhat influenced by the spirit of Rational Mechanics [a la Truesdell, Noll], and would quite like a more axiomatic breakdown.)

An alternative perspective
Note: I don’t think the modification here contradicts the sort of solution proposed in the previous section. It is simply another perspective aimed at conceptual clarification.

Again, l’ll adopt a sketchy, narrative description.

Singular problems often result from an incorrect reduction of dimension and hence can be regularised by reintroducing additional scales, dimensions, quantities or cutoffs.

The ‘physical’ resolution noted that the ladder can detach from the wall. A tension between the wall constraint and the motion constraints appears to produce the singularity.

Consider a perfectly horizontal ladder lying on the ground. If it stays attached and the other end continues to move according to the given kinematic condition then the only possibility is that the ladder is being stretched. This violates the (presumably valid) assumption that the ladder is a rigid object (but see later for more on this!).

In fact, this shows up in the Wolfram example. The simulation allows you to (requires you to?) solve two different problems – the ladder able to detach and the kinematic constraint (given horizontal rate of motion for the bottom of the ladder) satisfied (I think) OR the ladder not able to detach and the horizontal (kinematic motion) constraint dropped in favour of a rate determined by angular and linear momentum conservation for a rigid rod falling under gravity.

Let’s consider the first case – i.e. a detachable ladder with the constraint of a fixed horizontal rate of motion for the bottom of the ladder satisfied. (This is presumably just as physically realisable in an experimental setup as a freely-falling ladder, e.g. by connecting it to a controlled pulling mechanism, and closer to the original problem specification.)

In this case we can remove the contradiction between the model and constraints (which generates the singularity) by simply introducing a moving coordinate system. This is implicitly fixed in the original solution. The key invariant is still the ladder length. See the figure below

Ladder Problem Sketch

Now, for convenience, let’s continue to fix the y coordinate origin at 0, but allow the x coordinate origin to be variable. Call this x0, but note this is not in general constant.

Redo the calculations. Keep the same numbering.

(1) (x-x0)^2 + y^2 = 25

Differentiate. Note x0 varies in time! Drop constants.

(2) (x-x0)(x’-x0′) + yy’ = 0

This expresses the key problem invariant – the ladder length. As expected, the price of an enlarged, non-singular problem is greater underdetermination. The original problem has x0, x0′ = 0, but if the ladder detaches then these are not true in general.

Note y=0 now implies x0 = 0 and/or x0′ = x’. This latter case, with x0 unknown, allows a rigid sliding of the ladder along the ground. In general, we can maintain sensible dominant balances so as to define the behaviour for small y and in the limit as y goes to zero.

In general, preservation of the key invariant (ladder length) plus special boundary constraints (touching the wall and/or floor) now allows the solution of particular cases. So we now have two well-posed (or better-posed) problems – touching the wall and touching the floor, respectively – with an underdetermined but non-singular problem in-between. We can’t, for example, say exactly when the ladder might be expected to detach from the wall, on the basis of the given info. The detachment point is unknown. For the sliding problem the initial x0 is also unknown in general. (Relevant exercise for the reader: Google ‘matched asymptotic expansion’).

So no, the problem is not fully well-posed, though it is soluble by making special assumptions. It is also (to me) clearer now where the additional information should come from – for example (a bound on) the rotation rate required to keep the bar in contact with the wall, given the kinematic condition (staying as close as possible to the problem as posed). This is of course determined by angular and linear momentum conservation, as in the Wolfram simulation.

It also raises other, equally realistic, possibilities though – violation of the rigid body assumption leading to deformation (stretching/strain, where x0=0 say but x’>x0′) or fracture (similar to the detachment case).

So, at some point one may need to introduce additional information – eg conservation of linear/angular momentum but also maybe material properties – to solve the expanded problem, but this shouldn’t obscure the key invariants and assumptions used, why they are required and at what point they are introduced.

This leads to a more general lesson.

Morally speaking
The key lesson to me is this:

The price of removing a singularity by embedding a problem in a higher dimensional problem is typically greater undetermination requiring additional information to solve in full generality. Regardless, it is helpful to view the original problem as a particular limit of an expanded problem.

Asymptotics, renormalization and scientific theories

Overview
In lieu of a post with original material and/or updates on the other posts, here is a nice quote relating to some of the key themes that I’ve started exploring on this blog. Specifically a quote about asymptotics and renormalization (and, by implication, model closure, approximation and invariance), and how these can illuminate some aspects of the nature of scientific theories.

On renormalization
From ‘Intermediate Asymptotics and Renormalization Group Theory’  by Goldenfeld, Martin, Oono (1989).

[a] macroscopic phenomenological description…consists of two parts: the universal structure, i.e., the structure of the equation itself, and phenomenological parameters sensitive to the specific microscopic physics of the system. Any good phenomenological description of a system always has this structure: a universal part and a few detail-sensitive parameters…In this sense, it is [also] possible that there is no good macroscopic phenomenology [for a given system of interest].

Thus if we consider a set of transformations that alters only the microscopic parameters of a model…the macroscopic universal features should remain unchanged. Therefore, if we can absorb the changes caused by modification of microscopic parameters into a few phenomenological parameters, we can obtain universal relations between phenomenological parameters.

If this is possible by introducing a finite number of phenomenological parameters, we say that the model (or the system) is renormalizable. This is the standard method of formulating the problem of extracting macroscopic phenomenology with RG. RG seeks the microscopic detail sensitive parts in the theory and tries to absorb them into macroscopic phenomenological parameters.

…Suppose that the macroscopic phenomenology of a system can be described successfully with a renormalizable microscopic model. The phenomenological parameters must be provided from either experiment or from a description valid at a smaller length scale. Is this a fundamental limitation of the renormalizable theory? If one is a reductionist, the answer is probably yes. However, another point of view is that microscopic models are not more fundamental than macroscopic phenomenology.

In fact, it is inevitable that in constructing models of physical systems, phenomena beyond some energy scale (or on length scales below a threshold) are neglected. In this sense, all present-day theoretical physics is macroscopic phenomenology.

Renormalization group theory has taught us how to extract definite macroscopic conclusions from this vague description. Of course, this is not always possible…However, we clearly recognize general macroscopic features of the world in our daily lives as macroscopic creatures! Thus, we may believe that for many important aspects of the macroscopic world there must be renormalizability. We may say that renormalizability makes physics possible.

Closure: objective and subjective, truth and approximation

Overview
A sketch of a few thoughts on ‘objective’ vs ‘subjective’ and ‘truth’ vs ‘approximation’ in the context of what I’ve been calling ‘model closure‘. Taking a roughly/informally category theory perspective. Includes more discussion of how the data space is idealised/closed as well as the parameter/theory space, as well as issues of invariance, multiple scales, intermediate asymptotics and renormalization.

Disclaimer
Still very rough. I have included some handwritten notes for now – will convert to typeset later. [Version: 0.3]

Orientation: objective and subjective, truth and approximation
First, I want to set the basic conceptual picture. I’ve mentioned this perspective a few times but I think it’s good to re-emphasise using some visualisation. Consider the following conceptual pictures, all making similar points:

combined_objective
Figure 1: ‘Thinking’ as a process of ‘mirroring’ ‘reality’ (L) and
the ‘objective/subjective thinking’ distinction as a further mirroring (essentially via a ‘functor’) of this ‘thinking-reality’ relationship within the ‘thinking’ concept itself (R; both from ‘Conceptual Mathematics’ by Lawvere and Schanuel).

combined_inside_outside
Figure 2: Testing ‘within’ and ‘without’ relative to a model (L; from ‘Probability theory and statistical inference’ by Spanos 1999) and a geometric picture of model closure relative to the ‘truth’ (R; my own drawing).

Each of these figures makes the point that:

even in ‘model world’ (c.f. the ‘real’ world) we need to distinguish between the ‘objective, external’ world and the ‘subjective, internal’ world. In particular, this distinction is drawn relative to the boundary defining the model closure, and applies to both ‘data’ and ‘parameters’.

As I have discussed in other posts, closure is what delinates the boundary between estimating parameters within a model structure and testing the model adequacy with respect to external reality. We have essentially already considered the parameter closure, i.e. discarding ‘irrelevant’ parameters (theoretical constructs). The same idea applies, however, to the data space closure. Some do not distinguish ‘within’ and ‘without’ in the way done here for various reasons – from ‘all models are wrong and therefore subjective’ to leaving ‘lumps of probability‘ to keep the ‘options open’ somewhat. There is some truth in these general ideas; after all, all closures are provisional. I still prefer to explicitly introduce and distinguish ‘inside’ and ‘outside’ a model and ‘objective’ and ‘subjective’ constructs, however – even when both are (and really, can only be) imagined.

‘Intermediate’ structure and multiple scales
On the other hand, a subtle issue emerges in a similar way to in the ‘tacking paradox’ post – the distinction between predictive irrelevance and more ‘complete’ irrelevance, i.e. the presence or absence and nature of further internal degrees of freedom. We need to find a way to follow the advice to

Rule out the accidental features
And you will see: the world is marvellous

– Alexander Block (translated by Sir James Lighthill)

This ‘intermediate’ perspective is described in Barenblatt’s ‘Scaling‘ which quotes the above and also give the following painting as a conceptual example:

Lincoln_in_Dalivision,_Salvador_Dali_Lincoln_in_Dalivision_Print,_Lincoln_in_Dalivision
Figure 3
“Lincoln in Dalivision, Salvador Dali Lincoln in Dalivision Print, Lincoln in Dalivision”. One (relatively) small scale depicts ‘Gala’ gazing at the sea, which in turn ‘merges into’, at an ‘intermediate’ scale, a portrait of Abraham Lincoln. The ‘frame’ of the full painting ends our ‘boundary of interest’. If we stand much much further back, we no longer recognise any interesting features – our ‘largest’ observation scale determines the largest scale features we wish to perceive.

Related to the (applied mathematics) concepts of intermediate asymptotics and renormalization scaling is another set of concepts that I will (loosely) draw on below – the (thermodynamic) concepts of ‘external variables’, ‘internal variables’ and ‘internal coordinates’. Roughly speaking, the external variables determine the overall ‘shape’ of the closure as determined by ‘background’ conditions and connect our invariant theories (see next) to external measurements, the internal variables are intermediate variables that form (approximately, at least) an invariant and predictively complete set for a (scale-free) phenomenon of interest, while the internal coordinates index a finer set of internal degrees of freedom. In general the internal variables are determined from integrals over internal degrees of freedom/internal coordinates. So we have (at least) three scales – ‘external’, ‘intermediate’ and ‘small’.

This enables us [or will eventually] to compare theories that are a priori distinct, e.g. have different parameter domains and definitions, but seem similar when looked at in the right way. That is, it may be possible to find a common, scale-free predictive theory with a (relatively) invariant set of internal variables that serve as a common target mapping for the variables of distinct theories to enable consistent comparison. To connect back to reality requires ‘boundary closures’ on ‘either side’ of the intermediate, invariant theory – i.e. data space closure via a notion of measurement and parameter space closure via a notion of stability under manipulation/variation in other degrees of freedom (and relates to the formulation of priors).

A basic theme emerges:

‘causality’ and ‘mechanistic’ understanding are about invariant structures under the scales and controls of interest; probability enters into consideration in a somewhat secondary manner: to capture uncertainty within and between structural relationships, and in determining the resolution of control and measurement accuracy.

Additional notes
For now, here are some (very quickly sketched) handwritten notes.

0.0 A first attempt at a ‘closure functor’

cat-stat

0.1 A first/another attempt at relating model closure to ideas of invariance, intermediate asymptotics etc

Invariance_Intermediate_Asymptotics_Causality_Categories_20151015_Combined

Further notes
Besides properly tidying these ideas up, I also want to connect them to Laurie Davies’ ‘Approximate models‘ approach.

Causal recipes

From Cakes, Custards and Category Theory by Eugenia Cheng:

The idea of maths is to look for similarities between things so that you only need one ‘recipe’ for many different situations. The key is that when you ignore some details, the situations become easier to understand, and you can fill in the variables later…

…once you’ve made the abstract ‘recipe’ you will find that you won’t be able to apply it to everything. But you are at least in a position to try, and sometimes surprising things turn out to work in the same recipe.

This connects with my earlier post on what the domain of the ‘for all’ is in the closure conditions – we are taking a rather structuralist view of causal theories (or model closure schema). That is, we are saying what the structure, expressed in terms of relationships between a collection of objects, of an idealised causal theory looks like without worrying too much (for now) about the nature of objects to be ‘filled in’.

Obviously more needs to be said on the crucial ideas of idealisation and approximation (though I’ve touched on these somewhat) and hence the process of slotting objects in. This is what I’d like to focus on next, hopefully, before further linking to some of the other causal literature.

Postscript
This idea of focusing on the essence of the recipe rather than the details of the objects is of course quite generally applicable (get it!) and, I feel, has a lot of pedagogical value. For example I recently read a nice article on improving the teaching of simple significance testing here. The author takes a quite similar ‘structuralist’ (in my view) and ‘abstract recipe’ perspective. Which is somewhat ironic since, without meaning to nitpick a nice article, claims

When statistics is taught by mathematicians, I can see the temptation. In mathematical terms, the differences between tests are the interesting part. This is where mathematicians show their chops, and it’s where they do the difficult and important job of inventing new recipes to cook reliable results from new ingredients in new situations. Users of statistics, though, would be happy to stipulate that mathematicians have been clever, and that we’re all grateful to them, so we can get onto the job of doing the statistics we need to do

Ironically, as argued above, a mathematician (or at least one who likes the ‘abstract nonsense’ of category theory) would probably prefer the view expressed earlier in the same article:

Every significance test works exactly the same way. We should teach this first, teach it often, and teach it loudly; but we don’t. Instead, we make a huge mistake: we whiz by it and begin teaching test after test, bombarding students with derivations of test statistics and distributions and paying more attention to differences among tests than to their crucial, underlying identity. No wonder students resent statistics.

The ‘tacking paradox’: model closure and irrelevant hypotheses

Overview
I. The tacking paradox in philosophy of science
Interlude – Bayesian or not?
II. A resolution
Interlude – severe tests and tracking truth?
III. Implications for mathematical/computational models in practice (sort of)

Disclaimer
This is the one of (what should be) a few posts which aim to connect some basic puzzles in the philosophy and methodology of science to the practice of mathematical and computational modelling. They are not intended to be particularly deep philosophically or to be (directly) practical scientifically. Nor are they fully complete expositions. Still, I find thinking about these puzzles in this context to be an interesting exercise which might provide a conceptual guide for better understanding (and perhaps improving?) the practice of mathematical and computational modelling. These are written by a mathematical modeller grappling with philosophical questions, rather than by a philosopher, so bear that in mind! Comments, criticisms and feedback of course welcome! [Current version: 3.0.]

I. The tacking paradox in philosophy of science (or, the problem of irrelevant hypotheses)
The so-called tacking paradox (or at least one instance) can be described in minimal terms as follows. More detail is given on Philosopher Deborah Mayo’s blog here, (along with some responses in the comments section that don’t seem too far from the resolution given here, though they are a little unclear to me in places). As I noted in my other posts, I will prefer to think of ‘hypotheses’ h as parameters within mathematical model structures predicting data y (search this blog for more). The basic perspective from which I will try to resolve this problem is that of schematic ‘model closure’ assumptions.

Firstly, we need to define what it means to ‘Bayesian confirm’ (really, ‘Likelihood confirm’) a hypothesis h given data y. Let’s take the following statement to capture this idea, in terms of ‘predictive confirmation’:

(1) p(y|h,b) > p(y|b)

That is, if the hypothesis h makes the data more likely (under a given model p) then this is taken to mean ‘y confirms h’. Note that we have included a controlled/given background context b.

Also note that ‘confirmation’ as defined here is thus a change in probability rather than a probability itself. There are a number of different positions on this topic (see e.g. Mayo’s discussion) but I prefer to think of ‘confirmation’/’evidence’ given new data as a change in belief/probability induced by that data (to do – further references) to a new state of belief/probability. This is the difference between ‘state variables’ and ‘fluxes’ in physics/dynamical systems – or ‘stocks’ and ‘flows’ to use an equivalent terminology (which I really dislike!).

So we ‘confirm’ a hypothesis when it makes a ‘successful prediction’ of newly observed data. This seems a fairly non-controversial assumption in the sense that a minimal measure of the ‘quality’ of a theory ought to mean that one can predict observations better (to at least some degree) than not having that theory. Note that this is relative to and requires the existence of a prior predictive distribution p(y|b), and this should exist in a standard Bayesian account (more on this one day). I will think of this as ‘predictive relevance’.

Now, suppose that the scheme (1) is true for a hypothesis h1 and data y0, i.e. p(y0|h1,b) > p(y0|b). Say y0 represents some planetary observations and h1 some aspect of (parameter in) Newton’s theory.

Next, ‘irrelevance’ of a hypothesis h” is usually defined in this context as:

(2) p(y|h’,h”,b) = p(y|h’,b)

which is clearly relative to y, h’, p and b. Note that in my terminology we have ‘predictive irrelevance’ of h” here.

This leads to the following argument. Let h2 be a typical ‘irrelevant theory’ e.g. a theory about the colour of my hat, that is (for example) a parameter representing possible ‘colour values’ my hat could take. Then we have

p(y0|h1,b) > p(y0|b) {given that y0 Bayesian/Likelihood confirms h1}

p(y0|h1,h2,b) = p(y0|h1,b) {assuming irrelevance of h2}

So

p(y0|h1,h2,b) > p(y0|b)

Therefore y0 Bayesian/Likelihood confirms (h1&h2) (with respect to model p and background b).

So what’s the ‘paradox’? The (allegedly) troubling thing is that h2 is supposed to be ‘irrelevant’ and yet it seems to be confirmed along with h1. So planetary observations seem to be able to confirm something like ‘Newton’s theory is true and my hat is red’.

More concretely, one might try to argue as follows: since (h1&h2) is confirmed and since the joint proposition/logical conjunction (h1&h2) logically entails h2, then h2 is confirmed {confirmation/epistemic closure principle}.

So, according to the above argument, ‘my hat is red’ could be confirmed by planetary observations. This argument scheme captures a notion of ‘knowledge is closed under deductive entailment’ or ‘epistemic closure’ in the epistemological literature. Note, however, that this does not follow from any of the main model closure axioms that we have put forward thus far – the ‘model closure’ we refer to is not ‘epistemic closure’. In fact, the approach we follow has more in common with those taken to deny epistemic closure, such as Nozick and/or Dretske (see here).

Interlude – Bayesian or not?
Before I give my preferred resolution of the paradox, there are a couple of points to distinguish here – first, is the Bayesian/Likelihoodist language appropriate to express a resolution of this problem? Second, are the concepts involved in the resolution inherently part of or extrinsic to the Bayesian/Likelihoodist approach? This second point is, I take it, what led Clark Glymour to write ‘Why I am not a Bayesian’ (1981) (see also Pearl’s ‘What I am only a half-Bayesian’) – my interpretation of his point being that the resolution to ‘paradoxes’ such as these may or may not be expressible within the Bayesian language but the underlying concepts driving what we translate into Bayesian language are additional to and not a part of the basic Bayesian account.

I basically agree with Glymour on this general point, but use the Bayesian language to express the concepts required to resolve the ‘paradox’. My view, as expressed elsewhere on this blog, is that these are additional ‘closure’ assumptions. As pointed out above, these are not ‘epistemic closure’ assumptions but rather schematic model structure closure assumptions (see here and here). The need for assumptions such as these, whether considered ‘pure’ Bayesian or not, are, however, explicitly and/or implicitly acknowledged by many Bayesians (e.g. Jaynes, Gelman etc).

II. A resolution
Firstly, consider whether we have really captured the notion of ‘irrelevance’. We seem to have predictive irrelevance but what about ‘boundary/background irrelevance’ – i.e if the variable is ‘truly’ irrelevant then we could imagine moving it into the ‘boundary/controlled’ or ‘background’ variables and varying it without affecting the variables that matter. Thus I argue that we have more information available (knowledge of possible relationships) in the problem specification than we have used.

In particular, based on the closure conditions I gave in the first post on this blog, I would argue that applying the model closure assumptions (1-3) in that post to p(y0|h1,h2,b) requires us to specify both p(y0|h1,h2,b) = p(y0|h1,b) {predictive irrelevance}, as well as an expression for p(h1|h2,b). That is

We are obligated, according to our model closure principles, to say how varying h2 affects h1 in order to have a well-posed problem. It either has a relevant affect – varying h2 by experimental control affects h1 – or it is a ‘fully irrelevant’ background variable.

In light of the above discussion, we will take ‘h2 is an irrelevant hypothesis’ to further mean

(3) p(h1|h2,b) = p(h1|b) for all h1, h2, b

i.e. h2 falls into the ‘truly irrelevant background variables’, rather than the ‘controlled and controlling boundary values’ b. This means that varying h2 cannot control h1: the parameters of Newton’s theory are not manipulable by changing the colour of my hat. It is a truly ‘passive cog’ capable of no explanatory work.

Note also that we are actually speaking at the schematic/structural level here – i.e. for any value h1, h2 take – and hence counterfactually about particular instances conceived as members of a set of possible values.

So in this context I can vary (or imagine varying) my hat colour and how this affects other variables. Though perhaps unfamiliar to many, this is actually a common way of framing theories in the physical sciences, even in classical mechanics, e.g. D’Alembert’s principle and related ideas, which require ‘virtual’ (counterfactual) displacements.

This leads to [to do – proper latex in wordpress]

p(y0|h2,b) = ∫ p(y0|h1,h2,b)p(h1|h2,b) dh1

=  ∫ p(y0|h1,b)p(h1|h2,b) dh1 {by ‘predictive irrelevance’ of h2}

=  ∫ p(y0|h1,b)p(h1|b) dh1 {by ‘h1 is not manipulable by h2’}

= p(y0|b)

Therefore

p(y0|h2,b) = p(y0|b)

and so h2 is not confirmed by y0 at all! Note again that, in defining our original closure conditions, we required some assumption to be made on p(h2|h1,b) – the one chosen in the particular context here seems to best represent the concept of ‘irrelevance’ intended. Thus when we include both predictive irrelevance and boundary irrelevance/non-manipulability closure assumptions then there is no paradox.

Until some p(h1|h2) is given we have an ill-posed problem – or, at best we can find a class of solutions and require boundary conditions to further pick out solutions capturing our particular circumstances.

For example, one could also imagine an h2 which is simply a ‘duplicate’ of h1 – this satisfies predictive irrelevance in that it adds no predictive ability to know the same thing twice, but may be considered the opposite (singular/delta) limit of p(h1|h2).

Interlude – severe tests and tracking truth?
Since I motivated this problem with reference to Mayo’s blog, how might the ‘severe testing’ concept relate? For now, a quick thought: if by ‘test’ we mean ‘behaviour under specified experimental manipulations‘ then we see some similarity. In particular, one might imagaine that the ‘testing’ aspect refers to defining boundary conditions and related behaviour under possible (‘counterfactual’ or ‘virtual’) boundary manipulations, which is a crucial part of the ‘model closure’ account here.

Similarly, Nozick’s ‘truth tracking’ account in epistemology is relativised to methods and, if we equate ‘methods’ to ‘model structures’ – which seems appropriate since a ‘model structure’ is really a functional recipe – then it also has much in common with the ‘model closure’ (again, as opposed to epistemic closure) account given here. Furthermore, I think Kripke’s supposed ‘red barn’ counterexample (see here) to Nozick’s theory seems to fail for similar reasons of being an ill-posed problem: the solution depends on how the ‘boundary of the problem’ is closed.

I will (hopefully) have more to say on these topics at some point.

III. Implications for the everyday ‘mathematical/computational modeller’
What does this mean for people building mathematical and/or computational models of complex phenomena such as those of biology? As all of us ‘mathematical modellers’ who have tried to do something even resembling ‘real science’  know, we almost always face the ‘simple model/complex model’ and ‘modelling for understanding/modelling for prediction’ trade-offs.

Consider this common experience: you present a slightly too complicated model (all of them, and none of them, basically) and show it ‘predicting’ some experimental result ‘correctly’. The first question is, of course, so what? Why should I trust your model? Followed by ‘I could fit an elephant (with a wiggly trunk) with that model’ and/or ‘most of those parameters appear completely irrelevant – what are the most important parameters. Have you done a sensitivity analysis?’.

You see the parallel with the tacking paradox – with all those (presumably) extraneous, irrelevant parameters (hypotheses) ‘tacked onto’ your model, how can you possibly say that it is ‘confirmed’ by the fact that it predicts some experiment? Which of your parameters really capture the ‘true mechanism’ and which are ‘irrelevant’?

The resolution is of course that
a) the model as a whole can be ‘confirmed’ (that is, made more probable to some degree by fitting the data/avoiding being falsified etc)
but
b) we don’t know which parts are confirmed and by how much, unless we know how the parameters (hypotheses) within the model relate to each other.

In order to further reduce the model to ‘minimal’ or ‘mechanistic’ form, we need to define behaviour under (possible) manipulation (boundary conditions). Predictively irrelevant variables either have ‘boundary condition’ effects or no effects, but we need to say which is the case.

One problem then in practice is that, without going further and investigating relations between parameters (via direct manipulation and/or varying boundary/contextual assumptions, say), we are restricted in our ability to generalise to new situations – without being able to identify ‘modular’ or ‘invariant’ model components (more on this one day, hopefully) and the context within which this invariance applies, we don’t know which can be used to build models of similar but differing situations.

From a ‘machine learning’ point of view this could be considered a form of bias-variance trade-off – without stable (invariant) sub-components that apply to other contexts we are at risk of ‘overfitting’. So ‘bias’ is really (a form of) ‘knowledge external to this particular dataset‘.

To put it another way, Newton’s law of universal gravitation is a whole lot more useful as a force model than Maclaren’s law of forces between these two particular objects in this particular context, precisely because it is an invariant feature of nature valid for a wide range (e.g. inertial) of frames of reference. Thus mere prediction on one dataset is not enough to be scientifically interesting. Which we all know of course but – let’s be honest! – can often forget in the day-to-day grind.

To me, these ‘extra-statistical’ closure assumptions are often guided by balancing the competing goals of prediction and understanding. I have some thoughts on how this balance can be clarified, and how some related areas of research bear on this, but this post is getting long and the margins of this blog are too small to..

Model schema and the ‘structuralist’ interpretation of ‘for all’: the uniformity of what?

Disclaimer
A short note mentioning an update of a previous post, as well as an additional comment.

Overview
I have re-written a number of parts of my first post ‘For all’ is not ‘catch all’: closure, model schema and how a Bayesian can be a Falsificationist. I’ve added some brief references to Jaynes, for one, but also have tried to clarify the nature of the ‘for all’ statement and its domain of application. This came up in the comment section. I’ve copied this section below (in blue), as well as added an additional comment after.

What is the domain of the ‘for all’?
A further clarification is needed [see the comment section for the origins of this]: the closure conditions are schematic/structural and only implicitly determine the domain of validity B for a given theory. That is, in the general scheme, b and B are placeholders; for a particular proposed theory we need to find particular b and B such that the closure conditions are satisfied. This has an affinity with the ideas of mathematical structuralism (without necessarily committing to endorsing the entire position, at least for now). For example, Awodey (2004, An Answer to Hellman’s Question), describes:

the idea of specifying, for a given…theory only the required or relevant degree of information or structure, the essential features of a given situation, for the purpose at hand, without assuming some ultimate knowledge, specification, or determination of the ‘objects’ involved…The statement of the inferential machinery involved thus becomes a…part of the mathematics…the methods of reasoning involved in different parts of mathematics are not ‘global’ and uniform across fields…but are themselves ‘local’ or relative…[we make] schematic statement[s] about a structure…which can have various instances

This lack of specificity or determination is not an accidental feature of mathematics, to be described as universal quantification over all particular instances in a specific foundational system as the foundationalist would have it…rather it is characteristic of mathematical statements that the particular nature of the entities involved plays no role, but rather their relations, operations, etc. – the ‘structures’ that they bear – are related, connected, and described in the statements and proofs of the theorems.

This can be seen as following in the (in this case, algebraic) ‘structuralist’ tradition of Hilbert (1899, in a letter to Frege):

it is surely obvious that every theory is only a scaffolding or schema of concepts together with their necessary relations to one another, and that the basic elements can be thought of in any way one likes…

…the application of a theory to the world of appearances always requires a certain measure of good will and tactfulness: e.g., that we substitute the smallest possible bodies for points and the longest possible ones, e.g., light-rays, for lines. At the same time, the further a theory has been developed and the more finely articulated its structure, the more obvious the kind of application it has to the world of appearances

So, here we are defining a model schema capturing the idea of the ‘closure of a model’ or, alternatively, a ‘closed model structure’, and meant to capture some notion of induction ‘within’ a model structure and falsification ‘outside’ a model structure. Hilbert’s last paragraph captures this second point.

Suppose we have a background of interest for which we want to create a theory. It may be/almost certainly is the case that there are (many) possible contexts/backgrounds for which we cannot find ‘good’ theories satisfying the closure conditions – e.g. the theories are either much too general or much too specific. This is why psychology is in some ways ‘harder’ than physics – it is very difficult to partition the large number of possibly relevant variables for predicting ‘target’ variables y into a small number of invariant theoretical contructs x, a small set of ‘controllable’ variables b’ and a large set of ‘irrelevant’ variables b”. If we wish to retain an ability to ‘realistically represent’ the phenomenon of interest captured by y, then most things will be ‘explanatory variables’ needing to be placed in x and/or controlled in b’. That is, we will have a very descriptive theory, as opposed to a very ‘causal’ theory. Note that the division (3) into ‘controlled’ and ‘irrelevant’ variables b’ and b”, respectively, tries to help with this, to some extent, but means that controlled lab experiments can be both quite reproducible within a lab but can fail to generalise outside it.

The closure conditions mean that we still know what a theory should look like, if it exists, though and this helps with the search.

A further comment – the uniformity of what?
It’s also worth noting (as I did in the comment section on the original post) that when confronted with complex phenomena we have a choice of

– developing a (probably ad hoc) theory for unusual events by moving some b” variables into b’ or x and having a more complicated theory structure (in terms of number of theoretically-relevant variables)

– having no theory for unusual events (for the moment) and focusing on those which satisfy the closure. These are the ‘simple but general’ theories like the ideal gas.

Thus different theories have different divisions of x, b’ and b”.

We are hence attempting to avoid the problem of requiring an assumption about the uniformity of nature by only making assumptions about the uniformity of our models and accepting that they may not ‘cover’ the entire real world (see also here). This explains why we are ‘inductive’ ‘inside’ the model – uniformity applies here – but ‘falsificationist’ ‘outside’ our model, i.e. in assessing whether its assumed uniformity holds when held up against the real world.

Whether the closure conditions are satisfied depends on your willingness to accept the closure conditions in a given situation which depends on how you define your observable y (for example). I have more to say on this measurement issue at some point in the future, but suffice to say a ‘coarse’ y makes it easier to accept that the conditions are satisfied, as Hilbert implied.

Model closure and formalism in economics, and a topological metaphor for model closure

Disclaimer
This is the one of (what should be) a few posts which aim to connect some basic puzzles in the philosophy and methodology of science to the practice of mathematical and computational modelling. They are not intended to be particularly deep philosophically or to be (directly) practical scientifically. Nor are they fully complete expositions. Still, I find thinking about these puzzles in this context to be an interesting exercise which might provide a conceptual guide for better understanding (and perhaps improving?) the practice of mathematical and computational modelling. These are written by a mathematical modeller grappling with philosophical questions, rather than by a philosopher, so bear that in mind! Comments, criticisms and feedback of course welcome! [Current version: 2.0]

Overview
I. Model closure and formalism in economics
II. A topological metaphor for model closure – manifolds, charts and atlases

I. Model closure and formalism in economics
Lars P. Syll gives a nice quote here from Shelia Dow, an expert in economic methodology (who I haven’t encountered before), on what is required for obtaining model closure in the context of economics:

…structures with fixed (or at least predictably random) interrelations between separable parts (e.g., economic agents) and predictable (or at least predictably random) outside influences…

…Any formal model is a closed system. Variables are specified and identified as endogenous or exogenous, and relations are specified between them. This is a mechanism for separating off some aspect of an open-system reality for analysis. But, for consistency with the subject matter, any analytical closure needs to be justified on the grounds that, for the purposes of the analysis, it is not unreasonable to treat the variables as having a stable identity, for them to have stable interrelations and not to be subject to unanticipated influences from outside … But in applying such an analysis it is important then to consider what has been assumed away…

I take the quote to make quite similar points to my previous post about the need not just for all formal models to have a closure but also where these closure assumptions come in. This post tried to connect the issue of model closure to debates about ‘catchall’ hypotheses in Bayesian inference.

The point I argued was that the appropriate ‘model closure’ for Bayesian inference (and of course all formal models have a closure, as Shelia argues) occurs (or should occur) at the level of model structure and ‘boundary conditions’ (priors) and does not require a probability distribution over the ‘background’ or ‘external’ variables.

Rather, closure occurs via a collection of ‘structural’ conditional probability statements with some variables only appearing on the right-hand side of the conditioning (and hence not possessing/requiring a probability distribution). These provide an assumed separation into ‘inside the system’, a ‘closed boundary’ based on experimentally-controlled variables and the external/irrelevant ‘outside the system’ variables. Once this closure is established, Bayesian inference can be carried out within this boundary, where probability distributions can be normalised, but not outside. This closure is always temporary and falsifiable, however, and requires qualitatively different inference methods for assessing its validity, such as ‘pure significance’ tests, unless again embedded in a further higher-level model. This is why there can be ‘falsificationist Bayesians’ (e.g. Andrew Gelman).

Note the subtle point that we are constructing a sort of ‘meta-model’ of the process of inference itself, into which we embed particular models of interest.

As I stated in the comments on Lars’ blog, in some ways I’m more optimistic about closure – I think the search for model closure is the search for interesting theories and is part of the ‘stupendous beauty of closure’ referenced in my post. I agree, however, that we may not always be able to find it (especially for ‘messy’ subjects like biology, psychology, economics etc). This came up in my exchanges with the philosopher Greg Gandenberger and is something I need to elaborate on at some point.

This latter issue, as I see it, has its clarification through the roles of idealisation and approximation and the separation of the ‘formal/mathematical’ and ‘actual/possible’ worlds. I’ll return to the topic of formalising (to some extent) the processes of idealisation, measurement and ‘seeing’ vs ‘doing’ at some point in the future. For now just keep in mind that every closure establishes a formal model by separating it off from the real world at some point and hence, as the cliche goes, ‘all models are wrong’. They are ‘wrong’ because of the closure; however, the ‘but some are useful‘ part comes in when we find useful closures.

Useful/beautiful closures may or may not exist – that’s the fun and challenge of doing science!

II. A topological metaphor for model closure – manifolds, charts and atlases
A metaphor for the role of limited (closed) theories can be found in topology and geometry: we might imagine a collection of possibilities of the ‘real world’ as a sort of Platonic abstract manifold of some sort to which we have finite access (to do – Plato’s cave…, possible worlds). Our (closed) models form a patchwork of charts, each of which only cover a small part of the possible world manifold. As the Wikipedia page states

It is not generally possible to describe a manifold with just one chart, because the global structure of the manifold is different from the simple structure of the charts. For example, no single flat map can represent the entire Earth without separation of adjacent features across the map’s boundaries or duplication of coverage. When a manifold is constructed from multiple overlapping charts, the regions where they overlap carry information essential to understanding the global structure.

As mathematical modellers using idealisations (closures) we inevitably use limited models (charts) only covering some part of the target world. Furthermore we may/will never be able to fully cover the entire ‘possible world manifold’. Our best hope is a collection of charts – a so-called atlas – covering as much as we can and with certain mutual consistency properties. In the Bayesian account, each chart would be analogous to a probability distribution over possible parameter values/’true’ states of the world. Note that in this interpretation of the level of model closure, the Bayesian account appears to require some sort of ‘possible world‘ interpretation.

Returning to the topological metaphor, it is again helpful for guiding us on understanding consistency properties – to compare two models of our state of knowledge (charts) we require the analogy of a transition map (see the atlas page) between the charts (models). Without these we cannot compare models of our state of knowledge/information (charts).

This corresponds to the intuitive Bayesian and Likelihoodist constraint that many advocate: one should not compare parameter values between different models unless embedded into a larger model or a mapping between models is provided.

An interesting connection to explore further in the epistemology literature is whether Susan Haack’s ‘crossword puzzle’ metaphor relates to the topological metaphor given here. Her crossword puzzle metaphor for epistemology involves the search for a collection of words (think models/knowledge) satisfying both external empirical ‘clues’ (data) and mutual consistency of ‘intersecting words’ (coherence/invariance properties).

‘For all’ is not ‘catch all’: closure, model schema and how a Bayesian can be a Falsificationist

Disclaimer
This is the first of (what should be) a few posts which aim to connect some basic puzzles in the philosophy and methodology of science to the practice of mathematical and computational modelling. They are not intended to be particularly deep philosophically or to be (directly) practical scientifically. Nor are they fully complete expositions. Still, I find thinking about these puzzles in this context to be an interesting exercise which might provide a conceptual guide for better understanding (and perhaps improving?) the practice of mathematical and computational modelling. These are written by a mathematical modeller grappling with philosophical questions, rather than by a philosopher, so bear that in mind! Comments, criticisms and feedback of course welcome! [Current version: 8.0]

Introduction
This particular post is a quick draft based on some exchanges with the philosopher Deborah Mayo here on statistical inference frameworks. The post is only rough for now; I’ll try to tidy it up a little later, including making it more self-contained and with less waffling. For now I’ll assume you’ve read that post. You probably don’t need to bother with my train-of-thought (and at times frustrated!) comments there as I’ve tried to make them clearer here. The notation is still a bit sloppy in what follows.

The basic problem is to do with closure of mathematical and/or theoretical frameworks. Though here the debate is over closure of statistical inference frameworks, the same issues arise everywhere in mathematical models. For example in statistical mechanics it’s possible to begin from Liouville’s equation involving the full phase-space distribution involving all particles. To derive anything more immediately tractable from this – such as the equations of kinetic theory or the equations of continuum mechanics – one needs to reduce the information contained in the full set of equations. This can be done by ‘coarse graining’ – throwing information away – and/or by identifying a ‘lossless reduction’ using implicit constraints or symmetries. Note that one may also simply postulate and test a reduced model without deriving it from a more ‘basic’ model. Regardless of how the derivation proceeds, the ultimate result of this is identifying a reduced set of variables giving a self-contained set of equations. Here is a nice little article by the materials scientist Hans Christian Öttinger with the great title ‘On the stupendous beauty of closure’. The following passage reiterates the above ideas:

In its widest sense, closure is associated with the search for self-contained levels of description on which time-evolution equations can be formulated in a closed, or autonomous, form. Proper closure requires the identification of the relevant structural variables participating in the dominant processes in a system of interest, and closure hence is synonymous with focusing on the essence of a problem and consequently with deep understanding.

Overview of the ‘problem(s)’ and my ‘solutions’
The questions raised by Mayo are ‘how do Bayesians deal with the problem of normalising probabilities to one when there are always background, unspecified alternatives which should come with some amount of probability attached?’ and, relatedly, ‘can a Bayesian be a Falsificationist?’ My answers are: closure via ‘for all’ conditional probability statements at the level of model schema/model structure and yes, via ‘for all’ conditional probability statements at the level of model schema/model structure. 

I give a (sketchy) elaboration of my answers in the next section but first, consider the prototypical example of a falsifiable theory given by Popper: ‘all swans are white’. As he pointed out, the quantifier ‘for all’ plays a key role in this [need some Popper refs here]. That is, a single counterexample – a ‘there exists’ (a non-white swan) statement – can falsify a ‘for all’ statement. He further noted that part of the appeal of the ‘for all’ theory is that it is sharp and bold, as compared to the the trivially true but less-useful ‘some swans are white’.

Popper’s example is relevant for the closure problem of Bayesian inference as follows. In her post, Mayo quotes a classic exchange in which some famous Bayesians (Savage, Lindley) propose “tacking on a catch-all ‘something else’” hypothesis (the negation of the main set of hypotheses considered), which is given a ‘small lump of prior probability’. This is to avoid having to explicitly ‘close off’ the model. That is, since the ‘catchall’ is of the form ‘or something else happens’, it evades (or tries to evade) the seeming need for Bayesians to have all possibilities explicitly known in advance. Knowing ‘all possible hypotheses’, whether explicitly or implicitly via a ‘catchall’, is (it is argued) required by Bayesians to normalise their probability distributions.

I think this is misguided, and prefer a set of explicit, falsifiable closure assumptions.

The ‘for all’ closure assumptions
The closure statements I argue for instead are ‘for all’ statements, which give ‘sharp closure’ a la Popper, but are at the level of model structure. These describe what a self-contained – closed – theory should look like, if it exists; they do not guarantee that we can always find one, however . Hence I say ‘for all is not catch all’ . These statements can be explained as follows (originally based on my comments on Mayo’s blog, but modified quite a bit). First I need to emphasise the ambiguity over ‘parameter values’ vs ‘models’ vs ‘hypotheses’:

Since here we are constructing a mathematical model capturing the process of inference itself, each parameter value within a model structure (following a model schema) correponds to a particular instance of a ‘mechanistic’ model. It is in the higher-level model of inference that we formulate closure conditions applying to model structures.

For related reasons, I prefer to use point parameter values within a model to refer to ‘simple hypotheses’, rather than ‘compound hypotheses’ which may include multiple parameter values. Philosophers often refer to the latter which can cause much miscommunication. Michael Lew’s comments on Mayo’s post make the same point from a Likelihoodist point of view. This is an interesting topic to return to.

Consider a model structure where we predict a quantity y as a function of x in background context b. As mentioned above, each value of x should be considered a possible parameter value within a mathematical model. Grant the existence of p(y|x,b) and p(x|b). Note b is only ever on the right hand side so need not be considered probabilistic (notation can/will be formalised further).

My first two closure assumptions are
1) p(y|x,b) = p(y|x) for all x,y,b
2) p(x|b) is given/known for all b

These establish a boundary between the explanatory variables x and their effect on y (for a class of models) and the external/environmental variables b and their effect on x. If these model schemata are satisfied by the model structure of interest then it’s fine to apply the usual methods of Bayesian parameter inference within this model structure. Each possible parameter value corresponds to one hypothetical model possibility. Note that these conditional probabilities only involve b on the right-hand side of the conditioning and integrate to one over the possible values on the left-hand side of the conditioning. This includes both integrating over y for statement (1) and integrating over x for statement (2), so is Bayesian to the extent that the parameter(s) x come with a probability distribution. No ‘or something else’ hypothesis is required for x, at least not one with any probability attached.

It helps to further assume a separation of environmental variables into ‘strictly irrelevant’ (not in x or experimentally-controlled) variables b” and ‘experimentally-controlled’/’experimental boundary’ variables b’. These are defined via p(x|b’,b”) = p(x|b’), where b” are the ‘irrelevant’ variables in (the vector) b and b’ are the experimentally controlled variables in b. This sharp division is useful to maintain unless/until we reach a contradiction. This is an idealisation, a mathematical assumption, and a crucial part of model building. It is likely not true but we will try to get away with it until we get caught. We are being bold by claiming ‘x are my theory variables, b’ are my controlled variables all other variables b” are explicitly irrelevant’. The experimentally-controlled bs – b’ – are ‘weakly relevant’ or ‘boundary’ variables in that they affect x but not y and are controlled. They allow us to say what p(x|b’) is.

We can make this another explicit closure condition by stating

(3) p(x|b) = p(x|b’,b”) = p(x|b’) for all b” called ‘irrelevant or fully external variables’ of the background vector b

The difference with the Bayesian catchall is that we don’t have a probability distribution over the background variables, b’ and b” making up b, we only condition on them. Thus we don’t violate any laws of probability by not leaving a ‘lump of probability’ behind. If we put forward a new model in which a previously ‘irrelevant’ variable is considered ‘relevant’ the new model is not related to the old model by any probability statements unless a mapping between the models is given. Functions with different domains are different functions and should not be (directly) compared.

An analogy for the parameter estimation within a model structure is a conservation of mass differential equation (where mass plays the role of probability; one could also simply directly consider a Master equation expressing conservation of probability) within a given domain, boundary conditions at the edge of the domain and all variables that aren’t ‘inside’ or ‘on the boundary’ assumed irrelevant. If the closure conditions are not satisfied then the model structure is misspecified, i.e. the problem is not well-posed, just as with a differential equation model lacking boundary conditions. The inference problem is then to see how probability ‘redistributes’ itself within the domain (over parameter values/model instances of interest) given new observations – again imagine a ‘probability fluid’ for example – subject to appropriate boundary and initial conditions and independence from the external environment. A good model structure has a large domain of applicability – the domain of b/set of values satisfying the model schema (1) & (2), as well as (3) if necessary – and we can only investigate this by varying b and seeing if the conditions still hold. This is Bayesian within the model since the model parameters x have probability distributions.

What is the domain of the ‘for all’?
A further clarification is needed [see the comment section for the origins of this]: the closure conditions are schematic/structural and only implicitly determine the domain of validity B for a given theory. That is, in the general scheme, b and B are placeholders; for a particular proposed theory we need to find particular b and B such that the closure conditions are satisfied. This has an affinity with the ideas of mathematical structuralism (without necessarily committing to endorsing the entire position, at least for now). For example, Awodey (2004, An Answer to Hellman’s Question), describes:

the idea of specifying, for a given…theory only the required or relevant degree of information or structure, the essential features of a given situation, for the purpose at hand, without assuming some ultimate knowledge, specification, or determination of the ‘objects’ involved…The statement of the inferential machinery involved thus becomes a…part of the mathematics…the methods of reasoning involved in different parts of mathematics are not ‘global’ and uniform across fields…but are themselves ‘local’ or relative…[we make] schematic statement[s] about a structure…which can have various instances

This lack of specificity or determination is not an accidental feature of mathematics, to be described as universal quantification over all particular instances in a specific foundational system as the foundationalist would have it…rather it is characteristic of mathematical statements that the particular nature of the entities involved plays no role, but rather their relations, operations, etc. – the ‘structures’ that they bear – are related, connected, and described in the statements and proofs of the theorems.

This can be seen as following in the (in this case, algebraic) ‘structuralist’ tradition of Hilbert (1899, in a letter to Frege):

it is surely obvious that every theory is only a scaffolding or schema of concepts together with their necessary relations to one another, and that the basic elements can be thought of in any way one likes…

…the application of a theory to the world of appearances always requires a certain measure of good will and tactfulness: e.g., that we substitute the smallest possible bodies for points and the longest possible ones, e.g., light-rays, for lines. At the same time, the further a theory has been developed and the more finely articulated its structure, the more obvious the kind of application it has to the world of appearances

So, here we are defining a model schema capturing the idea of the ‘closure of a model’ or, alternatively, a ‘closed model structure’, and meant to capture some notion of induction ‘within’ a model structure and falsification ‘outside’ a model structure. Hilbert’s last paragraph captures this second point.

Suppose we have a background of interest for which we want to create a theory. It may be/almost certainly is the case that there are (many) possible contexts/backgrounds for which we cannot find ‘good’ theories satisfying the closure conditions – e.g. the theories are either much too general or much too specific. This is why psychology is in some ways ‘harder’ than physics – it is very difficult to partition the large number of possibly relevant variables for predicting ‘target’ variables y into a small number of invariant theoretical contructs x, a small set of ‘controllable’ variables b’ and a large set of ‘irrelevant’ variables b”. If we wish to retain an ability to ‘realistically represent’ the phenomenon of interest captured by y, then most things will be ‘explanatory variables’ needing to be placed in x and/or controlled in b’. That is, we will have a very descriptive theory, as opposed to a very ‘causal’ theory. Note that the division (3) into ‘controlled’ and ‘irrelevant’ variables b’ and b”, respectively, tries to help with this, to some extent, but means that controlled lab experiments can be both quite reproducible within a lab but can fail to generalise outside it.

The closure conditions mean that we still know what a theory should look like, if it exists, though and this helps with the search.

Further interpretation, testing and relation to ‘stopping rules’
We see that

(1) is an assumption on mechanism ‘inside’ a domain – i.e. ‘x determines y regardless of context b’
(2) is an assumption on experimental manipulation – i.e. boundary conditions of a sort
(3) is a further division into ‘controlled’ and ‘irrelevant’ background/boundary variables, meaning all background effects pass through and are summarised by knowledge of the boundary manipulations

As emphasised these sort of assumptions are ‘meta-statistical’ closure assumptions but testable to the extent we can explore/consider different contexts (values of b). Another ‘structural’ analogy used here is how, in formal logic, axiom schema are used as a way to express higher-order logic (e.g. second-order logic) formulae as a collection of axioms within a lower-order logic (e.g. first-order logic). In fact this is one way of deductively formalising the other form of inductive inference within first-order logic – mathematical induction. Here, though, we likely have to work much harder to find good instances of the closure assumptions for particular domains of interest.

The analogy to physics problems with divisions of ‘inside the system’, the ‘boundary of the system’ and the ‘external environment of the system’ is clear. Closed systems are defined similarly in that context.

Statistically, these conditions can be checked to some extent by the analogue of so-called ‘pure significance testing’, that is without alternatives lying ‘outside of’ b’s domain. These essentially ask – ‘can I predict y given x (to acceptable approximation) without knowing the values of other variables?’ and ‘do I know how and which of my interventions/context/experimental set up affect my predictor x?’.

Things such as ‘stopping rules’ may be included as part of the variable b, so could affect the validity of assumption (1) and/or assumption (2). For example, a particular stopping rule may be construed as preserving (1) while requiring modification of (2) i.e. a different prior. Here the stopping rule is part of b’, the experimentally-controlled variables having an effect on x. Other stopping rules may be irrelevant and hence lie in b”. This point has been made by numerous Bayesians – I first came across it in Gelman et al.’s book and/or Bernardo and Smith’s book (hardly unknown Bayesians). Similar points to this (and others made in this post) can be found on the (slightly more polemical) blog here by the mysterious internet character ‘Laplace’.

A slightly subtle, but interesting, point is that if the model structure is misspecified then it may be corrected on that data in that context but this may invalidate its application in other contexts (a more formal explication can be given). Invariance of the relationship between y and x for all contexts b is crucial here. So, again, it’s really the closure assumptions doing most of the ‘philosophical work’ – this is elaborated on more below.

Recap so far
I think this is a fairly defensible sketch (note the word sketch!) of how a Bayesian may be able to be a Falsificationist. They provisionally accept two/three conditional probability statements which involve conditioning on (dividing with) a ‘boundary’ background domain of validity. The ‘background’ variables do not need a probability distribution over their domain as they are only ever conditioned on. To emphasise: probabilities (which are all conditional) integrate to one within (conditional on) a model structure/schema but the background variables do not need a probability distribution and the closure assumptions can be falsified.

As I see it then, the goal of a scientist is hence a ‘search’ problem [a la Glymour?] to find (e.g. by guessing, whatever) theories, the form of which satisfies these closure conditions for a desirable range of background contexts/divisions, along with more specific estimates of the quantities within these theories under more specific conditions of immediate interest. When the closure conditions are not satisfied for a given background then the theory is false (-ified) for that domain and any quantities estimated within that theory are meaningless.

Haven’t I seen this idea before?
If you’re a philosopher of science then this sounds very ‘Conjectures and Refutations’, no? Shades of the Kuhnian normal science/paradigm shift structure, too (as Gelman has noted on many occasions). If you’re a ‘causal modeller’ then you might think about Pearl and the concept of ‘surgery’ describing (possibly hypothetical) experimental interventions, as well as some related causal inference work by Glymour et al. (though I need to read more of this literature). If you’ve read any Jaynes/Cox you might recognise some kinship with Cox’s theorem and the derivation of probability theory from given axioms expressed as functional equations; see e.g. p. 19 of Jaynes’ PT:LoS where he mentions ‘interface conditions’ required to relate the behaviour of an ideal ‘reasoning robot’ – i.e. model of the inference process in the terms used here – to the ‘real world’. (Also, given my affinity for functional equations and ‘model schema’ I should really go back over this in more detail.) In fact, Jaynes explicitly states essentially the central point made here, e.g. p. 326 of PT:LoS –

The function of induction is to tell us not which predictions are right, but which predictions are indicated by our present knowledge. If the predictions succeed, then we are pleased and become more confident of our present knowledge; but we have not learned much…it is only when our inductive inferences are wrong that we learn new thing about the real world.

It is clear that Jaynes is saying the same thing as expressed here – use inductive reasoning (e.g. Bayesian parameter inference) inside a ‘closed’ model structure (see ‘interface conditions’ from PT:LoS cited above) until a contradiction is reached. At this point the closure conditions – the model structure conditions – are ‘inadequate’ and must be ‘respecified’ before the ‘within-model’ inference can be considered sound. Finally, as is apparent from the opening examples, if you come from a physical science background then it’s clear that many analogous ideas are present in the statistical mechanics/thermodynamics literature (Jaynes shows up here again, along with many others; I’d like to write more on this at some point as well).

Interestingly, many of these ideas also seem quite similar to ‘best practice’ ‘Frequentist’ methods. For example Spanos’ version of Mayo’s ‘Error Statistical’ perspective [in my understanding – see comment section] requires an adequate model structure, established with the help of general (Fisherian-style) tests, before (Neyman-Pearson/severe test) parameter estimation can be soundly carried out. We seem to differ mostly on specific formalisation and on the parameter estimation methods used within a structure. I know Glymour has written something on relating ‘Error Statistical’ ideas to the causal inference literature, though I haven’t looked at it in detail.

Finally, of note from an epistemological perspective, these are not ‘knowledge is closed under entailment’ assumptions. I’m generally against this a la Nozick. The closure here is different to that in the epistemological literature dealing with knowledge closure, though is perhaps related; it would also be interesting to look into this [update: see here for a start]. Note that Nozick’s proposed solution to that problem was effectively to go to a ‘higher level’ by relativising knowledge to methods, in a manner very similar to Mayo’s, and similar to the present approach in that I use higher-level model structures.

A brief example and why one might be only a ‘half-Bayesian’ – closure does the work!
As an example, Newton’s law f-ma=0 is a general scheme characterising an invariant relationship parameterised by ‘context’. Feyman’s lectures give a great discussion of this [insert link]. When that involves knowing ‘gravity is present and the relevant masses are known’ and I want to predict acceleration, then the expression for f is determined by background knowledge and is used to predict acceleration. When acceleration and mass are known relative to a background reference frame then the net force can be predicted. The rest of the background is assumed irrelevant. This relationship is nice because we can satisfy the two conditions I gave under a wide range of conditions.

A Bayesian would typically express what is known (given a model structure) – e.g. a range of reasonable mass values – in terms of a prior and then report predictions – e.g. the acceleration – in terms of predictive distributions. This is not really the central issue, however:

These closure assumptions don’t really have anything to do with being Bayesian or not – I believe Glymour and Pearl have said things along these lines (see ‘Why I am not a Bayesian’ and ‘Why I am only a half Bayesian’, respectively) – but are still perfectly compatible with a Bayesian approach.

If you don’t want to use Bayesian parameter estimation, fine, but the argument that it cannot be compatible with a Falsificationist approach to doing science is clearly wrong (to me anyway). Bayesian and Likelihoodist methods also happen to have particularly intuitive interpretations for parameter estimation within a model structure defined conditionally w.r.t. a background context. Furthermore, there are ‘Bayesian analogues’ of Fisherian tests (see BDA for examples) which are particularly useful for graphical exploration, so this does not present too much difficulty in principle.

Another recap
As I have said above, it is the scientist’s job to find particular theories with a structure satisfying the closure conditions, determine the range of backgrounds over which these conditions are satisfied and then estimate quantities within these conditional model structures. They may also seek to relate different theories by allowing background variables of one model to be primary variables of another and carrying out some sort of reduction and/or marginalisation/coarse-graining process.

There is no ‘catchall’ however! There are, instead, schematic ‘for all’ statements for which we need to determine (find) the truth sets – the range of values for which the quantifications hold – and hence determine the explanatory variables and domain(s) for which our theory/model structure is applicable. This defines the ‘closure’ of the model structure (paradigm) and allows us to proceed to the ‘normal science’ of parameter estimation. At any point we can work with a ‘temporary closure’ of B, i.e. a subset of B, that captures the range of conditions we are currently interested in or able to explore. The background variables b are usually further (assumed to be) divided into manipulable/boundary and irrelevant/fully external, and can be taken to parameterise various subsets of B.

And here seems a good place to close this post, for now.

Postscript
Mayo replies on her blog:
“The bottom line is that you don’t have inference by way of posteriors without a catchall. The issue of falsification is a bit different. You don’t have falsification without a falsification rule. It will not be deductive, that’s clear. So what’s your probabilistic falsification rule? I indicated some possible avenues.”

A short reply (for now):
Some of us are happy to use conditional probability as basic and posteriors where useful. Here posteriors do come in – over the parameters within the model structure. I haven’t shown this explicitly as it follows from the usual Bayesian parameter estimation procedures – as long as the closure conditions are assumed.

These closure conditions also allow you to pass from the prior predictive to posterior predictive distributions (see Gelman et al. or Bernardo and Smith for definitions) so do also allow (predictive) inference using a posterior. This was actually my original motivation for making these conditions explicit, as they were (to me) implicit in a number of Bayesian arguments. That this requires accepting two conditional probability statements is neither here nor there to me as far as ‘being Bayesian’ or not is concerned. As I mentioned in the original post I am not a ‘complete Bayesian’ for similar reasons – additional, but entirely compatible, assumptions are needed to complete the usual Bayesian account. I am hardly the first person to point this out. I also note that in principle it may give a motivated person some (philosophical) wiggle room to replace using Bayesian parameter estimation with another parameter estimation method, for a number of reasons I won’t go into here. All ‘power’ to them.

In terms of the need for a ‘falsification rule’: in my account these are needed for saying when the closure conditions fail to hold for a particular model/context. I briefly indicated which of the avenues suggested by Mayo I follow: essentially ‘pure significance’ tests (Fisherian style, without an alternative). I prefer to do these graphically, as done in Gelman et al.’s BDA. Fisher also recommended that these tests be ‘informal’ rather than governed by formal criteria like p < 0.05 (so, note to reproducibility people: you can’t [really] blame him for the widespread abuse of p-values!).

Rather than aiming to reject a particular model – as in the notorious ‘NHST’ procedure – the goal is usually to find a structure, characterising an ensemble of models, that satisfies closure. This is entirely analogous to Spanos’ (who is also an ‘Error Statistician’ along with Mayo) requirement of establishing model structure adequacy before parameter estimation is carried out. After these Fisherian-style tests of model structure, he uses Neyman-Pearson-style estimation; I use Fisherian-style tests of model structure in combination with Bayesian/Likelihood estimation.

There are a number of details left implicit or even completely absent here and I don’t have the time/motivation/ability to fill them all in right now. My advice for the moment to anyone interested would be to read more applied Bayesian work and look for where the closure conditions come in and how they are checked.

Fin.