# The ‘tacking paradox’: model closure and irrelevant hypotheses

Overview
I. The tacking paradox in philosophy of science
Interlude – Bayesian or not?
II. A resolution
Interlude – severe tests and tracking truth?
III. Implications for mathematical/computational models in practice (sort of)

Disclaimer
This is the one of (what should be) a few posts which aim to connect some basic puzzles in the philosophy and methodology of science to the practice of mathematical and computational modelling. They are not intended to be particularly deep philosophically or to be (directly) practical scientifically. Nor are they fully complete expositions. Still, I find thinking about these puzzles in this context to be an interesting exercise which might provide a conceptual guide for better understanding (and perhaps improving?) the practice of mathematical and computational modelling. These are written by a mathematical modeller grappling with philosophical questions, rather than by a philosopher, so bear that in mind! Comments, criticisms and feedback of course welcome! [Current version: 3.0.]

I. The tacking paradox in philosophy of science (or, the problem of irrelevant hypotheses)
The so-called tacking paradox (or at least one instance) can be described in minimal terms as follows. More detail is given on Philosopher Deborah Mayo’s blog here, (along with some responses in the comments section that don’t seem too far from the resolution given here, though they are a little unclear to me in places). As I noted in my other posts, I will prefer to think of ‘hypotheses’ h as parameters within mathematical model structures predicting data y (search this blog for more). The basic perspective from which I will try to resolve this problem is that of schematic ‘model closure’ assumptions.

Firstly, we need to define what it means to ‘Bayesian confirm’ (really, ‘Likelihood confirm’) a hypothesis h given data y. Let’s take the following statement to capture this idea, in terms of ‘predictive confirmation’:

(1) p(y|h,b) > p(y|b)

That is, if the hypothesis h makes the data more likely (under a given model p) then this is taken to mean ‘y confirms h’. Note that we have included a controlled/given background context b.

Also note that ‘confirmation’ as defined here is thus a change in probability rather than a probability itself. There are a number of different positions on this topic (see e.g. Mayo’s discussion) but I prefer to think of ‘confirmation’/’evidence’ given new data as a change in belief/probability induced by that data (to do – further references) to a new state of belief/probability. This is the difference between ‘state variables’ and ‘fluxes’ in physics/dynamical systems – or ‘stocks’ and ‘flows’ to use an equivalent terminology (which I really dislike!).

So we ‘confirm’ a hypothesis when it makes a ‘successful prediction’ of newly observed data. This seems a fairly non-controversial assumption in the sense that a minimal measure of the ‘quality’ of a theory ought to mean that one can predict observations better (to at least some degree) than not having that theory. Note that this is relative to and requires the existence of a prior predictive distribution p(y|b), and this should exist in a standard Bayesian account (more on this one day). I will think of this as ‘predictive relevance’.

Now, suppose that the scheme (1) is true for a hypothesis h1 and data y0, i.e. p(y0|h1,b) > p(y0|b). Say y0 represents some planetary observations and h1 some aspect of (parameter in) Newton’s theory.

Next, ‘irrelevance’ of a hypothesis h” is usually defined in this context as:

(2) p(y|h’,h”,b) = p(y|h’,b)

which is clearly relative to y, h’, p and b. Note that in my terminology we have ‘predictive irrelevance’ of h” here.

This leads to the following argument. Let h2 be a typical ‘irrelevant theory’ e.g. a theory about the colour of my hat, that is (for example) a parameter representing possible ‘colour values’ my hat could take. Then we have

p(y0|h1,b) > p(y0|b) {given that y0 Bayesian/Likelihood confirms h1}

p(y0|h1,h2,b) = p(y0|h1,b) {assuming irrelevance of h2}

So

p(y0|h1,h2,b) > p(y0|b)

Therefore y0 Bayesian/Likelihood confirms (h1&h2) (with respect to model p and background b).

So what’s the ‘paradox’? The (allegedly) troubling thing is that h2 is supposed to be ‘irrelevant’ and yet it seems to be confirmed along with h1. So planetary observations seem to be able to confirm something like ‘Newton’s theory is true and my hat is red’.

More concretely, one might try to argue as follows: since (h1&h2) is confirmed and since the joint proposition/logical conjunction (h1&h2) logically entails h2, then h2 is confirmed {confirmation/epistemic closure principle}.

So, according to the above argument, ‘my hat is red’ could be confirmed by planetary observations. This argument scheme captures a notion of ‘knowledge is closed under deductive entailment’ or ‘epistemic closure’ in the epistemological literature. Note, however, that this does not follow from any of the main model closure axioms that we have put forward thus far – the ‘model closure’ we refer to is not ‘epistemic closure’. In fact, the approach we follow has more in common with those taken to deny epistemic closure, such as Nozick and/or Dretske (see here).

Interlude – Bayesian or not?
Before I give my preferred resolution of the paradox, there are a couple of points to distinguish here – first, is the Bayesian/Likelihoodist language appropriate to express a resolution of this problem? Second, are the concepts involved in the resolution inherently part of or extrinsic to the Bayesian/Likelihoodist approach? This second point is, I take it, what led Clark Glymour to write ‘Why I am not a Bayesian’ (1981) (see also Pearl’s ‘What I am only a half-Bayesian’) – my interpretation of his point being that the resolution to ‘paradoxes’ such as these may or may not be expressible within the Bayesian language but the underlying concepts driving what we translate into Bayesian language are additional to and not a part of the basic Bayesian account.

I basically agree with Glymour on this general point, but use the Bayesian language to express the concepts required to resolve the ‘paradox’. My view, as expressed elsewhere on this blog, is that these are additional ‘closure’ assumptions. As pointed out above, these are not ‘epistemic closure’ assumptions but rather schematic model structure closure assumptions (see here and here). The need for assumptions such as these, whether considered ‘pure’ Bayesian or not, are, however, explicitly and/or implicitly acknowledged by many Bayesians (e.g. Jaynes, Gelman etc).

II. A resolution
Firstly, consider whether we have really captured the notion of ‘irrelevance’. We seem to have predictive irrelevance but what about ‘boundary/background irrelevance’ – i.e if the variable is ‘truly’ irrelevant then we could imagine moving it into the ‘boundary/controlled’ or ‘background’ variables and varying it without affecting the variables that matter. Thus I argue that we have more information available (knowledge of possible relationships) in the problem specification than we have used.

In particular, based on the closure conditions I gave in the first post on this blog, I would argue that applying the model closure assumptions (1-3) in that post to p(y0|h1,h2,b) requires us to specify both p(y0|h1,h2,b) = p(y0|h1,b) {predictive irrelevance}, as well as an expression for p(h1|h2,b). That is

We are obligated, according to our model closure principles, to say how varying h2 affects h1 in order to have a well-posed problem. It either has a relevant affect – varying h2 by experimental control affects h1 – or it is a ‘fully irrelevant’ background variable.

In light of the above discussion, we will take ‘h2 is an irrelevant hypothesis’ to further mean

(3) p(h1|h2,b) = p(h1|b) for all h1, h2, b

i.e. h2 falls into the ‘truly irrelevant background variables’, rather than the ‘controlled and controlling boundary values’ b. This means that varying h2 cannot control h1: the parameters of Newton’s theory are not manipulable by changing the colour of my hat. It is a truly ‘passive cog’ capable of no explanatory work.

Note also that we are actually speaking at the schematic/structural level here – i.e. for any value h1, h2 take – and hence counterfactually about particular instances conceived as members of a set of possible values.

So in this context I can vary (or imagine varying) my hat colour and how this affects other variables. Though perhaps unfamiliar to many, this is actually a common way of framing theories in the physical sciences, even in classical mechanics, e.g. D’Alembert’s principle and related ideas, which require ‘virtual’ (counterfactual) displacements.

This leads to [to do – proper latex in wordpress]

p(y0|h2,b) = ∫ p(y0|h1,h2,b)p(h1|h2,b) dh1

=  ∫ p(y0|h1,b)p(h1|h2,b) dh1 {by ‘predictive irrelevance’ of h2}

=  ∫ p(y0|h1,b)p(h1|b) dh1 {by ‘h1 is not manipulable by h2’}

= p(y0|b)

Therefore

p(y0|h2,b) = p(y0|b)

and so h2 is not confirmed by y0 at all! Note again that, in defining our original closure conditions, we required some assumption to be made on p(h2|h1,b) – the one chosen in the particular context here seems to best represent the concept of ‘irrelevance’ intended. Thus when we include both predictive irrelevance and boundary irrelevance/non-manipulability closure assumptions then there is no paradox.

Until some p(h1|h2) is given we have an ill-posed problem – or, at best we can find a class of solutions and require boundary conditions to further pick out solutions capturing our particular circumstances.

For example, one could also imagine an h2 which is simply a ‘duplicate’ of h1 – this satisfies predictive irrelevance in that it adds no predictive ability to know the same thing twice, but may be considered the opposite (singular/delta) limit of p(h1|h2).

Interlude – severe tests and tracking truth?
Since I motivated this problem with reference to Mayo’s blog, how might the ‘severe testing’ concept relate? For now, a quick thought: if by ‘test’ we mean ‘behaviour under specified experimental manipulations‘ then we see some similarity. In particular, one might imagaine that the ‘testing’ aspect refers to defining boundary conditions and related behaviour under possible (‘counterfactual’ or ‘virtual’) boundary manipulations, which is a crucial part of the ‘model closure’ account here.

Similarly, Nozick’s ‘truth tracking’ account in epistemology is relativised to methods and, if we equate ‘methods’ to ‘model structures’ – which seems appropriate since a ‘model structure’ is really a functional recipe – then it also has much in common with the ‘model closure’ (again, as opposed to epistemic closure) account given here. Furthermore, I think Kripke’s supposed ‘red barn’ counterexample (see here) to Nozick’s theory seems to fail for similar reasons of being an ill-posed problem: the solution depends on how the ‘boundary of the problem’ is closed.

I will (hopefully) have more to say on these topics at some point.

III. Implications for the everyday ‘mathematical/computational modeller’
What does this mean for people building mathematical and/or computational models of complex phenomena such as those of biology? As all of us ‘mathematical modellers’ who have tried to do something even resembling ‘real science’  know, we almost always face the ‘simple model/complex model’ and ‘modelling for understanding/modelling for prediction’ trade-offs.

Consider this common experience: you present a slightly too complicated model (all of them, and none of them, basically) and show it ‘predicting’ some experimental result ‘correctly’. The first question is, of course, so what? Why should I trust your model? Followed by ‘I could fit an elephant (with a wiggly trunk) with that model’ and/or ‘most of those parameters appear completely irrelevant – what are the most important parameters. Have you done a sensitivity analysis?’.

You see the parallel with the tacking paradox – with all those (presumably) extraneous, irrelevant parameters (hypotheses) ‘tacked onto’ your model, how can you possibly say that it is ‘confirmed’ by the fact that it predicts some experiment? Which of your parameters really capture the ‘true mechanism’ and which are ‘irrelevant’?

The resolution is of course that
a) the model as a whole can be ‘confirmed’ (that is, made more probable to some degree by fitting the data/avoiding being falsified etc)
but
b) we don’t know which parts are confirmed and by how much, unless we know how the parameters (hypotheses) within the model relate to each other.

In order to further reduce the model to ‘minimal’ or ‘mechanistic’ form, we need to define behaviour under (possible) manipulation (boundary conditions). Predictively irrelevant variables either have ‘boundary condition’ effects or no effects, but we need to say which is the case.

One problem then in practice is that, without going further and investigating relations between parameters (via direct manipulation and/or varying boundary/contextual assumptions, say), we are restricted in our ability to generalise to new situations – without being able to identify ‘modular’ or ‘invariant’ model components (more on this one day, hopefully) and the context within which this invariance applies, we don’t know which can be used to build models of similar but differing situations.

From a ‘machine learning’ point of view this could be considered a form of bias-variance trade-off – without stable (invariant) sub-components that apply to other contexts we are at risk of ‘overfitting’. So ‘bias’ is really (a form of) ‘knowledge external to this particular dataset‘.

To put it another way, Newton’s law of universal gravitation is a whole lot more useful as a force model than Maclaren’s law of forces between these two particular objects in this particular context, precisely because it is an invariant feature of nature valid for a wide range (e.g. inertial) of frames of reference. Thus mere prediction on one dataset is not enough to be scientifically interesting. Which we all know of course but – let’s be honest! – can often forget in the day-to-day grind.

To me, these ‘extra-statistical’ closure assumptions are often guided by balancing the competing goals of prediction and understanding. I have some thoughts on how this balance can be clarified, and how some related areas of research bear on this, but this post is getting long and the margins of this blog are too small to..