Hypothesis testing – What’s Wrong with ACH?
December 31, 2007
The “Analysis of Competing Hypotheses” method, or ACH, is one of the most important tools on the intelligence analyst’s bench. It is a procedure for determining which of a range of hypotheses is most likely to be true, given the available evidence. At its heart is a matrix, wherein hypotheses are listed across the top, and items of evidence are listed down the left side. There is then a square or cell in the matrix corresponding to every hypothesis/evidence pair, and in that square one indicates whether the item of evidence is (in)consistent or neutral with the hypothesis.
For more information on ACH, see
- the classic chapter by Richards Heuer
- This brief overview (pdf file) of ACH compared with argument mapping (AM). Some of the points made below are presaged in the overview.
ACH is based on some fundamental insights:
- The network of relationships between items of evidence and hypothesis is “many-many”. That is, one piece of evidence can bear, one way or another, on many hypotheses, and an hypothesis is generally considered in the light of a number of pieces of evidence. The ACH matrix is an obvious and natural way to accommodate this web of relationships.
- At least in many situations, structured thinking techniques yield better results than informal or intuitive “pondering”. The ACH imposes a strong structure on hypothesis testing.
- Structured techniques are even more effective when making use of suitable external (i.e., outside the head or “on paper) representations. Thus long division in the head is hard; on paper, following a standard procedure, it is easy. The ACH matrix is an external representation aiding hypothesis testing.
- When making informal or qualitative judgements, it is usually better to use coarse schemes such as consistent/neutral/inconsistent rather than more elaborate and precise schemes such as numerical consistency ratings.
Nevertheless, in my experience using ACH, difficulties of various sorts rapidly arise; and as the expenditures of mental effort involved in struggling with those difficulties mount up, alternatives, such as “muddling through” without the use of a tool such as ACH, or using some other tool such as argument mapping, look increasingly attractive.
(Admittedly, I’ve never tried using ACH on a real, “industrial strength” problem of the kind that, presumably, intelligence analysts are engaging with on a daily or at least weekly basis. Perhaps the difficulties don’t arise so much in real cases; or perhaps they do arise, but are more than compensated for by the various benefits of using the ACH method, given the complexity of real cases. Perhaps; but I doubt it.)
Further, I’ve heard that while some intelligence analysts use the ACH technique regularly and perhaps even enthusiastically, the majority tend not to use it unless they really have to. It seems that their perception is that ACH is not worth the effort. Presumably this can be explained in part in terms of the various difficulties discussed here.
1. Too many judgements to make.
ACH, at least in a strong form, requires that you enter a judgement of consistency for every evidence item/hypothesis pair; i.e., you have to fill in every cell in the matrix. This is both a great strength of ACH, and a serious problem. It is a strength because it makes the process of comparing hypotheses against evidence exhaustive, thereby helping ensure that the evidential weight of all the items of evidence is properly accounted for.
The trouble is that the number of separate judgements becomes very large; for example, with 20 items of evidence and 5 hypotheses, you’d have to make 100 distinct judgements, each taking some modicum of conscious mental effort. Ugh! To make matters worse, many of these judgements return a “nil” verdict. In other words, in many cases, after careful consideration you conclude that item of evidence e is neutral (“neither here nor there”) with respect to hypothesis h.
So for example, suppose you are investigating the death of Princess Diana, and you are considering hypotheses including drunk-driving accident and assassination by MI5; and that one piece of evidence is that the driver had been drinking prior to the crash. This is clearly consistent with the drunk-driving hypothesis. ACH requires you to also consider whether this item of evidence is consistent, neutral or inconsistent with the assassination hypothesis. So you consider it, and you conclude that it is neutral; it really has nothing to do with that hypothesis.
In such a case, the mental effort of making the judgement seems to have yielded no immediate progress towards the goal of assessing the relative merits of the hypotheses. Arguably, that effort has in fact yielded some value in the context of the overall process, value which becomes apparent when you look across a row (to assess diagnosticity of evidence) or down a column (to assess the plausibility of an hypothesis). But it takes serious commitment to crank through dozens of such boring judgements in pursuit of some result at the end of the process. When in the midst of the ACH procedure, being forced to consider every e in relation to every h, only to conclude that it is (in and of itself) irrelevant, is a dispiriting activity; it feels like “makework” demanded arbitrarily by a tedious and laborious process.
(2) No e is an island
Superficially, ACH treats an item of evidence as consistent or inconsistent on its own with each of the hypotheses. Thus it seems to make sense to ask whether [the driver's drinking before the crash] is consistent with the hypothesis that [the death of Diana was a drunk-driving accident]. However this is an illusion. In fact, and always, the evidential relationship between one proposition and another is mediated by other propositions. Put another way, an item of evidence is only consistent or otherwise with an hypothesis in the context of other relevant pieces of information or assertions. Thus the drink driver’s drinking before the crash is only consistent with the drunk-driving accident hypothesis given the general background knowledge that driving under the influence of alcohol increases the chances of an accident. If this were false – if drinking improved driving – then the driver’s drinking would be inconsistent with the drunk-driving hypothesis.
In argument mapping terms, we would say that every reason or objection is actually a multi-premise structure. In the philosophy of science, we would say that observations only confirm or disconfirm hypotheses in the context of auxiliary hypotheses. Sometimes we call these additional propositions assumptions. However we cast the point, the fundamental problem is that ACH’s way of structuring the evidence, hypotheses and their relationships leaves something important out of the picture. The kernel of the problem is the matrix representation at the heart of ACH; it naturally pairs individual items of evidence with individual hypotheses, and so is ill-suited to handling the actual structure of evidential relations even in the simplest case.
Does this matter? By necessity, every graphical or structural display of the web of evidential relations must select and simplify. The question is whether a particular display is, on balance, useful. Does the display enable us to think through the issues more effectively than using our default, informal and “in the head” methods? ACH enthusiasts of course think that the tradeoff is a good one. However, I think that while choosing a way to organise evidence and hypotheses which treats items of evidence discretely and independently of other information offers short-term gains, it does so at the cost of problems further down the track.
One such situation is where an additional item of information comes in, which has the effect of undermining a co-premise/auxiliary hypothesis/assumption. To illustrate: consider the question of what caused the Permian Extinction. One hypothesis is
h: it was a massive meteor collision.
A relevant piece of evidence is that
e1: there is no known meteor impact crater of the right age.
This appears to be inconsistent with the meteor hypothesis. Later, you find out that
e2: it is possible for lava to flow back up through the hole created in a large meteor impact, erasing the impact crater.
Now, the question is, how to accommodate this new piece of information in an ACH matrix? It seems to make little sense to treat it as a new, independent piece of evidence, against which each hypothesis can be tested. So the only option left is to leave it out of the matrix, but to change the “inconsistent” rating of e1 wrt h to neutral or consistent. However without e2, such a rating is mysterious. It seems e2 has to be recorded somewhere, but the ACH matrix offers no space for it.
A better treatment of this situation is to recognise that e1 is inconsistent with h only given the natural assumption that
a: A large meteor impact would leave a crater.
e1 alone is not inconsistent with h; rather, it is the bundle [e1, a] which is inconsistent with h; or alternatively, e1 is inconsistent with [h given a]. e2 is then a challenge to a.
However we express this verbally, the fundamental problem is that you can’t adequately represent, and make sense of, what is going on here in a basic ACH format. (You can handle this sort of situation quite easily in an argument mapping format, but that is another topic.)
(3) Flat structure of hypotheses
Another major problem with ACH is that it cannot handle the hierarchical structure of hypotheses (or it can do so at best only in an ungainly and unilluminating manner).
Hypotheses can be more or less general or abstract, and a general hypothesis can have sub-hypotheses. So in the Princess Diana case, one general hypothesis is assassination and another general one is accident. The general assassination hypothesis can have sub-hypotheses such as assassination by MI5, assassination by mafia, etc..
This is important because distinct items of evidence can count for or against hypotheses at various levels. Thus a bullet hole in the limousine would count in favour of any assassination hypothesis (or at least many such hypotheses), while an internal MI5 document might count for or against only the MI5 sub-hypothesis.
The classic ACH matrix asks for all hypotheses to be entered individually across the top row, and then to be compared against all pieces of evidence. But in the case of an hierarchical structure of hypotheses, this will result in an absurd duplication of effort, in which for example a piece of evidence bearing on all assassination hypotheses is compared not only against the general assassination hypothesis but also against all its sub-cases.
(4) Subordinate deliberation
By its very nature, being based on a matrix structure, the ACH approach does not consider what is “behind” or “underneath” any given piece of evidence. From a piece of evidence, it looks “forwards” or “upwards” to its bearing on the hypotheses under consideration. However the weight of a piece of evidence wrt an hypothesis depends on information bearing upon that piece of evidence. e may be quite (in)consistent with h, but how seriously we take this (in)consistency depends on how seriously we take e itself (its plausibility or credibility). This can only be evaluated in the light of further information subordinate to e. If you like, think of e as itself an hypothesis, in relation to which there is supporting or opposing evidence. In the standard ACH framework there is no way to represent or display this layered structure. (Again, the ability to handle such structure is a strength of argument mapping.)
(5) Decontextualisation and discombobulation
We’ve seen in points 2 and 4 above that the ACH matrix does not accommodate co-premises/assumptions, or subordinate deliberation. An ACH matrix is a like a sieve on the web of evidence, letting through some items and relationships but keeping out many others. Unfortunately what is left out is the context which helps make sense of the relationship of any given item of evidence to an hypothesis. Absent that context, the judgement becomes difficult to make. For a not-very-exaggerated example, consider: Is
e: David Hicks was captured in Afghanistan
consistent, inconsistent, or neutral, with respect to:
h: David Hicks was a terrorist
The proper answer is: uh….dunno…it depends. Absent any other information, you’d probably choose neutral, but this is not because e is neutral wrt to h. It is only because without surrounding information it is hard to tell what the evidential value of e is.
ACH, in demanding that we make so many judgements even as it strips the context of those judgements away, is constantly asking us to engage in these sorts of mentally taxing, even discombobulating exercises. After an extended bout of ACH, I tend to feel a bit dazed and confused, and have to stave off that feeling with redoubled mental effort to see the sense of the judgements I’m making.
Summary
We might reduce my complaints about ACH to two:
- ACH asks us to make too many distinct judgements; and
- Those judgements are emaciated due to the stripping away of relevant context, of both hypotheses and evidence
These problems are deeply related to choice of structure of external representation, i.e., to the choice of a matrix as the way to organise evidence, hypotheses, and judgements. I’m inclined to think that the use of the matrix is the fatal mistake of ACH; it is a commitment which seems obvious and natural initially, but as things unfold the limitations and problems inherent in the matrix structure come to the fore.
Much of the ACH procedure, as outlined for example by Heuer, could be retained even if the matrix structure was dispensed with in favour of some richer, more flexible format. But if you throw out the matrix, you also must throw out all those further aspects of the classic ACH which only applied if a matrix was being used. It is doubtful that what you’d have left would be worth calling ACH.
If you wanted to replace the ACH matrix, what would you use? One candidate is the argument map (of the kind you can create in, for example, the Rationale software). However while these have some strengths, for hypothesis testing they have a complementary set of weaknesses. At Austhink we are working on a new structure, one which will take and blend the best elements of both ACH and argument mapping, thus superseding them both. This new structure enables users to rapidly and intuitively assemble a (hierarchical) set of hypotheses in relation to some issue, items of evidence bearing on multiple hypotheses, assumptions, subordinate considerations, etc..
Rationale documentary on YouTube
November 19, 2007
Dinosaurs and inference rebuttals
November 18, 2007
Spotted at the Creation Museum:
Q: Are human bones found with dinosaur fossils?
A: None have been discovered yet. However, if human bones aren’t found with dinosaur bones, it simply means they weren’t buried together. Humans have come in contact with lots of animals, like crocodiles and coelecanths, but they aren’t buried with humans.
The obvious thing to say about this is that it is flagrant “confirmation bias” – seeking or treating evidence in such a way as to confirm one’s cherished beliefs rather than to evaluate or test them.
From an argument analysis perspective, though, it is a nice example of what, technically, we’d call an “inference rebuttal” – an objection to an primary objection which targets not any of the stated premises of the primary objection but rather the inference from the primary objection to the falsity of the main contention.
That’s quite a mouthful, but the basic idea is simple enough, and can be easily illustrated.
Doing so will help explain one of the most distinctive – but subtle – features of the Rationale software.
On the face of it, the fact that human bones have not been discovered with dinosaur fosils is an objection to the standard Creationist story, which includes the idea that humans and dinosaurs once both roamed the earth at the same time.

The premise of the objection is a blunt fact, and so the Creationist has to accept it:

However the Creationist still wants to defuse the objection, and can do it by arguing that the premise, though true, doesn’t show that the contention is false.
To represent this kind of move, Rationale allows a lower-level objection to be connected to the primary objection itself rather than to any of its premises. Graphically, the lower-level objection points to the word “opposes”:

Evaluating this argument as a Creationist presumably would, the objection has been defused:

There is however another way to read the Creationist’s argument. This way of framing things probably better reflects the Creationist’s underlying mindset. From this perspective, creationist “science” combined with the basic facts imply an interesting “discovery”: those humans who did (supposedly) coexist with dinosaurs never buried themselves with said dinosaurs:

Got Code got prize
October 18, 2007
Tonight Andy Bulka (our software architect) and I went to the “ICT Panorama” event at the University of Melbourne Computer Science and Software Engineering Department.
Each year, 4th year students in the department are divided into teams who work on innovative projects for “real world” clients. Austhink Software was assigned a team, code-named “Got Code.” Over the past 6 months or so the team has been working on a “Web 2.0″ version of Rationale. This consisted of a simple Flash version of the product (“Rationale Lite”) and an associated Flickr-type website for sharing Rationale maps, called Bickr. A nice feature is that in Bickr you can edit maps online using Bickr (imagine if, in Flickr, you could edit an image using a stripped-down Photoshop).
Other projects included a 3D Tetris, a neural-networks based system for predicting foreign exchange rates, and a system for playing a kind of ping-pong (using a real table) with a remote opponent.
At the ICT Panorama event, all the teams display their projects. They are judged not only on the quality of their work but also on how professionally they present it. Three judges observe all projects, without giving away to the teams that they are judges.
A prize is awarded to the best project. Got Code won… Congratulations to the team, but also to Andy who managed them pretty closely.
We’ll be making Rationale Lite and/or Bickr available just as soon as we feasibly can.
Powered by ScribeFire.
Rationale for Rationale now officially available
October 15, 2007
Oxford Journals has published my The Rationale for Rationale in Law, Probability and Risk.
They’ve sent me is a link to an online pdf version. Judging by the page numbering and the citation (see below) it seems this is an digital- or online-only issue.
A good feature of the new system is that the papers are freely available to all.
The full citation is:
The rationale for RationaleTM, Tim van Gelder, Law, Probability and Risk 2007; doi: 10.1093/lpr/mgm032
“doi” is Digital Object Identifier – a kind of unique “name tag” for a digital object. In theory, as long as you have such a tag, and the International DOI Foundation is still going, you’ll always be able to locate the corresponding object.
Powered by ScribeFire.
On Buying Cheese
October 9, 2007
The current issue of Choice Magazine (the Australian “Consumer Reports”) has a report on cheddar cheese.
They had five experts blindly rate 28 cheddar cheeses, ranging from your cloth- or wax-wrapped special deli cheddar at $50+ dollars per kilo down to the supermarket brands, sometimes less than $10 per kilo.
Eyeballing the results table, it seemed that price wasn’t a reliable guide to quality – some good cheeses were quite cheap and vice versa.
In the results table, they listed overall quality (score out of 20) and price per kg. They didn’t offer a “value for money” rating, so I copied the table into Excel and had it compute “value for money” as quality divided by price.
Now that the data was in Excel, we could probe a little further.
Turns out the correlation between quality and price was -.05. In other words, the quality of the cheese you buy, on average, has virtually nothing to do with price. If anything, as you go up in price, it gets worse.
Consequently, the correlation between quality and value for money was abysmal: -.8. In other words, on average, the more you pay, the more you’re getting ripped off.
Some cheeses had long names with lots of fancy-sounding words, such as “Devondale Special Reserve Premium Aged Vintage.” That must be a good cheese, right?
I used Excel to count the characters in a cheese’s name. Running the correlations showed that length of name bears little if any relation to price, quality, or value for money.
Conclusions: buying cheddar cheese is a lottery. If you haven’t tasted the cheeses, and are just trying to guess which ones are good, ignore price and fancy names; these have nothing to do with quality. If you want value for money, go for the cheaper cheese.
In short: when buying cheddar cheese in Australia, it just isn’t true that “you get what you pay for.”
PS – the cheese I’ll buy: South Cape Vintage Black Label. Nearly the top in quality, but only $15 a kilo.
Powered by ScribeFire.
A little more on wisdom
August 30, 2007
A post I wrote a while back on “smart vs wise” turns out to have been one of the most popular on this blog. It seems that people are frequently asking themselves this question.
This simple reflection from an “in memoriam” piece in today’s Age seems to get at the notion of wisdom as I understand it:
A bridge can just be a bridge but this bridge is also a reminder
that my father meant more to the people of Moggs Creek than I could
ever know. He didn’t seem to have done anything out of the
ordinary. He was not a firefighter or a lifesaver. He was not a
councillor or a campaigner. But he was a friendly, helpful person,
which is as much as we can ask of anyone.
“Your father was prepared to share his time and his expertise
freely,” says Margaret McDonald. “Among a few of us, he was
affectionately known as the mayor of Moggs Creek. He knew everyone
and what was going on but was not concerned with other people’s
business.
“There was an astuteness about him. Just through his questioning
he led you to make decisions of your own, whether they were about
building or finance or gardening.
“He seemed to know just what was the right thing to do in a wide
range of things.”
Powered by ScribeFire.
Mega-Litigation
August 29, 2007
Maybe this post should’ve been called “Why judges should be paid more.”
Simon Lewis alerted me to the written judgment of Justice Ronald Sackville in the case Seven Network Limited v News Limited, otherwise known as the C7 case, or “Kerry Stokes against the world.”
This is a monster (1200 pages, 76mb in rtf format) document, itself the tip of the iceberg consisting of a far more monstrous legal case. The first chapter is a commentary on the case itself and its challenges. Some highlights:
“This case is an example of what is best described as ‘mega-litigation’…Mega-litigation, if it proceeds to finality, often generates very long judgments. Regrettably, this is a prime example.”
“The hearing occupied 120 sitting days…The burden on the Court was not limited to the 120 hearing days… ” Nevertheless “The hearing in the present case was considerably shorter than it might have been.”
One factor which reduced the length of the hearing was the extensive use of an “electronic courtroom.” “It would have been virtually impossible to conduct the trial without the use of modern technology.”
“the volume of closing written submissions filed by the parties was truly astonishing” – but “The written submissions are only a minor component of the ‘paper’ burden in a case like this.”
“What is surprising is the sheer amount of money that has been devoted to a single case…the litigation has cost the parties collectively a staggering sum, amounting to nearly $200 million…In my view, the expenditure of $200 million (and counting) on a single piece of litigation is not only extraordinarily wasteful but borders on the scandalous.”
“I directed the parties to prepare an agreed chronology and encouraged them to agree on a template for written submissions. However, the responses illustrate that parties to mega-litigation are often able effectively to ignore (albeit politely) directions made by the court, if they consider that their forensic interests will be advanced by doing so.”
“The fundamental difficulty facing a court hearing mega-litigation, however, is that the parties may decide, for whatever reason, to engage in a full-blown forensic battle in which almost every barely arguable issue is examined in depth. In these circumstances, the best efforts of the court to limit the scope of the dispute may amount to very little.”
“No doubt courts must endeavour to control mega-litigation more efficiently.”
“the boards and shareholders of public companies embroiled in litigation of this kind need to take a more critical and sustained interest in the proceedings…If there is one lesson to emerge from this case, it is that even the largest and best-resourced corporations owe it to their shareholders, if not to the general public, to think very carefully before committing themselves irrevocably to mega-litigation.”
“the length of written submissions may not be a true reflection of their worth. Very detailed submissions, despite their length, can of course be most helpful in clarifying the issues in dispute and in analysing the complex factual and legal questions requiring resolution. But this is not necessarily so.”
“the parties had not structured their Closing Submissions by reference to an agreed list of topics that had been handed up in court towards the conclusion of the evidence…by and large, they had decided to ignore the ‘agreed’ list of topics. They had taken this course notwithstanding my understanding, derived from discussions in court, that the list would provide a template for the written submissions and, in all probability, for the judgment.”
From a letter to the parties: “Quite apart from their length, I must confess to being surprised about some aspects of the submissions. At the risk of stating the obvious, part of the art of advocacy is to make it easy for the decision-maker to understand what issues need to be resolved and to explain clearly, cogently and concisely how and why the crucial issues should be resolved in favour of a particular party. To leave the Judge, if not completely at large, then without a reliable working compass in a vast sea of factual material, is not a technique calculated to advance a party’s case. This.. is because the cogency and persuasiveness of submissions depends on the ability of the Judge to follow them and to isolate the critical legal and factual issues upon which a case is likely to turn’.”
“Writing a judgment in a case such as this is an extremely onerous task. In part, this is due to the sheer volume of material that must be read, absorbed and analysed. The onerous nature of the task increases in proportion to the complexity of the legal and factual issues requiring resolution. In my view, only those who have undertaken a task of this character and magnitude can appreciate how relentless and indeed stressful it can be.”
“mega-litigation requires the judge to be given every assistance that modern information technology can provide…in future, the setting up and co-ordination of electronic databases in mega-litigation must be carried out under the direct supervision of the Court, not the parties. Moreover, the process must be directed from the outset to meeting the judgment writing needs of the judge. ”
“The conclusion I have reached is that Seven has not succeeded in any of the many causes of action in which it has relied.” Note: this is amazing. They spent 100 million dollars fighting a legal case and didn’t succeed on a single point.
“There is a particular risk associated with mega-litigation that (happily for all concerned, but particularly for me) has not (yet) eventuated in these proceedings. The completion of the trial and the timely preparation of a judgment are contingent upon the trial judge surviving in reasonable health for the entirety of the proceedings…I asked at a pre-trial directions hearing whether the parties in the present case had considered insuring against the risk of judicial death or infirmity. “
Two of these issues – the failure of the lawyers to present their arguments in a manner easily comprehended by the judge, and the need for “every assistance that modern information technology can provide” are the ones of most interest to me and I will address them in a subsequent post.
Pre-structured maps of legal arguments
August 8, 2007
Peter Tillers discusses why DNA can never be regarded, on its own, as conclusive evidence of guilt or innocence.
This post makes me wonder about the possibility of a kind of schematic argument map showing how the argument from say a DNA match to guilt would have to go in some more-or-less general version. This map would display the numerous inferential steps, assumptions etc. – i.e., the numerous points at which the inference might fail.
John Burns, who at the time was quite senior in the Hong Kong police and had experience in training detectives, proposed this kind of idea in masters dissertation. He called them “pre-structured argument maps”. You would have such a map for each typical situation in which a detective might be trying to make the case for guilt, e.g., one for shoplifting. The pre-structured map would embody (a) a good understanding of the overall structure of the case that would have to be made out, and (b) the accumulated wisdom of experienced detectives as to all the bases that need to covered – e.g., the detective would have to have evidence to rebut the defendant’s claim that he already owned the item.
Then, rather than piecing together a case (whether in argument map form, or more traditional format) from scratch, the detective would check off the various aspects of the case on the pre-structured map, removing parts which are inapplicable to the particular situation, etc.. Along the way the detective would be learning what a good case looks like, being exposed to the myriad ways in which the case might be defeated by a clever lawyer, etc.
Powered by ScribeFire.
Age & SMH appearance
July 31, 2007
Brief mention of Austhink Software in The States or Bust in The Age and the Sydney Morning Herald today. (Don’t be scared off by the ugly visages.)
