Tuesday, April 22, 2014

Objective probability or automatic nonsense?

A follow-up to the previous probability post.

Perhaps this will provide a clearer demonstration of the limitations of Nic's method. In his post, he conveniently provided a simple worked example, which in his view demonstrates how well his method works. A big advantage of using his example is that hopefully no-one can argue I've misapplied his method :-) This is Figure 2 from his post:



This example is based on carbon-14 dating, about which I know very little, but hopefully enough to explain what is going on. The x-axis in the above is real age with 0 corresponding to the "present day", which I think is generally defined as 1950 (so papers don't need to be continually reparsed as time passes). The y-axis is "carbon age" which is basically a measure of the C14 content of something under investigation, typically something organic (plant or animal). The basic idea is that the plant or aminal took up C14 as it grew, but this C14 slowly decays so the proportion in the sample declines after death according to the C14 half-life. So in principle you would think that the age (at death) can be determined directly from measurement of the proportion of carbon that is C14. However, the proportion of C14 in the original organism depends on the ambient concentration of C14 which has varied significantly in the past (it's created by cosmic rays and the like), so there's quite a complicated calibration curve. The black line in the above is a simplified and stylised version of what a curve could look like (Nic's post also has a real calibration curve, but this example is clearer to work with).

So in the example above, the red gaussian represents a measurement of radiocarbon which represents a "carbon age" of about 1000y, with some uncertainty. This is mapped via the calibration curve into a real age distribution on the x-axis, and Nic has provided two worked examples using a uniform prior and his favoured Jefferies prior.

As some of you may recall, I lived in Japan until recently. Quite by chance, my home town of Kamakura was the capital of Japan for a brief period roughly 7-800y ago. Lots of temples date from that time, and there are numerous wooden artefacts which are well-dated to the Kamakura Era (let's assume, carved out of conteporaneous wood, though of course wood is generally a bit older than the date of the tree felling). Let's see what happens when we try to carbon-date some of these artefacts using Nic's method.

Well, one thing that Nic's method will say with certainty is "this is not a Kamakura-era artefact"! The example above is a plausible outcome, with the carbon age of 1000y covering the entire Kamakura era. Nic's posterior (green solid curve) is flatlining along the axis over the range 650-900y, meaning zero probability for this whole range. The obvious reason for this is that his prior (dashed line) is also flatlining here, making it essentially impossible for any evidence, no matter how strong, to overturn the prior presumption that the age is not in this range.

It is important to recognise that the problem here is not with the actual measurement itself. In fact the measurement shown in the figure indicates very high likelihood (in the Bayesian sense) of the Kamakura era. The problem is entirely in Nic's prior, which ruled out this time interval even before the measurement was made - just because he knew that a measurement of carbon age was going to be made!

Nic uses the emotionally appealing terminology of "objective probability" for this method. I don't blame him for this (he didn't invent it) but I do wonder whether many people have been seduced by the language without understanding what it actually does. You can see Richard Tol insisting that the Jefferies prior is "truly uninformative" in a comment on my previous post, for example. Well, that might be true, but only if you define "uninformative" in a technical sense not equivalent to common english usage. If you then use it in public, including among scientists who are not well versed in this stuff, then people are going to get badly misled. Frame and Allen went down this rabbit hole a few years ago, I'm not sure if they ever came out. It seems to work for many as an anchoring point, when you discuss in detail, they acknowledge that yes, it's not really "uninformative" or "ignorant" really, but then they quickly revert back to this usage, and the caveats somehow get lost.

I propose that it would be better to use the term "automatic" rather than "objective". What Nic is presenting is an automatic way of generating probabilities, though it remains questionable (to put it mildly) whether they are of any value. Nic's method insists that no trace remains of the Kamakura era, and I don't see any point in a probabilistic method that generates such obvious nonsense.

35 comments:

Pekka Pirilä said...

As far as I understood his comments correctly, Nic agrees that the pdf obtained using Jeffreys' prior is false, but he argues that Jeffreys' prior does still give uniformative confidence intervals.

I think he could maintain that view by looking only at cases where the whole flat part is included in the confidence interval. In such cases the resulting confidence interval appears reasonable, but considering cases where the empirical distribution of C14 date is moved up or down so that the flat parts fall near the edge of the distribution, the confidence intervals get equally strangely behaving as the pdf.

Pekka Pirilä said...

Adding on the uninformativeness of Jeffeys' prior.

Jeffreys' priors are based on simple rules that can be applied to wide class of problems. The are uninformative on the level of that whole class, i.e., they can be considered uninformative, when we know nothing on the problem being studied.

When we move from the class of all problems to a single problem, the situation changes. There's prior knowledge about the problem even in absence of prior knowledge on the result being searched for. In the case of radiocarbon dating the prior knowledge tells that certain present C14 values have been produced in samples of a wide range of past ages, while other radiocarbon dates are an outcome of far fewer real dates.

This is knowledge about the method. From this knowledge we can conclude that Jeffreys' prior is highly informative for this problem.

In natural sciences this kind of situations are the rule, not the exception. It's very common to be able to conclude that Jeffreys' prior is highly informative either with great certainty or at least likely.

More generally there's no inherent reason to favor Jeffreys' prior as more uninformative than other priors in any question of natural sciences. it's just one among many, sometimes reasonable, sometimes not.

Richard S J Tol said...

James: There is nothing intuitive about statistics and probability. Indeed, the hard part of teaching statistics is to erase sloppy intuition and replace it with rigorous mathematics. There is a large literature in psychology and statistics that shows that most stat teachers are not particularly successful in this task.

So, indeed "uninformative" does not refer to the common English usage of the word "information". Instead, "uninformative" means "proportional to the square root of the determinant of the Fisher information", where Fisher Info is the accepted, rigorous definition of information.

James: While you may think you are going after Nic Lewis and me, you really are going after Harold Jeffreys, Ronald Fisher, Claude Shannon, Sol Kullback and Richard Leibler.

Science is never settled, of course, but these are big fellas to take on -- and showing them wrong would be worthy of a Fields Medal (although I guess you're too old for that).

Pekka Pirilä said...

The Fisher information is not unique, when the measure of the space is not given externally. Giving it externally in different ways and determining the Jeffreys' prior based on that leads to different Jefreys' priors.

There's no fundamentally uniquely defined maximally uninformative prior.

Richard S J Tol said...
This comment has been removed by the author.
Richard S J Tol said...

Pekka: Granted (but a bit a meta would get around that).

I don't think that this is quite James' point though.

EliRabett said...

FWIW C14 curves end in 1950 because atmospheric H-bomb testing hockey sticks the curve.

James Annan said...

Richard, Pekka's point was indeed not my specific point here, but nevertheless it is an important one that I've made before: using "uninformative" in this way just kicks the can down the road, since we than have to choose the measure in an arbitrary and (dare I say it) subjective manner.

Anyway, rather than trying to parse what we all think Nic's main point might really have been (I've seen your twittering) can we at least all agree that in this example, which I remind you was Nic's own demonstration of his preferred method, the method generates nonsensical and worse-than-worthless answers, because it repeatedly and reliably assigns zero probability to the truth?

I also think that if people said their prior was proportional to the determinant of Fischer information for a specified measure, that would be a lot less misleading than calling it objective or uninformative!

James Annan said...

I hope someone will correct me if I'm wrong, but is it not the case that any prior can be described as the square root of the determinant of the Fischer information, given a suitably defined measure?

Richard S J Tol said...

James: Doug McNeall may have convinced me that I was too hasty.

I'd prefer to use the term Jeffreys prior rather than uninformative prior, because only informed people will challenge the use of a Jeffreys prior, while fools rush in to condemn uninformative ones.

Jeffreys priors follow from the data model. If you're uncertain about the data model, you can define a PDF about that and continue as before.

Pekka Pirilä said...

I don't dare to give an opinion on, whether any prior can be described as the square root of the determinant of the Fischer information, but many and very different ones certainly can.

The theory of Haar measures might give the answer for that question, but I know very little about Haar measures.

To me it's more significant that Jeffreys' prior may lead to nonsensical results. Being maximally noninformative with respect to some measure is not a useful property, if the measure is pathological, and I would call a measure that corresponds to the C14 values pathological, when the issue being studied is the determination of the true age of a sample.

Richard S J Tol said...

While the discussion here and elsewhere focuses on choice of prior, the problem at hand is a degenerate likelihood.

Using a prior to fix a likelihood is a bad idea.

Pekka Pirilä said...

Isn't the problem rather a degenerate transfer function than a degenerate likelihood.

The likelihood derived from the empirical results is well defined. It's constant over wide ranges, i.e. it cannot differentiate at all between values in these ranges, but that doesn't lead to any technical difficulties or in any problems in interpreting directly the likelihoods.

All problems are related to the prior and they are serious specifically, when an attempt is made to use the Jeffreys' prior. That leads to an obviously nonsensical PDF.

Due to the flat likelihood function, any choice of prior has a strong influence. That may be considered a problem, but it's a problem of priors.

Another way of describing the situation is that radiacarbon dating is useless as a tool for telling what the true value is within the range 500-950 years or in the range 1050-1700 years, and of limited value also in making a difference between these two ranges. This idealized radiocarbon dating is useful only for dates more recent than 400 years or in the range 1750-2000 years.

This is weakness of the method. It's not uncommon that various methods have good resolving power only in a limited range of the quantity to be determined.

Richard S J Tol said...

Pekka: The likelihood on the carbon date is fine. The likelihood on the calendar date is degenerate. The latter matters.

A degenerate likelihood implies a degenerate prior -- as it should -- and together they make a degenerate posterior -- as it should.

Fix the transfer function and all is fine.

Leave the prior out of it.

Pekka Pirilä said...

That part of my comment was purely semantic, i.e. about understanding the meaning of the expression.

I don't think we have any disagreement on the content (on this point at least), I was just proposing a different way of using words. It may well be that my proposal was not a good one.

EliRabett said...

Prekka, the stylized curve that Lewis uses hides a lot. It forces a strong peak in the prior where the wiggle is in the center. It would do so where there is any wiggle, and indeed, because there are lots of wiggles in the real calibration data, you have to use metadata to get sensible answers and build sensible priors.

http://www.geo.arizona.edu/palynology/geos462/10radiometric.html

Pekka Pirilä said...

Eli,

A lot of information is used to create the calibration curve, i.e., to determine the relationship between measured C14 and the true age. When that relationship has been determined, the likelihoods of different true ages that correspond to a particular measured C14 value can be determined. Up to that point of the analysis no priors are used.

Priors enter if and when the likelihood values are used to determine posterior probabilities (a pdf or confidence intervals). Thus it's possible to perform the whole empirical work and present it's results in full without any reference to any prior. The limitation of that is that the results are not probabilities but (relative) likelihoods. The likelihood curve need not integrate to one. Actually it's not appropriate to calculate any integrals of it, because that would lead to a confusion with probabilities.

aaronsheldon said...

Those contrived statistical machinations are of a minor concern compared to the (unreported) measurement uncertainty in the calibration curve.

That's the largest source of error! and its not even reported. Sloppy.

Gavin Cawley said...

Richard S. Tol wrote: "There is nothing intuitive about statistics and probability."

I disagree, the Bayesian conception of probability is straightforward and intuitive (but the analysis is often mathematically taxing), the problem with "statistics" is that it is generally performed in a frequentist setting, where the analysis is easy, but the definition of a probability is deeply unintuitive. Ideally both sets of tools should be in our toolbox though.

"Indeed, the hard part of teaching statistics is to erase sloppy intuition and replace it with rigorous mathematics."

No, if you do that you only mask the sloppy intuition with a veneer of mathematics, but the sloppy thinking is still there. To apply probability and statistics correctly you need your intuition to be correct as well. Much better to fix the sloppy intuition and then reinforce it with mathematical rigour.

"So, indeed "uninformative" does not refer to the common English usage of the word "information". Instead, "uninformative" means "proportional to the square root of the determinant of the Fisher information", where Fisher Info is the accepted, rigorous definition of information."

This is the sort of thing that results from being overly concerned with mathematical rigour rather than common sense (i.e. intuition). Why describe Jeffreys' prior as "uninformative" (which it isn't) rather than "invariant" (which it is in a fairly general sense)? What is gained by this?

If you want to make probability and statistics intuitive then needlessly using terminology in a way that is not in accordance with everyday usage seems like a bit of an own goal.

Don't misunderstand me, I like mathematical rigour, but I also like engineering common sense, and the two are not mutually exclusive.

Gavin Cawley said...

This example clarifies very well just what is wrong with unthinking application of the Jeffrey's prior. In what way does the true age of the objects being analyzed depend on the calibration curve? It doesn't, there is no good reason to suppose that more objects made 2000-1700, about 1000 and 500-0 years ago are more likely to be subjected to analysis than objects produced during the intervening periods, so why should the prior favour those intervals? It shouldn't. The uniform prior seems much more in accord with what we actually don't know.

While a Jeffrey's prior may have mathematical rigour, that doesn't mean we shouldn't ignore common sense, or check the consequences of the mathematically rigorous prior.

I suspect part of the problem would go away if the uncertainty in estimating the calibration curve were taken into account.

Pekka Pirilä said...

As far as I have understood, the uncertainty in the calibration curve has been analyzed and taken into account in the software used in interpreting C14 values. That uncertainty seems, however, to be very small in comparison with the uncertainty in the determination of the C14 value, and therefore a very minor factor in these considerations.

The main original source in the uncertainty is in the determination of the C14 value, but that uncertainty may be amplified greatly by the unfortunate form of the calibration curve (the real one, not only the artificial one shown in this post).

EliRabett said...

Prekka, what do you mean by uncertainty in the C14 value? There are several steps from specimen to C14 content and then more in going from C14 content to calendar age.

Pekka Pirilä said...

Eli,

What I mean can be explained by this figure from Nic's post and further from Keenan (2012) according to Nic.

In that case the red distribution tells about the uncertainty in the determination of the C14 value, and the width of the narrow bluish band about the uncertainty in going from a given C14 value to real date. (The wiggles of the band add another aspect of uncertainty.)

EliRabett said...

Prekka, Eli's question is what is the source of the width of the pink "Gaussian"

Pekka Pirilä said...

Eli,

(You are often too concise and cryptic for me. That may be, in part, due to cultural differences.)

That's a part of the problem that I have just accepted as correct enough presentation of all uncertainties related to the sample and the actual data collection.

By "correct enough" in the above I mean that the correct distribution is roughly that wide in comparison to the width of the bluish band and its wiggles.

Little of what I have written is dependent even on that, but there might be something that is.

EliRabett said...

Pekka,

The width of the pink Gaussian on the side of the figures seems, at least to Eli, to be a by gosh and golly seat of the pants estimate handed down by a guru of all of the different sources of variation in the counting experiment, but not just for a particular sample.

Finding a real error budget has proven difficult, and it appears that most people just use what pops out of a program that all the C14 folk use. There is no doubt that that curve is totally frequentest, and as such, (you were the first to mention this?) the place where Bayseans could help is finding a better way to estimate that width.

Pekka Pirilä said...

Eli,

What you write sounds reasonable, but for me to say something more specific would require digging deep in the procedures.

lotta blissett said...

Rabbit, the source of the width is explained in Keenan(2012), referring to Stuiver and Polach(1977)

(A Poisson radiation process leading to logGaussian, approximated to Gaussian. The measurement protocol can probably further be fine tuned for age of the samples etc. Also the width can be further reduced to practically zero if you have the money to pay for that)

Pekka Pirilä said...

Coming back to the question of Eli, I had some more ideas about an answer.

In the Bayesian approach, what is determined from the experiment are likelihoods. That's what the experiment tells, pdf's require also a prior.

What the experiment really tells is one single number, an integer that's the count recorded by the detector and the counter. As such this is a precise number with no uncertainty. All uncertainties can be handled elsewhere.

Now we wish to calculate the likelihoods that exactly this number is observed for all the values of the variables we are interested in. In this case we have only a single variable, the calendar age.

Calculating the likelihoods consists of following steps:

1) Determine the amount of carbon in the sample, both of the age being studied and possible contamination from other times and the efficiency of the detector in observing C14 decays taking into account geometry and all other factors. Present all this information as pdf's (assuming Gaussian distributions is probably justified).

2) Determine from the calibration data (the band with data on its shape) the probabilities of each C14 age given the precise calendar age. Do that also for the ages of possible contamination.

3) For each C14 age calculate the pdf of counts taking into account the uncertainties of step one. If the uncertainties are small the distribution is a Poisson distribution, with corrections it could be a convolution of Gaussian and Poisson. Again do that also for the C14 ages of potential contamination. Take into account also effects like the time between the growth of a tree and manufacturing of the sample being studied and other comparable factors.

4) Combine the results of steps (2) and (3) to get the probability of the actually observed count. These probabilities form the relative likelihoods of each calendar date.

(The probabilities of all counts add up to one for each calendar age. Picking from the same set of numbers the probabilities of a single value of count for every calendar age results in relative likelihoods that do not add up to one, and should not be summed at all without the addition of a prior.)

Up to this point there should not be much disagreement. We have converted all relevant data to a set of likelihoods. Doing that we have extracted all the information the measurement can tell about the calendar date.

As a result we have an unnormalized likelihood function that tells in relative terms, how strongly the measurement favors some calendar ages over others. To give confidence intervals or full pdf's we must add a prior. It makes absolutely no sense to determine the prior based on the empirical setup. How we perform the measurement has no bearing on the probability of various ages of the sample. The prior must be fixed by some other arguments. It could be uniform in calendar time or it could be inversely proportional to the age, or we might use some additional information pertinent to that specific sample. That's up to the person who interprets the results. The measurement can tell only the relative likelihoods.

In steps (1), (2), and (3) pdf's of contributing factors are used. They are real probabilities that describe some effectively random contributions to the expected count for a given calendar age.

EliRabett said...

Pekka, take a look at this presentation esp down to the bottom. The preparation of samples looks as if it requires a great deal of care and expertise, and in Eli's experience that varies all over the place.

Captcha CERVECERIA. Eli wins the week

Pekka Pirilä said...

Eli,
Thanks for the link.

In my latest comment I tried to explain very briefly, how the Bayesian approach can be extended more explicitly to the handling of all the types of uncertainties those slides tell about.

You are surely right that all users of the method are not equally competent. Errors are made in doing the preparatory steps, and in interpreting the results.

There's hardly any field of science, where statistical methods are always used correctly. Well prepared standard software is of great help in that. The proposals of Kennan and Lewis would represent a really serious step in a wrong direction, but there might also be much potential for improvement. That's the part I cannot say anything more that to consider it likely, because errors are so common all around.

crandles said...

Perhaps I am reading it wrong, but to me it seems that the calibration curve is saying that carbon-14 dating is a hopeless method if you want to determine whether an item is 700-800 years old or just a bit younger or older as these give identical results.

So, if you are already confident your item is 600-900 years old and you do use such a method then prior and posterior should be very similar.

It seems to me there are more outcomes than just this item is now more likely (less likely) to be 700-800 years old.

This example brings out the conclusion that this method is not very good at telling.

It seems using more than one prior remains sensible to help get at other possibilities rather than aiming for one invariant prior.

Pekka Pirilä said...

Crandles,

Your conclusion is correct. The consequences of that problem are probably the reason that the confidence limits given for calendar date of samples listed on page 10 of this presentation have very often upper or lower limits in the ranges 350-420 BC and 750-830 BC, but in no case in the range 510-750 BC and in only two cases in the range 420-510 BC.

The method is very weak in resolving dates 400-750 BC, therefore those dates occur rarely in as end points of confidence intervals, but the confidence interval includes often this whole range.

EliRabett said...

Horrors, all agree. Time for another blogger ethics panel

Tom Fiddaman said...

Regarding "one thing that Nic's method will say with certainty is "this is not a

Kamakura-era artefact"! ... Nic's posterior (green solid curve) is flatlining along the

axis over the range 650-900y, meaning zero probability for this whole range. The obvious

reason for this is that his prior (dashed line) is also flatlining here, making it

essentially impossible for any evidence, no matter how strong, to overturn the prior

presumption that the age is not in this range."

I think this is actually an incorrect statement of what the figure shows.

Statements like "x is in this range" are about the CDF, not the pdf. Nic's posterior pdf

here says that there's a low probability that an artifact lies in any _individual_ year

between 650 and 900. That's a natural consequence of the fact that the chunk of C14

density that maps to that era is spread over a long calendar age, due to the degenerate

transfer function. But it's not the same as saying the era is excluded; if you look at the

CDF, its 90% or 95% range would certainly include the era.

Nic's posterior matches what you'd get if you generate a random sample from the Normal C14

age distribution and pass it through the calibration curve, which seems like a pretty

intuitive procedure to me. Of course, this begs the question of why you'd bother

constructing an arcane prior in calendar space when it's much easier in C14 space, which

he actually discusses in the text.

Even though Nic's argument strikes me as technically correct as far as it goes, the

trimodal prior on calendar age is intuitively repugnant. So, I went back to the Bronk

Ramsey methods paper.

I think Nic has actually missed an absolutely crucial point. In his zeal for an

uninformative p(calendar age), he skipped over BR's explanation, "This prior reflects that

fact that some 14C measurements (notably those on the plateaus) are more likely than

others." In other words, BR is making the auxiliary assumption that there is a constant

rate artifact-generating process.

The distribution of c14 age measurements arising from a constant artifact-generating process is such that the likelihood of observing a given measurement is highest exactly where the Fisher information from the transform (i.e. the slope) is lowest. The two effects cancel, rendering the uniform prior for calendar age sensible. Score one for intuition.

This means that, if you present Nic's method with a random selection of artifacts from the real world, it'll perform poorly.

You could still argue about the true or least informative distribution for artifacts - I would guess that there's actually a survival bias that renders it nonuniform, but rather gently.