Bayesianism is the philosophical tenet that the mathematical theory of probability applies to the degree of plausibility of statements, or to the degree of belief of rational agents in the truth of statements; when used with Bayes theorem, it then becomes Bayesian inference. This is in contrast to frequentism, which rejects degree-of-belief interpretations of mathematical probability, and assigns probabilities only to random events according to their relative frequencies of occurrence. The Bayesian interpretation of probability allows probabilities assigned to random events, but also allows the assignment of probabilities to any other kind of statement.
Whereas a frequentist and a Bayesian might both assign probability 1/2 to the event of getting a head when a coin is tossed, only a Bayesian might assign probability 1/1000 to personal belief in the proposition that there was life on Mars a billion years ago, without intending to assert anything about any relative frequency.
History of Bayesian probability
"Bayesian" probability or "Bayesian" theory is named after Thomas Bayes, who proved a special case of what is called Bayes' theorem. (However, the term "Bayesian" came into use only around 1950, and in fact it is not clear that Bayes would have endorsed the very broad interpretation of probability now called "Bayesian".) Laplace independently proved a more general version of Bayes' theorem and put it to good use in solving problems in celestial mechanics, medical statistics and, by some accounts, even jurisprudence.
For instance, Laplace estimated the mass of Saturn, given orbital data that were available to him from various astronomical observations. He presented the result together with an indication of its uncertainty, stating it like this: 'It is a bet of 11000 to 1 that the error in this result is not within 1/100th of its value'. He would have won the bet, as another 150 years' accumulation of data have changed the estimate by only 0.63%.
The general outlook of Bayesian probability, promoted by Laplace and several later authors, has been that the laws of probability apply equally to propositions of all kinds. Several attempts have been made to ground this intuitive notion in formal demonstrations. One line of argument is based on betting, as expressed by Bruno de Finetti and others. Another line of argument is based on probability as an extension of ordinary logic to degrees of belief other than 0 and 1. This argument has been expounded by Harold Jeffreys, Richard T. Cox, and Edwin Jaynes. Other well-known proponents of Bayesian probability have included L. J. Savage, Frank P. Ramsey, John Maynard Keynes, and B.O. Koopman .
The frequentist interpretation of probability was preferred by some of the most influential figures in statistics during the first half of the twentieth century, including R.A. Fisher, Egon Pearson, and Jerzy Neyman . The mathematical foundation of probability in measure theory via the Lebesgue integral was elucidated by A. N. Kolmogorov in the book Foundations of the Theory of Probability in 1933. Thus for some decades the Bayesian interpretation fell out of favor. Beginning about 1950 and continuing into the present day, the work of Savage, Koopman, Abraham Wald, and others has led to broader acceptance. Nevertheless, the rift between the "frequentists" and "Bayesians" continues up to this day, with mathematicians working on probability theory and empirical statisticians not talking to each other for the most part, not attending each others conferences, etc...
Varieties of Bayesian probability
The terms subjective probability, personal probability, epistemic probability and logical probability describe some of the schools of thought which are customarily called "Bayesian". These overlap but there are differences of emphasis.
Subjective probability is supposed to measure the degree of belief an individual has in an uncertain proposition.
Some Bayesians do not accept the subjectivity. The chief exponents of this objectivist school were Edwin Thompson Jaynes and Harold Jeffreys. Perhaps the main objectivist Bayesian now living is James Berger of Duke University. Jose Bernardo and others accept some degree of subjectivity but believe a need exists for "reference priors" in many practical situations.
Advocates of logical (or objective epistemic) probability, (such as Harold Jeffreys, Richard Threlkeld Cox, and Edwin Jaynes), hope to codify techniques that would enable any two persons having the same information relevant to the truth of an uncertain proposition to independently calculate the same probability. Except for simple cases the methods proposed are controversial. Critics challenge the suggestion that it is possible or necessary in the absence of information to start with an objective prior belief which would be acceptable to any two persons who have identical information.
Bayesian and frequentist probability
The Bayesian approach is in contrast to the concept of frequency probability where probability is held to be derived from observed or imagined frequency distributions or proportions of populations. The difference has many implications for the methods by which statistics is practiced when following one model or the other, and also for the way in which conclusions are expressed. When comparing two hypotheses and using some information, frequency methods would typically result in the rejection or non-rejection of the original hypothesis with a particular degree of confidence, while Bayesian methods would suggest that one hypothesis was more probable than the other or that the expected loss associated with one was less than the expected loss of the other.
Bayes' theorem is often used to update the plausibility of a given statement in light of new evidence. For example, Laplace estimated the mass of Saturn (described above) in this way. According to the frequency probability definition, however, the laws of probability are not applicable to this problem. This is because the mass of Saturn is a constant and not a random variable, therefore, it has no frequency distribution and so the laws of probability cannot be used.
Applications of Bayesian probability
Today, there are a variety of applications of personal probability that have gained wide acceptance. Some schools of thought emphasise Cox's theorem and Jaynes' principle of maximum entropy as cornerstones of the theory, while others may claim that Bayesian methods are more general and give better results in practice than frequency probability. See Bayesian inference for applications and Bayes' Theorem for the mathematics.
Bayesian inference is proposed as a model of the scientific method in that updating probabilities via Bayes' theorem is similar to the scientific method, in which one starts with an initial set of beliefs about the relative plausibility of various hypotheses, collects new information (for example by conducting an experiment), and adjusts the original set of beliefs in the light of the new information to produce a more refined set of beliefs of the plausibility of the different hypotheses. Similarly the use of Bayes factors has been put forward as justifications for Occam's Razor.
Bayesian techniques have recently been applied to filter out e-mail spam with good success. After submitting a selection of known spam to the filter, it then uses their word occurrences to help it discriminate between spam and legitimate email.
See Bayesian inference and Bayesian filtering for more information in this regard.
Problem of evidence in Bayesian probability
One criticism which has been levelled at the Bayesian probability interpretation is that the probability itself cannot convey how much evidence one has. Consider the following three situations:
- You have a box and you know that there are some white and some black balls in it.
- You have a box and you tried to draw some balls from it, and half the time you drew a black one and the other half a white one.
- You have a box and you know that there is exactly the same amount of white and black balls in it.
If you would have to assign Bayesian probability of "drawing black ball", you would assign probability 1/2 in all these cases. But the situations are not all the same, because you have different amount of evidence. Several suggestions have been made by frequentist statisticians. Cedric Smith and Arthur Dempster each, quite separately, developed a theory of upper and lower probabilities . Glenn Shafer developed Dempster's theory further, and it is now known as Dempster-Shafer theory. Bayesians do not consider it a problem at all, and argue that it is rational, on decision theory grounds, to assign a probability of 1/2 in each case. For example, Howard Raiffa discusses similar issues in detail in his book Decision Analysis: Introductory Lectures on Choices under Uncertainty. The decision theorist Daniel Ellsberg pointed out that in experiments where participants were presented with a choice between two gambles, one with precisely determined odds and one with vaguely known ones, most people chose the gamble with precise odds. This irrationality (from the point of view of decision theory) is known as Ellsberg's paradox .
External links and references
On-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay, has many chapters on Bayesian methods, including introductory examples; compelling arguments in favour of Bayesian methods (in the style of Edwin Jaynes); state-of-the-art Monte Carlo methods, message-passing methods , and variational methods; and examples illustrating the intimate connections between Bayesian inference and data compression.
- Jaynes, E.T. (1998) Probability Theory : The Logic of Science.
- Bretthorst, G. Larry, 1988, Bayesian Spectrum Analysis and Parameter Estimation in Lecture Notes in Statistics, 48, Springer-Verlag, New York, New York;
- David Howie: Interpreting Probability, Controversies and Developments in the Early Twentieth Century, Cambridge University Press, 2002, ISBN 0521812518
- Colin Howson and Peter Urbach: Scientific Reasoning: The Bayesian Approach, Open Court Publishing, 2nd edition, 1993, ISBN 0812692357, focuses on the philosophical underpinnings of Bayesian and frequentist statistics. Argues for the subjective interpretation of probability.
- Jeff Miller "Earliest Known Uses of Some of the Words of Mathematics (B)"
- Paul Graham "Bayesian spam filtering"
- novomind AG "Outlook categorizing tool based on Bayesian filtering"
- Howard Raiffa Decision Analysis: Introductory Lectures on Choices under Uncertainty. McGraw Hill, College Custom Series. (1997) ISBN: 007-052579-X
Last updated: 05-10-2005 06:47:51
Last updated: 05-13-2005 07:56:04