7  Axioms for Expected Utility Theory

For decision-making under risk, we require two additional axioms of desirable behaviour to develop a predictive or descriptive model. These additional axioms are continuity and the independence of irrelevant alternatives.

That gives us four axioms:

Under these axioms, a decision maker behaves as if choosing between risky prospects by selecting the one with the highest expected utility.

The four axioms are often called the von Neumann–Morgenstern axioms for a rational agent. This gives us another benchmark of “rationality”. An agent is rational if they conform with these four axioms.

One important feature of preferences under these assumptions is that utility is cardinal. The magnitude, not just rank, of the numbers matters.

If you look at other resources on the axioms underlying expected utility theory, you may come across an axiom called the Archimedean property. The Archimedean property is an alternative assumption to continuity. Only one of continuity or the Archimedean property need be assumed. I will not cover the Archimedean property in these notes.

Beyond the axioms of completeness, transitivity, continuity and independence of irrelevant alternatives, some additional axioms are often adopted for practical purposes. These include using a reference point of zero wealth, non-satiation, monotonicity, convexity and diminishing marginal utility.

7.1 Continuity

The idea behind continuity is that people have similar preferences for similar bundles. If x is preferred to y, bundles close to x are preferred to bundles close to y. There are no “jumps” in utility.

Continuity guarantees that every preference relation can be represented by a continuous utility function (and vice versa).

Here are two formal definitions.

7.1.1 Definition 1

A preference relation is continuous if for any x \succ y there exists a number \epsilon > 0 such that every bundle a that is less distant from x than \epsilon and every bundle b that is less distant from y than \epsilon results in a \succ b.

To put this another way, a preference relation is continuous if for any x \succ y there are some neighbourhoods N_\epsilon x and N_\epsilon y around x and y such that for every a\in N_\epsilon x and b\in N_\epsilon x we have a \succ b.

One way to picture this is to imagine a circle around bundle x of radius \epsilon. This circle represents the neighbourhood. There will always exist some circle - even if very small - within which every bundle a is preferred to bundle y.

The intuition behind this definition is that a very small change in your bundle should not result in a sudden switch of your preferences. If you prefer 5 bananas to 2 oranges, you will likely prefer 4.9 bananas to 2 oranges. (And if not, there will be some amount of bananas between 4.9 and 5 that you prefer over 2 oranges.)

Here’s another intuitive example: if you prefer a Mercedes to a Toyota, there will be some level of defect in the Mercedes that you would be willing to accept while still preferring the Mercedes to the Toyota.

7.1.2 Definition 2

If x, y and z are lotteries with x\succcurlyeq y\succcurlyeq z, the continuity axiom requires that there exists a probability p such that y is equally as good as a mix of x and z. That is, there exists p such that:

px+(1-p)z \sim y

The below diagram illustrates continuity under this definition.

On the diagram are three bundles: x, y and z, and each sits on a different indifference curve. The indifference curve that x is on is higher than that of y which is higher than that of z. That is, x\succcurlyeq y\succcurlyeq z.

Now consider a gamble that pays x with probability p and z with probability 1-p. Each value of p would result in a gamble with utility falling between that of x and z. If we were to draw a line between x and z, you could think of the utility of the gamble for each value of p as having the same utility as a bundle on that line. Under the continuity axiom, there would be no holes in that line. For some value of p, that gamble will be on the same indifference curve for y. At that point, px+(1-p)z \sim y.

Code
library(ggplot2)

# Create a data frame
df <- data.frame(
  n = seq(0.05,20,0.05),
  x=NA,
  y=NA,
  z=NA
)

# Fill columns of the data frame
df$x <- (7-df$n^0.5)^2
df$y <- (6-df$n^0.5)^2
df$z <- (5-df$n^0.5)^2

# Plot the line and points
ggplot()+
  geom_line(data = df, mapping = aes(n, x))+ 
  geom_line(data = df, mapping = aes(n, y))+ 
  geom_line(data = df, mapping = aes(n, z))+
  geom_point(aes(x=15, y=(7-15^0.5)^2)) +
  geom_text(aes(x=15, y=10.5, label="x"))+
  geom_point(aes(x=10, y=(6-10^0.5)^2)) +
  geom_text(aes(x=10, y=9, label="y"))+
  geom_point(aes(x=12, y=(5-12^0.5)^2)) +
  geom_text(aes(x=12, y=1.8, label="z"))+

  # Add vertical and horizontal axis lines
  geom_vline(xintercept = 0, linewidth=0.25)+ 
  geom_hline(yintercept = 0, linewidth=0.25)+

  # Add a dashed line segment between two points
  geom_segment(aes(x = 15, y = (7-15^0.5)^2), xend = 12, yend = (5-12^0.5)^2, linetype=2)+

  # Remove the axes labels
  labs(x = "", y = "") +

  # Set the limits of the plot
  coord_fixed(xlim = c(0,20), ylim = c(0,20))+

  # Set the theme
  theme_minimal()

7.1.3 Example of discontinuous preferences

Lexicographic preferences occur where an agent prefers any amount of a good x to any amount of another good y. If choosing between bundles of goods, the agent will choose the bundle with the most x, regardless of the amount of y. They will only consider the amount of y if the amount of x in two bundles is identical.

Consider an agent with lexicographic preferences who is offered the following combinations of x and y.

A. (1, 1)

B. (1, 2)

C. (1.1, 1)

Their preference ranking will be C\succ B \succ A. They prefer C as it has more x than the other two options. As A and B have the same amount of x, the agent distinguishes them based on the quantity of Y, preferring B.

This function is not continuous as there is a “jump” whenever there is an increase in x, even if y is large. Add an infinitesimal amount \epsilon of x to bundle A and the preference relation between bundle A and bundle B flips.

These three bundles A, B and C are represented graphically below.

First, let’s consider these preferences in terms of the first definition, being that there are some neighbourhoods around A and B such that we will always prefer another bundle of goods within the neighbourhood of B to any bundles within the neighbourhood of A. Around A I have drawn a circle of radius \epsilon, which we can consider to be the neighbourhood). No matter how small I draw this circle - that is, no matter how small \epsilon - any bundle within the circle that lies to the right of A (that is, contains x>1) is preferred to bundle B. There is a jump in preferences to the right of A.

Code
library(ggplot2)

# Plot the points
ggplot()+
  geom_point(aes(x=1, y=1)) +
  geom_text(aes(x=1, y=1.1, label="A")) +
  geom_point(aes(x=1, y=2)) +
  geom_text(aes(x=1, y=2.1, label="B")) +
  geom_point(aes(x=1.1, y=1)) +
  geom_text(aes(x=1.1, y=1.1, label="C")) +
  geom_point(aes(x = 1, y = 1), pch=21, size=7) +

  # Add vertical and horizontal axis lines
  geom_vline(aes(xintercept = 0), linewidth=0.25) + 
  geom_hline(aes(yintercept = 0), linewidth=0.25) +

  # Remove x and y axis labels
  labs(x = "x", y = "y") +

  # Set the limits of the plot
  coord_fixed(xlim = c(0,2.2), ylim = c(0,2.2))+

  # Set the theme
  theme_minimal()

One interesting feature of lexicographic preferences is that you cannot draw indifference curves on this figure. If a bundle differs from another, it must be strictly preferred to the other as no amount of y can make up for any amount of x.

We can also consider lexicographic preferences in terms of the second definition of continuity. There is no p for which:

pA+(1-p)C \sim B

When p=1, B \succ A. For any p<1, pA+(1-p)C \succ B as any non-zero share of C makes the combination of A and C preferred.

7.2 Independence of irrelevant alternatives

Consider the following scenarios.

In the first, a person has a choice between an orange and an apple. They state that they strictly prefer the orange.

In the second, they are offered a choice between two gambles. The first gamble is a 50% chance of an orange and a 50% chance of nothing. The second is a 50% chance of an apple and a 50% chance of nothing. They state that they strictly prefer the gamble with a 50% chance of an orange.

Compare the two scenarios. The choice between an apple or an orange from the first scenario is mixed with a 50% probability of getting nothing in the second.

Under the axiom of irrelevant alternatives, a person who prefers the orange in the first will prefer the gamble with the orange in the second. The addition of the irrelevant third option - in this case, a 50% probability of nothing - will not change their order of preference.

More generally, under the axiom of independence of irrelevant alternatives, a person who mixes two gambles with an irrelevant third gamble will maintain the same order of preference when the gambles are mixed as they had for the two original gambles when they were presented independently of the third.

A formal definition states that if:

  • x and y are lotteries with x\succcurlyeq y and
  • p is the probability that a third option z is present, then:

pz+(1-p)x\succcurlyeq pz+(1-p)y

The third choice, z, is irrelevant. The order of preference for x before y holds. It is independent of the presence of z.

7.2.1 Example of independence of irrelevant alternatives

Let us put our earlier example into this formal definition.

Suppose x is an orange and y is an apple. I strictly prefer an orange to an apple.

Suppose there is not a third possibility z of receiving no fruit. Under the independence of irrelevant alternatives, if I prefer oranges to apples, I will prefer a gamble with a 50% chance of getting an orange and 50% chance of receiving nothing to a gamble with a 50% chance of getting an apple and a 50% chance of receiving nothing.

That is:

\text{orange}\succ \text{apple} \Longrightarrow 50\% \text{ chance of orange}\succ 50\% \text{ chance of apple}

7.3 Auxiliary axioms

Beyond completeness, transitivity, continuity and independence, economists often adopt other axioms. These are not required for expected utility theory, but make analysis more practicable.

These include:

  • Reference point of zero wealth
  • Non-satiation
  • Monotonicity
  • Convexity
  • Diminishing marginal utility

I provide further detail on these.

7.3.1 Reference point of zero wealth

When people are considering whether to accept or gamble or compare options, they do not decide from a blank slate. They come with an existing set of resources (wealth), and that wealth may affect their decision. A gamble may be more attractive if someone has more or less wealth.

This necessitates the setting of a “reference point” from which utility is calculated. In Expected Utility Theory, that reference point is typically considered to be zero wealth.

The way this is implemented is we typically calculate utility over total wealth. For example, if offered a gamble where they could win or lose $10, we do not calculate the utility of each option as U(\$10) and U(-\$10). Rather, the utility of each option is calculated as U(W+\$10) and U(W-\$10).

The practical impact of this implementation is that people’s choices may differ depending on their wealth. The same gamble may be accepted or rejected at different levels of wealth.

7.3.2 Non-satiation

The idea behind non-satiation is that no matter what you have, there is always another (nearby) bundle that you would rather have. There is no “maximum” level of utility that you can achieve. Whatever your utility now, you can always increase it.

In this diagram I have plotted an indifference curve. Point x is on the curve. For non-satiation, there will always be a point, such as y, that is strictly preferred to x.

Code
## Load ggplot2
library(ggplot2)

## Create a data frame with 2 columns and 4 rows
df <- data.frame(
  x = c(1, 4, 8, 15, 20),
  y = c(20, 10, 5, 10, 8)
)

# Plot a smooth line through the points
  ggplot()+
    geom_smooth(data = df, mapping = aes(x, y), color = "black", na.rm = TRUE)+

    # Add points and labels
    geom_point(aes(x=8, y=5)) +
    geom_text(aes(x=7.5, y=4.5, label="x"))+
    geom_point(aes(x=8, y=8)) +
    geom_text(aes(x=8, y=7.5, label="y"))+

    # Add vertical and horizontal axis lines
    geom_vline(xintercept = 0, linewidth=0.25)+ 
    geom_hline(yintercept = 0, linewidth=0.25)+

    # Remove x and y axis labels 
    labs(x = "", y = "")+

    # Set the limits of the plot
    coord_fixed(xlim = c(0,20), ylim = c(0,20))+

    # Set the theme
    theme_minimal()

7.3.3 Monotonicity

Preferences are monotone if more of any good in the bundle makes the agent strictly better off. Non-satiation is implied by monotonicity, but not the other way around.

Monotonicity implies downward-sloping indifference curves. This is because any increase of a good in your bundle would take you to a higher indifference curve. A horizontal indifference curve is not feasible as moving along that indifference curve implies more of the good, but that is not possible as monotonicity implies you are better off and hence on a higher indifference curve.

This can be seen in the following diagram. Point x lies on the indifference curve. Increasing the amount of either good will take you to a higher indifference curve. That is true for all points on that indifference curve.

Code
## Load ggplot2
library(ggplot2)

## Create a data frame with 2 columns and 4 rows
df <- data.frame(
  x = c(1, 4, 8, 15, 20),
  y = c(20, 10, 8, 6, 3)
)

# Plot a smooth line through the points
  ggplot()+
    geom_smooth(data = df, mapping = aes(x, y), color = "black", na.rm = TRUE)+

    # Add points and labels
    geom_point(aes(x=8, y=8)) +
    geom_text(aes(x=8, y=7.5, label="x"))+

    # Add vertical and horizontal axis lines
    geom_vline(xintercept = 0, linewidth=0.25)+ 
    geom_hline(yintercept = 0, linewidth=0.25)+

    # Remove x and y axis labels 
    labs(x = "", y = "")+

    # Set the limits of the plot
    coord_fixed(xlim = c(0,20), ylim = c(0,20))+

    # Set the theme
    theme_minimal()

7.3.4 Convexity

Convexity means that people have a preference for variety or combination (indifference curves bulge toward the origin). Averages are better than extremes.

In many contexts this makes sense. For example, suppose you are indifferent between two beers and two meat pies. Under this assumption, any mix of the two, such as a beer and a pie) will be at least as preferred as the two beers or two pies.

Formally, a preference relation is convex if, for any x\succcurlyeq y and for every \theta\in[0,1]:

\theta x+(1-\theta)y\succcurlyeq y

This definition is illustrated in the following diagram. The curve represents an indifference curve for different combinations of two goods. There are two bundles, x and y. In this case, x\succcurlyeq y (as x\sim y). Any weighted combination of x and y, which would be on the line between the two, can be seen to be strictly preferred to either x or y as it would be on a higher indifference curve (a curve further from the origin).

Code
library(ggplot2)

# Create a data frame
df <- data.frame(
  n = seq(0.05,20,0.05),
  x=NA
)

# Fill column x of the data frame
df$x <- (5-df$n^0.5)^2

# Plot the line and points
ggplot()+
  geom_line(data = df, mapping = aes(n, x))+ 
  geom_point(aes(x=5, y=(5-5^0.5)^2)) +
  geom_text(aes(x=5, y=7, label="x"))+
  geom_point(aes(x=15, y=(5-15^0.5)^2)) +
  geom_text(aes(x=15, y=1, label="y"))+

  # Add vertical and horizontal axis lines
  geom_vline(xintercept = 0, linewidth=0.25)+ 
  geom_hline(yintercept = 0, linewidth=0.25)+
  
  # Add a dashed line segment between the two points
  geom_segment(aes(x = 5, y = (5-5^0.5)^2), xend = 15, yend = (5-15^0.5)^2, linetype=2)+

  # Remove x and y axis labels
  labs(x = "", y = "") +

  # Set the limits of the plot
  coord_fixed(xlim = c(0,20), ylim = c(0,20))+

  # Set the theme
  theme_minimal()

One point to note from this diagram is that if the indifference curve were a straight line, any point on a line between x and y would also be on that line, and weakly preferred to x and y. This would still be a convex curve.

This contrasts with strict convexity. Strict convexity is where, for any x\sim y, x\neq y and for every \theta\in[0,1]:

\theta x+(1-\theta)y\succcurlyeq y

\theta x+(1-\theta)y\succcurlyeq x

For two equivalent goods or bundles, a weighted average of the two bundles is better than each of those bundles.

7.3.5 Diminishing marginal utility

Suppose you want some chocolate. You eat a piece. You then eat another. And another. How much utility do you imagine getting from the first piece compared to the 100th piece? The first piece will likely be much more satisfying than the 100th. This is the idea of diminishing marginal utility.

Marginal utility is how much utility you get or lose from an incremental decrease or increase in consumption. Under diminishing marginal utility, each successive additional unit of consumption delivers a smaller (diminishing) amount of utility than the last.

Diminishing marginal utility leads to risk-averse preferences. Someone is risk averse if they strictly prefer the expected value of a gamble to the gamble itself.

Diminishing marginal utility is related to the axiom of convexity. Diminishing marginal utility will lead to convex indifference curves. However, the reverse relationship does not always hold.