Overview: Probability distributions
How to use
This overview of probability distributions, will (where available) contain the following content on each distribution:
Where to use
Notation
Parameter(s) for use
Mean/expected value. In all cases, the mean equals the expected value, and the expected value of \(X\) will be represented as \(\mathbb{E}(X)\). See Guide: Expected values, variance, standard deviation for more.
Variance, which is represented as \(\mathbb{V}(X)\). See Guide: Expected values, variance, standard deviation for more.
Probability mass function (PMF) for discrete random variables, or the probability density function (PDF) for continuous random variables. See Guide: PMFs, PDFs, CDFs for more.
Cumulative distribution function (CDF), where possible. See Guide: PMFs, PDFs, CDFs for more.
Interactive figure, where possible.
Example of usage.
Although this overview is not an exhaustive list of all possible probability distributions, this overview can be treated as an introduction to commonly used types of distributions.
Discrete random variables
Uniform distribution (discrete)
Where to use: The discrete uniform distribution is used when all integer outcomes \(x\) in the interval \(a\) to \(b\) are equally likely. \(X\) is a random variable for integer outcomes \(x\) where for \(a \leq x \leq b\), and the probability of each outcome \(1/n\), where \(n = b - a + 1\).
Notation: \(X \sim \textrm{Uniform}(a,b)\) or \(X \sim U(a,b)\)
Parameters: The numbers \(a,b\) are integers where
- \(a\) is the minimum value of an outcome
- \(b\) is the maximum value of an outcome
There are \(n\) outcomes in total, with \(n = b - a + 1\).
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = \dfrac{a+b}{2}.\) | |
Variance | \(\mathbb{V}(X) = \dfrac{n^2-1}{12}.\) | |
PMF | \(\mathbb{P}(X=x)=\frac{1}{n}\) | |
CDF | \(\mathbb{P}(X\leq x)= \begin{cases} 0 & \textsf{if } x \leq a \\\dfrac{\lfloor x \rfloor - a + 1}{n} & \textsf{if } a< x<b \\1 & \textsf{if } x \geq b \end{cases}\) | \(\lfloor x \rfloor\) is the floor function |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 750
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "Discrete Uniform distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("a", "Minimum value (a):", value = 1, step = 1),
numericInput("b", "Maximum value (b):", value = 10, step = 1),
hr(),
radioButtons("prob_type", "Probability to Calculate:",
choices = list("P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "less"),
conditionalPanel(
condition = "input.prob_type == 'less' || input.prob_type == 'greater'",
sliderInput("x_value", "x value:", min = 1, max = 10, value = 5, step = 1)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = 1, max = 10, value = 3, step = 1),
sliderInput("x_upper", "Upper bound (y):", min = 1, max = 10, value = 7, step = 1)
)
)
),
# Right column - Plot
card(
card_header("Discrete Uniform distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# Ensure b is always greater than or equal to a
observe({
if (input$b < input$a) {
updateNumericInput(session, "b", value = input$a)
}
})
# Update the range of the sliders when a or b changes
observe({
updateSliderInput(session, "x_value", min = input$a, max = input$b, value = min(max(input$a, 5), input$b))
updateSliderInput(session, "x_lower", min = input$a, max = input$b, value = min(max(input$a, 3), input$b))
updateSliderInput(session, "x_upper", min = input$a, max = input$b, value = min(max(input$a, 7), input$b))
})
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("DUnif(a = %d, b = %d)", input$a, input$b)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Discrete uniform probability mass function
ddunif <- function(x, min, max) {
ifelse(x >= min & x <= max & x == round(x), 1/(max - min + 1), 0)
}
# Discrete uniform cumulative distribution function
pdunif <- function(q, min, max) {
ifelse(q < min, 0,
ifelse(q >= max, 1,
(floor(q) - min + 1) / (max - min + 1)))
}
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "less") {
prob <- pdunif(input$x_value, input$a, input$b)
explanation <- sprintf("P(X ≤ %d) = %.4f or %.2f%%",
input$x_value, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_value))
} else if (input$prob_type == "greater") {
# For P(X ≥ x), we need 1 - P(X < x) = 1 - P(X ≤ x-1)
if (input$x_value <= input$a) {
prob <- 1 # P(X ≥ a) is always 1 for discrete uniform
} else {
prob <- 1 - pdunif(input$x_value - 1, input$a, input$b)
}
explanation <- sprintf("P(X ≥ %d) = %.4f or %.2f%%",
input$x_value, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_value))
} else if (input$prob_type == "between") {
if (input$x_lower == input$x_upper) {
# Exact probability for a single value
prob <- ddunif(input$x_lower, input$a, input$b)
} else {
# P(x_lower ≤ X ≤ x_upper) = P(X ≤ x_upper) - P(X < x_lower) = P(X ≤ x_upper) - P(X ≤ x_lower-1)
upper_prob <- pdunif(input$x_upper, input$a, input$b)
if (input$x_lower <= input$a) {
lower_prob <- 0
} else {
lower_prob <- pdunif(input$x_lower - 1, input$a, input$b)
}
prob <- upper_prob - lower_prob
}
explanation <- sprintf("P(%d ≤ X ≤ %d) = %.4f or %.2f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the discrete uniform distribution plot
output$distPlot <- renderPlot({
# Create data frame for plotting
x_values <- input$a:input$b
prob_mass <- rep(1/(input$b - input$a + 1), length(x_values))
df <- data.frame(x = x_values, probability = prob_mass)
# Create base plot
p <- ggplot(df, aes(x = x, y = probability)) +
geom_col(fill = "lightgray", color = "darkgray", alpha = 0.7) +
labs(x = "X", y = "probability mass function") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
scale_x_continuous(breaks = x_values) +
ylim(0, max(prob_mass) * 1.1)
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "less") {
highlight_x <- input$a:res$x
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
} else if (res$type == "greater") {
highlight_x <- res$x:input$b
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
} else if (res$type == "between") {
highlight_x <- res$lower:res$upper
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example: You roll a fair six-sided die, where all outcomes (\(1, 2, 3, 4, 5,\) and \(6\)) are equally likely. This can be expressed as \(X \sim U(1,6)\). It means \(1\) is the minimum value and \(6\) is the maximum value, where all discrete values of \(X\) for \(1 \leq x \leq 6\) are equally likely.
Bernoulli distribution
Where to use: The Bernoulli distribution is used for binary data, where one trial is conducted with only two possible outcomes. Examples include success/failure, yes/no, and heads/tails. \(X\) indicates whether the trial is a success (when \(X=1\)) or failure (when \(X=0\)).
Notation: \(X \sim \textrm{Bernoulli}(p)\)
Parameter: The real number \(p\) is the probability of success in a trial (where \(0 \le p \le 1\)).
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = p\) | |
Variance | \(\mathbb{V}(X) = p(1-p)\) | |
PMF | \(\mathbb{P}(X=x)=\begin{cases} 1-p & \textsf{if }x=0 \\p & \textsf{if }x=1\end{cases}\) | |
CDF | \(\mathbb{P}(X\leq x)= \begin{cases} 0 & \textsf{if } x< 0 \\1-p & \textsf{if } 0\leq x<1 \\p & \textsf{if } x\geq1 \end{cases}\) |
Figure
PUT FIGURE HERE TOM
Example: You flip a coin, and the probability of getting ‘heads’ is \(0.5\). Taking ‘heads’ as a success, this can be expressed as \(X \sim \textrm{Bernoulli}(0.5)\), meaning the probability of success in each trial is \(0.5\).
Binomial distribution
Where to use: The binomial distribution is used when there are a fixed number of trials (\(n\)) and only two possible outcomes for each trial, representing \(n\) many Bernoulli trials. Here, the random variable \(X\) represents the number of successes.
Notation: \(X \sim \textrm{Binomial}(n,p)\) or \(X \sim B(n,p)\).
Parameters: Two numbers \(n,p\) where: - \(n\) is an integer representing the number of trials, - \(p\) is a real number representing the probability of success of a trial (where \(0 \le p \le 1\)).
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = np\) | |
Variance | \(\mathbb{V}(X) = np(1-p)\) | |
PMF | \(\mathbb{P}(X=x)=\dfrac{n!}{(n-x)!x!}p^xq^{(n-x)}\) | |
CDF | \(\mathbb{P}(X\leq x)=I_{q}(n-\lfloor x \rfloor,1+\lfloor x \rfloor)\) | \(I_x(a,b)\) regularized incomplete beta function, \(\lfloor x \rfloor\) the floor function |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 750
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "Binomial distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("n", "Number of trials (n):", value = 10, min = 1, step = 1),
sliderInput("p", "Probability of success (p):", min = 0, max = 1, value = 0.5, step = 0.01),
hr(),
radioButtons("prob_type", "Probability to Calculate:",
choices = list("P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "less"),
conditionalPanel(
condition = "input.prob_type == 'less' || input.prob_type == 'greater'",
sliderInput("x_value", "x value:", min = 0, max = 10, value = 5, step = 1)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = 0, max = 10, value = 3, step = 1),
sliderInput("x_upper", "Upper bound (y):", min = 0, max = 10, value = 7, step = 1)
)
)
),
# Right column - Plot
card(
card_header("Binomial distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# Update the range of the sliders when n changes
observe({
updateSliderInput(session, "x_value", max = input$n)
updateSliderInput(session, "x_lower", max = input$n)
updateSliderInput(session, "x_upper", max = input$n)
})
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("Bin(n = %d, p = %.2f)", input$n, input$p)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "less") {
prob <- pbinom(input$x_value, size = input$n, prob = input$p)
explanation <- sprintf("P(X ≤ %d) = %.4f or %.2f%%",
input$x_value, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_value))
} else if (input$prob_type == "greater") {
# For P(X ≥ x), we need 1 - P(X < x) = 1 - P(X ≤ x-1)
if (input$x_value == 0) {
prob <- 1 # P(X ≥ 0) is always 1
} else {
prob <- 1 - pbinom(input$x_value - 1, size = input$n, prob = input$p)
}
explanation <- sprintf("P(X ≥ %d) = %.4f or %.2f%%",
input$x_value, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_value))
} else if (input$prob_type == "between") {
if (input$x_lower == input$x_upper) {
# Exact probability for a single value
prob <- dbinom(input$x_lower, size = input$n, prob = input$p)
} else {
# P(x_lower ≤ X ≤ x_upper) = P(X ≤ x_upper) - P(X < x_lower) = P(X ≤ x_upper) - P(X ≤ x_lower-1)
upper_prob <- pbinom(input$x_upper, size = input$n, prob = input$p)
if (input$x_lower == 0) {
lower_prob <- 0
} else {
lower_prob <- pbinom(input$x_lower - 1, size = input$n, prob = input$p)
}
prob <- upper_prob - lower_prob
}
explanation <- sprintf("P(%d ≤ X ≤ %d) = %.4f or %.2f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the binomial distribution plot
output$distPlot <- renderPlot({
# Create data frame for plotting
x_values <- 0:input$n
prob_mass <- dbinom(x_values, size = input$n, prob = input$p)
df <- data.frame(x = x_values, probability = prob_mass)
# Create base plot
p <- ggplot(df, aes(x = x, y = probability)) +
geom_col(fill = "lightgray", color = "darkgray", alpha = 0.7) +
labs(x = "number of successes (X)", y = "probability mass function") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
scale_x_continuous(breaks = x_values)
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "less") {
highlight_x <- 0:res$x
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
} else if (res$type == "greater") {
highlight_x <- res$x:input$n
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
} else if (res$type == "between") {
highlight_x <- res$lower:res$upper
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example: You flip a coin \(10\) times, and the probability of getting ‘heads’ is \(0.5\). Taking ‘heads’ as a success, this can be expressed as \(X \sim B(10, 0.5)\), meaning \(10\) trials are conducted, where the probability of success in each trial is \(0.5\).
Multinomial distribution
Where to use: The multinomial distribution is used when there are a fixed number of trials (\(n\)) and more than two possible outcomes for each trial. \(X_{i}\) represents the number of times a specific outcome occurs. Therefore, the mean, variance, and expected value of multinomial distributions are calculated for each \(X_{i}\), not \(X\).
Notation: \(X \sim \textrm{Multinomial}(n,\mathbf{p})\) or \(X \sim M(n,\mathbf{p})\)
Parameters: Three parameters \(n,k,\mathbf{p}\), where
- \(n\) is an integer representing the number of trials
- \(k\) is an integer representing the number of mutually exclusive outcomes
- \(\mathbf{p} = [p_{1},\ldots,p_{k}]\) is a vector with real numbered probabilities \(0\leq p_i\leq 1\) for each outcome (\(1\leq i \leq k\)).
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X_{i}) = np_{i}\) | |
Variance | \(\mathbb{V}(X_{i}) = np_{i}(1-p_{i})\) | |
PMF | \(\mathbb{P}(X_{1}=x_{1},\ldots,X_{k}=x_{k})=\dfrac{n!}{x_{1}!\ldots x_{k}!}p^{x_{1}}\ldots p^{x_{k}}\) |
Example: There is a candy jar consisting of 5 red candies, 3 blue candies, and 7 yellow candies.
The probability of drawing a red candy is \(\dfrac{1}{3}\).
The probability of drawing a blue candy is \(\dfrac{1}{5}\).
The probability of drawing a yellow candy is \(\dfrac{7}{15}\)
You draw 3 candies from the jar, replacing as you go along. This can be expressed as \(X \sim M\left(3,\frac{1}{3},\frac{1}{5},\frac{7}{15}\right)\). It means 3 trials are conducted, where \(p_{1}=\frac{1}{3}\), \(p_{2}=\frac{1}{5}\), and \(p_{3}=\frac{7}{15}\) (and \(k=3\)).
Poisson distribution
Where to use: The Poisson distribution is used when a specific event occurs at some rate \(\lambda\), and you are counting \(X\), the number of times this event occurs in some interval.
Notation: \(X \sim \textrm{Poisson}(\lambda)\) or \(X \sim \textrm{Pois}(\lambda)\).
Parameter: \(\lambda\) is the integer number of times an event occurs within a specific period of time.
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = \lambda\) | |
Variance | \(\mathbb{V}(X) = \lambda\) | |
PMF | \(\mathbb{P}(X=x)=\dfrac{\lambda^xe^{-\lambda}}{x!}\) | |
CDF | \(\displaystyle\mathbb{P}(X\leq x)=\sum^{\lfloor x \rfloor}_{i=1}\frac{\lambda^xe^{-\lambda}}{x!}\) | \(\lfloor x \rfloor\) the floor function |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 670
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "Poisson distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("lambda", "Rate parameter (λ):", value = 5, min = 0.1, step = 0.1),
hr(),
radioButtons("prob_type", "Probability to calculate:",
choices = list("P(X = x)" = "exact",
"P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "exact"),
conditionalPanel(
condition = "input.prob_type == 'exact'",
sliderInput("x_exact", "x value:", min = 0, max = 20, value = 5, step = 1)
),
conditionalPanel(
condition = "input.prob_type == 'less'",
sliderInput("x_less", "x value:", min = 0, max = 20, value = 5, step = 1)
),
conditionalPanel(
condition = "input.prob_type == 'greater'",
sliderInput("x_greater", "x value:", min = 0, max = 20, value = 5, step = 1)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = 0, max = 20, value = 3, step = 1),
sliderInput("x_upper", "Upper bound (y):", min = 0, max = 20, value = 7, step = 1)
)
)
),
# Right column - Plot
card(
card_header("Poisson distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# When lambda changes, adjust the range of sliders
observe({
# Set a reasonable max value as 3*lambda or at least 10
max_x <- max(round(input$lambda * 3), 10)
updateSliderInput(session, "x_exact", max = max_x)
updateSliderInput(session, "x_less", max = max_x)
updateSliderInput(session, "x_greater", max = max_x)
updateSliderInput(session, "x_lower", max = max_x)
updateSliderInput(session, "x_upper", max = max_x)
})
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("Poisson(λ = %.1f)", input$lambda)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "exact") {
prob <- dpois(input$x_exact, lambda = input$lambda)
explanation <- sprintf("P(X = %d) = %.6f or %.4f%%",
input$x_exact, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "exact", x = input$x_exact))
} else if (input$prob_type == "less") {
prob <- ppois(input$x_less, lambda = input$lambda)
explanation <- sprintf("P(X ≤ %d) = %.6f or %.4f%%",
input$x_less, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_less))
} else if (input$prob_type == "greater") {
# For P(X ≥ x), we need 1 - P(X < x) = 1 - P(X ≤ x-1)
if (input$x_greater == 0) {
prob <- 1 # P(X ≥ 0) is always 1
} else {
prob <- 1 - ppois(input$x_greater - 1, lambda = input$lambda)
}
explanation <- sprintf("P(X ≥ %d) = %.6f or %.4f%%",
input$x_greater, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_greater))
} else if (input$prob_type == "between") {
if (input$x_lower == input$x_upper) {
# Exact probability for a single value
prob <- dpois(input$x_lower, lambda = input$lambda)
} else {
# P(x_lower ≤ X ≤ x_upper) = P(X ≤ x_upper) - P(X < x_lower) = P(X ≤ x_upper) - P(X ≤ x_lower-1)
upper_prob <- ppois(input$x_upper, lambda = input$lambda)
if (input$x_lower == 0) {
lower_prob <- 0
} else {
lower_prob <- ppois(input$x_lower - 1, lambda = input$lambda)
}
prob <- upper_prob - lower_prob
}
explanation <- sprintf("P(%d ≤ X ≤ %d) = %.6f or %.4f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the Poisson distribution plot
output$distPlot <- renderPlot({
# Determine the range for the x-axis
lambda <- input$lambda
max_x <- max(round(lambda * 3), 10)
# Create data frame for plotting
x_values <- 0:max_x
prob_mass <- dpois(x_values, lambda = lambda)
df <- data.frame(x = x_values, probability = prob_mass)
# Create base plot
p <- ggplot(df, aes(x = x, y = probability)) +
geom_col(fill = "lightgray", color = "darkgray", alpha = 0.7) +
labs(x = "number of events (X)", y = "probability mass function") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
scale_x_continuous(breaks = seq(0, max_x, by = ifelse(max_x > 20, 2, 1)))
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "exact") {
highlight_x <- res$x
highlight_df <- df[df$x == highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
} else if (res$type == "less") {
highlight_x <- 0:res$x
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
} else if (res$type == "greater") {
highlight_x <- res$x:max_x
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
} else if (res$type == "between") {
highlight_x <- res$lower:res$upper
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example: Customers enter Cantor’s Confectionery at an average rate of 20 people per hour, and you want to see the likelihood that \(X\) number of customers walks in. This can be expressed as \(X \sim \textrm{Pois}(20)\).
Negative binomial distribution
Where to use: The negative binomial distribution is often used to handle over-dispersed data, which means the variance exceeds the mean. It can serve as an alternative to the Poisson distribution, as the Poisson distribution assumes that the mean is equal to the variance. \(X\) represents the number of trials required to reach the targeted number of successes \(r\).
Notation: \(X \sim \textrm{NB}(r,p)\)
Parameters: Two numbers \(r,p\) where:
- \(r\) is an integer representing the targeted number of successes,
- \(p\) is a real number representing the probability of success in a single trial (where \(0 \le p \le 1\)).
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = \dfrac{r(1-p)}{p}\) | |
Variance | \(\mathbb{V}(X) = \dfrac{r(1-p)}{p^2}\) | |
PMF | \(\mathbb{P}(X=x)=\dfrac{(x+r-1)!}{(r-1)!x!}(1-p)^xp^r\) | |
CDF | \(\displaystyle\mathbb{P}(X\leq x)=\sum^{x}_{i=1}\frac{(x+r-1)!}{(r-1)!x!}(1-p)^xp^r\) |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 850
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "Negative binomial distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("r", "Number of successes (r):", value = 5, min = 1, step = 1),
sliderInput("p", "Probability of success (p):", min = 0.01, max = 0.99, value = 0.3, step = 0.01),
numericInput("max_x", "Maximum x to display:", value = 20, min = 10, step = 5),
hr(),
radioButtons("prob_type", "Probability to calculate:",
choices = list("P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "less"),
conditionalPanel(
condition = "input.prob_type == 'less' || input.prob_type == 'greater'",
sliderInput("x_value", "x value:", min = 0, max = 20, value = 8, step = 1)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = 0, max = 20, value = 5, step = 1),
sliderInput("x_upper", "Upper bound (y):", min = 0, max = 20, value = 12, step = 1)
)
)
),
# Right column - Plot
card(
card_header("Negative binomial distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# Update the range of the sliders when max_x changes
observe({
updateSliderInput(session, "x_value", max = input$max_x)
updateSliderInput(session, "x_lower", max = input$max_x)
updateSliderInput(session, "x_upper", max = input$max_x)
})
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("NegBin(r = %d, p = %.2f)", input$r, input$p)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "less") {
prob <- pnbinom(input$x_value, size = input$r, prob = input$p)
explanation <- sprintf("P(X ≤ %d) = %.4f or %.2f%%",
input$x_value, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_value))
} else if (input$prob_type == "greater") {
# For P(X ≥ x), we need 1 - P(X < x) = 1 - P(X ≤ x-1)
if (input$x_value == 0) {
prob <- 1 # P(X ≥ 0) is always 1
} else {
prob <- 1 - pnbinom(input$x_value - 1, size = input$r, prob = input$p)
}
explanation <- sprintf("P(X ≥ %d) = %.4f or %.2f%%",
input$x_value, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_value))
} else if (input$prob_type == "between") {
if (input$x_lower == input$x_upper) {
# Exact probability for a single value
prob <- dnbinom(input$x_lower, size = input$r, prob = input$p)
} else {
# P(x_lower ≤ X ≤ x_upper) = P(X ≤ x_upper) - P(X < x_lower) = P(X ≤ x_upper) - P(X ≤ x_lower-1)
upper_prob <- pnbinom(input$x_upper, size = input$r, prob = input$p)
if (input$x_lower == 0) {
lower_prob <- 0
} else {
lower_prob <- pnbinom(input$x_lower - 1, size = input$r, prob = input$p)
}
prob <- upper_prob - lower_prob
}
explanation <- sprintf("P(%d ≤ X ≤ %d) = %.4f or %.2f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the negative binomial distribution plot
output$distPlot <- renderPlot({
# Create data frame for plotting
x_values <- 0:input$max_x
prob_mass <- dnbinom(x_values, size = input$r, prob = input$p)
df <- data.frame(x = x_values, probability = prob_mass)
# Create base plot
p <- ggplot(df, aes(x = x, y = probability)) +
geom_col(fill = "lightgray", color = "darkgray", alpha = 0.7) +
labs(x = "number of failures (X)", y = "probability mass function") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
scale_x_continuous(breaks = pretty(x_values, n = 10))
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "less") {
highlight_x <- 0:min(res$x, input$max_x)
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
} else if (res$type == "greater") {
highlight_x <- res$x:input$max_x
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
} else if (res$type == "between") {
highlight_x <- res$lower:min(res$upper, input$max_x)
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example: You flip a coin multiple times, and the probability of getting ‘heads’ is \(0.5\). You decide to stop flipping the coin once you get \(3\) ‘heads’; these do not have to be consecutive. Taking ‘heads’ as a success, this can be expressed as \(X \sim \textrm{NB}(3,0.5)\). It means the probability of success is \(0.5\), and you will stop conducting trials after you reach \(3\) successes.
Geometric distribution
Where to use: The geometric distribution is used to count \(X\), the number of Bernoulli trials until a successful outcome is reached.
Notation: \(X \sim \textrm{Geometric}(p)\)
Parameter: \(p\) is the real number representing the probability of success in a single trial (where \(0 \le p \le 1\)).
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = \dfrac{1}{p}\) | |
Variance | \(\mathbb{V}(X) = \dfrac{1-p}{p^2}\) | |
PMF | \(\mathbb{P}(X=x)=(1-p)^{k-1}p\) | |
CDF | \(\displaystyle\mathbb{P}(X\leq x)=\begin{cases} 1-(1-p)^x & \textsf{if } x\geq0 \\0 & \textsf{if } x<0\end{cases}\) |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 680
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "Geometric distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
sliderInput("p", "Probability of success (p):", min = 0.01, max = 1, value = 0.3, step = 0.01),
hr(),
radioButtons("prob_type", "Probability to calculate:",
choices = list("P(X = x)" = "exact",
"P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "exact"),
conditionalPanel(
condition = "input.prob_type == 'exact'",
sliderInput("x_exact", "x value:", min = 0, max = 20, value = 3, step = 1)
),
conditionalPanel(
condition = "input.prob_type == 'less'",
sliderInput("x_less", "x value:", min = 0, max = 20, value = 3, step = 1)
),
conditionalPanel(
condition = "input.prob_type == 'greater'",
sliderInput("x_greater", "x value:", min = 0, max = 20, value = 3, step = 1)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = 0, max = 20, value = 2, step = 1),
sliderInput("x_upper", "Upper bound (y):", min = 0, max = 20, value = 5, step = 1)
)
)
),
# Right column - Plot
card(
card_header("Geometric distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results (removed the hr and helpText)
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# When p changes, adjust the range of sliders
observe({
# For geometric distribution with parameter p, the mean is (1-p)/p
# Set a reasonable max value based on this
max_x <- max(round(3 * (1 - input$p) / input$p), 20)
updateSliderInput(session, "x_exact", max = max_x)
updateSliderInput(session, "x_less", max = max_x)
updateSliderInput(session, "x_greater", max = max_x)
updateSliderInput(session, "x_lower", max = max_x)
updateSliderInput(session, "x_upper", max = max_x)
})
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("Geometric(p = %.2f)", input$p)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "exact") {
prob <- dgeom(input$x_exact, prob = input$p)
explanation <- sprintf("P(X = %d) = %.6f or %.4f%%",
input$x_exact, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "exact", x = input$x_exact))
} else if (input$prob_type == "less") {
prob <- pgeom(input$x_less, prob = input$p)
explanation <- sprintf("P(X ≤ %d) = %.6f or %.4f%%",
input$x_less, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_less))
} else if (input$prob_type == "greater") {
# For P(X ≥ x), we need 1 - P(X < x) = 1 - P(X ≤ x-1)
if (input$x_greater == 0) {
prob <- 1 # P(X ≥ 0) is always 1
} else {
prob <- 1 - pgeom(input$x_greater - 1, prob = input$p)
}
explanation <- sprintf("P(X ≥ %d) = %.6f or %.4f%%",
input$x_greater, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_greater))
} else if (input$prob_type == "between") {
if (input$x_lower == input$x_upper) {
# Exact probability for a single value
prob <- dgeom(input$x_lower, prob = input$p)
} else {
# P(x_lower ≤ X ≤ x_upper) = P(X ≤ x_upper) - P(X < x_lower) = P(X ≤ x_upper) - P(X ≤ x_lower-1)
upper_prob <- pgeom(input$x_upper, prob = input$p)
if (input$x_lower == 0) {
lower_prob <- 0
} else {
lower_prob <- pgeom(input$x_lower - 1, prob = input$p)
}
prob <- upper_prob - lower_prob
}
explanation <- sprintf("P(%d ≤ X ≤ %d) = %.6f or %.4f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the Geometric distribution plot
output$distPlot <- renderPlot({
# Determine the range for the x-axis
p <- input$p
max_x <- max(round(3 * (1 - p) / p), 20)
# Create data frame for plotting
x_values <- 0:max_x
prob_mass <- dgeom(x_values, prob = p)
df <- data.frame(x = x_values, probability = prob_mass)
# Create base plot
p <- ggplot(df, aes(x = x, y = probability)) +
geom_col(fill = "lightgray", color = "darkgray", alpha = 0.7) +
labs(x = "number of failures before first success (X)", y = "probability mass function") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
scale_x_continuous(breaks = seq(0, max_x, by = ifelse(max_x > 20, 2, 1)))
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "exact") {
highlight_x <- res$x
highlight_df <- df[df$x == highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
} else if (res$type == "less") {
highlight_x <- 0:res$x
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
} else if (res$type == "greater") {
highlight_x <- res$x:max_x
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
} else if (res$type == "between") {
highlight_x <- res$lower:res$upper
highlight_df <- df[df$x %in% highlight_x, ]
p <- p + geom_col(data = highlight_df, aes(x = x, y = probability),
fill = "#3F6BB6", color = "darkgray", alpha = 0.8)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example: You flip a coin multiple times, and the probability of getting ‘heads’ is \(0.5\). You decide to stop flipping the coin once you get a ‘heads’. Taking ‘heads’ as a success, this can be expressed as \(X \sim \textrm{Geometric}(0.5)\). It means the probability of success is \(0.5\), and you will stop conducting trials after you reach a success.
Continuous random variables
Uniform distribution (continuous)
Where to use: The continuous uniform distribution is used when all continuous values \(x\) in the interval \(a\) to \(b\) are equally likely. The random variable \(X\) represents the outcome.
Notation: \(X \sim \textrm{Uniform}(a,b)\) or \(X \sim U(a,b)\).
Parameters: Two real numbers \(a,b\), where
\(a\) is the minimum value of an outcome,
\(b\) is the maximum value of an outcome.
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = \dfrac{a+b}{2}\) | |
Variance | \(\mathbb{V}(X) = \dfrac{(b-a)^2}{12}\) | |
\(\mathbb{P}(X=x)=\begin{cases} \dfrac{1}{b-a} & \textsf{if } a \leq x \leq b \\0 & \textsf{otherwise}\end{cases}\) | ||
CDF | \(\displaystyle\mathbb{P}(X\leq x)=\begin{cases} 0 & \textsf{if } x< a \\\dfrac{x-a}{b-a} & \textsf{if } a\leq x\leq b \\1 & \textsf{if } x>b \end{cases}\) |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 750
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "Continuous uniform distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("a", "Minimum value (a):", value = 0, step = 0.1),
numericInput("b", "Maximum value (b):", value = 10, step = 0.1),
hr(),
radioButtons("prob_type", "Probability to Calculate:",
choices = list("P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "less"),
conditionalPanel(
condition = "input.prob_type == 'less' || input.prob_type == 'greater'",
sliderInput("x_value", "x value:", min = 0, max = 10, value = 5, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = 0, max = 10, value = 3, step = 0.1),
sliderInput("x_upper", "Upper bound (y):", min = 0, max = 10, value = 7, step = 0.1)
)
)
),
# Right column - Plot
card(
card_header("Continuous uniform distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# Ensure b is always greater than a
observe({
if (input$b <= input$a) {
updateNumericInput(session, "b", value = input$a + 1)
}
})
# Update the range of the sliders when a or b changes
observe({
updateSliderInput(session, "x_value", min = input$a, max = input$b, value = min(max(input$a, (input$a + input$b)/2), input$b))
updateSliderInput(session, "x_lower", min = input$a, max = input$b, value = min(max(input$a, input$a + (input$b - input$a)/3), input$b))
updateSliderInput(session, "x_upper", min = input$a, max = input$b, value = min(max(input$a, input$a + 2*(input$b - input$a)/3), input$b))
})
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("Unif(a = %.1f, b = %.1f)", input$a, input$b)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "less") {
prob <- punif(input$x_value, min = input$a, max = input$b)
explanation <- sprintf("P(X ≤ %.2f) = %.4f or %.2f%%",
input$x_value, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_value))
} else if (input$prob_type == "greater") {
prob <- 1 - punif(input$x_value, min = input$a, max = input$b)
explanation <- sprintf("P(X ≥ %.2f) = %.4f or %.2f%%",
input$x_value, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_value))
} else if (input$prob_type == "between") {
upper_prob <- punif(input$x_upper, min = input$a, max = input$b)
lower_prob <- punif(input$x_lower, min = input$a, max = input$b)
prob <- upper_prob - lower_prob
explanation <- sprintf("P(%.2f ≤ X ≤ %.2f) = %.4f or %.2f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the continuous uniform distribution plot
output$distPlot <- renderPlot({
# Create data frame for plotting the PDF
x_range <- seq(input$a - 0.5 * (input$b - input$a),
input$b + 0.5 * (input$b - input$a),
length.out = 1000)
pdf_values <- dunif(x_range, min = input$a, max = input$b)
df <- data.frame(x = x_range, density = pdf_values)
# Create base plot
p <- ggplot(df, aes(x = x, y = density)) +
geom_line(color = "darkgray", size = 1.2) +
labs(x = "X", y = "probability density function") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
ylim(0, max(pdf_values) * 1.1)
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "less") {
shade_x <- seq(input$a, res$x, length.out = 100)
shade_y <- dunif(shade_x, min = input$a, max = input$b)
shade_df <- data.frame(x = c(input$a, shade_x, res$x),
y = c(0, shade_y, 0))
p <- p + geom_polygon(data = shade_df, aes(x = x, y = y),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "greater") {
shade_x <- seq(res$x, input$b, length.out = 100)
shade_y <- dunif(shade_x, min = input$a, max = input$b)
shade_df <- data.frame(x = c(res$x, shade_x, input$b),
y = c(0, shade_y, 0))
p <- p + geom_polygon(data = shade_df, aes(x = x, y = y),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "between") {
shade_x <- seq(res$lower, res$upper, length.out = 100)
shade_y <- dunif(shade_x, min = input$a, max = input$b)
shade_df <- data.frame(x = c(res$lower, shade_x, res$upper),
y = c(0, shade_y, 0))
p <- p + geom_polygon(data = shade_df, aes(x = x, y = y),
fill = "#3F6BB6", alpha = 0.6)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example:
A machine from Cantor’s Confectionery is programmed to chop long candy bars into pieces, each with a length between 30 millimetres to 50 millimetres. Due to variations in the machine, each continuous value between this interval is equally likely. This can be expressed as \(X \sim U(30,50)\). It means 30 is the minimum value and 50 is the maximum value, where all continuous values of \(X\) for \(30 \leq x \leq 50\) are equally likely.
Normal distribution
Where to use: The normal distribution is used to model continuous random variables, which can include any positive or negative real values. The use of this distribution is often justified by the Central Limit Theorem: as the sample size increases, the distribution of sample means will resemble a normal distribution more and more closely.
Notation: \(X \sim \textrm{Normal}(\mu,\sigma^2)\) or \(X \sim N(\mu,\sigma^2)\)
Parameters: Two real numbers \(\mu\) and \(\sigma^2\).
- \(\mu\) is the centre of the distribution (the mean/expected value).
- \(\sigma^2\) is the measure of how the distribution is spread (the variance).
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = \mu\) | |
Variance | \(\mathbb{V}(X) = \sigma^2\) | |
\(\mathbb{P}(X=x)=\dfrac{1}{\sqrt{2\pi\sigma^2}}\exp\left({-\dfrac{(x-\mu)^2}{2\sigma^2}}\right)\) | \(\exp(y) = e^y\) | |
CDF | \(\displaystyle\mathbb{P}(X\leq x)=\dfrac{1}{2}\left[1+\textrm{erf}\left(\dfrac{x-\mu}{\sigma\sqrt{2}}\right)\right]\) | \(\textrm{erf}(x)\) is the error function of \(x\) |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 700
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "Normal distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("mean", "Mean (μ):", value = 0),
numericInput("sd", "Standard deviation (σ):", value = 1, min = 0.01),
hr(),
radioButtons("prob_type", "Probability to calculate:",
choices = list("P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "less"),
conditionalPanel(
condition = "input.prob_type == 'less' || input.prob_type == 'greater'",
numericInput("x_value", "x value:", value = 0)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
numericInput("x_lower", "Lower bound (x):", value = -1),
numericInput("x_upper", "Upper bound (y):", value = 1)
)
)
),
# Right column - Plot
card(
card_header("Normal distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
# Removed the LaTeX formula display
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("N(μ = %.2f, σ = %.2f)", input$mean, input$sd)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "less") {
prob <- pnorm(input$x_value, mean = input$mean, sd = input$sd)
explanation <- sprintf("P(X ≤ %.2f) = %.4f or %.2f%%",
input$x_value, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_value))
} else if (input$prob_type == "greater") {
prob <- 1 - pnorm(input$x_value, mean = input$mean, sd = input$sd)
explanation <- sprintf("P(X ≥ %.2f) = %.4f or %.2f%%",
input$x_value, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_value))
} else if (input$prob_type == "between") {
lower_prob <- pnorm(input$x_lower, mean = input$mean, sd = input$sd)
upper_prob <- pnorm(input$x_upper, mean = input$mean, sd = input$sd)
prob <- upper_prob - lower_prob
explanation <- sprintf("P(%.2f ≤ X ≤ %.2f) = %.4f or %.2f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the normal distribution plot
output$distPlot <- renderPlot({
# Calculate range for x-axis (covering 99.7% of the distribution)
x_min <- input$mean - 3.5 * input$sd
x_max <- input$mean + 3.5 * input$sd
# Create data frame for plotting
x <- seq(x_min, x_max, length.out = 500)
y <- dnorm(x, mean = input$mean, sd = input$sd)
df <- data.frame(x = x, y = y)
# Create base plot
p <- ggplot(df, aes(x = x, y = y)) +
geom_line() +
labs(x = "X", y = "Density") +
theme_minimal() +
theme(panel.grid.minor = element_blank())
# Add bold line at X = 0
p <- p + geom_vline(xintercept = 0, linetype = "solid", color = "black", linewidth = 0.8)
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "less") {
shade_x <- seq(x_min, res$x, length.out = 200)
shade_y <- dnorm(shade_x, mean = input$mean, sd = input$sd)
shade_df <- data.frame(x = shade_x, y = shade_y)
p <- p + geom_area(data = shade_df, aes(x = x, y = y), fill = "#3F6BB6", alpha = 0.6) +
geom_vline(xintercept = res$x, linetype = "dashed", color = "#db4315")
} else if (res$type == "greater") {
shade_x <- seq(res$x, x_max, length.out = 200)
shade_y <- dnorm(shade_x, mean = input$mean, sd = input$sd)
shade_df <- data.frame(x = shade_x, y = shade_y)
p <- p + geom_area(data = shade_df, aes(x = x, y = y), fill = "#3F6BB6", alpha = 0.6) +
geom_vline(xintercept = res$x, linetype = "dashed", color = "#db4315")
} else if (res$type == "between") {
shade_x <- seq(res$lower, res$upper, length.out = 200)
shade_y <- dnorm(shade_x, mean = input$mean, sd = input$sd)
shade_df <- data.frame(x = shade_x, y = shade_y)
p <- p + geom_area(data = shade_df, aes(x = x, y = y), fill = "#3F6BB6", alpha = 0.6) +
geom_vline(xintercept = res$lower, linetype = "dashed", color = "#db4315") +
geom_vline(xintercept = res$upper, linetype = "dashed", color = "#db4315")
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example: The lengths of chocolate bars produced by Cantor’s Confectionery follow a normal distribution with a mean of \(5.6\) inches and a variance of \(1.44\). This can be expressed as \(X \sim N(5.6, 1.44)\), meaning the data is normally distributed, centered at \(5.6\) with standard deviation \(\sqrt{1.44} = 1.2\).
Lognormal distribution
Where to use: The lognormal distribution is used to model continuous random variables with values that are both real and non-negative, wherein the logarithms of these variables follow a normal distribution. That is to say, if the random variable \(X\) is lognormally distributed, then the random variable \(Y = \ln(X)\) is normally distributed (where \(\ln\) is the natural logarithm).
Notation: \(X \sim \textrm{Lognormal}(\mu,\sigma^2)\)
Parameters: As with the normal distribution, two numbers \(\mu\) and \(\sigma^2\) where:
- \(\mu\) is the expected value of the normally distributed random variable \(Y = \ln(X)\),
- \(\sigma^2\) is the variance of the normally distributed random variable \(Y = \ln(X)\).
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = \exp(\mu+\frac{\sigma^2}{2})\) | \(\exp(y) = e^y\) |
Variance | \(\mathbb{V}(X) = [\exp(\sigma^2)-1]\exp(2\mu+\sigma^2)\) | \(\exp(y) = e^y\) |
\(\mathbb{P}(X=x)=\dfrac{1}{x\sigma\sqrt{2\pi}}\exp\left(-\dfrac{(\ln(x)-\mu)^2}{2\sigma^2}\right)\) | \(\exp(y) = e^y\) | |
CDF | \(\displaystyle\mathbb{P}(X\leq x)=\dfrac{1}{2}\left[1+\textrm{erf}\left(\dfrac{\ln(x)-\mu}{\sigma\sqrt{2}}\right)\right]\) | \(\textrm{erf}(x)\) is the error function of \(x\) |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 740
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "Lognormal distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("meanlog", "Log mean (μ):", value = 0, step = 0.1),
numericInput("sdlog", "Log standard deviation (σ):", value = 1, min = 0.01, step = 0.1),
hr(),
radioButtons("prob_type", "Probability to calculate:",
choices = list("P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "less"),
conditionalPanel(
condition = "input.prob_type == 'less'",
sliderInput("x_less", "x value:", min = 0, max = 10, value = 1, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'greater'",
sliderInput("x_greater", "x value:", min = 0, max = 10, value = 1, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = 0, max = 10, value = 0.5, step = 0.1),
sliderInput("x_upper", "Upper bound (y):", min = 0, max = 10, value = 2, step = 0.1)
)
)
),
# Right column - Plot
card(
card_header("Lognormal distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# When parameters change, adjust the range of sliders
observe({
# For lognormal distribution, adjust slider based on parameters
meanlog <- input$meanlog
sdlog <- input$sdlog
# Use a heuristic to determine a reasonable upper bound
# This captures most of the meaningful density
max_x <- min(qlnorm(0.995, meanlog, sdlog), 100)
updateSliderInput(session, "x_less", max = max_x)
updateSliderInput(session, "x_greater", max = max_x)
updateSliderInput(session, "x_lower", max = max_x)
updateSliderInput(session, "x_upper", max = max_x)
})
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("Lognormal(μ = %.2f, σ = %.2f)", input$meanlog, input$sdlog)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "less") {
prob <- plnorm(input$x_less, meanlog = input$meanlog, sdlog = input$sdlog)
explanation <- sprintf("P(X ≤ %.1f) = %.6f or %.4f%%",
input$x_less, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_less))
} else if (input$prob_type == "greater") {
prob <- 1 - plnorm(input$x_greater, meanlog = input$meanlog, sdlog = input$sdlog)
explanation <- sprintf("P(X ≥ %.1f) = %.6f or %.4f%%",
input$x_greater, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_greater))
} else if (input$prob_type == "between") {
if (input$x_lower == input$x_upper) {
# For continuous distributions, P(X = a) = 0
prob <- 0
} else {
upper_prob <- plnorm(input$x_upper, meanlog = input$meanlog, sdlog = input$sdlog)
lower_prob <- plnorm(input$x_lower, meanlog = input$meanlog, sdlog = input$sdlog)
prob <- upper_prob - lower_prob
}
explanation <- sprintf("P(%.1f ≤ X ≤ %.1f) = %.6f or %.4f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the lognormal distribution plot
output$distPlot <- renderPlot({
# Get parameters
meanlog <- input$meanlog
sdlog <- input$sdlog
# Determine a reasonable max for x-axis based on parameters
max_x <- min(qlnorm(0.995, meanlog, sdlog), 100)
# Create data frame for plotting
x_values <- seq(0.01, max_x, length.out = 500) # Avoid x=0
density_values <- dlnorm(x_values, meanlog = meanlog, sdlog = sdlog)
plot_df <- data.frame(x = x_values, density = density_values)
# Create base plot
p <- ggplot(plot_df, aes(x = x, y = density)) +
geom_line(size = 1, color = "darkgray") +
labs(x = "X", y = "probability density function") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
xlim(0, max_x)
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "less") {
# Create data for the filled area
fill_x <- seq(0.01, res$x, length.out = 200)
fill_y <- dlnorm(fill_x, meanlog = meanlog, sdlog = sdlog)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "greater") {
# Create data for the filled area
fill_x <- seq(res$x, max_x, length.out = 200)
fill_y <- dlnorm(fill_x, meanlog = meanlog, sdlog = sdlog)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "between") {
# Create data for the filled area
fill_x <- seq(res$lower, res$upper, length.out = 200)
fill_y <- dlnorm(fill_x, meanlog = meanlog, sdlog = sdlog)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example: The logarithms of Cantor’s Confectionery’s stock prices follow a normal distribution. The mean of the stock prices’ natural logarithms is \(8.01\), whereas the variance of the stock prices’ natural logarithms is \(3\). This can be expressed as \(X \sim \textrm{Lognormal}(8.01, 3)\), meaning the logarithm of the location parameter is \(8.01\) and the logarithm of scale parameter is \(3\).
Exponential distribution
Where to use: The exponential distribution is used when \(X\) is the waiting time before a certain event occurs. It is similar to the geometric distribution, but the exponential distribution uses a continuous waiting time instead of the integer number of trials.
Notation: \(X \sim \textrm{Exponential}(\lambda)\) or \(X \sim \textrm{Exp}(\lambda)\)
Parameter: An integer \(\lambda\), representing number of times an event occurs within a specific period of time.
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = \frac{1}{\lambda}\) | |
Variance | \(\mathbb{V}(X) = \frac{1}{\lambda^2}\) | |
\(\mathbb{P}(X=x)=\lambda e^{-\lambda x}\) | ||
CDF | \(\mathbb{P}(X \leq x)=1-e^{-\lambda x}\) |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 640
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "Exponential distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("rate", "Rate parameter (λ):", value = 0.5, min = 0.01, step = 0.01),
# Removed the helpText about mean and variance
hr(),
radioButtons("prob_type", "Probability to calculate:",
choices = list("P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "less"),
conditionalPanel(
condition = "input.prob_type == 'less'",
sliderInput("x_less", "x value:", min = 0, max = 10, value = 2, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'greater'",
sliderInput("x_greater", "x value:", min = 0, max = 10, value = 2, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = 0, max = 10, value = 1, step = 0.1),
sliderInput("x_upper", "Upper bound (y):", min = 0, max = 10, value = 3, step = 0.1)
)
)
),
# Right column - Plot
card(
card_header("Exponential distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# When rate changes, adjust the range of sliders
observe({
# For exponential distribution with parameter rate, the mean is 1/rate
# Set a reasonable max value based on this
max_x <- max(round(5 / input$rate, 1), 10)
updateSliderInput(session, "x_less", max = max_x)
updateSliderInput(session, "x_greater", max = max_x)
updateSliderInput(session, "x_lower", max = max_x)
updateSliderInput(session, "x_upper", max = max_x)
})
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("Exponential(λ = %.2f)", input$rate)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "less") {
prob <- pexp(input$x_less, rate = input$rate)
explanation <- sprintf("P(X ≤ %.1f) = %.6f or %.4f%%",
input$x_less, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_less))
} else if (input$prob_type == "greater") {
prob <- 1 - pexp(input$x_greater, rate = input$rate)
explanation <- sprintf("P(X ≥ %.1f) = %.6f or %.4f%%",
input$x_greater, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_greater))
} else if (input$prob_type == "between") {
if (input$x_lower == input$x_upper) {
# For continuous distributions, P(X = a) = 0
prob <- 0
} else {
upper_prob <- pexp(input$x_upper, rate = input$rate)
lower_prob <- pexp(input$x_lower, rate = input$rate)
prob <- upper_prob - lower_prob
}
explanation <- sprintf("P(%.1f ≤ X ≤ %.1f) = %.6f or %.4f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the Exponential distribution plot
output$distPlot <- renderPlot({
# Determine the range for the x-axis
rate <- input$rate
max_x <- max(round(5 / rate, 1), 10)
# Create data frame for plotting
x_values <- seq(0, max_x, length.out = 500)
density_values <- dexp(x_values, rate = rate)
df <- data.frame(x = x_values, density = density_values)
# Create base plot
p <- ggplot(df, aes(x = x, y = density)) +
geom_line(size = 1, color = "darkgray") +
labs(x = "Time (X)", y = "probability density function") +
theme_minimal() +
theme(panel.grid.minor = element_blank())
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "less") {
# Create data for the filled area
fill_x <- seq(0, res$x, length.out = 200)
fill_y <- dexp(fill_x, rate = rate)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "greater") {
# Create data for the filled area
fill_x <- seq(res$x, max_x, length.out = 200)
fill_y <- dexp(fill_x, rate = rate)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "between") {
# Create data for the filled area
fill_x <- seq(res$lower, res$upper, length.out = 200)
fill_y <- dexp(fill_x, rate = rate)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example: Customers enter Cantor’s Confectionery at an average rate of 20 people per hour, and the time distance between each visit can be modelled by an exponential distribution. This can be expressed as \(X \sim \textrm{Exp}(20)\).
Gamma distribution
Where to use: The gamma distribution generalizes the exponential distribution, allowing for greater or lesser variance. It is used to model positive continuous random variables that have skewed distributions.
Notation: \(X \sim \textrm{Gamma}(\alpha,\theta)\) or \(X \sim \textrm{Gam}(\alpha,\theta)\)
Parameters: Two real numbers \(\alpha\) and \(\theta\), which are related to the mean \(\mu\) and variance \(\sigma^2\):
- \(\alpha = \frac{\mu^2}{\sigma^2}\) (shape parameter)
- \(\theta=\frac{\sigma^2}{\mu}\) (scale parameter)
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = \alpha\theta\) | |
Variance | \(\mathbb{V}(X) = \alpha\theta^2\) | |
\(\mathbb{P}(X=x)=\dfrac{x^{\alpha-1}\exp\left(-\frac{x}{\theta}\right)}{\Gamma(\alpha)\theta^{\alpha}}\) | \(\Gamma(x)\) the gamma function of \(x\) | |
CDF | \(\mathbb{P}(X \leq x)=\dfrac{\textrm{Gam}\left(\alpha,\frac{x}{\theta}\right)}{\Gamma(\alpha)}\) | \(\textrm{Gam}(\alpha,\theta)\) is the PDF of the gamma distribution |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 720
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "Gamma distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("shape", "Shape parameter (α):", value = 2, min = 0.01, step = 0.1),
numericInput("scale", "Scale parameter (θ):", value = 1, min = 0.01, step = 0.1),
hr(),
radioButtons("prob_type", "Probability to Calculate:",
choices = list("P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "less"),
conditionalPanel(
condition = "input.prob_type == 'less'",
sliderInput("x_less", "x value:", min = 0, max = 15, value = 2, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'greater'",
sliderInput("x_greater", "x value:", min = 0, max = 15, value = 2, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = 0, max = 15, value = 1, step = 0.1),
sliderInput("x_upper", "Upper bound (y):", min = 0, max = 15, value = 5, step = 0.1)
)
)
),
# Right column - Plot
card(
card_header("Gamma distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# When parameters change, adjust the range of sliders
observe({
# For gamma distribution, adjust slider based on parameters
shape <- input$shape
scale <- input$scale
# Use a heuristic to determine a reasonable upper bound
# This captures most of the meaningful density
max_x <- min(qgamma(0.995, shape = shape, scale = scale), 100)
updateSliderInput(session, "x_less", max = max_x)
updateSliderInput(session, "x_greater", max = max_x)
updateSliderInput(session, "x_lower", max = max_x)
updateSliderInput(session, "x_upper", max = max_x)
})
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("Gamma(α = %.2f, θ = %.2f)", input$shape, input$scale)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "less") {
prob <- pgamma(input$x_less, shape = input$shape, scale = input$scale)
explanation <- sprintf("P(X ≤ %.1f) = %.6f or %.4f%%",
input$x_less, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_less))
} else if (input$prob_type == "greater") {
prob <- 1 - pgamma(input$x_greater, shape = input$shape, scale = input$scale)
explanation <- sprintf("P(X ≥ %.1f) = %.6f or %.4f%%",
input$x_greater, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_greater))
} else if (input$prob_type == "between") {
if (input$x_lower == input$x_upper) {
# For continuous distributions, P(X = a) = 0
prob <- 0
} else {
upper_prob <- pgamma(input$x_upper, shape = input$shape, scale = input$scale)
lower_prob <- pgamma(input$x_lower, shape = input$shape, scale = input$scale)
prob <- upper_prob - lower_prob
}
explanation <- sprintf("P(%.1f ≤ X ≤ %.1f) = %.6f or %.4f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the gamma distribution plot
output$distPlot <- renderPlot({
# Get parameters
shape <- input$shape
scale <- input$scale
# Determine a reasonable max for x-axis based on parameters
max_x <- min(qgamma(0.995, shape = shape, scale = scale), 100)
# Create data frame for plotting
x_values <- seq(0.01, max_x, length.out = 500) # Avoid x=0 for some shape values
density_values <- dgamma(x_values, shape = shape, scale = scale)
plot_df <- data.frame(x = x_values, density = density_values)
# Create base plot
p <- ggplot(plot_df, aes(x = x, y = density)) +
geom_line(size = 1, color = "darkgray") +
labs(x = "X", y = "Probability Density Function") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
xlim(0, max_x)
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "less") {
# Create data for the filled area
fill_x <- seq(0.01, res$x, length.out = 200)
fill_y <- dgamma(fill_x, shape = shape, scale = scale)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "greater") {
# Create data for the filled area
fill_x <- seq(res$x, max_x, length.out = 200)
fill_y <- dgamma(fill_x, shape = shape, scale = scale)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "between") {
# Create data for the filled area
fill_x <- seq(res$lower, res$upper, length.out = 200)
fill_y <- dgamma(fill_x, shape = shape, scale = scale)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example: You collect historical data on the time to failure of a machine from Cantor’s Confectionery. The mean is 83 days and the variance is 50.3. You can then use this to estimate the shape and scale parameters of the gamma distribution:
\(\alpha = \frac{83^2}{50.3} = 136.958250497 \approx 137\)
\(\theta = \frac{50.3}{83} = 0.60602409638 \approx 0.61\)
The distribution can be expressed as \(X \sim \textrm{Gam}(137,0.61)\), where the shape parameter is 137 and the scale parameter is 0.61.
Beta distribution
Where to use: The beta distribution is used to model the distribution of probabilities or proportions. Hence, the random variable \(0 \leq X \leq 1\).
Notation: \(X \sim \textrm{Beta}(\alpha,\beta)\)
Parameters: Two positive real numbers \(\alpha,\beta\), which are shape parameters. These can be specified as follows in terms of \(n\) and \(k\) where \(n\) is the number of Bernoulli trials and \(k\) is the number of successes:
- \(\alpha = k + 1\)
- \(\beta = n - k + 1\)
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = \dfrac{\alpha}{\alpha+\beta}\) | |
Variance | \(\mathbb{V}(X) = \dfrac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\) | |
\(\mathbb{P}(X=x)=\dfrac{x^{\alpha-1}(1-x)^{\beta-1}}{\textrm{B}(\alpha,\beta)}\) | \(\textrm{B}(x,y)\) is the beta function | |
CDF | \(\mathbb{P}(X \leq x)=I_{x}(\alpha,\beta)\) | \(I_{x}(a,b)\) is the regularized incomplete beta function |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 730
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "Beta distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("shape1", "Shape parameter α:", value = 2, min = 0.01, step = 0.1),
numericInput("shape2", "Shape parameter β:", value = 2, min = 0.01, step = 0.1),
hr(),
radioButtons("prob_type", "Probability to calculate:",
choices = list("P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "less"),
conditionalPanel(
condition = "input.prob_type == 'less'",
sliderInput("x_less", "x value:", min = 0, max = 1, value = 0.5, step = 0.01)
),
conditionalPanel(
condition = "input.prob_type == 'greater'",
sliderInput("x_greater", "x value:", min = 0, max = 1, value = 0.5, step = 0.01)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = 0, max = 1, value = 0.25, step = 0.01),
sliderInput("x_upper", "Upper bound (y):", min = 0, max = 1, value = 0.75, step = 0.01)
)
)
),
# Right column - Plot
card(
card_header("Beta distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("Beta(α = %.2f, β = %.2f)", input$shape1, input$shape2)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "less") {
prob <- pbeta(input$x_less, shape1 = input$shape1, shape2 = input$shape2)
explanation <- sprintf("P(X ≤ %.2f) = %.6f or %.4f%%",
input$x_less, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_less))
} else if (input$prob_type == "greater") {
prob <- 1 - pbeta(input$x_greater, shape1 = input$shape1, shape2 = input$shape2)
explanation <- sprintf("P(X ≥ %.2f) = %.6f or %.4f%%",
input$x_greater, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_greater))
} else if (input$prob_type == "between") {
if (input$x_lower == input$x_upper) {
# For continuous distributions, P(X = a) = 0
prob <- 0
} else {
upper_prob <- pbeta(input$x_upper, shape1 = input$shape1, shape2 = input$shape2)
lower_prob <- pbeta(input$x_lower, shape1 = input$shape1, shape2 = input$shape2)
prob <- upper_prob - lower_prob
}
explanation <- sprintf("P(%.2f ≤ X ≤ %.2f) = %.6f or %.4f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the beta distribution plot
output$distPlot <- renderPlot({
# Get parameters
shape1 <- input$shape1
shape2 <- input$shape2
# Create data frame for plotting
# Beta distribution is defined on the interval [0, 1]
x_values <- seq(0, 1, length.out = 500)
density_values <- dbeta(x_values, shape1 = shape1, shape2 = shape2)
plot_df <- data.frame(x = x_values, density = density_values)
# Create base plot
p <- ggplot(plot_df, aes(x = x, y = density)) +
geom_line(size = 1, color = "darkgray") +
labs(x = "X", y = "probability density function") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
xlim(0, 1) +
# Adjust y-limit based on maximum density to handle tall peaks
ylim(0, max(density_values) * 1.05)
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "less") {
# Create data for the filled area
fill_x <- seq(0, res$x, length.out = 200)
fill_y <- dbeta(fill_x, shape1 = shape1, shape2 = shape2)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "greater") {
# Create data for the filled area
fill_x <- seq(res$x, 1, length.out = 200)
fill_y <- dbeta(fill_x, shape1 = shape1, shape2 = shape2)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "between") {
# Create data for the filled area
fill_x <- seq(res$lower, res$upper, length.out = 200)
fill_y <- dbeta(fill_x, shape1 = shape1, shape2 = shape2)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example: Cantor’s Confectionery is visited by 10 customers, and 6 of them purchase something from the store. Taking the buying customers as successes and the total visiting customers as number of trials, there would be 6 successes, allowing you to find the following parameters:
\(\alpha = 6 + 1 = 7\)
\(\beta = 10 - 6 + 1 = 5\)
Then the distribution of the probabilities of a customer purchasing from Cantor’s Confectionery can be expressed as \(X \sim \textrm{Beta}(7,5)\), meaning the first shape parameter is 7 and the second shape parameter is 5.
χ^2 distribution
Where to use: The \(\chi^2\) distribution is used for hypothesis testing, such as for goodness of fit tests and tests for independence. (See Guide: Introduction to hypothesis testing for more.) It is a special case of the gamma distribution, as \(\chi^2(k)=\textrm{Gam}(\frac{k}{2},2)\).
Notation: \(X \sim \chi^2(k)\)
Parameter: The integer \(k\) is the number of degrees of freedom in the sample.
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = k\) | |
Variance | \(\mathbb{V}(X) = 2k\) | |
\(\mathbb{P}(X=x)=\dfrac{x^{\frac{k}{2}-1}\exp\left({-\frac{x}{2}}\right)}{2^\frac{k}{2}\Gamma\left(\frac{k}{2}\right)}\) | \(\Gamma(x)\) is the gamma function | |
CDF | \(\mathbb{P}(X \leq x)=\dfrac{1}{\Gamma\left(\frac{k}{2}\right)}\textrm{Gam}\left(\frac{k}{2},\frac{x}{2}\right)\) | \(\Gamma(x)\) is the gamma function, \(\textrm{Gam}(\alpha,\theta)\) is the PDF of the gamma distribution |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 640
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "Chi-squared distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("df", "Degrees of freedom (k):", value = 3, min = 1, step = 1),
hr(),
radioButtons("prob_type", "Probability to calculate:",
choices = list("P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "less"),
conditionalPanel(
condition = "input.prob_type == 'less'",
sliderInput("x_less", "x value:", min = 0, max = 20, value = 5, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'greater'",
sliderInput("x_greater", "x value:", min = 0, max = 20, value = 5, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = 0, max = 20, value = 2, step = 0.1),
sliderInput("x_upper", "Upper bound (y):", min = 0, max = 20, value = 7, step = 0.1)
)
)
),
# Right column - Plot
card(
card_header("Chi-squared distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# When degrees of freedom change, adjust the range of sliders
observe({
# For chi-squared distribution, a reasonable upper limit for the x-axis depends on df
# Higher df means larger values make more sense
df <- input$df
# Use a heuristic to determine a reasonable upper bound
# This captures critical values at the 0.995 quantile
max_x <- min(qchisq(0.995, df = df), 50)
updateSliderInput(session, "x_less", max = max_x)
updateSliderInput(session, "x_greater", max = max_x)
updateSliderInput(session, "x_lower", max = max_x)
updateSliderInput(session, "x_upper", max = max_x)
})
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("Chi-Squared(k = %d)", input$df)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "less") {
prob <- pchisq(input$x_less, df = input$df)
explanation <- sprintf("P(X ≤ %.1f) = %.6f or %.4f%%",
input$x_less, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_less))
} else if (input$prob_type == "greater") {
prob <- 1 - pchisq(input$x_greater, df = input$df)
explanation <- sprintf("P(X ≥ %.1f) = %.6f or %.4f%%",
input$x_greater, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_greater))
} else if (input$prob_type == "between") {
if (input$x_lower == input$x_upper) {
# For continuous distributions, P(X = a) = 0
prob <- 0
} else {
upper_prob <- pchisq(input$x_upper, df = input$df)
lower_prob <- pchisq(input$x_lower, df = input$df)
prob <- upper_prob - lower_prob
}
explanation <- sprintf("P(%.1f ≤ X ≤ %.1f) = %.6f or %.4f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the chi-squared distribution plot
output$distPlot <- renderPlot({
# Get parameters
df_val <- input$df
# Determine a reasonable max for x-axis based on df
max_x <- min(qchisq(0.995, df = df_val), 50)
# Create data frame for plotting - avoid x=0 when df=1 as density is infinite there
x_min <- if(df_val == 1) 0.01 else 0
x_values <- seq(x_min, max_x, length.out = 500)
density_values <- dchisq(x_values, df = df_val)
plot_df <- data.frame(x = x_values, density = density_values)
# Create base plot
p <- ggplot(plot_df, aes(x = x, y = density)) +
geom_line(size = 1, color = "darkgray") +
labs(x = "X", y = "probability density function") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
xlim(0, max_x)
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "less") {
# Create data for the filled area
fill_x <- seq(x_min, res$x, length.out = 200)
fill_y <- dchisq(fill_x, df = df_val)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "greater") {
# Create data for the filled area
fill_x <- seq(res$x, max_x, length.out = 200)
fill_y <- dchisq(fill_x, df = df_val)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "between") {
# Create data for the filled area
fill_x <- seq(res$lower, res$upper, length.out = 200)
fill_y <- dchisq(fill_x, df = df_val)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Examples:
Goodness of fit example: You have a six-sided die with six possible outcomes: \(1, 2, 3, 4, 5,\) and \(6\). You calculate the expected frequencies of each outcome. Then you roll the die many times and record the observed frequencies of each outcome. Since there are 6 categories, \[\textsf{degrees of freedom = number of categories} - 1 = 6 - 1 = 5\] This can be expressed as \(X \sim \chi^2(5)\), meaning the degrees of freedom is \(5\).
Test for independence example: You are investigating whether there is a correlation between two variables: candy colour and flavour. You have \(5\) categories of colours and 3 categories of flavours. Calculating the degrees of freedom can be done with the formula: \[(\textsf{categories of colours} - 1)(\textsf{categories of flavours} - 1) = (5-1)(3-1)=(4)(2)=8.\] You can model \(X \sim \chi^2(8)\), meaning that there are \(8\) degrees of freedom.
\(F\)-distribution
Where to use: The \(F\)-distribution is used for the ratio \((X/d_1)/(Y/d_2)\) of two independent random \(\chi^2\) variables \(X\sim \chi^2(d_1)\) and \(Y\sim \chi^2(d_2)\). It is commonly used as a reference distribution in hypothesis testing to compare two variances or more than two means, such as Analysis of Variance (ANOVA) tests.
Notation: \(X \sim F(d_{1},d_{2})\)
Parameters: Two integers \(d_1\) and \(d_2\), where - \(d_{1}\) degrees of freedom for the random variable \(X\sim \chi^2(d_{1})\). - \(d_{2}\) degrees of freedom for the random variable \(Y\sim \chi^2(d_{2})\).
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = \dfrac{d_{2}}{d_{2}-2}\) | \(d_2>2\) |
Variance | \(\mathbb{V}(X) = \dfrac{2d_{2}(d_{1}+d_{2}-2)}{d_{1}(d_{2}-2)^2(d_{2}-4)}\) | |
\(\mathbb{P}(X=x)=\dfrac{\sqrt{\frac{(d_{1}x)^{d_{1}}d_{2}^{d_{2}}}{(d_{1}x+d_{2})^{d_{1}+d_{2}}}}}{x\textrm{B}\left(\frac{d_{1}}{2},\frac{d_{2}}{2}\right)}\) | \(\textrm{B}(x,y)\) is the beta function | |
CDF | \(\mathbb{P}(X \leq x)=I_{\frac{d_{1}x}{d_{1}x+d_{2}}}(\frac{d_{1}}{2},\frac{d_{2}}{2})\) | \(I_{x}(a,b)\) is the regularized incomplete beta function |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 760
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "F-distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("df1", "Numerator degrees of freedom (d₁):", value = 5, min = 1, step = 1),
numericInput("df2", "Denominator degrees of freedom (d₂):", value = 10, min = 1, step = 1),
hr(),
radioButtons("prob_type", "Probability to calculate:",
choices = list("P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "less"),
conditionalPanel(
condition = "input.prob_type == 'less'",
sliderInput("x_less", "x value:", min = 0, max = 10, value = 1, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'greater'",
sliderInput("x_greater", "x value:", min = 0, max = 10, value = 1, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = 0, max = 10, value = 0.5, step = 0.1),
sliderInput("x_upper", "Upper bound (y):", min = 0, max = 10, value = 2, step = 0.1)
)
)
),
# Right column - Plot
card(
card_header("F-distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# When degrees of freedom change, adjust the range of sliders
observe({
# For F distribution, a reasonable upper limit for the x-axis depends on degrees of freedom
# Higher df means smaller values make more sense
df1 <- input$df1
df2 <- input$df2
# Use a heuristic to determine a reasonable upper bound
# This captures critical values at the 0.999 quantile
max_x <- min(qf(0.999, df1, df2), 10)
updateSliderInput(session, "x_less", max = max_x)
updateSliderInput(session, "x_greater", max = max_x)
updateSliderInput(session, "x_lower", max = max_x)
updateSliderInput(session, "x_upper", max = max_x)
})
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("F-distribution(d₁ = %d, d₂ = %d)", input$df1, input$df2)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "less") {
prob <- pf(input$x_less, df1 = input$df1, df2 = input$df2)
explanation <- sprintf("P(X ≤ %.1f) = %.6f or %.4f%%",
input$x_less, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_less))
} else if (input$prob_type == "greater") {
prob <- 1 - pf(input$x_greater, df1 = input$df1, df2 = input$df2)
explanation <- sprintf("P(X ≥ %.1f) = %.6f or %.4f%%",
input$x_greater, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_greater))
} else if (input$prob_type == "between") {
if (input$x_lower == input$x_upper) {
# For continuous distributions, P(X = a) = 0
prob <- 0
} else {
upper_prob <- pf(input$x_upper, df1 = input$df1, df2 = input$df2)
lower_prob <- pf(input$x_lower, df1 = input$df1, df2 = input$df2)
prob <- upper_prob - lower_prob
}
explanation <- sprintf("P(%.1f ≤ X ≤ %.1f) = %.6f or %.4f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the F-distribution plot
output$distPlot <- renderPlot({
# Get parameters
df1 <- input$df1
df2 <- input$df2
# Determine a reasonable max for x-axis based on df values
max_x <- min(qf(0.999, df1, df2), 10)
# Create data frame for plotting
x_values <- seq(0.01, max_x, length.out = 500) # Avoid x=0 since df(0) is undefined
density_values <- df(x_values, df1 = df1, df2 = df2)
plot_df <- data.frame(x = x_values, density = density_values)
# Create base plot
p <- ggplot(plot_df, aes(x = x, y = density)) +
geom_line(size = 1, color = "darkgray") +
labs(x = "X", y = "probability density function") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
xlim(0, max_x)
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "less") {
# Create data for the filled area
fill_x <- seq(0.01, res$x, length.out = 200)
fill_y <- df(fill_x, df1 = df1, df2 = df2)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "greater") {
# Create data for the filled area
fill_x <- seq(res$x, max_x, length.out = 200)
fill_y <- df(fill_x, df1 = df1, df2 = df2)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "between") {
# Create data for the filled area
fill_x <- seq(res$lower, res$upper, length.out = 200)
fill_y <- df(fill_x, df1 = df1, df2 = df2)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example: You have three independent groups of data containing Cantor’s Confectionery chocolate bar lengths, and the total sample size is 90. From this, you would like to conduct an ANOVA test investigating if there is a statistically significant difference between the means of each group. You can find the degrees of freedom using the following methods:
\(\textsf{numerator degrees of freedom = number of groups} - 1 = 3 - 1 = 2\)
\(\textsf{denominator degrees of freedom = sample size - number of groups} = 90 - 3 = 87\)
The \(F\) distribution, which will be used as a reference distribution for the ANOVA test, can be expressed as \(X \sim F(2,87)\), meaning the numerator degrees of freedom is \(2\) and the denominator degrees of freedom is \(87\).
\(t\)-distribution
Where to use: The \(t\)-distribution is a special case of the \(F\)-distribution, as \((t(\nu))^2 = F(1,\nu)\). This distribution is used for continuous random variables with heavier tails than the normal distribution, and it is often employed in hypothesis testing where the population standard deviation is unknown. (See Guide: Introduction to hypothesis testing for more.)
Notation: \(X \sim t(\nu)\)
Parameter: The integer \(\nu\) representing the degrees of freedom.
Quantity | Value | Notes |
---|---|---|
Mean | \(\mathbb{E}(X) = 0\) | |
Variance | \(\mathbb{V}(X) = \dfrac{\nu}{\nu -2}\) | \(\nu > 2\) |
\(\mathbb{P}(X=x)=\dfrac{\Gamma\left(\frac{\nu+1}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right)\sqrt{\pi \nu}}\left(1+\frac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}}\) | \(\Gamma(x)\) is the gamma function | |
CDF | \(\mathbb{P}(X \leq x)=\dfrac{1}{2}+x\Gamma\left(\frac{\nu+1}{2}\right)\left(\frac{_{2}F_{1}\left(\frac{1}{2},\frac{\nu+1}{2};\frac{3}{2};-\frac{x^2}{\nu}\right)}{\Gamma\left(\frac{\nu}{2}\right)\sqrt{\pi \nu}}\right)\) | \(\Gamma(x)\) is the gamma function, \(_{2}F_{1}(a,b;c;z)\) is the hypergeometric function |
Figure
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 640
library(shiny)
library(bslib)
library(ggplot2)
ui <- page_fluid(
title = "t-distribution calculator",
layout_columns(
col_widths = c(4, 8),
# Left column - Inputs
card(
card_header("Parameters"),
card_body(
numericInput("df", "Degrees of freedom (v):", value = 5, min = 1, step = 1),
hr(),
radioButtons("prob_type", "Probability to Calculate:",
choices = list("P(X ≤ x)" = "less",
"P(X ≥ x)" = "greater",
"P(x ≤ X ≤ y)" = "between"),
selected = "less"),
conditionalPanel(
condition = "input.prob_type == 'less'",
sliderInput("x_less", "x value:", min = -6, max = 6, value = 0, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'greater'",
sliderInput("x_greater", "x value:", min = -6, max = 6, value = 0, step = 0.1)
),
conditionalPanel(
condition = "input.prob_type == 'between'",
sliderInput("x_lower", "Lower bound (x):", min = -6, max = 6, value = -2, step = 0.1),
sliderInput("x_upper", "Upper bound (y):", min = -6, max = 6, value = 2, step = 0.1)
)
)
),
# Right column - Plot
card(
card_header("t-distribution plot"),
card_body(
uiOutput("plot_title"),
plotOutput("distPlot", height = "300px")
)
)
),
# Bottom row - Results
card(
card_header("Results"),
card_body(
textOutput("explanation")
)
)
)
server <- function(input, output, session) {
# Ensure that x_upper is always greater than or equal to x_lower
observe({
if (input$x_upper < input$x_lower) {
updateSliderInput(session, "x_upper", value = input$x_lower)
}
})
# Display the plot title with distribution parameters
output$plot_title <- renderUI({
title <- sprintf("t-distribution(df = %d, μ = 0)", input$df)
tags$h4(title, style = "text-align: center; margin-bottom: 15px;")
})
# Calculate the probability based on user selection
probability <- reactive({
if (input$prob_type == "less") {
prob <- pt(input$x_less, df = input$df)
explanation <- sprintf("P(X ≤ %.1f) = %.6f or %.4f%%",
input$x_less, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "less", x = input$x_less))
} else if (input$prob_type == "greater") {
prob <- 1 - pt(input$x_greater, df = input$df)
explanation <- sprintf("P(X ≥ %.1f) = %.6f or %.4f%%",
input$x_greater, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "greater", x = input$x_greater))
} else if (input$prob_type == "between") {
if (input$x_lower == input$x_upper) {
# For continuous distributions, P(X = a) = 0
prob <- 0
} else {
upper_prob <- pt(input$x_upper, df = input$df)
lower_prob <- pt(input$x_lower, df = input$df)
prob <- upper_prob - lower_prob
}
explanation <- sprintf("P(%.1f ≤ X ≤ %.1f) = %.6f or %.4f%%",
input$x_lower, input$x_upper, prob, prob * 100)
return(list(prob = prob, explanation = explanation, type = "between",
lower = input$x_lower, upper = input$x_upper))
}
})
# Display an explanation of the calculation
output$explanation <- renderText({
res <- probability()
return(res$explanation)
})
# Generate the t-distribution plot
output$distPlot <- renderPlot({
# Create data frame for plotting
df <- input$df
x_values <- seq(-6, 6, length.out = 500)
density_values <- dt(x_values, df = df)
plot_df <- data.frame(x = x_values, density = density_values)
# Create base plot
p <- ggplot(plot_df, aes(x = x, y = density)) +
geom_line(size = 1, color = "darkgray") +
labs(x = "X", y = "probability density function") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray", alpha = 0.7)
# Add shaded area based on selected probability type
res <- probability()
if (res$type == "less") {
# Create data for the filled area
fill_x <- seq(-6, res$x, length.out = 200)
fill_y <- dt(fill_x, df = df)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "greater") {
# Create data for the filled area
fill_x <- seq(res$x, 6, length.out = 200)
fill_y <- dt(fill_x, df = df)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
} else if (res$type == "between") {
# Create data for the filled area
fill_x <- seq(res$lower, res$upper, length.out = 200)
fill_y <- dt(fill_x, df = df)
fill_df <- data.frame(x = fill_x, density = fill_y)
p <- p + geom_area(data = fill_df, aes(x = x, y = density),
fill = "#3F6BB6", alpha = 0.6)
}
# Add normal distribution comparison if degrees of freedom is high enough
if (df >= 30) {
norm_x <- seq(-6, 6, length.out = 500)
norm_y <- dnorm(norm_x)
norm_df <- data.frame(x = norm_x, density = norm_y)
p <- p + geom_line(data = norm_df, aes(x = x, y = density),
linetype = "dotted", color = "#db4315", size = 0.8, alpha = 0.7)
}
return(p)
})
}
shinyApp(ui = ui, server = server)
Example: You have a sample of 40 measurements of Cantor’s Confectionery chocolate bar lengths. From this, you would like to conduct a one sample \(t\)-test comparing the sample to a hypothesized mean. You find the degrees of freedom:
\[ \textsf{degrees of freedom = sample size} - 1 = 40 - 1 = 39 \]
The \(t\) distribution, which will be used as a reference distribution for the \(t\)-test, can be expressed as \(X \sim t(39)\), meaning the degrees of freedom is 39. See Guide: Introduction to hypothesis testing for more.
Further reading
For more information on hypothesis testing, please see Guide: Introduction to hypothesis testing.
For more information on probability, please see Guide: Introduction to probability.
For more information on mean, expected value, variance, and standard deviation, please see Guide: Expected value, variance, standard deviation.
For more information on PMFs, PDFs, and CDFs, please see Guide: PMFs, PDFs, and CDFs.
Version history
v1.0: initial version created 04/25 by Michelle Arnetta as part of a University of St Andrews VIP project.