Continuous Probability
If \(E\) is continuous (typically \(E = \mathbb{R}\)), then we can’t meaningfully talk about the probabilities of elementary events. The probability that an observation is exactly any particular value \(x \in \mathbb{R}\) is (typically) zero.
Instead, we define a sigma field where events are intervals:
- \(E = \mathbb{R}\)
- \(\mathcal{F}\) is the set of intervals, their complements, and their countable unions. It contains infinitesimally small intervals, but not singletons.
This is not the only way to define probabilities over continuous event spaces, but it is the common way of defining probabilities over real values. This particular sigma-field is called the Borel sigma algebra, and we will denote it \((\mathbb{R}, \mathcal{B})\).
Distributions
We often talk about continuous distributions as the distribution of a random variable \(X\). A random variable is a variable that takes on random values. We can (often) observe or sample a random variable.
We define continuous probabilities in terms of a distribution function \(F_X\):
\[F_X(x) = \mathrm{P}[X < x]\]
This is also called the cumultaive distribution function (CDF).
We can use it to compute the probability for any interval:
\[\mathrm{P}[x_1 \le X < x_2] = F_X(x_2) - F_X(x_1)\]
This probability is called the probability mass on a particular interval.
Distributions are often defined by a probability density function \(p\) such that
\[F(x) = \int_{-\infty}^x p(x_*) dx_*\]
Unlike probabilities or probability mass, densities can exceed 1. When you use sns.distplot and it shows the kernel density estimator (KDE), it is showing you an estimate of the density. That is why the \(y\) axis is weird.
We can also talk about joint and conditional continuous probabilities and densities. When marginalizing a continuous probability density, we replace the sum with an integral:
\[p(x) = \int p(x,y) dy\]