A generative approach to probability spaces

published on 2021-11-18

In probability, a measure space $(\Omega, \Sigma, \Prb)$ dictates the likelihood of certain events occurring. The events we may measure are the elements of $\Sigma$, and a likelihood of some $A \in \Sigma$ is measured by $\Prb A \in [0,1]$. Our events of interest are usually those in which some undetermined quantity $\X$ lands in some target set. $$ \begin{gathered} \Prb\big(160{\rm cm} \leq \operatorname{height} \leq 187{\rm cm}\big) = 0.95 \\ \Prb\big(60{\rm kg} \leq \operatorname{mass} \leq 80{\rm kg}\big) = 0.8 \\ \Prb\big(22 \leq \operatorname{age} \leq 28\big) = 0.01 \\ \Prb\big( \X \in A\big) = 0.0001 \\ \end{gathered} $$ Explicitly, these quantities $\X$ are measurable functions $\X: \Omega \rightarrow \stateSpace$ from $(\Omega, \Sigma)$ to their respective measurable codomains $(\stateSpace, \stateAlgebra)$, and the event sets are preimages. $$ \big\{ \X \in A \big\} = X^{-1}A \in \Sigma $$ This perspective of $\Sigma/\stateAlgebra$-measurable functions $\X$ is nice, in that we are able to mentally replace $\X: \Omega \rightarrow \stateSpace$ with $\stateVar \in \stateSpace$ in the sense that any operation $f(\stateVar)$ we do to $\stateVar$ may be replaced with function composition $f(\X) = f \circ X: \Omega \rightarrow \bbF$.

Example. Just as if we can calculate body mass index (BMI) from quantities $\operatorname{height}, \operatorname{mass} \in \bbR$ $$ \operatorname{BMI} = \frac{\operatorname{mass}}{\operatorname{height}^2} \in \bbR, $$ we can construct an undetermined quantity $\operatorname{BMI}: \Omega \rightarrow \bbR$ from those $\operatorname{height}, \operatorname{mass}: \Omega \rightarrow \bbR$ via the same operation $$ \operatorname{BMI}(\omega) = \frac{\operatorname{mass}(\omega)}{\operatorname{height}(\omega)^2} $$ and simply write $\operatorname{BMI} = \operatorname{mass}/\operatorname{height}^2$.

Caveat. We must have that the output set $\bbF$ of our operation $f$ has some associated $\sigma$-algebra $\scrF$ such that our operation $f: \stateSpace \rightarrow \bbF$ is $\stateAlgebra/\scrF$-measurable.

The benefit to the approach of mentally replacing quantities $\stateVar \in \stateSpace$ with undetermined versions $\X: \Omega \rightarrow \stateSpace$ is our ability to apply operations like those in the example above. The downside of this approach is that it abstracts all the information out of $(\Omega, \Sigma, \Prb)$, making it hard for newcomers to probability to understand how probabililty spaces are constructed to satisfy properties we study. Take, for instance, a Python[undefined],[undefined] program which generates samples from distributions like so.

import numpy
X = numpy.random.exponential(scale=1/69.0)
Y = numpy.random.normal(0.0, 1.0)
Z = numpy.random.normal(Y, X)

We can quickly mentally model this code in terms of uncertain quantities, decided like the code. $$\begin{gathered} X \sim \operatorname{Exponential}(69) \\ Y \sim \operatorname{Normal}(0, 1) \\ Z \sim \operatorname{Normal}(Y, X) \\ \end{gathered}$$ However, the notation above hides the structure of probability spaces $(\Omega, \Sigma, \Prb)$ which equip such undetermined quantities $X, Y, Z$. This abstraction may leave people unfulfilled: without having explicit spaces to tinker with, learners may struggle with building familiarity with the theory.

This post seeks to address how we can think of probability generatively; that is, we seek to demonstrate how one may construct probability spaces $(\Omega, \Sigma, \Prb)$ which correspond to generative sampling algorithms like the Python code above.

Start with a distribution

If I want some probability space $(\Omega, \Sigma, \Prb)$ with a single undetermined quantity $\X: \Omega \rightarrow \stateSpace$ with some distribution $\mu$, $$ \Prb(\X \in A) = \mu A, \quad A \in \stateAlgebra, $$ the construction is very easy. $$ (\Omega, \Sigma, \Prb, \X) = (\stateSpace, \stateAlgebra, \mu, \operatorname{identity}) $$ This will clearly satisfy the distribution. $$ \Prb(\X \in A) = \Prb(\X^{-1}A) = \Prb(A) = \mu(A) $$

Example. Denoting $\mu$ as a $\operatorname{Exponential}(69)$ probability measure on the Borel sets $\scrR$ of $\bbR$, $$ \mu(A) = \int_{A \cap [0,\infty)} \frac{1}{69} e^{-69x} \rmd x, $$ We can construct our space like so. $$ (\Omega, \Sigma, \Prb, \X) = (\bbR, \scrR, \mu, \operatorname{identity}) $$ Note that any $f: \Omega \rightarrow \bbF$ that is $\Sigma/\scrF$-measurable is necessarily such that $$ f(\stateVar) = f(\X(\stateVar)) $$ so all undetermined quantities $f$ in the space $(\Omega, \Sigma, \Prb)$ are effectively deterministic operations of $\X$.

This is a silly edge case, as we typically deal with undetermined quantities that relate to each other in some way. In the next section, we will discuss how we build our probability space through successive enlargements when consecutively introducing new undetermined quantities.

Tacking on quantities

In our Python program, we were able to consecutively initialize variables X, Y, Z by sequentially calling functions in the numpy.random module. Each function call allocated more memory for the program, like in the diagram below.

Memory: M1=[X]

↓

Memory: M2=[X,Y]

↓

Memory: M3=[X,Y,Z]

Just as the allocated memory enlarges

M1

→

M2

→

$\cdots$

→

Mn

as we consecutively initialize variables X1, X2, $\ldots$, Xn in a program, we can enlarge our probability spaces

$(\Omega_1, \Sigma_1, \Prb_1)$

→

$(\Omega_2, \Sigma_2, \Prb_2)$

→

$\cdots$

→

$(\Omega_n, \Sigma_n, \Prb_n)$

with the declaration of undetermined quantities $X_1, X_2, \ldots, X_n$. I see there being two types of ways perform such enlargements, which I will call independent sampling and conditional sampling. Each type of enlargement will leverage some probability space $(\Omega, \Sigma, \Prb)$ to create a new space $(\tilde\Omega,\tilde\Sigma,\tilde\Prb)$ with the following properties.

There is a new undetermined quantity $Y: \Omega \rightarrow \bbY$.
Any undetermined quantity $\X: \Omega \rightarrow \stateSpace$ on our old space is immediately identifiable with a lifted one $\tilde\X: \tilde\Omega \rightarrow \stateSpace$ with the exact same distribution.

To this end, we may suppose as though our enlargement never occurred, and that our original abstract $(\Omega, \Sigma, \Prb)$ is in fact our enlarged $(\tilde\Omega,\tilde\Sigma,\tilde\Prb)$ with a new undetermined quantity in addition to our previous ones.

Independent sampling

Provided two probabililty spaces $(\Omega, \Sigma, \Prb)$ and $(\bbY, \scrY, \mu)$, we may define a new space $(\tilde\Omega, \tilde\Sigma,\tilde\Prb)$ via the product operations. $$\begin{aligned} \tilde\Omega &= \Omega \times \bbY = \big\{ (\omega, y) : \omega \in \Omega, y \in \bbY \big\} \\ \tilde\Sigma &= \Sigma \otimes \scrY = \sigma\big(A \times B : A \in \Sigma, B \in \scrY \big) \\ \tilde\Prb &= \Prb \otimes \mu \\ &\quad (\Prb\otimes\mu)(A \times B) = \Prb(A)\cdot \mu(B) \end{aligned}$$ Note that we only defined $\Prb\otimes\mu$ on the rectangles $\big\{ A \times B : A \in \Sigma, B \in \scrY \big\}$, but the partial definition on these rectangles uniquely extends to some measure on $\tilde\Sigma$. Now, we may define $Y: \tilde\Omega \rightarrow \bbY$ via $$ Y(\omega, y) = y. $$ We immediately have that $Y$ is $\tilde\Sigma/\scrY$ measurable; for all $A \in \scrY$, we get $$ Y^{-1}A = \Omega \times A \in \tilde\Sigma. $$ Its distribution is also $\mu$. $$\begin{aligned} \tilde\Prb(Y \in A) &= \tilde\Prb(Y^{-1}A) \\ &= \tilde\Prb(\Omega \times A) \\ &= (\Prb\otimes\mu)(\Omega\times A) \\ &= \Prb\Omega\cdot\mu A \\ &=\mu A \end{aligned}$$ Further, any undetermined quantity $X: \Omega \rightarrow \stateSpace$ we had on our previous space $(\Omega, \Sigma, \Prb)$ can be extended to the new space via the following definition $\tilde\X: \tilde\Omega \rightarrow \stateSpace$. $$ \tilde\X(\omega, y) = \X(\omega) $$ The $\Sigma/\stateAlgebra$-measurability that held for $\X$ is lifted to $\tilde\Sigma/\stateAlgebra$-measurability of $\tilde\X$; indeed, each $A \in \stateAlgebra$ is such that $\X^{-1}A \in \Sigma$, so $$ \tilde\X^{-1}A = \X^{-1}A \times \bbY \in \tilde\Sigma. $$ The $\tilde\Prb$-distribution of $\tilde\X$ is the also the exact same as the $\Prb$-distribution of $\X$. $$\begin{aligned} \tilde\Prb(\tilde \X \in A) &= \tilde\Prb(\tilde \X^{-1}A) \\ &= \tilde\Prb(\X^{-1}A \times \bbY) \\ &= (\Prb\otimes\mu)(\X^{-1}A \times \bbY) \\ &= \Prb(\X^{-1}A) \cdot \mu\bbY \\ &= \Prb(\X \in A) \end{aligned}$$

Lastly, the phrase independent sampling comes from the fact that counterparts $\tilde\X$ of undetermined quantities $\X$ in our original space are all independent of our new quantity $Y$. $$\begin{aligned} \tilde\Prb\big(\tilde\X \in A, Y \in B\big) &= \tilde\Prb\big(\tilde\X^{-1}A \cap Y^{-1}B\big) \\ &= \tilde\Prb\big((\X^{-1}A \times \bbY) \cap (\Omega \times B) \big) \\ &= \tilde\Prb\big(\X^{-1}A \times B \big) \\ &= (\Prb\otimes\mu)\big(\X^{-1}A \times B \big) \\ &= \Prb(\X^{-1}A) \cdot \mu B \\ &= \Prb(\X \in A) \cdot \tilde\Prb(Y \in B) \\ &= \tilde\Prb(\tilde\X \in A) \cdot \tilde\Prb(Y \in B) \end{aligned}$$

Example. Consider $\mu$ as in our previous example. $$\mu(A) = \int_{A \cap [0,\infty)} \frac{1}{69}e^{-69\stateVar} \rmd\stateVar$$ By introducing a $\operatorname{Normal}(0, 1)$ measure $\nu$ on $(\bbR, \scrR)$ $$\nu(A) = \int_A (2\pi)^{-1/2} \exp\Big(-\frac{y^2}{2} \Big)\rmd y, $$ we may construct a probability space as follows. $$ (\Omega, \Sigma, \Prb) = (\bbR\times\bbR, \scrR\otimes\scrR, \mu\otimes\nu) $$ Defining $\X, Y: \Omega \rightarrow \bbR$ defined by $\X(\stateVar, y) = \stateVar$ and $Y(\stateVar, y) = y$, we see the following joint distribution. $$\begin{aligned} \Prb(X \in A, Y \in B) &=\Prb(X^{-1}A \cap Y^{-1}B) \\ &=\Prb\big((A \times \bbR) \cap (\bbR \times B)\big) \\ &=\Prb(A \times B) \\ &=(\mu \otimes \nu)(A \times B) \\ &= \mu(A) \cdot \nu(B) \end{aligned}$$ In other words, $\X$ is $\operatorname{Exponential}(69)$ distributed, while $Y$ is independently $\operatorname{Normal}(0,1)$ distributed. Note that any function $\Sigma/\scrF$-measurable function $f: \Omega \rightarrow \bbF$ is immediately of the form $$ f(\stateVar, y) = f\big(\X(\stateVar, y), Y(\stateVar, y) \big) $$ so all undetermined quantities $f$ in the space $(\Omega, \Sigma, \Prb)$ are effectively deterministic operations of quantities $\X$ and $Y$.

Conditional sampling

Fix a probabililty space $(\Omega, \Sigma, \Prb)$ and a measurable space $(\bbY, \scrY)$. A transition kernel $\kappa: \Omega \times \scrY \rightarrow \bbR$ is an object satisfying the following.

$\kappa(\omega, \cdot)$ is a probability measure on $(\bbY, \scrY)$ for each $\omega \in \Omega$.
$\kappa(\cdot, A)$ is $\Sigma/\scrR$-measurable for each $A \in \scrY$.

With these objects, we may define a probability space $(\tilde\Omega, \tilde\Sigma, \tilde\Prb)$, defined as follows.

$$\begin{aligned} \tilde\Omega &= \Omega \times \bbY \\ \tilde\Sigma &= \Sigma \otimes \scrY \\ \tilde\Prb &= \Prb \ast \kappa \\ &\quad (\Prb\ast\kappa)(A \times B) = \int_A \kappa(\omega, B) \Prb(\rmd\omega) \end{aligned}$$ Again, we may define a $\tilde\Sigma/\scrY$-measurable function $Y: \tilde\Omega \rightarrow \bbY$ via $$ Y(\omega, y) = y. $$ Although the measurable space $(\tilde\Omega, \tilde\Sigma)$ is the same as before, the new measure $\tilde\Prb$ this time makes the distribution of $Y$ rather complicated. $$\begin{aligned} \tilde\Prb(Y \in A) &= \tilde\Prb(Y^{-1}A) \\ &= \tilde\Prb(\Omega \times A) \\ &= \int_\Omega \kappa(\omega, A) \Prb(\rmd\omega) \end{aligned}$$ Intuitively, we may think of the last expression as a $\Prb$-weighted average over a family of likelihoods $\{\kappa(\omega, A)\}_{\omega\in\Omega}$. Each $\kappa(\omega, A)$ is the likelihood of the event $\{ Y \in A \}$ for a specific instance $\omega \in \Omega$. To this end, $\kappa(\omega, \cdot)$ effectively serves as a conditional distribution for $Y$, given a specific $\omega \in \Omega$.

Note that if we curry the first argument of a transition kernel $\kappa$ so that it is a function of the form $\omega \mapsto (A \mapsto \kappa(\omega, A))$, we may recognize $\kappa: \Omega \rightarrow \bbM_1(\scrY)$, where $\bbM_1(\scrY)$ is the set of probability measures on $(\bbY, \scrY)$. We may equip $\bbM_1(\scrY)$ with a $\sigma$-algebra $\scrM_1(\scrY)$, weakly determined by integration maps. That is, letting $\bbR_+ = [0,\infty)$, $\scrR_+$ be the relative algebra of $\scrR$ on $\bbR_+$, $$ \scrR_+ = \Big\{ A \cap \bbR_+ : A \in \scrR \Big\}, $$ and associating each bounded $\scrY/\scrR_+$-measurable function $f: \bbY \rightarrow \bbR_+$ with an integration map $I_f: \bbM_1(\scrY) \rightarrow \bbR$ by $$ I_f(\mu) = \int_\bbY f(y) \mu(\rmd y), $$ the following is a $\sigma$-algebra on $\bbM_1(\scrY)$. $$ \scrM_1(\scrY) = \sigma\Big( I_f \;|\; f \text{ is bounded and } \scrY/\scrR_+\text{-measurable}\Big) $$

Result 0. Given a transition kernel $\kappa$, the map $\omega \mapsto (A \mapsto \kappa(\omega, A))$ is $\Sigma/\scrM_1(\scrY)$-measurable.
Proof. The first part of the definition of a transition kernel tells us that $\kappa(\omega, \cdot) \in \bbM_1(\scrY)$ for each $\omega \in \Omega$, so everything is well-defined.$$ \Omega \xrightarrow{\omega \mapsto \kappa(\omega, \cdot)} \bbM_1(\scrY) \xrightarrow{\mu \mapsto I_f(\mu)} \bbR_+$$By Lemma 1 in this post, it suffices to show that the composition above, $\omega \mapsto I_f(\kappa(\omega, \cdot))$, is $\Sigma/\scrR_+$-measurable for an arbitrary bounded $\scrY/\scrR_+$-measurable $f$. To see this, we first recognize that $\scrY/\scrR_+$-measurablility of $f$ tells us that $$ A_{n,k} \defeq f^{-1}\big[k2^{-n}, (k+1)2^{-n}\big) \in \scrY $$ for each $n,k \in \bbN$. Thus, for each $n \in \bbN$, the following $f_n$ is $\scrY/\scrR_+$-measurable. $$ f_n(y) \defeq \sum_{k=0}^{n2^n} k2^{-n} 1_{A_{n,k}}(y) $$ These bounded simple functions $f_n$ are such that $\omega \mapsto I_{f_n}(\kappa(\omega, \cdot))$ is $\Sigma/\scrR_+$-measurable; indeed, this is from the second part of the definition of a transition kernel. $$ I_{f_n}(\kappa(\omega, \cdot)) = \int_\bbY f_n(y) \kappa(\omega, \rmd y) = \sum_{k=0}^{n2^n} k2^{-n} \kappa(\omega, A_{n,k}) $$ The functions $f_n$ were also designed to increase up to $f$, so we may apply the monotone convergence theorem to see that $\omega \mapsto I_f(\kappa(\omega, \cdot))$ is also $\Sigma/\scrR_+$-measurable. $$\begin{aligned} I_f(\kappa(\omega, \cdot)) &= \int_\bbY f(y) \kappa(\omega, \rmd y) \\ &= \lim_{n\rightarrow\infty} \int_\bbY f_n(y) \kappa(\omega, \rmd y) \\ &= \lim_{n\rightarrow\infty} I_{f_n}(\kappa(\omega, \cdot)) \end{aligned}$$

Result 1. Given a $\Sigma/\scrM_1(\scrY)$-measurable map $\kappa: \Omega \rightarrow \bbM_1(\scrY)$, the map $(\omega, A) \mapsto \kappa(\omega)(A)$ is a transition kernel.
Proof. This direction is simple. For each $\omega \in \Omega$, $\kappa(\omega) \in \bbM_1(\scrY)$ gives us the first part of the definition of a transition kernel. For the second part, we recognize that for a fixed $A \in \scrY$, we have the following identity over all $\omega \in \Omega$ $$ \kappa(\omega)(A) = \int_\bbY 1_A(y) \kappa(\omega)(\rmd y) = I_{1_A}(\kappa(\omega)) = (I_{1_A} \circ \kappa)(\omega) $$ This means that $\omega \mapsto \kappa(\omega)(A)$ is the same map as $I_{1_A} \circ \kappa$, which is $\Sigma/\scrR_+$-measurable by $\Sigma/\scrM_1(\scrY)$-measurability of $\kappa$. When viewed as having codomain $\bbR$, this of course results in $\Sigma/\scrR$-measurability.

With this equivalence, $\kappa$ is an undetermined measure that distributes according to $\Prb$, and through the algebraic operation $(\Prb, \kappa) \mapsto \Prb\ast\kappa$, we are able to introduce a new source of indeterminism on top of that from $(\Omega, \Sigma, \Prb)$. The integral structure of the distribution of the new undetermined quantity $Y$ depends on each of these sources $\kappa, \Prb$. However, this property is unique to the new undetermined quantity $Y$. That is, for each $\X: \Omega \rightarrow \stateSpace$, the lifted $\tilde\X: \tilde\Omega \rightarrow \stateSpace$ preserves distribution, acting independently of $\kappa$. $$\begin{aligned} \tilde\Prb(\tilde X \in A) &=\tilde\Prb(\tilde X^{-1}A) \\ &=\tilde\Prb(X^{-1}A \times \bbY) \\ &=\int_{X^{-1}A} \kappa(\omega, \bbY) \Prb(\rmd\omega) \\ &=\int_{X^{-1}A} \Prb(\rmd\omega) \\ &=\Prb(X^{-1}A) \\ &=\Prb(X \in A) \end{aligned}$$ We may again compare this to the Python program: The declaration of variables X and Y at lines 2 and 3 are declared before Z is declared at line 4, so their values should not depend on the later allocation. Conversely, Z is declared in a fashion that depends on X and Y, so its value should depend on the earlier allocation.

Example. Consider probability space $(\bbR \times \bbR, \scrR \otimes \scrR, \mu \otimes \nu)$ as in our previous example. $$\begin{gathered} \mu(A) = \int_A \frac{1}{69} e^{-69\stateVar} \rmd\stateVar \\ \nu(B) = \int_B (2\pi)^{-1/2} \exp\Big(-\frac{y}{2}\Big) \rmd y \end{gathered}$$ Define a kernel $\kappa: (\bbR\times\bbR) \times \scrR \rightarrow \bbR$ as follows. $$ \kappa((\stateVar,y), A) = \int_A (2\pi\stateVar)^{-1/2} \exp\Big(-\frac{(z-y)^2}{2\stateVar} \Big) \rmd z $$ This allows us to declare our probability space. $$\begin{aligned} \Omega &= \bbR^3, \\ \Sigma &= \scrR \otimes \scrR \otimes \scrR,\\ \Prb &= (\mu \otimes \nu) \ast\kappa \end{aligned}$$ Now let $\X, Y, Z: \Omega \rightarrow \bbR$ be the component maps. The joint distribution of $(\X, Y)$ is $\mu\otimes\nu$. $$\begin{aligned} \Prb(\X \in A, Y \in B) &= \big((\mu\otimes\nu)\ast\kappa\big)(A \times B \times \bbR) \\ &= \int_{A\times B} \kappa\big((\stateVar, y), \bbR\big) (\mu\otimes\nu)(\rmd\stateVar,\rmd y) \\ &= \int_{A\times B} (\mu\otimes\nu)(\rmd\stateVar,\rmd y) \\ &= (\mu\otimes\nu)(A \times B) \end{aligned}$$ While conditional distributions have numerous characterizations, we can think of taking a conditional probability $$\begin{aligned} \Prb(Z \in C | \X \in A, Y \in B) &= \frac{\Prb(\X \in A, Y \in B, Z \in C)}{\Prb(\X \in A, Y \in B)} \\ &= \frac{\big((\mu\otimes\nu)\ast\kappa\big)(A \times B \times C)}{(\mu\otimes\nu)(A\times B)} \\ &= \frac{1}{(\mu\otimes\nu)(A\times B)} \int_{A \times B} \kappa\big((\stateVar, y), C\big) (\mu\otimes\nu)(\rmd\stateVar,\rmd y) \end{aligned}$$ and decrease $A \times B \rightarrow \{(\stateVar, y)\}$ to claim $Z$ is conditionally $\operatorname{Normal}(Y, \X)$. $$\begin{aligned} \Prb(Z \in C | \X =\stateVar, Y = y) &= \lim_{A \times B \rightarrow \{(\stateVar, y)\}} \Prb(Z \in C | \X \in A, Y \in B) \\ &= \kappa\big((\stateVar, y), C\big) \\ &= \int_C (2\pi\stateVar)^{-1/2} \exp\Big(-\frac{(z-y)^2}{2\stateVar} \Big) \rmd z \end{aligned}$$

Specific case: independent sampling

If we have a measure $\mu$ and a transition kernel $\kappa$ that is constant in its first coordinate, i.e. there is a probability measure $\nu$ such that $$ \kappa(\omega, A) = \nu(A) $$ it is easy to see that $\mu \ast\kappa= \mu\otimes\nu$. $$\begin{aligned} (\mu\ast\kappa)(A \times B) &= \int_A \kappa(\omega, B) \mu(\rmd\stateVar) \\ &= \int_A \nu(B) \mu(\rmd\stateVar) \\ &= \mu(A)\nu(B) \\ &= (\mu\otimes\nu)(A\times B) \end{aligned}$$

Specific case: latent sampling

A trick that is sometimes exploited is what I would like to call latent sampling, in which we construct a probability space $(\Omega, \Sigma, \Prb)$ with undetermined quantities $\X: \Omega \rightarrow \stateSpace$ and $Y: \Omega \rightarrow \bbY$ which are correlated through a deterministic operation of a latent and independent $W: \Omega \rightarrow \bbW$. $$ Y = f(\X, W) $$ We can easily construct this in two ways. The first of which is by making $(\Omega, \Sigma, \Prb)$ the independent sampling of $\X, W$. $$\begin{aligned} \Omega &= \stateSpace \times \bbW \\ \Sigma &= \stateAlgebra \otimes \scrW \\ \Prb &= \mu \otimes \nu \\ \X(\stateVar, w) &= \stateVar \\ W(\stateVar, w) &= w \\ Y(\stateVar, w) &= f(\stateVar, w) \end{aligned}$$ The second of which is by directly making it a conditional sampling of $\X, Y$. $$\begin{aligned} \Omega &= \stateSpace \times \bbY \\ \Sigma &= \stateAlgebra \otimes \scrY \\ \Prb &= \mu \ast \kappa \\ \kappa(\stateVar, B) &= \int_\bbW 1_B\big(f(\stateVar, w)\big) \nu(\rmd w) \\ \X(\stateVar, y) &= \stateVar \\ Y(\stateVar, y) &= y \end{aligned}$$ Note that the second approach is somehow more efficient in that the quantities $\X$ and $Y$ of interest are the component maps. This comes at the cost of being less generative, in that the latent variable $W$ is nonexistent. In fact, the second approach can be seen as the most efficient construction of a model bearing $\X$ and $Y$, in that it is embedded in any model as seen in the distributions. $$\begin{aligned} \Prb(\X \in A, Y \in B) &= \Prb(X \in A, f(X, W) \in B) \\ &= \int_{\stateSpace \times \bbW} 1_A(\stateVar) 1_B\big(f(\stateVar, w)\big) (\mu\otimes\nu)(\rmd\stateVar,\rmd w) \\ &= \int_A \int_\bbW 1_B\big(f(\stateVar, w)\big) \nu(\rmd w) \mu(\rmd\stateVar) \\ &= \int_A \kappa(\stateVar, B) \mu(\rmd\stateVar) \\ &= (\mu\ast\kappa)(A \times B) \end{aligned}$$

Extending inductively

We now see how to generate probability spaces $(\Omega, \Sigma, \Prb)$ with undetermined quantities $X_1, \ldots, X_n$ exhibiting interesting relations. We start by picking a distribution for $X_1$, set $(\Omega_1, \Sigma_1, \Prb_1)$ as the target space with said distribution, and then inductively perform enlargements $(\Omega_i, \Sigma_i, \Prb_i) \rightarrow (\Omega_{i+1}, \Sigma_{i+1}, \Prb_{i+1})$ like above, each time adding a new undetermined quantity $X_{i+1}$, until we end at $(\Omega, \Sigma, \Prb) = (\Omega_n, \Sigma_n, \Prb_n)$ equipped with $X_1, \ldots, X_n$ (up to identification).

As we saw in the examples, our space $(\Omega, \Sigma)$ will be a product space with $n$ factors, each $\X_i$ is the $i$-th component map, and each undetermined quantity will take the form $f(\X_1, \ldots, \X_n)$. To this end, our spaces are constructed so that $\X_1, \ldots, \X_n$ determine the entire space. This matches our programming analogy, in which the memory of the program consists of the variables X1, $\ldots$, Xn.

Note that when we say extend inductively, we mean to suggest a finite amount of enlargements. As it stands, we have only proven enlargements that add a single undetermined quantity at a time. This means we still don't have measure spaces $(\Omega, \Sigma, \Prb)$ which equip stochastic processes $(X_i)_{i\in I}$ for infinite $I$. This should still align with our computer program analogy. For countably infinite $I$, we would need a never-ending infinite for-loop to get all $(X_i)_{i\in I}$. For uncountably infinite $I$, the analogy would not even work.

To remedy this, we think of stochastic processes slightly differently than a collection of undetermined quantities $(\X_i)_{i\in I}$. We instead think of the entire sequence $\X=(\X_i)_{i\in I}$, and show how to construct measures associated to processes $\X: \Omega \rightarrow \bbX^I$. These measures will again be defined on product algebras $\scrX^I$ over $\bbX^I$, and we will be able to describe the measures on $\scrX^I$ through finite-dimensional projections (even when $I$ is uncountable). To read more on this, consider the following post, which discusses how to construct stochastic processes.

References

[0]NumPy.Website. https://numpy.org/

[1]The Python Programming Language.Website. https://www.python.org/

A measure-theoretic introduction to stochastic processes

2021-11-17

probabilitystochastic-processesmarkov-processes