Convergence of random variables
In probability theory, there exist several different notions of convergence of random variables. The convergence (in one of the senses presented below) of sequences of random variables to some limiting random variable is an important concept in probability theory, and its applications to statistics and stochastic processes. For example, if the average of n uncorrelated random variables Yi, i = 1, ..., n, is given by
- <math>X_n = \frac{1}{n}\sum_{i=1}^n Y_i\,,</math>
then as n goes to infinity, Xn converges in probability (see below) to the common mean, μ, of the random variables Yi. This result is known as the weak law of large numbers. Other forms of convergence are important in other useful theorems, including the central limit theorem.
Throughout the following, we assume that (Xn) is a sequence of random variables, and X is a random variable, and all of them are defined on the same probability space (Ω, F, P).
Convergence in distribution
Suppose that F1, F2, ... is a sequence of cumulative distribution functions corresponding to random variables X1, X2, ..., and that F is a distribution function corresponding to a random variable X. We say that the sequence Xn converges towards X in distribution, if
- <math>\lim_{n\rightarrow\infty} F_n(a) = F(a),</math>
for every real number a at which F is continuous. Since F(a) = Pr(X ≤ a), this means that the probability that the value of X is in a given range is very similar to the probability that the value of Xn is in that range, provided n is sufficiently large. Convergence in distribution is often denoted by adding the letter <math>\mathcal D</math> over an arrow indicating convergence:
- <math>X_n \, \xrightarrow{\mathcal D} \, X</math>
Small <math>d</math> is also possible, although less common.
Convergence in distribution is the weakest form of convergence, and is sometimes called weak convergence (main article: weak convergence of measures). It does not, in general, imply any other mode of convergence. However, convergence in distribution is implied by all other modes of convergence mentioned in this article, and hence, it is the most common and often the most useful form of convergence of random variables. It is the notion of convergence used in the central limit theorem and the (weak) law of large numbers.
A useful result, which may be employed in conjunction with law of large numbers and the central limit theorem, is that if a function g: R → R is continuous, then if Xn converges in distribution to X, then so too does g(Xn) converge in distribution to g(X). (This may be proved using Skorokhod's representation theorem.) This fact could be taken as a definition for the convergence in distribution.
Convergence in distribution is also called convergence in law, since the word "law" is sometimes used as a synonym of "probability distribution."
Convergence in probability
To say that the sequence Xn converges towards X in probability means
- <math>\lim_{n\rightarrow\infty}\Pr\left(\left|X_n-X\right|\geq\varepsilon\right)=0</math>
for every ε > 0. Formally, pick any ε > 0 and any δ > 0. Let Pn be the probability that Xn is outside a tolerance ε of X. Then, if Xn converges in probability to X then there exists a value N such that, for all n ≥ N, Pn is itself less than δ.
Convergence in probability is often denoted by adding the letter 'P' over an arrow indicating convergence:
- <math>X_n \, \xrightarrow{P} \, X</math>
Convergence in probability is the notion of convergence used in the weak law of large numbers. Convergence in probability implies convergence in distribution. To prove it, it's convenient to prove the following, simple lemma:
Lemma
Let X, Y be random variables, c a real number and ε > 0; then
- <math>\Pr(Y\leq c)\leq \Pr(X\leq c+\varepsilon)+\Pr(\left|Y - X\right|>\varepsilon).</math>
Proof of lemma
- <math>\Pr(Y\leq c)=\Pr(Y\leq c,X\leq c+\varepsilon)+\Pr(Y\leq c,X>c+\varepsilon)</math>
- <math>\leq \Pr(X\leq c+\varepsilon)+\Pr(Y\leq c,c<X - \varepsilon)</math>
- <math>\leq \Pr(X\leq c+\varepsilon)+\Pr(Y - X<- \varepsilon)\leq \Pr(X\leq c+\varepsilon)+\Pr(\left|Y - X\right|>\varepsilon)</math>
since
- <math>\Pr(\left|Y - X\right|>\varepsilon)=\Pr(Y - X>\varepsilon)+\Pr(Y - X<-\varepsilon)\geq \Pr(Y - X<-\varepsilon).</math>
Proof
For every <math>\varepsilon > 0</math>, due to the preceding lemma, we have:
- <math>\Pr(X_n\leq a)\leq \Pr(X\leq a+\varepsilon)+ \Pr(\left|X_n - X\right|>\varepsilon)</math>
- <math>\Pr(X\leq a-\varepsilon)\leq \Pr(X_n \leq a)+\Pr(\left|X_n - X\right|>\varepsilon)</math>
So, we have
- <math>\Pr(X\leq a-\varepsilon)-\Pr(\left|X_n - X\right|>\varepsilon)\leq \Pr(X_n \leq a)\leq \Pr(X\leq a+\varepsilon)+\Pr(\left|X_n - X\right|>\varepsilon).</math>
Taking the limit for <math>n\to\infty</math>, we obtain:
- <math>\Pr(X\leq a-\varepsilon)\leq \lim_{n\rightarrow\infty} \Pr(X_n \leq a)\leq P(X\leq a+\varepsilon).</math>
But <math>\Pr(X\leq a)</math> is the cumulative distribution function <math>F_X(a)</math>, which is continuous by hypothesis, that is
- <math>\lim_{\varepsilon \to {0+}} F_X(a-\varepsilon)=\lim_{\varepsilon \to {0+}} F_X(a+\varepsilon)=F_X(a),</math>
and so, taking the limit for <math>\varepsilon \to {0+}</math>, we obtain
- <math>\lim_{n\to\infty} \Pr(X_n \leq a)=\Pr(X \leq a).</math>
Almost sure convergence
To say that the sequence Xn converges almost surely or almost everywhere or with probability 1 or strongly towards X means
- <math>\Pr\left(\lim_{n\rightarrow\infty}X_n=X\right)=1.</math>
This means that the values of Xn approach the value of X, in the sense (see almost surely) that events for which Xn does not converge to X have probability 0. Using the probability space (Ω, F, P) and the concept of the random variable as a function from Ω to R, this is equivalent to the statement
- <math>\Pr\left(\big\{\omega \in \Omega \, | \, \lim_{n \to \infty}X_n(\omega) = X(\omega) \big\}\right) = 1.</math>
Almost sure convergence is often denoted by adding the letters a.s. over an arrow indicating convergence:
- <math>X_n \, \xrightarrow{\mathrm{a.s.}} \, X</math>
Almost sure convergence implies convergence in probability, and hence implies convergence in distribution. It is the notion of convergence used in the strong law of large numbers.
Sure convergence
To say that the sequence or random variables (Xn) defined over the same probability space (i.e., a random process) converges surely or everywhere or pointwise towards X means
- <math>\lim_{n\rightarrow\infty}X_n(\omega)=X(\omega), \, \, \forall \omega \in \Omega.</math>
where <math>\Omega</math> is the sample space of the underlying probability space over which the random variables are defined.
This is the notion of pointwise convergence of sequence functions extended to sequence of random variables. (Note that random variables themselves are functions).
- <math>\big\{\omega \in \Omega \, | \, \lim_{n \to \infty}X_n(\omega) = X(\omega) \big\} = \Omega.</math>
Sure convergence of a random variable implies all the other kinds of convergence stated above, but there is no payoff in probability theory by using sure convergence compared to using almost sure convergence. The difference between the two only exists on sets with probability zero. This is why the concept sure convergence of random variables is very rarely used.
Convergence in mean
We say that the sequence Xn converges in the r-th mean' or in the Lr 'norm towards X, if r ≥ 1, E|Xn|r < ∞ for all n, and
- <math>\lim_{n\rightarrow\infty}\mathrm{E}\left(\left|X_n-X\right|^r\right)=0</math>
where the operator E denotes the expected value. Convergence in rth mean tells us that the expectation of the r-th power of the difference between Xn and X converges to zero.
This type of convergence is often denoted by adding the letter Lr over an arrow indicating convergence:
- <math>X_n \, \xrightarrow{L^r} \, X.</math>
The most important cases of convergence in r-th mean are:
- When Xn converges in r-th mean to X for r = 1, we say that Xn converges in mean to X.
- When Xn converges in r-th mean to X for r = 2, we say that Xn converges in mean square to X.
Convergence in the r-th mean, for r > 0, implies convergence in probability (by Chebyshev's inequality), while if r > s ≥ 1, convergence in r-th mean implies convergence in s-th mean. Hence, convergence in mean square implies convergence in mean.
Implications
The chain of implications between the various notions of convergence are noted in their respective sections. They are, using the arrow notation
- <math> \xrightarrow{\textrm{a.s.}} \quad \Rightarrow \quad \xrightarrow{P} \quad \Rightarrow \quad \xrightarrow{\mathcal D} </math>
- <math> \forall r>0:\quad\xrightarrow{L^r} \quad \Rightarrow \quad \xrightarrow{P} </math>
- <math>\forall r>s\geq1:\quad\xrightarrow{L^r} \quad \Rightarrow \quad \xrightarrow{L^s}</math>
No other implications other than these hold in general, but a number of special cases do permit the converse implications:
- If Xn converges in distribution to a constant c, then Xn converges in probability to c.
- If Xn converges in probability to X, and if Pr(|Xn| ≤ b) = 1 for all n and some b, then Xn converges in rth mean to X for all r ≥ 1. In other words, if Xn converges in probability to X and all random variables Xn are almost surely bounded above and below, then Xn converges to X also in any rth mean.
- If for all ε > 0,
- <math>\sum_n P\left(|X_n - X| > \varepsilon\right) < \infty,</math>
- then we say that Xn converges almost completely, or fast in probability towards X. When Xn converges almost completely towards X then it also converges almost surely to X. In other words, if Xn converges in probability to X sufficiently quickly (i.e. the above sequence of tail probabilities is summable for all ε > 0), then Xn also converges almost surely to X. This is a direct implication from the Borel-Cantelli lemma.
- If Sn is a sum of n real independent random variables:
- <math>S_n = X_1+\cdots+X_n</math>
- then Sn converges almost surely if and only if Sn converges in probability.
- Lévy's convergence theorem gives sufficient conditions for almost sure convergence to imply L1-convergence:
- <math>
\left. \begin{array}{ccc} X_n\xrightarrow{a.s.} X \\ \\ |X_n| < Y \\ \\ \mathrm{E}(Y) < \infty \end{array}\right\} \quad\Rightarrow \quad X_n\xrightarrow{L^1} X </math>
See also
- Continuous stochastic process: the question of continuity of a stochastic process is essentially a question of convergence, and many of the same concepts and relationships used above apply to the continuity question.
External links
References
- G.R. Grimmett and D.R. Stirzaker (1992). Probability and Random Processes, 2nd Edition. Clarendon Press, Oxford, pp 271--285. ISBN 0-19-853665-8.
- M. Jacobsen (1992). Videregående Sandsynlighedsregning (Advanced Probability Theory) 3rd Edition. HCØ-tryk, Copenhagen, pp 18--20. ISBN 87-91180-71-6.
de:Konvergenz (Stochastik) it:Convergenza di variabili casuali he:התכנסות (סטטיסטיקה)