Probability Theory — Chapters 21–29
- Get link
- X
- Other Apps
Probability Theory — Chapters 21–29
Chapter 21 — Large Deviations
21.1 Exponential scales
Large deviation theory studies probabilities that decay exponentially in a scale parameter. Instead of asking whether Xn⇒X, it asks for asymptotics of the form
P(Xn∈A)≈e−ninfx∈AI(x).The scale n is not always the sample size; it may be t, 1/ε, n2, volume, inverse noise, or another problem-specific growth parameter. The object I is the rate function. Events with small I are less expensive; events with large I are exponentially suppressed.
The exponential scale is invisible to the central limit theorem. A CLT sees fluctuations of size n, while large deviations often concern deviations of size n. For iid Xi with mean μ, the event
n1i=1∑nXi≈a=μusually has probability e−nI(a), not Gaussian-scale probability. The correct question is not “does the probability go to zero?” but “what exponential cost does the deviation pay?”
21.2 Chernoff bounds
Chernoff bounds use exponential moments to estimate tail probabilities. For any t>0,
P(X≥a)=P(etX≥eta)≤e−taE[etX].Optimizing over t gives
P(X≥a)≤t>0infe−taE[etX].For sums of independent variables, moment generating functions factor:
Eet∑iXi=i∏EetXi.Chernoff bounds are the finite-n ancestor of large deviation theory. They convert an event constraint into an exponential tilting problem. The bound is often sharp at exponential scale because the optimizing t defines a tilted probability measure under which the rare event becomes typical. The hidden debt is existence of the moment generating function in the needed range.
21.3 Cramér’s theorem
Cramér’s theorem is the fundamental large deviation theorem for iid sample means. Let X1,X2,… be iid real variables and define
Sn=n1i=1∑nXi.Let
Λ(t)=logE[etX1]be the log-moment generating function. Under standard hypotheses, Sn satisfies a large deviation principle with rate function
I(x)=t∈Rsup{tx−Λ(t)}.Thus for suitable sets A,
P(Sn∈A)≈e−ninfx∈AI(x).Cramér’s theorem says empirical means deviate from their expectation with exponential cost determined by the Legendre transform of Λ. The mean μ=EX is the zero-cost point: I(μ)=0. Away from μ, I(x)>0 when the law is nondegenerate. The theorem gives the correct rare-event regime for averages, not merely the CLT regime.
21.4 Rate functions
A rate function I:S→[0,∞] assigns exponential cost to states. It is usually required to be lower semicontinuous. A good rate function has compact level sets:
{x:I(x)≤c}compact for every c<∞. Goodness is a compactness condition; it prevents minimizing sequences from escaping.
A large deviation principle has two inequalities. For closed F,
n→∞limsupn1logP(Xn∈F)≤−x∈FinfI(x),and for open G,
n→∞liminfn1logP(Xn∈G)≥−x∈GinfI(x).The closed-set bound controls upper tails; the open-set bound guarantees enough mass near feasible points. Together they define exponential asymptotics at the topological level.
21.5 Legendre transform
The Legendre transform converts a convex log-moment generating function into a rate function:
I(x)=Λ∗(x)=tsup{tx−Λ(t)}.This operation is convex duality. The parameter t is the exponential tilt; the value x is the forced mean. At differentiability points, the optimizer satisfies
Λ′(t)=x.In large deviations, the Legendre transform is not formal decoration. It is the certificate that the cheapest way to force an atypical average is to tilt the underlying distribution. The original law P is replaced by
dPdPt(x)=etx−Λ(t).Under Pt, the rare mean x becomes typical. The exponential cost of this tilt is the rate I(x).
21.6 Sanov theorem
Sanov’s theorem is the large deviation principle for empirical measures. If X1,…,Xn are iid with law μ, the empirical measure
Ln=n1i=1∑nδXisatisfies an LDP with rate function
H(ν∣μ)=⎩⎨⎧∫log(dμdν)dν,+∞,ν≪μ,otherwise.This is relative entropy, or Kullback–Leibler divergence.
Sanov says the probability that the empirical distribution resembles ν decays like
e−nH(ν∣μ).The theorem lifts large deviations from sample means to empirical laws. Cramér’s theorem follows by contraction: the empirical mean is the image of the empirical measure under ν↦∫xdν.
21.7 Varadhan lemma
Varadhan’s lemma evaluates exponential integrals under an LDP. If Xn satisfies an LDP with rate I, then under suitable continuity and integrability hypotheses,
n→∞limn1logE[enF(Xn)]=xsup{F(x)−I(x)}.This is Laplace’s principle.
The lemma says exponential integrals are dominated by the best compromise between reward F(x) and cost I(x). It is the probabilistic analogue of saddle-point asymptotics. The supremum is not necessarily attained at the typical point; exponential reweighting selects a new dominant state.
21.8 Gärtner–Ellis theorem
The Gärtner–Ellis theorem derives an LDP from convergence of scaled log-moment generating functions. If
Λn(t)=n1logE[entXn]converges to Λ(t), and Λ satisfies suitable regularity conditions, then Xn satisfies an LDP with rate
I(x)=Λ∗(x)=tsup{tx−Λ(t)}.This theorem generalizes Cramér’s theorem beyond iid sums. It is useful for dependent systems, statistical mechanics, Markov processes, and random matrices. The differentiability or exposed-point conditions are not cosmetic; without them, only partial lower bounds may hold. The moment-generating carrier may see convexified costs rather than all pathwise obstructions.
21.9 Moderate deviations
Moderate deviations sit between the CLT and large deviations. For iid centered variables with variance σ2, consider scales an satisfying
an→∞,nan→0.Then
P(σn∑i=1nXi≥anx)often behaves like a Gaussian tail at exponential scale an2:
≈e−an2x2/2.The regime is larger than CLT fluctuations but smaller than order-n deviations. Moderate deviations show where Gaussian behavior remains valid into the tail. They require more tail control than the CLT but less than full Cramér-type large deviations.
21.10 Concentration versus large deviations
Concentration inequalities give nonasymptotic probability bounds such as
P(∣X−EX∣≥t)≤2e−ct2.Large deviation theory gives asymptotic exponential rates:
P(Xn∈A)≈e−ninfAI.Both study rare events, but their outputs differ. Concentration prioritizes finite-sample bounds, often with constants. Large deviations prioritize sharp exponential asymptotics and variational structure.
The two methods overlap but should not be conflated. A concentration bound may be nonsharp at exponential scale. An LDP may not give usable finite-n estimates. Concentration is a certificate of suppression; large deviations are a certificate of exponential cost geometry.
21.11 Rare-event regimes outside CLT
The CLT cannot control probabilities far from the central scale. If
Sn=i=1∑nXi,then the CLT describes
Sn−nμ=O(n).Events like
Sn−nμ≥cnbelong to large deviations. Events like
Sn−nμ≥ann,an→∞belong to moderate deviations. Events driven by one huge summand in heavy-tailed laws may follow subexponential rather than Cramér-type asymptotics.
The audit rule is scale first, theorem second. A Gaussian approximation at the wrong scale is a carrier error. Rare-event analysis must declare whether the deviation is collective tilting, one-big-jump behavior, boundary crossing, local lattice behavior, or path-level deviation.
Chapter 22 — Filtrations and Adapted Processes
22.1 Information over time
Probability over time requires an information structure. At time t, the model should know which events are observable by then. This is encoded by a σ-algebra Ft. If s≤t, then information cannot decrease:
Fs⊆Ft.The family (Ft) is a filtration.
Without a filtration, temporal statements like “known now,” “future,” “stopping time,” “martingale,” or “strategy” are undefined. A process alone gives values; a filtration tells which values and events are available at each time. Dynamic probability is probability plus information flow.
22.2 Filtrations
A filtration on (Ω,F,P) is an increasing family of sub-σ-algebras:
(Ft)t∈T,Fs⊆Ft⊆Ffor s≤t.The index set T may be discrete, such as N, or continuous, such as [0,∞).
The filtration is the carrier for conditional expectation over time. For an integrable random variable X, E[X∣Ft] is the best estimate of X using information available at time t. Increasing t refines the information. The tower property expresses temporal consistency.
22.3 Adapted processes
A process Xt is adapted to (Ft) if
Xt is Ft-measurablefor every t. This means the value of Xt is known at time t. Adaptedness is the minimum legality condition for a process to be observable in real time.
A process can be adapted to one filtration and not another. For example, if Ft=σ(Xs:s≤t), then X is adapted to its natural filtration. If the filtration also contains future values, adaptedness may still hold but the model permits anticipatory information. That matters for stochastic integration and trading strategies.
22.4 Predictable processes
In discrete time, a process Hn is predictable if Hn is Fn−1-measurable. The value used at time n must be chosen using information from before time n. In continuous time, predictability is defined through the predictable σ-algebra, generated by left-continuous adapted processes.
Predictability is stronger than adaptedness and is essential for stochastic integration. An integrand in an Itô integral must not see the future increment it is integrating against. Predictability is the no-anticipation certificate. Without it, one can manufacture false gains or invalid integrals by using future noise.
22.5 Stopping times
A random time τ is a stopping time if
{τ≤t}∈Ftfor every t. This means that by time t, one can determine whether τ has occurred. In discrete time, equivalently {τ=n}∈Fn.
Hitting times are standard examples:
τ=inf{t:Xt∈A}.Under suitable adaptedness and path regularity, such times are stopping times. Last exit times or times defined using future behavior are usually not stopping times. The stopping-time condition is the temporal measurability gate.
22.6 Optional times
The term “optional time” is often used synonymously with stopping time in many probability texts, though in finer continuous-time theory it connects to optional σ-algebras and optional processes. Optional processes are measurable with respect to the optional σ-algebra, generated by adapted càdlàg processes.
The distinction matters in stochastic calculus. Predictable objects are known just before time t; optional objects are observable at time t. Jump processes expose the difference. At a jump time, the jump itself may be optional but not predictable. This split is central to compensators, stochastic integration, and semimartingale theory.
22.7 Natural filtration
The natural filtration of a process Xt is
FtX=σ(Xs:s≤t).It is the smallest filtration making X adapted. It represents exactly the information generated by observing the process up to time t.
Natural filtrations are minimal carriers. In applications, one often enlarges them to include external information, initial randomness, completed null sets, or independent noise. Enlargement changes what counts as a stopping time or predictable strategy. The process is not the whole model; the filtration determines the information regime.
22.8 Completed and right-continuous filtrations
A filtration is complete if F0 contains all P-null subsets of F. It is right-continuous if
Ft=s>t⋂Fs.The usual conditions in continuous-time probability require completeness and right-continuity.
These conditions are technical but load-bearing. They improve behavior of stopping times, optional projections, martingale modifications, and debut theorems. Completing adds null information; right-continuity prevents infinitesimal information gaps immediately after t. Many stochastic calculus theorems assume these conditions silently.
22.9 Information leakage errors
Information leakage occurs when a process, strategy, stopping rule, or integrand uses future data while being treated as nonanticipative. For example, defining Hn=sign(Xn+1−Xn) and using it as a trading strategy at time n is illegal because it depends on future movement.
The audit is always measurability against the correct σ-algebra. Adaptedness uses Ft; predictability often uses Ft−; stopping uses {τ≤t}∈Ft. If the information gate fails, martingale fairness, optional stopping, and stochastic integration conclusions can collapse.
Chapter 23 — Martingales
23.1 Martingales, submartingales, supermartingales
A process (Mt,Ft) is a martingale if Mt is adapted, integrable, and
E[Mt∣Fs]=Ms,s≤t.It is a submartingale if
E[Mt∣Fs]≥Ms,and a supermartingale if the inequality is reversed.
A martingale is a fair process relative to the information filtration. It does not mean the path is constant or predictable. It means conditional future expectation equals present value. Submartingales have nonnegative conditional drift; supermartingales have nonpositive conditional drift.
23.2 Examples from sums, games, likelihood ratios
If Xi are integrable martingale differences with
E[Xi∣Fi−1]=0,then
Mn=i=1∑nXiis a martingale. Fair gambling capital under fair bets is the canonical example. If the bets have nonpositive expected gain, the capital is a supermartingale.
Likelihood ratios also produce martingales. If P and Q are probability measures and Ft is increasing, the density process
Lt=dP∣FtdQ∣Ftis a P-martingale under suitable absolute continuity. This connects martingales to change of measure, statistics, and filtering.
23.3 Doob decomposition
In discrete time, any integrable adapted submartingale Xn admits a decomposition
Xn=Mn+An,where Mn is a martingale and An is a predictable increasing process with A0=0. For a general adapted integrable process, one can decompose into martingale plus predictable drift increments.
The decomposition separates fair fluctuation from systematic drift. For a submartingale, the drift part is increasing. This is the prototype for compensators in continuous time, where counting processes are decomposed into martingale plus predictable intensity.
23.4 Martingale transforms
In discrete time, if Mn is a martingale with increments ΔMn=Mn−Mn−1, and Hn is predictable, then
(H⋅M)n=k=1∑nHkΔMkis a martingale under integrability conditions. This is the martingale transform.
The predictability of Hk is essential. It means the multiplier is chosen before observing ΔMk. Martingale transforms model fair betting strategies and form the discrete ancestor of stochastic integrals. If Hk can depend on future increments, the martingale property can fail.
23.5 Optional stopping theorem
The optional stopping theorem gives conditions under which
E[Mτ]=E[M0]for a martingale M and stopping time τ. Sufficient conditions include bounded τ, uniformly integrable stopped martingales, or suitable integrability and bounded increment conditions.
The theorem is not the false slogan “you cannot beat a martingale by stopping.” Without hypotheses, optional stopping can fail. Stopping times with infinite expectation, unbounded gains, or non-uniform integrability can create invalid expectations. The theorem is a gated result, not a universal gambling principle.
23.6 Optional sampling theorem
Optional sampling generalizes optional stopping to two stopping times σ≤τ:
E[Mτ∣Fσ]=Mσunder suitable hypotheses. For submartingales,
E[Xτ∣Fσ]≥Xσ.This theorem says martingale fairness persists between random observation times, provided those times are legitimate stopping times and integrability conditions hold. It is central for hitting probabilities, stopped processes, stochastic calculus, and Markov process arguments.
23.7 Maximal inequalities
Doob’s maximal inequality controls the running maximum of a submartingale. For a nonnegative submartingale Xk,
λP(k≤nmaxXk≥λ)≤E[Xn].For martingales, applying this to ∣Mk∣ or ∣Mk∣p gives pathwise maximum bounds.
Maximal inequalities convert terminal integrability into pathwise control over all earlier times. This is the martingale analogue of Kolmogorov’s inequality. It is indispensable for convergence theorems, stopping arguments, and tightness of stochastic processes.
23.8 Doob’s Lᵖ inequalities
For p>1 and a martingale Mk,
k≤nmax∣Mk∣p≤p−1p∥Mn∥p.Equivalently,
E[k≤nmax∣Mk∣p]≤(p−1p)pE∣Mn∣p.This inequality gives Lp control of the entire path from Lp control of the endpoint. It is one of the main reasons martingales are analytically tractable. The condition p>1 matters; p=1 has weaker forms and requires different handling.
23.9 Upcrossing inequality
For a submartingale Xn, the upcrossing inequality bounds the number of completed crossings of an interval [a,b]. If Un[a,b] is the number of upcrossings by time n, then
(b−a)E[Un[a,b]]≤E[(Xn−a)+]−E[(X0−a)+].Upcrossings measure oscillation. If infinitely many upcrossings occur between rational levels a<b, the sequence cannot converge. The inequality is therefore a convergence engine: boundedness in L1 controls oscillation, forcing almost sure convergence of submartingales under suitable assumptions.
23.10 Martingale convergence theorem
If Mn is a martingale bounded in L1 in the right way, such as a nonnegative supermartingale or an Lp-bounded martingale for p>1, then Mn converges almost surely to a limit M∞. Under uniform integrability,
Mn=E[M∞∣Fn]and convergence also holds in L1.
The theorem is a central pathwise compactness result. Martingales do not have deterministic monotonicity, but their conditional fairness plus integrability prevents endless profitable oscillation. Upcrossing control is the core mechanism.
23.11 Uniform integrability of martingales
A martingale (Mn) is uniformly integrable if
K→∞limnsupE[∣Mn∣1{∣Mn∣>K}]=0.Uniform integrability is exactly the condition that allows L1 terminal representation:
Mn=E[M∞∣Fn].Boundedness in Lp for p>1 implies uniform integrability. Without uniform integrability, a martingale can converge almost surely but lose mass in expectation. This is the martingale version of the general expectation convergence problem.
23.12 Reverse martingales
A reverse martingale is adapted to a decreasing sequence of σ-algebras Gn, with
E[Xn∣Gn+1]=Xn+1when Gn+1⊆Gn. A standard example is
Xn=E[X∣Gn].Reverse martingales converge almost surely and in L1 under integrability. They are used in exchangeability, de Finetti theory, ergodic theory, and tail σ-algebra arguments. They describe what remains when information is progressively forgotten rather than accumulated.
23.13 Martingale difference sequences
A martingale difference sequence satisfies
E[Dn∣Fn−1]=0.Then
Mn=k=1∑nDkis a martingale. Martingale differences generalize independent centered variables; they allow dependence while preserving zero conditional drift.
Many limit theorems and concentration inequalities for independent sums extend to martingale differences with conditional variance and boundedness assumptions. The adapted conditional-mean-zero property is the replacement certificate for independence.
Chapter 24 — Martingale Limit and Concentration Tools
24.1 Azuma–Hoeffding inequality
If Mn is a martingale with bounded increments
∣Mk−Mk−1∣≤ck,then
P(Mn−M0≥t)≤exp(−2∑k=1nck2t2).A two-sided version gives
P(∣Mn−M0∣≥t)≤2exp(−2∑ck2t2).Azuma–Hoeffding is a concentration theorem for bounded martingale differences. It converts conditional fairness plus bounded step size into subgaussian tails. It is especially useful when independence is absent but exposure martingales can be constructed.
24.2 Freedman inequality
Freedman’s inequality refines Azuma by using conditional variance. If Mn has increments bounded above by b, and predictable quadratic variation
Vn=k=1∑nE[(ΔMk)2∣Fk−1],then
P(Mn≥t, Vn≤v)≤exp(−2(v+bt/3)t2).This inequality is stronger when actual variance is much smaller than the worst-case squared increment bound. It is the martingale analogue of Bernstein concentration. It separates two debts: fluctuation variance v and maximum jump b.
24.3 Burkholder–Davis–Gundy inequalities
The BDG inequalities relate moments of a martingale’s maximum to moments of its quadratic variation. In continuous time, for p>0,
E[t≤Tsup∣Mt∣p]≍pE[[M]Tp/2]for suitable martingales. Here [M]T is quadratic variation.
BDG is a deep path-control theorem. It says the size of a martingale path is governed by accumulated quadratic variation. This is foundational for stochastic integration, SDE estimates, tightness, and regularity of martingale-driven processes.
24.4 Exponential martingales
Exponential martingales are built by exponentiating a martingale and subtracting compensating drift. For Brownian motion Bt,
exp(θBt−21θ2t)is a martingale. In discrete time, exponential supermartingales underlie Chernoff, Hoeffding, Bernstein, and Freedman bounds.
The compensation term is the log-moment correction. Exponential martingales turn tail events into optional stopping or maximal inequality problems. They are also the mechanism behind Girsanov change of measure: exponential martingales become Radon–Nikodym densities.
24.5 Stopping and localization
Localization handles processes that are well-behaved only up to random times. A process Mt is a local martingale if there exists an increasing sequence of stopping times τn↑∞ such that
Mt∧τnis a martingale for each n.
Localization permits stochastic calculus for unbounded or explosive processes by proving statements on bounded stopped intervals and then passing to the limit. The liftback requires care: local martingales need not be true martingales, and expectation identities may fail without integrability or uniform integrability.
24.6 Quadratic variation
Quadratic variation measures accumulated squared increments. For a continuous martingale M,
[M]tis the increasing process such that
Mt2−[M]tis a martingale. For Brownian motion,
[B]t=t.Quadratic variation is the second-order clock of martingale fluctuation. In stochastic calculus, it is why Itô’s formula has a second derivative term. Ordinary differentiable paths have zero quadratic variation; Brownian paths have finite nonzero quadratic variation. This is the analytic signature of stochastic motion.
24.7 Martingale CLT
A martingale central limit theorem gives Gaussian limits for martingale difference arrays. If Dn,k are martingale differences and the conditional variance sums converge:
k∑E[Dn,k2∣Fn,k−1]→σ2in probability, and a conditional Lindeberg condition holds, then
k∑Dn,k⇒N(0,σ2).The theorem replaces independence with conditional mean-zero plus conditional variance stabilization. It is crucial in dependent asymptotics, stochastic approximation, statistics, Markov chains, and random environments. The conditional variance process is the carrier of limiting Gaussian scale.
24.8 Concentration via bounded differences
If a function f(X1,…,Xn) changes by at most ci when only coordinate i is changed, and the Xi are independent, define the Doob martingale
Mi=E[f(X1,…,Xn)∣X1,…,Xi].Its increments are bounded by ci, so Azuma–Hoeffding yields
P(f−Ef≥t)≤exp(−∑ici22t2)up to convention-dependent constants.
This is McDiarmid’s bounded differences inequality. It converts low sensitivity to individual inputs into concentration. The martingale is an exposure process: reveal inputs one at a time and track the conditional expectation.
24.9 Algorithmic and combinatorial applications
Martingale concentration applies to random graphs, randomized algorithms, hashing, load balancing, random permutations, and combinatorial optimization. The typical method is to expose random choices sequentially, define a Doob martingale, bound the effect of each exposure, and apply Azuma, Freedman, or a refined inequality.
The key is not independence alone but controlled sensitivity under information revelation. If a single input can change the output drastically, bounded differences fail. If conditional variances are small, Freedman can improve bounds. The proof engine is filtration design plus increment audit.
Chapter 25 — Random Sequences and Processes
25.1 Process as random element in path space
A stochastic process (Xt)t∈T can be viewed as a random element
X:Ω→ST,X(ω)=(Xt(ω))t∈T.The path space ST carries a σ-algebra, often generated by coordinate maps. Under stronger regularity, the law may live on C[0,T], D[0,T], or another structured path space.
This viewpoint treats the whole trajectory as one random object. Finite-dimensional distributions become projections of the path law. Path properties such as continuity, bounded variation, hitting times, and oscillation become events in path space, requiring measurability and often regularity certificates.
25.2 Finite-dimensional distributions
The finite-dimensional distributions of a process are the laws of
(Xt1,…,Xtk)for finite choices t1,…,tk. These distributions describe all finite coordinate observations. They are necessary data for the process law.
For many process constructions, one first specifies finite-dimensional distributions and then checks consistency. But finite-dimensional distributions alone do not determine path regularity unless the path-space carrier is fixed. A process with continuous paths and a process with wild paths may share finite-dimensional distributions unless modifications are controlled.
25.3 Consistency conditions
Finite-dimensional distributions μt1,…,tk must be compatible under permutation and marginalization. If one forgets a coordinate, the larger distribution must project to the smaller one:
(π)∗μt1,…,tk=μs1,…,sm.They must also be invariant under relabeling of repeated or reordered indices.
Consistency is the finite-data certificate for a global process law. Without it, no process can have those finite-dimensional distributions. With it, extension theorems can build a measure on a product path space under suitable assumptions.
25.4 Kolmogorov extension theorem
Kolmogorov’s extension theorem says that a consistent family of finite-dimensional distributions on suitable state spaces defines a probability measure on the infinite product space ST. The coordinate process on that space then has the prescribed finite-dimensional laws.
The theorem provides existence of a process as a product-space random element. It does not guarantee continuous or measurable sample paths beyond the product σ-algebra. Regularity is a separate payload. This distinction is essential for Brownian motion: extension gives a process with Gaussian finite-dimensional distributions; continuity requires additional work.
25.5 Sample-path regularity
Sample-path regularity concerns properties of t↦Xt(ω): continuity, càdlàg behavior, bounded variation, Hölder regularity, differentiability, or jump structure. These are not determined merely by one-time marginal laws.
Kolmogorov’s continuity criterion is a standard regularity tool. If
E∣Xt−Xs∣α≤C∣t−s∣1+βfor suitable α,β>0, then the process has a continuous modification with Hölder-type regularity below a threshold. Moment bounds on increments become path regularity certificates.
25.6 Separability
Separability reduces uncountable-time behavior to countable dense-time behavior. A process is separable, roughly, if its path properties can be determined by values on a countable dense subset, up to null sets. This prevents uncountable measurability traps.
In continuous-time probability, events like
t∈[0,T]supXt>amay not be measurable without path regularity or separability. If paths are continuous or càdlàg, the supremum over an interval can be reduced to a supremum over rational times. This is why path-space regularity and separability are not optional technicalities.
25.7 Modification and indistinguishability
Two processes Xt and Yt are modifications of each other if for every fixed t,
P(Xt=Yt)=1.They are indistinguishable if
P(Xt=Yt for all t)=1.Indistinguishability is stronger, especially when T is uncountable.
A fixed-time null set may depend on t, and uncountably many null sets cannot be unioned safely. If processes have continuous paths, modification often implies indistinguishability under suitable conditions because equality on a countable dense set extends by continuity. Without regularity, the implication can fail.
25.8 Stationarity
A process is stationary if its finite-dimensional distributions are invariant under time shifts:
(Xt1+h,…,Xtk+h)=d(Xt1,…,Xtk)whenever the shifted indices are valid. Strict stationarity is full finite-dimensional invariance. Weak stationarity usually means constant mean and covariance depending only on time lag.
Stationarity is symmetry in time. It does not imply independence or ergodicity. A process may have the same distribution at all times while retaining long-range dependence or latent random environment. Stationarity supplies invariant law; ergodicity determines whether time averages collapse to constants.
25.9 Ergodicity
A stationary process is ergodic if shift-invariant events have probability 0 or 1. Equivalently, time averages converge to deterministic expectations rather than random invariant components. For an ergodic stationary sequence,
n1k=1∑nf(Xk)→E[f(X1)]almost surely under integrability.
Without ergodicity, the limit is conditional on the invariant σ-algebra:
n1k=1∑nf(Xk)→E[f(X1)∣I].Thus ergodicity is the gate that turns ensemble averages into time averages.
25.10 Mixing
Mixing is asymptotic independence between distant time regions. A strong mixing condition may take the form
α(n)=A∈F−∞0, B∈Fn∞sup∣P(A∩B)−P(A)P(B)∣→0.Different mixing notions measure different strengths of dependence decay.
Mixing usually implies ergodicity, but it is stronger. It supports CLTs, concentration, empirical process results, and statistical consistency for dependent data. The rate of mixing matters: summable mixing coefficients give stronger limit theorems than mere convergence to zero.
25.11 Process-level carrier debt
A stochastic process claim must declare its path carrier, filtration, finite-dimensional laws, regularity, modification class, and convergence topology. A statement about Xt at each fixed t is weaker than a statement about the whole path. A statement about finite-dimensional convergence is weaker than weak convergence in path space.
Common process-level errors include treating modifications as indistinguishable, assuming sample continuity from Gaussian marginals, using stopping times without a filtration, or applying continuous mapping arguments in the wrong path topology. Process probability is not coordinate probability repeated; it is probability on trajectories.
Chapter 26 — Markov Chains
26.1 Markov property
A discrete-time process Xn is Markov if, conditional on the present, the future is independent of the past:
P(Xn+1∈A∣X0,…,Xn)=P(Xn+1∈A∣Xn).For time-homogeneous chains, this conditional law depends only on Xn, not on n.
The Markov property is a conditional independence statement. It does not say the variables are independent; dependence passes through the current state. The present state is a sufficient statistic for future prediction. This is the carrier principle behind transition kernels and semigroups.
26.2 Transition kernels
A transition kernel P(x,A) gives the probability of moving from state x to a measurable set A. For countable spaces, this is a matrix
Pij=P(Xn+1=j∣Xn=i).The n-step transition kernel is obtained by composition:
Pn(x,A)=P(Xn∈A∣X0=x).Kernels are probability-valued functions of the current state. They define the chain law once an initial distribution μ is specified. The finite-dimensional law is
μ(dx0)P(x0,dx1)⋯P(xn−1,dxn).This factorization is the Markov certificate.
26.3 Finite-state chains
For a finite state space, the transition kernel is a stochastic matrix P with rows summing to one. If the distribution at time n is row vector μn, then
μn+1=μnP,μn=μ0Pn.Finite chains reduce many questions to matrix analysis. Stationary distributions solve
π=πP.Long-term behavior depends on irreducibility, periodicity, and spectral structure. Finite-state chains are the cleanest setting for mixing time and convergence to equilibrium.
26.4 Countable-state chains
Countable-state chains require more classification than finite chains because recurrence, transience, and invariant measures become subtle. Transition probabilities still form a stochastic matrix, but stationary distributions may fail to exist or may not be normalizable.
A simple random walk on Z is recurrent but has no probability stationary distribution. A biased random walk on Z is transient. Countable chains force separation between invariant measures, stationary probabilities, recurrence classes, and positive recurrence.
26.5 Communicating classes
State i reaches state j, written i→j, if
Pn(i,j)>0for some n. States communicate if i→j and j→i. Communication is an equivalence relation; equivalence classes are communicating classes.
A class is closed if the chain cannot leave it once entered. Irreducibility means all states communicate. Long-run behavior is analyzed class by class. Transient states may lead into closed recurrent classes; closed classes trap the chain.
26.6 Recurrence and transience
A state i is recurrent if, starting from i, the chain returns to i with probability one. It is transient if the return probability is less than one. Equivalently, recurrence can be characterized by
n=0∑∞Pn(i,i)=∞,while transience corresponds to finiteness of this sum.
In irreducible chains, recurrence or transience is a class property. Positive recurrence means expected return time is finite; null recurrence means return occurs almost surely but with infinite expected return time. Stationary probability distributions exist for irreducible positive recurrent chains.
26.7 Stationary distributions
A stationary distribution π satisfies
πP=π.If X0∼π, then Xn∼π for all n. Stationarity is distributional invariance under the transition kernel.
For finite irreducible chains, a stationary distribution exists and is unique. For countable chains, existence of a probability stationary distribution corresponds to positive recurrence under irreducibility. Stationarity does not automatically mean convergence to stationarity; periodicity can obstruct convergence.
26.8 Reversibility
A chain with stationary distribution π is reversible if
πiPij=πjPjifor all states i,j. This detailed balance condition says the stationary flow from i to j equals the flow from j to i.
Reversibility implies stationarity by summing over i:
i∑πiPij=i∑πjPji=πj.Reversible chains are analytically tractable because the transition operator is self-adjoint in L2(π). This connects Markov chains to spectral theory.
26.9 Detailed balance
Detailed balance is the local equation
π(x)P(x,dy)=π(y)P(y,dx)in general-state notation. It is stronger than stationarity, which only requires global balance:
∫π(dx)P(x,A)=π(A).Detailed balance is widely used to construct Markov chains with desired stationary distribution. In MCMC, Metropolis–Hastings transition rules are engineered to satisfy detailed balance. The cost is that reversible chains may be slower than nonreversible alternatives; reversibility is a certificate, not always an optimal design.
26.10 Hitting times
For a set A, the hitting time is
τA=inf{n≥0:Xn∈A}.Hitting probabilities solve harmonic equations. For h(i)=Pi(τA<τB),
h(i)=j∑Pijh(j)on states outside A∪B, with boundary values h=1 on A, h=0 on B.
Expected hitting times solve Poisson equations. If g(i)=Ei[τA], then
g(i)=1+j∑Pijg(j)outside A, with g=0 on A. Hitting times connect Markov chains to potential theory.
26.11 Absorbing chains
An absorbing state a satisfies Paa=1. Once entered, it cannot be left. More generally, a closed class acts as an absorbing region. Absorbing chains are analyzed by separating transient and absorbing states.
For finite absorbing chains with transient submatrix Q, the fundamental matrix is
N=(I−Q)−1=n=0∑∞Qn.The entry Nij gives the expected number of visits to transient state j starting from i. Absorption probabilities and expected absorption times follow from N.
26.12 Mixing time
For an irreducible aperiodic finite chain with stationary distribution π, the mixing time measures how long until the law is close to π. In total variation,
tmix(ε)=min{t:xsup∥Pt(x,⋅)−π∥TV≤ε}.Mixing time quantifies convergence to equilibrium. It depends on bottlenecks, spectral gap, conductance, coupling, and geometry of the state space. Stationarity alone does not give speed. Mixing analysis is about finite-time convergence, not merely limiting invariance.
26.13 Coupling methods
A coupling of two chains constructs (Xn,Yn) on one probability space with each marginal following the chain. If they meet and stay together after time T, then
∥Pn(x,⋅)−Pn(y,⋅)∥TV≤P(T>n).Coupling bounds mixing by controlling coalescence time.
Good couplings exploit contraction, monotonicity, shared randomness, or geometry. Coupling is not automatic; it is a designed joint carrier. The success of the method depends on how quickly the coupled chains meet under the chosen construction.
26.14 Spectral gap
For a reversible finite chain, the spectral gap is
γ=1−λ2where λ2 is the second-largest eigenvalue in appropriate ordering. A positive spectral gap implies exponential convergence to equilibrium. Roughly,
∥Pn(x,⋅)−π∥TVdecays at a rate controlled by (1−γ)n, with constants depending on π.
The spectral gap measures relaxation speed in L2(π). It is connected to Poincaré inequalities, conductance, and geometry. Small spectral gap indicates bottlenecks or slow modes. Spectral methods are powerful but usually require reversibility or operator control.
26.15 Markov chain Monte Carlo preview
MCMC constructs a Markov chain whose stationary distribution is a target law π. After running the chain, samples are used to estimate
∫fdπ.Metropolis–Hastings, Gibbs sampling, and Langevin algorithms are standard examples.
The validity of MCMC requires more than stationarity. One needs convergence to stationarity, adequate mixing, diagnostic control, and error estimates. A chain that has π as invariant law but mixes slowly may be computationally useless. The MCMC audit is invariant distribution plus convergence rate plus sampling error.
Chapter 27 — Markov Processes
27.1 Markov kernels on general spaces
On a measurable state space (S,S), a Markov kernel P(x,A) satisfies: for each x, A↦P(x,A) is a probability measure, and for each A, x↦P(x,A) is measurable. It defines transition probabilities from states to measurable sets.
General kernels replace matrices. The Markov property becomes
P(Xt+s∈A∣Ft)=Ps(Xt,A)for a transition family Ps. The measurable-space structure is essential; without it, the transition probability as a function of state may not be well-defined.
27.2 Semigroups
A Markov semigroup (Pt)t≥0 acts on functions by
Ptf(x)=Ex[f(Xt)].It satisfies
P0=I,Pt+s=PtPs.The second identity is the Chapman–Kolmogorov equation and encodes the Markov property.
Semigroups are operator carriers for Markov processes. Instead of tracking paths directly, one studies how expectations evolve. Semigroup methods connect probability to PDE, functional analysis, spectral theory, and generators.
27.3 Strong Markov property
The strong Markov property says the Markov property holds at stopping times. If τ is a stopping time, then conditional on Fτ, the post-τ process behaves like a fresh process started from Xτ:
E[f(Xτ+t)∣Fτ]=Ptf(Xτ).This is stronger than the deterministic-time Markov property. It is essential for hitting times, excursions, renewal arguments, boundary problems, and potential theory. The stopping-time condition is load-bearing; arbitrary random times do not generally permit Markov restart.
27.4 Feller processes
A Feller semigroup acts on C0(S), the continuous functions vanishing at infinity, and satisfies
Pt:C0(S)→C0(S),t↓0lim∥Ptf−f∥∞=0.A Markov process with such a semigroup is called a Feller process.
Feller structure connects stochastic processes to analytic generators. It gives regular dependence on initial conditions and supports construction through semigroup theory. Many diffusions, Lévy processes, and jump processes are Feller under suitable assumptions.
27.5 Generators
The generator A of a semigroup is
Af=t↓0limtPtf−ffor functions f where the limit exists. It describes infinitesimal evolution. For a diffusion,
Af=b(x)⋅∇f(x)+21Tr(a(x)∇2f(x)).The generator is the local analytic fingerprint of a Markov process. It determines PDEs, martingale problems, transition behavior, and invariant measures. The domain of A is part of the data; writing a differential expression without specifying domain can be incomplete.
27.6 Resolvents
The resolvent of a semigroup is
Rλf=∫0∞e−λtPtfdt,λ>0.It satisfies
Rλ=(λI−A)−1formally, when A is the generator.
Resolvents encode potential behavior: expected discounted occupation times. They are useful for constructing processes, solving boundary value problems, and analyzing recurrence. In Markov process theory, semigroup, generator, and resolvent are three equivalent analytic views when the carrier is well-behaved.
27.7 Invariant measures
A probability measure π is invariant if
∫Pt(x,A)π(dx)=π(A)for all t and measurable A. Equivalently,
∫Ptfdπ=∫fdπ.If X0∼π, then Xt∼π for all t.
Invariant measures describe equilibrium laws. Existence, uniqueness, and convergence to equilibrium require additional recurrence, irreducibility, Lyapunov, or compactness conditions. In continuous spaces, invariant measures may exist but convergence may fail or depend strongly on topology.
27.8 Harris recurrence
Harris recurrence is a recurrence notion for general-state Markov processes or chains. Roughly, a process is Harris recurrent if it visits every set of positive measure infinitely often with probability one, relative to an irreducibility measure.
This theory replaces communicating classes from countable chains. It provides the correct framework for ergodic theorems, invariant probability measures, and convergence in total variation on general spaces. Petite sets, minorization, and Lyapunov functions often appear as certificates for Harris recurrence and geometric ergodicity.
27.9 Regeneration
Regeneration occurs when a process probabilistically restarts, forgetting its past. A regeneration time τ splits the path into independent or conditionally independent cycles. For regenerative processes, long-run averages reduce to renewal-reward ratios:
t→∞limt1∫0tf(Xs)ds=EτE∫0τf(Xs)dsunder suitable conditions.
Regeneration is a powerful replacement for independence. It decomposes dependent paths into iid cycles. In Markov chains, regeneration can be constructed through atoms or minorization/splitting techniques.
27.10 Continuous-time Markov chains
A continuous-time Markov chain on a countable state space has transition rates qij for i=j, with
qi=j=i∑qij.The chain waits an exponential time with rate qi in state i, then jumps to j with probability qij/qi.
The generator matrix Q has entries
Qij=qij,i=j,Qii=−qi.Transition probabilities satisfy Kolmogorov equations:
dtdPt=PtQor QPt, depending on forward/backward formulation. Explosion control is required if infinitely many jumps can occur in finite time.
27.11 Jump processes
Jump processes evolve by sudden discontinuities rather than continuous paths. A pure jump Markov process is specified by jump rates or a jump kernel. More generally, jump-diffusions combine continuous diffusion with jumps.
The generator often has the form
Af(x)=b(x)⋅∇f(x)+∫(f(y)−f(x))q(x,dy),possibly with compensation for small jumps. Jump processes require càdlàg path spaces and careful handling of predictable compensators, intensities, and explosion.
27.12 Birth-death processes
A birth-death process is a continuous-time Markov chain on N with transitions only from n to n+1 or n−1. Birth rates are λn, death rates are μn. The generator is
Af(n)=λn[f(n+1)−f(n)]+μn[f(n−1)−f(n)].Birth-death processes model queues, populations, branching-like systems, and chemical reactions. Stationary distributions, when they exist, often satisfy detailed balance:
πnλn=πn+1μn+1.Recurrence and explosion depend on the rate growth.
27.13 Martingale problems
Given an operator A, the martingale problem asks for a process Xt such that for all test functions f in the domain,
f(Xt)−f(X0)−∫0tAf(Xs)dsis a martingale. Solving the martingale problem constructs a Markov process with generator A.
This formulation avoids directly solving SDEs or transition densities. It is robust under weak convergence and is central in diffusion theory. Well-posedness means existence and uniqueness in law. If the martingale problem is well-posed, the generator determines the process law.
Chapter 28 — Brownian Motion
28.1 Construction of Brownian motion
Brownian motion Bt is a process with B0=0, independent increments, Gaussian increments
Bt−Bs∼N(0,t−s),and continuous paths. Kolmogorov extension constructs a process with the correct finite-dimensional Gaussian laws; continuity is then obtained through moment estimates and a continuity theorem.
The construction separates law specification from path regularity. Gaussian finite-dimensional distributions alone do not automatically give continuous paths. Brownian motion is the continuous modification of the Gaussian process with covariance
E[BsBt]=s∧t.28.2 Gaussian processes
A Gaussian process is a process whose every finite linear combination is Gaussian. It is determined by its mean function
m(t)=E[Xt]and covariance function
K(s,t)=Cov(Xs,Xt).Brownian motion is centered with K(s,t)=s∧t.
Gaussianity reduces finite-dimensional laws to linear algebra. Positive semidefiniteness of the covariance kernel is the consistency condition. Path properties, however, depend on metric behavior induced by
d(s,t)2=E∣Xs−Xt∣2.Covariance determines law; regularity requires additional analysis of increments.
28.3 Independent increments
Brownian motion has independent increments: for
0≤t0<t1<⋯<tn,the variables
Bt1−Bt0,…,Btn−Btn−1are independent. Each increment has variance equal to its time length.
Independent increments make Brownian motion a continuous-time analogue of random walk. The Markov property, martingale property, and Gaussian transition densities follow. Independent increments are stronger than uncorrelated increments, though for Gaussian processes uncorrelated implies independent.
28.4 Continuous paths
Brownian motion has continuous paths almost surely, but those paths are almost surely nowhere differentiable. Continuity follows from moment bounds such as
E∣Bt−Bs∣p=Cp∣t−s∣p/2and Kolmogorov continuity criteria.
The paths are rough: for small h, increments are typically of size h, so difference quotients behave like h−1/2. This roughness is why stochastic calculus is not ordinary calculus. Brownian paths have finite quadratic variation but infinite total variation on intervals.
28.5 Scaling
Brownian motion satisfies the scaling property
(Bct)t≥0=d(cBt)t≥0.This follows from Gaussian increments: Bct has variance ct, matching cBt.
Scaling is a structural symmetry. It determines hitting-time distributions, fractal path behavior, local time scaling, and invariance principles. Brownian motion is self-similar with exponent 1/2. This exponent is the source of diffusive scaling.
28.6 Reflection principle
The reflection principle states, for Brownian motion and a>0,
P(0≤s≤tsupBs≥a)=2P(Bt≥a).Paths that cross level a are reflected after their first hitting time, giving a bijective probability argument.
This principle yields distributions of maxima and hitting times. For τa=inf{t:Bt=a},
P(τa≤t)=2P(Bt≥a).Reflection uses continuity and symmetry of Brownian increments; it is not available for arbitrary martingales or jump processes without modification.
28.7 Hitting times
For Brownian motion, the hitting time of level a is
τa=inf{t≥0:Bt=a}.Its distribution satisfies
P(τa≤t)=2(1−Φ(ta))for a>0. The density is
fτa(t)=2πt3ae−a2/(2t).Hitting times connect Brownian motion to boundary value problems. Probabilities of hitting one boundary before another solve harmonic equations. Expected hitting times solve Poisson equations when finite. In unbounded domains, expectations may be infinite even when hitting occurs almost surely.
28.8 Brownian bridge
A Brownian bridge from 0 to 0 over [0,1] is Brownian motion conditioned on B1=0. It can be represented as
βt=Bt−tB1.It is a centered Gaussian process with covariance
E[βsβt]=s∧t−st.The Brownian bridge appears in empirical process limits, Kolmogorov–Smirnov statistics, conditioned diffusions, and Gaussian process theory. It is not Brownian motion because its increments are not independent; the endpoint constraint introduces global dependence.
28.9 Brownian motion in ℝᵈ
A d-dimensional Brownian motion is
Bt=(Bt(1),…,Bt(d)),where the coordinates are independent one-dimensional Brownian motions. Its transition density is
pt(x,y)=(2πt)d/21exp(−2t∣y−x∣2).The generator is
A=21Δ.Thus Brownian motion is probabilistically tied to the heat equation:
∂tu=21Δu.The dimension d changes recurrence, hitting probabilities, potential theory, and path intersection behavior.
28.10 Recurrence and transience
Brownian motion is recurrent in dimensions d=1,2 in the sense that it returns arbitrarily close to points or neighborhoods, while it is transient in dimensions d≥3. In d=1, points are hit almost surely. In d=2, points are polar in a finer sense, but neighborhoods are visited recurrently. In d≥3, Brownian motion tends to infinity.
This dimension transition reflects potential theory. The Green function changes behavior by dimension. Recurrence means infinite occupation of neighborhoods; transience means finite expected occupation. The threshold at dimension two is fundamental.
28.11 Quadratic variation
Brownian motion has quadratic variation
[B]t=t.For partitions Πn of [0,t] with mesh going to zero,
[u,v]∈Πn∑(Bv−Bu)2→tin probability, and along suitable partitions almost surely.
This property distinguishes Brownian motion from differentiable functions, whose quadratic variation is zero. Quadratic variation is why Itô calculus has correction terms. It is also the intrinsic clock of Brownian martingales.
28.12 Lévy characterization
Lévy’s characterization states that a continuous adapted process Mt with M0=0 is Brownian motion if Mt is a martingale and
Mt2−tis also a martingale. Equivalently, M is a continuous martingale with quadratic variation [M]t=t.
This theorem characterizes Brownian motion by martingale and quadratic variation properties rather than independent Gaussian increments. It is powerful in proving that transformed processes are Brownian under new measures or filtrations.
28.13 Donsker invariance principle
Donsker’s theorem says that rescaled random walks converge to Brownian motion. If Xi are iid mean zero, variance one, and
Sn(t)=n1i=1∑⌊nt⌋Xiwith interpolation, then
Sn⇒Bin a suitable path space.
This is the process-level CLT. It upgrades convergence of one-time sums to convergence of entire paths. The theorem explains Brownian motion as the universal scaling limit of many independent small-step systems. Tightness is the additional payload beyond finite-dimensional CLT.
28.14 Brownian motion as universal limit carrier
Brownian motion is universal because it is the canonical limit of centered finite-variance cumulative noise under diffusive scaling. It appears in random walks, empirical processes, martingales, queueing, statistical estimation, stochastic differential equations, and physics.
Universality has conditions. Heavy tails lead to stable Lévy processes; long-range dependence can lead to fractional Brownian motion; jumps lead to discontinuous Lévy limits; nonstandard dependence can change scaling. Brownian motion is the universal carrier only after finite-variance, weak-dependence, and diffusive-scaling debts are paid.
Chapter 29 — Stochastic Integration
29.1 Motivation
Ordinary integration fails for Brownian-driven equations because Brownian paths have infinite variation and are nowhere differentiable. Expressions like
∫0tHsdBscannot be defined as Riemann–Stieltjes integrals in the usual pathwise sense for general adapted H.
Stochastic integration builds an integral using probability, martingales, and L2 limits. The integrand must be nonanticipative, typically predictable. The integral is not just a pathwise area; it is a limit of sums where the integrand is evaluated using information before the Brownian increment.
29.2 Itô integral for simple processes
For a simple predictable process
Ht=k=0∑n−1Hk1(tk,tk+1](t),where Hk is Ftk-measurable, define
∫0THtdBt=k=0∑n−1Hk(Btk+1−Btk).Predictability ensures Hk is known before the increment.
For such integrals,
E[∫0THtdBt]=0,and the Itô isometry holds:
E(∫0THtdBt)2=E∫0THt2dt.29.3 Extension by isometry
The Itô isometry extends the integral from simple predictable processes to all predictable processes satisfying
E∫0THt2dt<∞.Choose simple Hn→H in L2(Ω×[0,T]), define
∫HdB=L2-nlim∫HndB.The isometry guarantees the definition is independent of approximating sequence. This is the measure-theoretic construction of stochastic integration: build on simple predictable objects, prove an exact norm identity, then complete the space.
29.4 Predictable integrands
An Itô integrand must be predictable or at least progressively measurable with suitable integrability, depending on formulation. Predictability encodes nonanticipation: Ht may depend on past and present information but not future Brownian increments.
This condition is structural. If one uses right-endpoint values or future-dependent integrands, different integrals arise, such as Stratonovich-type integrals or anticipative integrals requiring Malliavin calculus. Itô integration is tied to adapted prediction and martingale structure.
29.5 Local martingales
The stochastic integral
Mt=∫0tHsdBsis a martingale when H is square-integrable on finite horizons. Under local square integrability, it is a local martingale. Its quadratic variation is
[M]t=∫0tHs2ds.Local martingales are the natural output of stochastic integration. They behave like martingales up to localization times but may fail to be true martingales globally. This distinction matters for expectation identities, change of measure, and financial modeling.
29.6 Quadratic variation
For Itô integrals,
[∫0⋅HsdBs]t=∫0tHs2ds.For two integrals,
[∫HdB,∫KdB]t=∫0tHsKsds.Quadratic variation is the second-order bookkeeping of stochastic calculus. It replaces ordinary differential products. The symbolic rule is
(dBt)2=dt,dBtdt=0,(dt)2=0.This rule is mnemonic for rigorous quadratic variation calculations.
29.7 Itô formula
If Xt satisfies
dXt=btdt+σtdBtand f∈C2, then
df(Xt)=f′(Xt)dXt+21f′′(Xt)σt2dt.Equivalently,
f(Xt)=f(X0)+∫0tf′(Xs)bsds+∫0tf′(Xs)σsdBs+21∫0tf′′(Xs)σs2ds.The second derivative term is the Itô correction caused by quadratic variation. Ordinary chain rule fails because Brownian increments have square size comparable to dt. Itô formula is the fundamental calculus rule for diffusion processes.
29.8 Stochastic integration by parts
For semimartingales X,Y,
d(XtYt)=XtdYt+YtdXt+d[X,Y]t.Integrated,
XtYt=X0Y0+∫0tXsdYs+∫0tYsdXs+[X,Y]twith predictable convention refinements depending on exact formulation.
The extra quadratic covariation term is the stochastic correction. If one process has finite variation, its quadratic covariation with a continuous martingale is zero. Integration by parts is essential for exponential martingales, product processes, and SDE transformations.
29.9 Exponential martingales
For Brownian motion,
Et(θB)=exp(θBt−21θ2t)is a martingale. More generally, for suitable H,
Et(∫HdB)=exp(∫0tHsdBs−21∫0tHs2ds)is a local martingale and under conditions such as Novikov’s condition is a true martingale.
Exponential martingales are density processes for changing measure. They also yield concentration bounds and solve linear stochastic equations. The correction −21∫H2 compensates quadratic variation.
29.10 Girsanov theorem
Girsanov’s theorem describes how Brownian drift changes under an equivalent measure. If
Zt=exp(−∫0tθsdBs−21∫0tθs2ds)is a martingale and defines dQ=ZTdP, then
Bt=Bt+∫0tθsdsis Brownian motion under Q.
The theorem is a change-of-measure engine. It transforms drift into likelihood ratio. It is central in filtering, finance, diffusion theory, rare-event simulation, and stochastic control. The martingale condition on Z is the integrability gate; without it, the measure change may fail.
29.11 Martingale representation theorem
In a Brownian filtration, every square-integrable martingale Mt can be represented as
Mt=M0+∫0tHsdBsfor some predictable H with suitable square integrability. This says Brownian stochastic integrals span martingales in the Brownian information carrier.
The theorem is foundational for hedging in mathematical finance, filtering, and stochastic control. It is filtration-specific. In a larger filtration or with jump noise, representation may require additional martingale drivers. The theorem is not a universal statement about all martingales; it is a carrier theorem for Brownian filtrations.
- Get link
- X
- Other Apps
Comments
Post a Comment