integral_help_1
Hi,
I need asap. Can someone sit with it now. ( 1 Hours from now) All except 6 no
http://www.cambridge.org/9780521850155
This page intentionally left blank
M E A S U R E S , I N T E G R A L S A N D M A R T I N G A L E S
This is a concise and elementary introduction to measure and integration theory
as it is nowadays needed in many parts of analysis and probability theory. The
basic theory – measures, integrals, convergence theorems, Lp-spaces and multiple
integrals – is explored in the first part of the book. The second part then uses
the notion of martingales to develop the theory further, covering topics such as
Jacobi’s general transformation theorem, the Radon–Nikodým theorem, differenti-
ation of measures, Hardy–Littlewood maximal functions or general Fourier series.
Undergraduate calculus and an introductory course on rigorous analysis in � are
the only essential prerequisites, making this text suitable for both lecture courses
and for self-study. Numerous illustrations and exercises are included, and these
are not merely drill problems but are there to consolidate what has already been
learnt and to discover variants, sideways and extensions to the main material.
Hints and solutions will be available on the internet.
René Schilling is Professor of Stochastics at the University of Marburg.
M E A S U R E S , I N T E G R A L S
A N D M A R T I N G A L E S
R E N É L . S C H I L L I N G
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
First published in print format
ISBN-13 978-0-521-85015-5
ISBN-13 978-0-521-61525-9
ISBN-13 978-0-511-34456-5
© Cambridge University Press 2005
2005
Information on this title: www.cambridge.org/9780521850155
This publication is in copyright. Subject to statutory exception and to the provision of
relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.
ISBN-10 0-511-34456-2
ISBN-10 0-521-85015-0
ISBN-10 0-521-61525-9
Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
hardback
paperback
paperback
eBook (EBL)
eBook (EBL)
hardback
http://www.cambridge.org
http://www.cambridge.org/9780521850155
Contents
Prelude page viii
Dependence chart xi
1 Prologue 1
Problems 4
2 The pleasures of counting 5
Problems 13
3 �-algebras 15
Problems 20
4 Measures 22
Problems 28
5 Uniqueness of measures 31
Problems 35
6 Existence of measures 37
Problems 46
7 Measurable mappings 49
Problems 54
8 Measurable functions 57
Problems 65
9 Integration of positive functions 67
Problems 73
10 Integrals of measurable functions and null sets 76
Null sets and the ‘a.e.’ 80
Problems 84
v
vi Contents
11 Convergence theorems and their applications 88
Parameter-dependent integrals 91
Riemann vs. Lebesgue integration 92
Examples 98
Problems 100
12 The function spaces �p� 1 � p � � 105
Problems 116
13 Product measures and Fubini’s theorem 120
More on measurable functions 127
Distribution functions 128
Minkowski’s inequality for integrals 130
Problems 130
14 Integrals with respect to image measures 134
Convolutions 137
Problems 140
15 Integrals of images and Jacobi’s transformation rule 142
Jacobi’s transformation formula 147
Spherical coordinates and the volume of the unit ball 152
Continuous functions are dense in �p��n� 156
Regular measures 158
Problems 159
16 Uniform integrability and Vitali’s convergence theorem 163
Different forms of uniform integrability 168
Problems 173
17 Martingales 176
Problems 188
18 Martingale convergence theorems 190
Problems 200
19 The Radon–Nikodým theorem and other applications
of martingales 202
The Radon–Nikodým theorem 202
Martingale inequalities 211
The Hardy–Littlewood maximal theorem 213
Lebesgue’s differentiation theorem 218
The Calderón–Zygmund lemma 221
Problems 222
Contents vii
20 Inner product spaces 226
Problems 232
21 Hilbert space � 234
Problems 246
22 Conditional expectations in L2 248
On the structure of subspaces of L2 253
Problems 257
23 Conditional expectations in Lp 258
Classical conditional expectations 263
Separability criteria for the spaces Lp�X� �� �� 269
Problems 274
24 Orthonormal systems and their convergence behaviour 276
Orthogonal polynomials 276
The trigonometric system and Fourier series 283
The Haar system 289
The Haar wavelet 295
The Rademacher functions 299
Well-behaved orthonormal systems 302
Problems 312
Appendix A: lim inf and lim sup 313
Appendix B: Some facts from point-set topology 318
Topological spaces 319
Metric spaces 322
Normed spaces 325
Appendix C: The volume of a parallelepiped 328
Appendix D: Non-measurable sets 330
Appendix E: A summary of the Riemann integral 337
The (proper) Riemann integral 337
The fundamental theorem of integral calculus 346
Integrals and limits 351
Improper Riemann integrals 353
Further reading 360
References 364
Notation index 367
Name and subject index 371
Prelude
The purpose of this book is to give a straightforward and yet elementary intro-
duction to measure and integration theory that is within the grasp of second or
third year undergraduates. Indeed, apart from interest in the subject, the only
prerequisites for Chapters 1–13 are a course on rigorous �-�-analysis on the real
line and basic notions of linear algebra and calculus in �n. The first few chapters
form a concise (not to say minimalist) introduction to Lebesgue’s approach to
measure and integration, based on a 10-week, 30-hour lecture course for Sussex
University mathematics undergraduates. Chapters 14–24 are more advanced and
contain a selection of results from measure theory, probability theory and anal-
ysis. This material can be read linearly but it is also possible to select certain
topics; see the dependence chart on page xi. Although more challenging than the
first part, the prerequisites stay essentially the same and a reader who has worked
through and understood Chapters 1–13 will be well prepared for all that follows.
At some points, one or another concept from point-set topology will be (mostly
superficially) needed; those readers who are not familiar with the topic can look
up the basic results in Appendix B whenever the need arises.
Each chapter is followed by a section of Problems. They are not just drill
exercises but contain variants, excursions from and extensions of the material
presented in the text. The proofs of the core material do not depend on any
of the problems and it is an exception that I refer to a problem in one of the
proofs. Nevertheless I do advise you to attempt as many problems as possible.
The material in the Appendices – on upper and lower limits, basic topology and
the Riemann integral – is primarily intended as back-up, for when you want to
look something up.
Unlike many textbooks this is not an introduction to integration for analysts
or a probabilistic measure theory. I want to reach both (future) analysts and
(future) probabilists, and to provide a foundation which will be useful for both
viii
Prelude ix
communities and for further, more specialized, studies. It goes without saying
that I have to leave out many pet choices of each discipline. On the other hand,
I try to intertwine the subjects as far as possible, resulting – mostly in the latter
part of the book – in the consequent use of the martingale machinery which gives
‘probabilistic’ proofs of ‘analytic’ results.
Measure and integration theory is often seen as an abstract and dry subject,
disliked by many students. There are several reasons for this. One of them
is certainly the fact that measure theory has traditionally been based on a thor-
ough knowledge of real analysis in one and several dimensions. Many excellent
textbooks are written for such an audience but today’s undergraduates find it
increasingly hard to follow such tracts, which are often more aptly labelled grad-
uate texts. Another reason lies within the subject: measure theory has come a
long way and is, in its modern purist form, stripped of its motivating roots. If,
for example, one starts out with the basic definition of measures, it takes unrea-
sonably long until one arrives at interesting examples of measures – the proof
of existence and uniqueness of something as basic as Lebesgue measure already
needs the full abstract machinery – and it is not easy to entertain by constantly
referring to examples made up of delta functions and artificial discrete measures.
I try to alleviate this by postulating the existence and properties of Lebesgue
measure early on, then justifying the claims as we proceed with the abstract
theory.
Technically, measure and integration theory is no more difficult than, say,
complex function theory or vector calculus. Most proofs are even shorter and have
a very clear structure. The one big exception, Carathéodory’s extension theorem,
can be safely stated without proof since an understanding of the technique is not
really needed at the beginning; we will refer to the details of it only in Chapter 14
in connection with regularity questions. The other exception is the (classical
proof of the) Radon–Nikodým theorem, but we will follow a different route in
this book and use martingales to prove this and other results.
I am grateful to all students who went to my classes, challenged me to write,
rewrite and improve this text and who drew my attention – sometimes unbe-
knownst to them – to many weaknesses. I owe a great debt to the patience and
interest of my colleagues, in particular to Niels Jacob, Nick Bingham, David
Edmunds and Alexei Tyukov who read the whole text, and to Charles Goldie and
Alex Sobolev who commented on large parts of the manuscript. Without their
encouragement and help there would be more obscure passages, blunders and
typos in the pages to follow. It is a pleasure to acknowledge the interest and skill
of the Cambridge University Press and its editor, Roger Astley, in the preparation
of this book.
x Prelude
A few words on notation before getting started. I tried to keep unusual
and special notation to a minimum. However, a few remarks are in order: �
means the natural numbers 1� 2� 3�
� and �0 �= � ∪ 0�. Positive or negative
is always understood in non-strict sense � 0 or � 0; to exclude 0, I say strictly
positive/negative. A ‘+’ as sub- or superscript refers to the positive part of a
function or the positive members of a set. Finally, a ∨ b resp. a ∧ b denote the
maximum resp. minimum of the numbers a� b ∈ �. For any other general notation
there is a comprehensive index of notation at the end of the book.
In some statements I indicate alternatives using square brackets, i.e., ‘if A [B]
… then P [Q] ’ should be read as ‘if A … then P ’ and ‘if B … then Q ’. The
end of a proof is marked by Halmos’ ‘tombstone’ symbol , and Bourbaki’s
‘dangerous bend’ symbol in the margin identifies a passage which requires
some attention.
As with every book, one cannot give all the details at every instance. On the
other hand, the less experienced reader might glide over these places without even
noticing that some extra effort is needed; for these readers – and, hopefully, not
to the annoyance of all others – I use the symbol[�] to indicate where some little
verification is appropriate.
Cross-referencing. Throughout the text chapters are numbered with arabic
numerals and appendices with capital letters. Formulae are numbered (n.k) refer-
ing to formula k from Chapter n. For theorems and the like I write n.m for
Theorem m from Chapter n. The abbreviation Tn.m is sometimes used for
Theorem n.m (with D standing for Definition, L for Lemma, P for Proposition
and C for Corollary).
D
ep
en
de
nc
e
C
ha
rt
C
ha
pt
er
s
2–
12
c
on
ta
in
c
or
e
m
at
er
ia
l
w
hi
ch
i
s
ne
ed
ed
i
n
al
l
la
te
r
ch
ap
te
rs
. P
re
re
qu
is
it
es
w
it
hi
n
C
ha
pt
er
s
13
–2
4
ar
e
sh
ow
n
by
a
r-
ro
w
s
,
da
sh
ed
a
rr
ow
s
i
nd
ic
at
e
a
m
in
or
d
ep
en
de
nc
e.
§1
4.
1–
3
Im
ag
e
m
ea
su
re
&
i
nt
eg
ra
ls
§1
4.
4–
8
C
on
vo
lu
ti
on
§1
5.
1–
4
I
nt
eg
ra
ls
of
d
ir
ec
t
im
ag
es
§1
5.
5–
15
J
ac
ob
i’
s
T
ra
ns
fo
rm
at
io
n
T
.
n
ee
d
s
p
f.
o
f
T
. 5
.1
§ 1
5.
16
–1
7
C
c
is
d
en
se
i
n
L
p
§1
5.
18
–2
0
R
eg
u-
la
ri
ty
o
f
m
ea
su
re
s
§1
3.
1–
9
P
ro
du
ct
m
ea
su
re
, F
ub
in
i
T
.
§1
3.
10
§1
3.
14
§1
3.
11
–1
3
D
is
tr
ib
ut
io
n
fu
nc
ts
.
§1
9.
22
C
al
de
ro
n-
Z
yg
m
un
d
L
em
m
a
§ 1
9.
1–
9
R
ad
on
-N
ik
od
ym
T
he
or
em
§1
6.
1–
7
U
ni
fo
rm
in
te
gr
ab
il
it
y,
V
it
al
i
§1
6.
8–
9
D
if
fe
re
nt
fo
rm
s
of
U
I
§1
7
M
ar
ti
ng
al
es
§1
8
M
ar
ti
ng
al
e
C
on
ve
rg
en
ce
§1
9.
11
–1
2
M
ar
ti
ng
al
e
in
eq
.
§1
9.
14
–1
8
M
ax
im
al
f
un
ct
io
ns
§1
9.
20
–2
1
L
eb
.
D
if
fe
re
nt
ia
ti
on
T
.
§
§
2
0
,
2
1
In
n
er
p
ro
d
u
ct
s,
H
il
b
er
t
sp
ac
e
§
22
.1
–
4
C
on
d.
E
xp
ec
ta
ti
on
i
n
L
2
§
23
.1
–1
3
C
on
d.
E
xp
ec
ta
ti
on
i
n
L
p
§
23
.1
4
–1
8
M
ar
ti
ng
al
es
&
C
on
d.
E
xp
ec
ta
ti
on
§
23
.1
9–
21
S
ep
ar
ab
il
it
y
of
L
p
§
24
.1
–1
5
O
rt
h.
p
ol
yn
om
ia
ls
,
F
ou
ri
er
s
er
ie
s
§
22
.5
S
tr
uc
tu
re
o
f
su
bs
pa
ce
s
of
L
2
§
24
.1
6–
18
H
aa
r
fu
nc
ti
on
s
§
24
.1
9–
20
H
aa
r
w
av
el
et
s
§
24
.2
1–
23
R
ad
em
ac
he
r
fn
s.
§
24
.2
4
–2
8
W
el
l-
be
ha
ve
d
O
N
S
s
§
24
.2
9
B
ro
w
ni
an
m
ot
io
n
´
´
1
Prologue
The theme of this book is the problem of how to assign a size, a content, a
probability, etc. to certain sets. In everyday life this is usually pretty straightfor-
ward; we
• count: �a� b� c� � � � � x� y� z� has 26 letters;
• take measurements: length (in one dimension), area (in two dimensions),
volume (in three dimensions) or time;
• calculate: rates of radioactive decay or the odds to win the lottery.
In each case we compare (and express the outcome) with respect to some base
unit; most of the measurements just mentioned are intuitively clear. Nevertheless,
let’s have a closer look at areas:
w
l
area = length��� × width�w�� (1.1)
An even more flexible shape than the rectangle is the triangle:
b
h area = 1
2
× base�b� × height�h�� (1.2)
1
2 R.L. Schilling
Triangles are indeed more basic than rectangles since we can represent every
rectangle, and actually any odd-shaped quadrangle, as the ‘sum’ of two non-
overlapping triangles:
(1.3)
area = area of shaded triangle + area of white triangle�
In doing so we have tacitly assumed a few things. In (1.2) we have chosen a par-
ticular base line and the corresponding height arbitrarily. But the concept of area
should not depend on such a choice and the calculation this choice entails. Indepen-
dence of the area from the way we calculate it is called well-definedness. Plainly,
b1
h1
h2
b2
h3
b3
(1.4)
area = 1
2
× h1 × b1 =
1
2
× h2 × b2 =
1
2
× h3 × b3�
Notice that (1.4) allows us to pick the most convenient method to work out the
area. In (1.3) we actually used two facts:
• the area of non-overlapping (disjoint) sets can be added, i.e.
area�A� = � area�B� =
� A ∩ B = ∅ =⇒ area�A ∪ B� = +
�
• congruent triangles have the same area, i.e. area
( )
= area
( )
.
This shows that the least we should expect from a reasonable measure � is that
it is
well-defined, takes values in �0� �
� and ��∅� = 0� (1.5)
additive, i.e. ��A ∪ B� = ��A� + ��B� whenever A ∩ B = ∅� (1.6)
The additional property that the measure �
is invariant under congruences (1.7)
Measures, Integrals and Martingales 3
turns out to be a very special property of length, area and volume, i.e. of Lebesgue
measure on �n.
The above rules allow us to measure arbitrarily odd-looking polygons using
the following recipe: dissect the polygon into non-overlapping triangles and add
their areas. But what about curved or even more complicated shapes, say,
?
Here is one possibility for the circle: inscribe a
regular 2j -gon, j ∈ �, into the circle, subdivide
it into congruent triangles, find the area of each
of these slices and then add all 2j pieces.
In the next step increase j � j + 1 by dou-
bling the number of points on the circumference
and repeat the above procedure. Eventually,
rad2π
2 j
area of circle = lim
j→�
2j × area(triangle at step j) (1.8)
Again, there are a few problems: does the limit exist? Is it admissible to subdivide
a set into arbitrarily many subsets? Is the procedure independent of the particular
subdivision? In fact, nothing would have prevented us from paving the circle
with ever smaller squares! For a reasonable notion of measure the answer to all
of these questions should be yes and the way we pave the circle should not lead
to different results, as long as our tiles are disjoint. However, finite additivity
(1.6) is not enough for this and we have to use instead
�� − additivity� area
(
·⋃
j∈�
Aj
)
= ∑
j∈�
area�Aj � (1.9)
where the notation ·⋃
j
Aj means the disjoint union of the sets Aj , i.e. the union
where the sets Aj are pairwise disjoint: Aj ∩ Ak = ∅ if j = k; a corresponding
notation is used for unions of finitely many sets.
We will see that conditions (1.5) and (1.9) lead to the notion of measure which
is powerful enough to cater for all our everyday measuring needs and for much
more. We will also see that a good notion of measure allows us to introduce
integrals, basically starting with the naı̈ve idea that the integral of a positive
4 R.L. Schilling
function should stand for the area of the set between the graph of the function
and the abscissa.
Problems
1.1. Consider the two figures below.
They seem to indicate that there is no conclusive way to exhaust an area by squares
(see the extra square in the second figure). Can that be?
1.2. Use (1.8) to find the area of a circle with radius r.
2
The pleasures of counting
Set algebra and countability play a major rôle in measure theory. In this chapter
we review briefly notation and manipulations with sets and introduce then the
notion of countability. If you are not already acquainted with set algebra, you
should verify all statements in this chapter and work through the exercises.
Throughout this chapter X and Y denote two arbitrary sets. For any two sets
A� B (which are not necessarily subsets of a common set) we write
A ∪ B = �x � x ∈ A or x ∈ B or x ∈ A and B�
A ∩ B = �x � x ∈ A and x ∈ B�
A \ B = �x � x ∈ A and x �∈ B��
in particular we write A ∪· B for the disjoint union, i.e. for A ∪ B if A ∩ B = ∅.
A ⊂ B means that A is contained in B including the possibility that A = B; to
exclude the latter we write A � B. If A ⊂ X, we set Ac �= X \A for the complement
of A (relative to X). Recall also the distributive laws for A� B� C ⊂ X
A ∩ �B ∪ C� = �A ∩ B� ∪ �A ∩ C�
A ∪ �B ∩ C� = �A ∪ B� ∩ �A ∪ C�
(2.1)
and de Morgan’s identities
�A ∩ B�c = Ac ∪ Bc
�A ∪ B�c = Ac ∩ Bc
(2.2)
which also hold for arbitrarily many sets Ai ⊂ X, i ∈ I (I stands for an arbitrary
index set),
(⋂
i∈I
Ai
)c
= ⋃
i∈I
Aci
(⋃
i∈I
Ai
)c
= ⋂
i∈I
Aci
(2.3)
5
6 R.L. Schilling
A map f � X → Y is called
injective (or one-one) ⇐⇒ f�x� = f�x′� =⇒ x = x′
surjective (or onto) ⇐⇒ f�X� �= �f�x� ∈ Y � x ∈ X� = Y
bijective ⇐⇒ f is injective and surjective
Set operations and direct images under a map f are not necessarily compatible:
indeed, we have, in general,
f�A ∪ B� = f�A� ∪ f�B�
f�A ∩ B� �= f�A� ∩ f�B�
f�A \ B� �= f�A� \ f�B�
(2.4)
Inverse images and set operations are, however, always compatible. For C� Ci�
D ⊂ Y one has
f −1
(⋃
i∈I
Ci
)
= ⋃
i∈I
f −1�Ci�
f −1
(⋂
i∈I
Ci
)
= ⋂
i∈I
f −1�Ci�
f −1�C \ D� = f −1�C� \ f −1�D�
(2.5)
If we have more information about f we can, of course, say more.
2.1 Lemma f � X → Y is injective if, and only if, f�A ∩ B� = f�A� ∩ f�B� for all
A� B ⊂ X.
Proof ‘⇒’: Since f�A ∩ B� ⊂ f�A� and f�A ∩ B� ⊂ f�B�, we have always
f�A ∩ B� ⊂f�A� ∩ f�B�. Let us check the converse inclusion ‘⊃’. If y ∈
f�A� ∩ f�B�, we have y = f�a� and y = f�b� for some a ∈ A� b ∈ B. So,
f�a� = y = f�b� and, by injectivity, a = b. This means that a = b ∈ A ∩ B,
hence y ∈ f�A ∩ B� and f�A� ∩ f�B� ⊂ f�A ∩ B� follows.
‘⇐’: Take x� x′ ∈ X with f�x� = f�x′� and set A �= �x�, B �= �x′�. Then
∅ �= f��x�� ∩ f��x′�� = f��x� ∩ �x′�� which is only possible if �x� ∩ �x′� �= ∅, i.e. if
x = x′. This shows that f is injective.
2.2 Lemma f � X → Y is injective if, and only if, f�X \ A� = f�X� \ f�A� for all
A ⊂ X.
Proof ‘⇒’ Assume that f is injective. We show first that f�x� �∈ f�A� if, and
only if, x �∈ A. Indeed, if f�x� �∈ f�A�, then x �∈ A; if x �∈ A but f�x� ∈ f�A�,
Measures, Integrals and Martingales 7
then we can find some a ∈ A such that f�a� = f�x� ∈ f�A�. Since f is injective,
x = a ∈ A and we have found a contradiction. Thus
f�X� \ f�A� = �y ∈ Y � y = f�x�� f�x� �∈ f�A��
= �y ∈ Y � y = f�x�� x �∈ A�
= f�X \ A�
‘⇐’: Let f�x� = f�x′� and assume that x �= x′. Then
f�x� ∈ f�X \ �x′�� = f�X� \ f��x′��
which cannot happen as f�x� ∈ f��x′��.
We can now start with the main topic of this chapter: counting.
2.3 Definition Two sets X� Y have the same cardinality if there exists a bijection
f � X → Y . In this case we write #X = #Y .
If there is an injection g � X → Y , we say that the cardinality of X is less than
or equal to the cardinality of Y and write #X � #Y . If #X � #Y but #X �= #Y ,
we say that X is of strictly smaller cardinality than Y and write #X < #Y (in this
case, no injection g � X → Y can be surjective).
That Definition 2.3 is indeed counting becomes clear if we choose Y = � since
in this case #X = #� or #X � #� just means that we can label each x ∈ X with
a unique tag from the set �1� 2� 3� �, i.e. we are numbering X. This particular
example is, in fact, of central importance.
2.4 Definition A set X is countable if #X � #�. If #� < #X, the set X is said
to be uncountable. The cardinality of � is called ℵ0, aleph null.
Plainly, Definition 2.4 requires that we can find for every countable set some
enumeration X = �x1� x2� x3� � which may or may not be finite (and which may
contain any xj more than once).
Caution: Some authors reserve the word countable for the situation where #X =
#� while sets where #X � #� are called at most countable or finite or countable.
This has the effect that a countable set is always infinite. We do not adopt this
convention.
8 R.L. Schilling
The following examples show that (countable) sets with infinitely many ele-
ments can behave strangely.
2.5 Examples (i) Finite sets are countable: �a� b� � z� → �1� 2� � 26� where
a ↔ 1� � z ↔ 26, is bijective and �1� 2� 3� � 26� → � is clearly an injection.
Thus
#�a� b� c� � z� = #�1� 2� 3� � 26� � #�
(ii) The even numbers are countable. This follows from the fact that the map
f � �2� 4� 6� � 2j� � → �� k �→ k
2
is an injection and even a bijection.[�]This means that there are ‘as many’ even
numbers as there are natural numbers.
(iii) The set of integers � = �0� ±1�
±2� � is countable. The counting
scheme is shown on the right (run
through in clockwise orientation starting
from 0) or, more formally,
g � k ∈ � �→
{
2k if k > 0�
2�k� + 1 if k � 0�
hence #� � #�.[�]
0–1–2
>
<
>
<
1 2
(iv) The Cartesian product � × � �= ��j� k� � j� k ∈ �� is countable. To see
this, arrange the pairs �j� k� in an array and count along the diagonals:
(5, 1)
...
(4, 1)
(3, 1)
(2, 1)
(1, 1)
(5, 2)
...
(4, 2)
(3, 2)
(2, 2)
(1, 2)
(5, 3)
...
(4, 3)
(3, 3)
(2, 3)
(1, 3)
(5, 4)
...
(4, 4)
(3, 4)
(2, 4)
(1, 4)
(5, 5) . . .
...
(4, 5) . . .
(3, 5) . . .
(2, 5) . . .
(1, 5) . . .
1 2 3 4 5
Measures, Integrals and Martingales 9
Notice that each line contains only finitely many elements, so that each diagonal
can be dealt with in finitely many steps. The map for the above counting scheme
is given by
h � �j� k� �→ �j + k��j + k − 1�
2
− k + 1 ∈ �� �j� k� ∈ � × � (2.6)
(v) The rational numbers � are countable. To see this, set Q± �= �q ∈ � �
±q > 0�. Every element m
n
∈ Q+ can be identified with at least one pair �m� n� ∈
� × �, so that
Q+ ⊂
{
1
1 �︸︷︷︸
1©
1
2 �
2
1 �︸︷︷︸
2©
1
3 �
2
2 �
3
1 �︸ ︷︷ ︸
3©
1
4 �
2
3 �
3
2 �
4
1 �︸ ︷︷ ︸
4©
}
�
in the set � � on the right we distinguish between cancelled and uncancelled
forms of a rational, i.e. 618 �
1
3 �
2
6 �
3
9 � etc. are counted whenever they appear.
The numbers k© refer to the corresponding diagonals in the counting scheme in
part (iv). This shows that we can find injections Q+
i−→ � � j−→ �×�; the set
� × � is countable, thus Q+ is countable[�] and so is Q−. Finally,
� = Q− ∪· �0� ∪· Q+ = �r1� r2� r3� � ∪· �0� ∪· �q1� q2� q3� �
and p1 �= 0� p2k �= qk� p2k+1 �= rk gives an enumeration �p1� p2� p3� � of �.
2.6 Theorem Let A1� A2� A3� be countably many countable sets. Then A =⋃
j∈� Aj is countable, i.e. countable unions of countable sets are countable.
Proof Since each Aj is countable we can find an enumeration
Aj =
(
aj�1� aj�2� � aj�k�
)
(if Aj is a finite set, we repeat the last element of the list infinitely often), so that
A = ⋃
j∈�
Aj =
(
aj�k � �j� k� ∈ � × �
)
Using Example 2.5(iv) we can relabel �� by � and (after deleting all duplicates)
we have found an enumeration.
It is not hard to see that for cardinalities ‘�’ is reflexive (#A � #A) and
transitive (#A � #B� #B � #C =⇒ #A � #C). Antisymmetry, which makes ‘�’
into a partial order relation, is less obvious. The proof of the following important
result is somewhat technical and can be left out at first reading.
2.7 Theorem (Cantor–Bernstein) Let X� Y be two sets. If #X � #Y and #Y �
#X, then #X = #Y .
10 R.L. Schilling
Proof By assumption,
#X � #Y ⇐⇒ there exists an injection f � X → Y�
#Y � #X ⇐⇒ there exists an injection g � Y → X
In order to prove #X = #Y we have to construct a bijection h � X → Y .
Step 1. Without loss of generality we may assume that Y ⊂ X. Indeed, since
g � Y → g�Y � is a bijection, we know that #Y = #g�Y � and it is enough to show
#g�Y � = #X. As g�Y � ⊂ X we can simplify things and identify g�Y � with Y ,
i.e. assume that g = id or, equivalently, Y ⊂ X.
Step 2. Let Y ⊂ X and g = id. Recursively we define
X0 �= X� � Xj+1 �= f�Xj �� Y0 �= Y� � Yj+1 �= f�Yj �
As usual, we write f j �= f � f � · · · � f︸ ︷︷ ︸
j times
and f 0 �= id. Then
f j+1�X� = f j �f�X�� f�X�⊂Y⊂ f j �Y � Y ⊂X⊂ f j �X�
�� �� ��
Xj+1 ⊂ Yj ⊂ Xj
and we can define a map h � X → Y by
h�x� �=
{
f�x� if x ∈ Xj \ Yj for some j ∈ �0�
x if x �∈ ⋃j∈�0 Xj \ Yj
Step 3. The map h is surjective: h�X� = Y . Indeed, we have by definition
h�X� = ⋃
j∈�0
f�Xj \ Yj � ∪
( ⋃
j∈�0
Xj \ Yj
)c
1 2= ⋃
j∈�0
(
f�Xj � \ f�Yj �
)∪
(
X \ Y ∪ ⋃
j∈�
Xj \ Yj
)c
= ⋃
j∈�0
(
Xj+1 \ Yj+1
)
︸ ︷︷ ︸
=� A
∪
[(
X \ Y)c ∩
( ⋃
j∈�0
Xj+1 \ Yj+1
)c]
= A ∪
�Xc ∪ Y � ∩ Ac� = A ∪
Y ∩ Ac� = Y ∩ X = Y
where we used that A = ⋃j∈�0 Xj+1 \ Yj+1 ⊂
⋃
j∈�0 Xj+1 ⊂ X1 = f�X� ⊂ Y .
Measures, Integrals and Martingales 11
Step 4. The map h is injective. To see this, let x� x′ ∈ X and h�x� = h�x′�. We
have four possibilities
(a) x� x′ ∈ Xj \ Yj for some j ∈ �0. Then f�x� = h�x� = h�x′� = f�x′� so that
x = x′ since f is injective.
(b) x� x′ �∈ ⋃j∈�0 Xj \ Yj . Then x = h�x� = h�x′� = x′.
(c) x ∈ Xj \ Yj for some j ∈ �0 and x′ �∈ Xk \ Yk for all k ∈ �0. As f�x� = h�x� =
h�x′� = x′ we see
x′ = f�x� ∈ f�Xj \ Yj �
1 2= f�Xj � \ f�Yj � = Xj+1 \ Yj+1
which is impossible, i.e. (c) cannot occur.
(d) x′ ∈ Xj\Yj for some j ∈ �0 and x �∈ Xk\Yk for all k ∈ �0. This is analogous to (c). .
Theorem 2.7 says that #X < #Y and #Y < #X cannot occur at the same time;
it does not claim that we can compare the cardinality of any two sets X and Y ,
i.e. that ‘�’ is a linear ordering. This is indeed true but its proof requires the
axiom of choice, see Hewitt and Stromberg [20, p. 19].
Not all sets are countable. The following proof goes back to G. Cantor and is
called Cantor’s diagonal method.
2.8 Theorem The interval �0� 1� is uncountable; its cardinality � �= #�0� 1� is
called the continuum.
Proof Recall that we can write each x ∈ �0� 1� as a decimal fraction, i.e. x =
0 y1y2y3 with yj ∈ �0� 1� � 9�. If x has a finite decimal representation, say
x = 0 y1y2y3 yn, yn �= 0, we replace the last digit yn by yn − 1 and fill it up with
trailing 9s. For example, 0 24 = 0 2399 . This yields a unique representation
of x by an infinite decimal expansion.
Assume that �0� 1� were countable and let �x1� x2� � be an enumeration
(containing no element more than once!). Then we can write
x1 = 0 a1�1a1�2a1�3a1�4
x2 = 0 a2�1a2�2a2�3a2�4
x3 = 0 a3�1a3�2a3�3a3�4
x4 = 0 a4�1a4�2a4�3a4�4
(2.7)
and construct a new number x �= 0 y1y2y3 ∈ �0� 1� with digits
yj �=
{
1 if aj�j = 5
5 if aj�j �= 5
(2.8)
12 R.L. Schilling
By construction, x �= xj for any xj from the list (2.7): x and xj differ at the jth
decimal. But then we have found a number x ∈ �0� 1� which is not contained
in our supposedly complete enumeration of �0� 1� and we have arrived at a
contradiction.
By �0� 1�� we denote the set of all sequences �xj �j∈� where xj ∈ �0� 1�.
2.9 Theorem We have #�0� 1�� = �.
Proof We have to assign to every sequence �xj �j∈� ⊂ �0� 1� a unique number
x ∈ �0� 1� – and vice versa. For this we write, as in the proof of Theorem 2.8,
each xj as a unique infinite decimal fraction
xj = 0 aj�1aj�2aj�3aj�4 � j ∈ ��
and we organize the array �aj�k�j�k∈� into one sequence with the help of the
counting scheme of Example 2.5(iv):
x �= 0 a1�1︸︷︷︸
1©
a1�2 a2�1︸ ︷︷ ︸
2©
a1�3 a2�2 a3�1︸ ︷︷ ︸
3©
a1�4 a2�3 a3�2 a4�1︸ ︷︷ ︸
4©
(The numbers k© refer to the corresponding diagonals in the counting scheme
of Example 2.5(iv).) Since the counting scheme was bijective, this procedure
is reversible, i.e. we can start with the decimal expansion of x ∈ �0� 1� and get
a unique sequence of xj s. We have thus found a bijection between �0� 1�
�
and �0� 1�.
We write ��X� for the power set �A � A ⊂ X� which is the family of all subsets
of a given set X. For finite sets it is clear that the power set is of strictly larger
cardinality than X. This is still true for infinite sets.
2.10 Theorem For any set X we have #X < #��X�.
Proof We have to show that no injection � � X → ��X� can be surjective. Fix
such an injection and define
B �= �x ∈ X � x �∈ ��x��
(mind: ��x� is a set!). Clearly B ∈ ��X�. If � were surjective, B = ��z� for
some element z ∈ X. Then, however,
z ∈ B def⇐⇒ z �∈ ��z� ⇐⇒ z �∈ B �since ��z� = B�
which is impossible. Thus � cannot be surjective.
Measures, Integrals and Martingales 13
Problems
2.1. Let A� B� C ⊂ X be sets. Show that
(i) A \ B = A ∩ Bc;
(ii) �A \ B� \ C = A \ �B ∪ C�;
(iii) A \ �B \ C� = �A \ B� ∪ �A ∩ C�;
(iv) A \ �B ∩ C� = �A \ B� ∪ �A \ C�;
(v) A \ �B ∪ C� = �A \ B� ∩ �A \ C�.
2.2. Let A� B� C ⊂ X. The symmetric difference of A and B is defined as A � B �=
�A \ B� ∪ �B \ A�. Verify that
�A ∪ B ∪ C� \ �A ∩ B ∩ C� = �A � B� ∪ �B � C�
2.3. Prove de Morgan’s identities (2.2) and (2.3).
2.4. (i) Find examples which illustrate that f�A ∩ B� �= f�A� ∩ f�B� and f�A \ B� �=
f�A� \ f�B�. In both relations one inclusion ‘⊂’ or ‘⊃’ is always true. Which
one?
(ii) Prove (2.5).
2.5. The indicator function of a set A ⊂ X is defined by 1A�x� �=
{
1� if x ∈ A�
0� if x �∈ A
Check that
(i) 1A∩B = 1A 1B� (ii) 1A∪B = min�1A + 1B� 1��
(iii) 1A\B = 1A − 1A∩B� (iv) 1A∪B = 1A + 1B − 1A∩B�
(v) 1A∪B = max�1A� 1B�� (vi) 1A∩B = min�1A� 1B�
2.6. Let A� B� C ⊂ X and denote by A�B the symmetric difference as in Problem 2.2.
Show that
(i) 1A�B = 1A + 1B − 2 1A1B = 1A + 1B mod 2;
(ii) A��B�C� = �A�B��C;
(iii) ��X� is a commutative ring (in the usual algebraists’ sense) with ‘addition’ �
and ‘multiplication’ ∩.
[Hint: use indicator functions for (ii) and (iii).]
2.7. Let f � X → Y be a map, A ⊂ X and B ⊂ Y . Show that, in general,
f � f −1�B� � B and f −1 � f�A� � A
When does ‘=’ hold in these relations? Provide an example showing that the above
inclusions are strict.
2.8. Let f and g be two injective maps. Show that f � g, if it exists, is injective.
14 R.L. Schilling
2.9. Show that the following sets have the same cardinality as � � �m ∈ � � m is odd�� �×
�� �m �m ∈ ���⋃m∈� �m.
2.10. Use Theorem 2.7 to show that #� × � = #�.
[Hint: #� = #� × �1� and � × �1� ⊂ � × �.]
2.11. Show that if E ⊂ F we have #E � #F . In particular, subsets of countable sets are
again countable.
2.12. Show that �0� 1�� = �all infinite sequences consisting of 0 and 1� is uncountable.
[Hint: diagonal method.]
2.13. Show that the set � is uncountable and that #�0� 1� = #�.
[Hint: find a bijection f � �0� 1� → �.]
2.14. Let �Aj �j∈� be a sequence of sets of cardinality �. Show that #
⋃
j∈� Aj = �.
[Hint: map Aj bijectively onto �j − 1� j� and use that �0� 1� ⊂
⋃�
j=1�j − 1� j� ⊂ �.]
2.15. Adapt the proof of Theorem 2.8 to show that #�1� 2�� � #�0� 1� � #�0� 1�� and
conclude that #�0� 1� = #�0� 1��.
Remark. This is the reason for writing � = 2ℵ0 .
[Hint: interpret �0� 1�� as base-2 expansions of all numbers in �0� 1� while �1� 2��
are all infinite base-3 expansions lacking the digit 0.]
2.16. Extend Problem 2.15 to deduce #�0� 1� 2� � n�� = #�0� 1� for all n ∈ �.
2.17. Mimic the proof of Theorem 2.9 to show that #�0� 1�2 = �. Use the fact that
#� = #�0� 1� to conclude that #�2 = �.
2.18. Show that the set of all infinite sequences of natural numbers �� has cardinality �.
[Hint: use that #�0� 1�� = #�1� 2��� �1� 2�� ⊂ �� ⊂ �� and #�� = #�0� 1��.]
2.19. Let � �= �F ⊂ � � #F < ��. Show that #� = #�.
[Hint: embed � into
⋃
k∈� �
k or show that F �→ ∑j∈F 2j is a bijection between �
and �.]
2.20. Show – not using Theorem 2.10 – that #���� > #�. Conclude that there are more
than countably many maps f � � → �.
[Hint: diagonal method.]
2.21. If A ⊂ � we can identify the indicator function 1A � � → �0� 1� with the 0-1-sequence(
1A�j�
)
j∈�, i.e., 1A ∈ �0� 1��. Show that the map ���� � A �→ 1A ∈ �0� 1�� is a
bijection and conclude that #���� = ���.
3
�-algebras
We have seen in the prologue that a reasonable measure should be able to deal
with countable partitions. Therefore, a measure function should be defined on a
system of sets which is stable whenever we repeat any of the basic set operations –
∪� ∩� c – countably many times.
3.1 Definition A �-algebra � on a set X is a family of subsets of X with the
following properties:
X ∈ �� (�1)
A ∈ � =⇒ Ac ∈ �� (�2)
�Aj �j∈� ⊂ � =⇒
⋃
j∈�
Aj ∈ �� (�3)
A set A ∈ � is said to be (�-)measurable.
3.2 Properties (of a �-algebra) (i) ∅ ∈ �.
Indeed: ∅ = Xc ∈ � by ��1� �2�.
(ii) A� B ∈ � =⇒ A ∪ B ∈ �.
Indeed: set A1 = A� A2 = B� A3 = A4 = � � � = ∅. Then A ∪ B =
⋃
j∈� Aj ∈ �
by ��3�.
(iii) �Aj �j∈� ⊂ � =⇒
⋂
j∈� Aj ∈ �.
Indeed: if Aj ∈ �, then Acj ∈ � by ��2�, hence
⋃
j∈� Acj ∈ � by ��3� and,
again by ��2��
⋂
j∈� Aj =
(⋃
j∈� Acj
)c ∈ �.
3.3 Examples (i) ��X� is a �-algebra (the maximal �-algebra in X).
(ii) �∅� X is a �-algebra (the minimal �-algebra in X).
(iii) �∅� B� Bc� X , B ⊂ X, is a �-algebra.
(iv) �∅� B� X is no �-algebra (unless B = ∅ or B = X).
(v) �
= �A ⊂ X
#A � #� or #Ac � #� is a �-algebra.
15
Muneer
Gul seddel
this is the larges sigma in X
Muneer
Gul seddel
This is the smallest sigma in X
Muneer
Gul seddel
This is the smallest sigma algebras
Muneer
Gul seddel
X is uncountable:
A:= {a
3, (A_j)_j£N £ A => UA_j £ A
16 R.L. Schilling
Proof: Let us verify ��1�–��3�.
��1�: X
c = ∅ which is certainly countable.
��2�: if A ∈ �, either A or Ac is by definition countable, so Ac ∈ �.
��3�: if �Aj �j∈� ⊂ �, then two cases can occur:
• All Aj are countable. Then A =
⋃
j∈� Aj is a countable union of countable
sets which is, by T2.6, itself countable.
• At least one Aj0 is uncountable. Then Acj0 must be countable, so that( ⋃
j∈�
Aj
)c
= ⋂
j∈�
Acj ⊂ Acj0 �
Hence
(⋃
j∈� Aj
)c
is countable (Problem 2.11) and so
⋃
j∈� Aj ∈ �.
(vi) (trace �-algebra) Let E ⊂ X be any set and let � be some �-algebra in X.
Then
�E
= E ∩ �
= �E ∩ A
A ∈ � (3.1)
is a �-algebra in E.
(vii) (pre-image �-algebra) Let f
X → X′ be a map and let �′ be a �-algebra
in X′. Then
�
= f −1��′�
= {f −1�A′�
A′ ∈ �′}
is a �-algebra in X.
3.4 Theorem (and Definition) (i) The intersection
⋂
i∈I �i of arbitrarily many
�-algebras �i in X is again a �-algebra in X.
(ii) For every system of sets � ⊂ ��X� there exists a smallest (also: minimal,
coarsest) �-algebra containing �. This is the �-algebra generated by �, denoted
by ����, and � is called its generator.
Proof (i) We check ��1�–��3�� ��1�: since X ∈ �i for all i ∈ I , X ∈
⋂
i �i. ��2�:
if A ∈ ⋂i �i, then Ac ∈ �i for all i ∈ I , so Ac ∈
⋂
i �i� ��3�: let �Ak�k∈� ⊂
⋂
i �i.
Then Ak ∈ �i for all k ∈ � and all i ∈ I , hence
⋃
k∈� Ak ∈ �i for each i ∈ I and
so
⋃
k∈� Ak ∈
⋂
i∈I �i.
(ii) Consider the family
�
= ⋂
� �-alg.
� ⊃ �
��
Since � ⊂ ��X� and since ��X� is a �-algebra, the above intersection is non-
void. This means that the definition of � makes sense and yields, by part (i), a
�-algebra containing �. If �′ is a further �-algebra with �′ ⊃ �, then �′ would
be included in the intersection used for the definition of �, hence � ⊂ �′. In
this sense, � is the smallest �-algebra containing �.
Measures, Integrals and Martingales 17
3.5 Remarks (i) If � is a �-algebra, then � = ����.
(ii) For A ⊂ X we have ���A � = �∅� A� Ac� X .
(iii) If � ⊂ � ⊂ �, then ��� � ⊂ ���� ⊂ ���� 3.5(i)= �.
On the Euclidean space �n there is a canonical �-algebra, which is generated
by the open sets. Recall that
U ⊂ �n is open ⇐⇒ ∀ x ∈ U ∃ � > 0
B��x� ⊂ U
where B��x�
= �y ∈ �n
�x − y� < � is the open ball with centre x and radius �.
A set is closed if its complement is open. The system of open sets in X = �n,
n, has the following properties:
∅� X ∈ n� ( 1)
U� V ∈ n =⇒ U ∩ V ∈ n� ( 2)
Ui ∈ n� i ∈ I �arbitrary� =⇒
⋃
i∈I
Ui ∈ n� ( 3)
Note, however, that countable or arbitrary intersections of open sets need not be
open[�]. A family of subsets of a general space X satisfying the conditions
� 1�–� 3� is called a topology, and the pair �X� � is called a topological space;
in analogy to �n, U ∈ is said to be open while closed sets are exactly the
complements of open sets; see Appendix B.
3.6 Definition The �-algebra �� n� generated by the open sets n of �n is
called Borel �-algebra, and its members are the Borel sets or Borel measurable
sets. We write
��n� or
n for the Borel sets in �n.
The Borel sets are fundamental for the study of measures on �n. Since the
Borel �-algebra depends on the topology of �n,
��n� is often also called the
topological �-algebra.
3.7 Theorem Denote by n� �n and � n the families of open, closed and compact1
sets in �n. Then
��n� = �� n� = ���n� = ��� n��
Proof Since compact sets are closed, we have � n ⊂ �n and by Remark 3.5(iii),
��� n� ⊂ ���n�. On the other hand, if C ∈ �n, then Ck
= C ∩ Bk�0� is2 closed
and bounded, hence ∈ � n. By construction C = ⋃k∈� Ck, thus �n ⊂ ��� n� and
also ���n� ⊂ ��� n�.
1 i.e. closed and bounded.
2 Bk�0�� Bk�0� denote the open, resp., closed balls with centre 0 and radius k.
18 R.L. Schilling
Since � n�c
= �U c
U ∈ n = �n (and ��n�c = n) we have �n = � n�c ⊂
�� n�, hence ���n� ⊂ �� n� and the converse inclusion is similar.
The Borel �-algebra
��n� is generated by many different systems of sets. For
our purposes the most interesting generators are the families of open rectangles
� o = � o�n = � o��n� = ��a1� b1� × · · · × �an� bn�
aj � bj ∈ �
and (from the right) half-open rectangles
� = � n = � ��n� = ��a1� b1� × · · · × �an� bn�
aj � bj ∈ � �
We use the convention that �aj � bj � = �aj � bj � = ∅ if bj � aj and, of course,
that �a1� b1� × · · · × ∅ × · · · × �an� bn� = ∅. Sometimes we use the shorthand
��a� b�� = �a1� b1� × · · · × �an� bn� for vectors a = �a1� � � � � an�� b = �b1� � � � � bn�
from �n. Finally, we write �rat� �
o
rat for the (half-)open rectangles with only
rational endpoints. Notice that the half-open rectangles are � � �
a a
b
a
b
b
intervals in R . . . , rectangles in R2 . . . , cuboids in R3 . . . ,
and hypercubes in dimensions n > 3.
3.8 Theorem We have
��n� = ��� nrat� = ��� o�nrat � = ��� n� = ��� o�n�.
Proof We begin with open rectangles having rational endpoints. Since the open
rectangle ��a� b�� is an open set[�], we find �� n� ⊃ ��� o� ⊃ ��� orat�.
U
Bε (x)I (x)
Conversely, if U ∈ n, we have
U = ⋃
I∈� orat�I⊂U
I� (3.2)
Here ‘⊃’ is clear from the definition and
for the other direction ‘⊂’ we fix x ∈ U .
Since U is open, there is some ball
B��x� ⊂ U – see the picture – and we can
inscribe a square into B��x� and then shrink this square to get a rectangle
I = I�x� ∈ � orat containing x. Since every rectangle I is uniquely determined
Measures, Integrals and Martingales 19
by its main diagonal, there are at most #�
n ×
n� = #� many I in the union (3.2).
Thus
U ∈ n ⊂ ��� orat��
proving the other inclusion �� n� ⊂ ��� orat�, and so �� n� = ��� o� = ��� orat�.
Every half-open rectangle (with rational endpoints) can be written as
�a1� b1� × · · · × �an� bn� =
⋂
j∈�
�a1 − 1j � b1� × · · · × �an − 1j � bn��
while every open rectangle (with rational endpoints) can be represented as
�c1� d1� × · · · × �cn� dn� =
⋃
j∈�
�c1 + 1j � d1� × · · · × �cn + 1j � dn��
These formulae imply that � ⊂ ��� o� and � o ⊂ ��� � [resp. �rat ⊂ ��� orat� and
� orat ⊂ ���rat�
]
, hence by Remark 3.5(iii), ��� o� = ��� � [resp. ��� orat� = ���rat�
]
and the proof follows since we know ��� orat� = ��� o� = �� n� from the first
part.
3.9 Remark The Borel sets of the real line � are also generated by any of the
following systems
��−�� a�
a ∈
� ��−�� a�
a ∈ � �
��−�� b
b ∈
� ��−�� b
b ∈ � �
��c� ��
c ∈
� ��c� ��
c ∈ � �
��d� ��
d ∈
� ��d� ��
d ∈ � �
3.10 Remark One might think that ���� can be explicitly constructed for any
given � by adding to the family � all possible countable unions of its members
and their complements:
��c
=
{ ⋃
j∈�
Gj �
( ⋃
j∈�
Gj
)c
Gj ∈ �
}
�
But ��c is not necessarily a �-algebra.
[�] Even if we repeat this procedure
countably often, i.e.
�n
= �� � � �� �c��c � � ���c︸ ︷︷ ︸
n times
� �̂
= ⋃
n∈�
�n�
we end up, in general, with a set that is too small: �̂ � ����.3
3 A ‘constructive’ approach along these lines is nevertheless possible if we use transfinite induction, see
Hewitt and Stromberg [20, Theorem 10.23] or Appendix D.
20 R.L. Schilling
This shows that the �-operation produces a pretty big family; so big, in fact,
that no approach using countably many countable set operations will give the
whole of ����. On the other hand, it is rather typical that a �-algebra is given
through its generator. In order to deal with these cases, we need the notion of
Dynkin systems which will be introduced in Chapter 5.
Problems
3.1. Let � be a �-algebra. Show that
(i) if A1� A2� � � � � AN ∈ �, then A1 ∩ A2 ∩ � � � ∩ AN ∈ �;
(ii) A ∈ � if, and only if, Ac ∈ �;
(iii) if A� B ∈ �, then A \ B ∈ � and A � B ∈ �.
3.2. Prove the assertions made in Example 3.3 (iv), (vi) and (vii).
[Hint: use (2.5) for (vii).]
3.3. Verify the assertions made in Remark 3.5.
3.4. Let X = �0� 1
. Find the �-algebra generated by the sets
(i)
(
0� 1
2
)
;
(ii)
[
0� 1
4
)
�
(
3
4
� 1
]
;
(iii)
[
0� 3
4
]
�
[
1
4
� 1
]
.
3.5. Let A1� A2� � � � � AN be subsets of X.
(i) If the Aj are disjoint and ·
⋃
Aj = X, then #��A1� A2� � � � � AN � = 2N .
Remark. A set A in a �-algebra � is called an atom, if there is no proper
subset B � A such that B ∈ �. In this sense all Aj are atoms.
(ii) Show that ��A1� A2� � � � � AN � consists of finitely many sets.
[Hint: show that ��A1� A2� � � � � AN � has only finitely many atoms.]
3.6. Verify the properties � 1�–� 3� for open sets in �
n. Is n a �-algebra?
3.7. Find an example (e.g. in �) showing that
⋂
j∈� Uj need not be open even if all Uj
are open sets.
3.8. Prove any one of the assertions made in Remark 3.9.
3.9. Is this still true for the family �′
= �Br �x�
x ∈
n� r ∈
+ ?
[Hint: mimic the Proof of T3.8.]
3.10. Let n be the collection of open sets (topology) in �n and let A ⊂ �n be an
arbitrary subset. We can introduce a topology A on A as follows: a set V ⊂ A is
called open (relative to A) if V = U ∩ A for some U ∈ n. We write A for the
open sets relative to A.
(i) Show that A is a topology on A, i.e. a family satisfying � 1�–� 3�.
(ii) If A ∈
��n�, show that the trace �-algebra A ∩
��n� coincides with
�� A� (the latter is usually denoted by
�A�: the Borel sets relative to A).
Measures, Integrals and Martingales 21
3.11. Monotone classes. A family � ⊂ ��X� is called a monotone class if it is stable
under countable unions and countable intersections, i.e.
�Aj �j∈� ⊂ � =⇒
⋃
j∈�
Aj �
⋂
j∈�
Aj ∈ ��
(i) Mimic the proof of T3.4 to show that for every � ⊂ ��X� there is a smallest
monotone class ���� containing �.
(ii) Assume that ∅ ∈ � and that E ∈ � =⇒ Ec ∈ �. Show that the system
�
= �B ∈ ����
Bc ∈ ���� is a �-algebra.
(iii) Show that in (ii) � ⊂ � ⊂ ���� ⊂ ���� holds and conclude that ���� = ����.
3.12. Alternative characterization of
��n�. In older books the Borel sets are often
introduced as the smallest family � of sets which is stable under countable
intersections and countable unions and which contains all open sets n. The
purpose of this exercise is to verify that � =
��n�. Show that
(i) � is well-defined and � ⊂ �� n�;
(ii) U ∈ n =⇒ U c ∈ �, i.e. � contains all closed sets;
(iii) �B ∈ �
Bc ∈ � is a �-algebra;
(iv) �� n� ⊂ �B ∈ �
Bc ∈ � ⊂ �.
[Hints: (i) – mimic T3.4(ii); (ii) – every closed set F is the intersection of the
open sets Un
= F + B1/n�0�
= �y
x ∈ F� �x − y� < 1/n , n ∈ �.]
4
Measures
We are now ready to introduce one of the central concepts of measure and
integration theory: measures. As before, X is some set and � is a �-algebra on X.
4.1 Definition A (positive) measure � on X is a mapping � � � → �0� �� defined
on a �-algebra � satisfying
��∅ = 0� (M1)
and, for any countable family of pairwise disjoint sets �Aj j∈� ⊂ �,
��-additivity �
(
·⋃
j∈�
Aj
)
= ∑
j∈�
��Aj
(M2)
If �M1 � �M2 hold, but � is not a �-algebra, � is said to be a pre-measure.
Caution: �M2 requires implicitly that ·
⋃
j
Aj is again in � – this is clearly the
case for �-algebras, but needs special attention if one deals with pre-measures.
4.2 Definition Let X be a set and � be a �-algebra on X. The pair �X� � is called
measurable space. If � is a measure on X, �X� �� � is called measure space.
A finite measure is a measure with ��X < � and a probability measure is
a measure with ��X = 1. The corresponding measure spaces are called finite
measure space resp. probability space.
An exhausting sequence �Aj j∈� ⊂ � is an increasing sequence of sets A1 ⊂
A2 ⊂ A3 ⊂
such that
⋃
j∈� Aj = X. A measure � is said to be �-finite and
�X� �� � is called a �-finite measure space, if � contains an exhausting sequence
�Aj j∈� such that ��Aj < � for all j ∈ �.
Let us derive some immediate properties of (pre-)measures.
22
Measures, Integrals and Martingales 23
4.3 Proposition Let �X� �� � be a measure space and A� B ∈ �. Then
(i) A ∩ B = ∅ =⇒ ��A ∪· B = ��A + ��B ( finitely additive)
(ii) A ⊂ B =⇒ ��A � ��B (monotone)
(iii) A ⊂ B� ��A < � =⇒ ��B \ A = ��B − ��A
(iv) ��A ∪ B + ��A ∩ B = ��A + ��B (strongly additive)
(v) ��A ∪ B � ��A + ��B
(subadditive)
Proof (i) Set A1 �= A� A2 �= B� A3 = A4 =
= ∅. Then �Aj j∈� is a family of
pairwise disjoint sets from �. Moreover A ∪· B = ·⋃
j
Aj and by �M2
��A ∪· B = �
(
·⋃
j∈�
Aj
)
= ∑
j∈�
��Aj = ��A + ��B + ��∅ +
= ��A + ��B
(ii) If A ⊂ B, we have B = A ∪· �B \ A , and by (i)
��B = ��A ∪· �B \ A = ��A + ��B \ A (4.1)
� ��A
(4.2)
(iii) If A ⊂ B, we can subtract the finite number ��A from both sides of (4.1)
to get ��B − ��A = ��B \ A .
(iv) For all A� B ∈ � we have
A ∪ B = (A \ �A ∩ B ) ∪· (A ∩ B) ∪· (B \ �A ∩ B )
and using (i) twice we get
��A ∪ B = ��A \ �A ∩ B + ��A ∩ B + ��B \ �A ∩ B
Adding ��A ∩ B (which may assume the value +�) on both sides and using
again (4.1) yields
��A ∪ B + ��A ∩ B
= ��A \ �A ∩ B + ��A ∩ B + ��B \ �A ∩ B + ��A ∩ B
= ��A + ��B
(v) From (iv) we get ��A + ��B = ��A ∪ B + ��A ∩ B � ��A ∪ B for all
A� B ∈ �.
So far we have not really used the �-additivity of � in its full strength. The
next theorem shows that �-additivity is, in fact, some kind of continuity condition
for (pre-)measures.
24 R.L. Schilling
We call a sequence of sets �Aj j∈� increasing, if A1 ⊂ A2 ⊂ A3 ⊂
and we
write in this case Aj ↑ A with limit A =
⋃
j Aj . Decreasing sequences of sets are
defined accordingly and we write Aj ↓ A with limit A =
⋂
j Aj . All �-algebras
are stable under increasing or decreasing limits of their members.
4.4 Theorem Let �X� � be a measurable space. A map � � � → �0� �� is a
measure if, and only if,
(i) ��∅ = 0,
(ii) ��A ∪· B = ��A + ��B for all A� B ∈ � with A ∩ B = ∅,
(iii) (continuity of measures from below)
for any increasing sequence �Aj j∈� ⊂ � with Aj ↑ A ∈ � we have
��A = lim
j→�
��Aj
(
= sup
j∈�
��Aj
)
If ��A < � for all A ∈ �, (iii) can be replaced by either of the following
equivalent conditions
(iii′) (continuity of measures from above)
for any decreasing sequence �Aj j∈� ⊂ � with Aj ↓ A ∈ � we have
��A = lim
j→�
��Aj
(
= inf
j∈�
��Aj
)
�
(iii′′) (continuity of measures at ∅)
for any decreasing sequence �Aj j∈� ⊂ � with Aj ↓ ∅ we have
lim
j→�
��Aj = 0
4.5 Remark With some obvious rewordings, P4.3 and T4.4 are still valid for
pre-measures, i.e. for families � which are not �-algebras. Of course, one has
to make sure that ∅ ∈ � and that � is stable under finite unions, intersections
and differences of sets1 (for P4.3) and, for T4.4, that increasing and decreasing
sequences of the sets under consideration have their limits in �. The proofs are
literally the same.
Proof (of Theorem 4.4) Let us, first of all, check that every measure � enjoys
all the properties (i)–(iii) and (iii′), (iii′′). Property (i) is clear from the definition
of a measure and (ii) follows from P4.3(i). Let �Aj j∈� ⊂ � be an increasing
sequence of sets Aj ↑ A and set
B1 �= A1�
� Bj+1 �= Aj+1 \ Aj
1 Such a family is called a ring of sets.
Measures, Integrals and Martingales 25
Obviously, Bj ∈ �, the Bj are pairwise
disjoint, Ak =
⋃k
j=1 Bj and
⋃
k∈� Ak =
·⋃
j∈�Bj = A. Thus
��A = �
(
·⋃
j∈�
Bj
)
= ∑
j∈�
��Bj
= lim
k→�
k∑
j=1
��Bj
= lim
k→�
��B1 ∪
∪ Bk
= lim
k→�
��Ak
B1
B3
B2
If Aj ↓ A we see easily that �A1 \ Aj ↑ �A1 \ A as j → �. Since ��A1 < �,
the previous argument shows that
��A1 \ A = lim
j→�
��A1 \ Aj = lim
j→�
(
��A1 − ��Aj
) = ��A1 − lim
j→�
��Aj
This means that ��A1 − ��A = ��A1 − limj→� ��Aj and (iii′) follows. If we
take, in particular, A = ∅, the above calculation also proves (iii′′).
Let us now assume that (i)–(iii) hold for the set-function � � � → �0� ��. In
order to see that � is a measure, we have to check �M2 . For this take a sequence
�Bj j∈� ⊂ � of pairwise disjoint sets and define
Ak �= B1 ∪·
∪· Bk ∈ �� A �=
⋃
k∈�
Ak = ·
⋃
j∈�
Bj
(4.3)
Clearly Ak ↑ A, and using repeatedly property (ii) we get ��Ak = ��B1 + · · · +
��Bk . From (iii) we conclude
��A = lim
k→�
��Ak = lim
k→�
k∑
j=1
��Bj =
∑
j∈�
��Bj
∗ ∗ ∗
Finally assume that ��A < � for all A ∈ � and that (i), (ii) and (iii′) or (iii′′)
hold. We will show that under the finiteness assumption (iii′)⇒(iii′′)⇒(iii); the
assertion follows then from the considerations of the first part of the proof.
26 R.L. Schilling
For (iii′)⇒(iii′′) there is nothing to show. For the remaining implication take
a sequence �Bj j∈� ⊂ � of pairwise disjoint sets and define sets Ak and A as in
(4.3). Then A \ Ak ↓ ∅ and from (iii′′) we conclude that limk→� ��A\Ak = 0.
Since ��Ak < � we get ��A = limk→� ��Ak and (iii) follows.
4.6 Corollary Every measure [pre-measure] is �-subadditive, i.e.
�
( ⋃
j∈�
Aj
)
�
∑
j∈�
��Aj (4.4)
holds for all sequences �Aj j∈� ⊂ � of not necessarily disjoint sets
[
such that⋃
j∈� Aj ∈ �
]
.2
Proof Since the arguments are virtually the same, we may assume that � is a
�-algebra, so that � becomes a measure. Set Bk �= A1 ∪
∪ Ak ↑
⋃
j∈� Aj as
k → �. By T4.4(iii) and repeated applications of P4.3(v),
�
( ⋃
j∈�
Aj
)
= lim
k→�
��A1 ∪
∪ Ak
� lim
k→�
(
��A1 + · · · + ��Ak
) = ∑
j∈�
��Aj
It is about time to give some examples of measures. At this stage this is,
unfortunately, a somewhat difficult task! The main problem is that we have to
explain for every set of the �-algebra � what its measure ��A shall be. Since
� can be very large – see Remark 3.10 – this is, in general, only (explicitly!)
possible if either � or � is very simple. Nevertheless ...
4.7 Examples (i) (Dirac measure, unit mass) Let �X� � be any measurable
space and let x ∈ X be some point. Then �x � � →
0� 1�, defined for A ∈ �
by
�x�A �=
{
0 if x �∈ A�
1 if x ∈ A�
is a measure. It is called Dirac’s delta measure or unit mass at the point x.
(ii) Consider ��� � with � from Example 2.3(v) (i.e. A ∈ � if A or Ac is
countable). Then � � � →
0� 1�, defined for A ∈ � by
��A �=
{
0 if A is countable�
1 if Ac is countable�
is a measure.
2 This is automatically fulfilled for a measure on a �-algebra.
Measures, Integrals and Martingales 27
(iii) (Counting measure) Let �X� � be a measurable space. Then
�A� �=
{
#A if A is finite�
+� if A is infinite�
defines a measure. It is called counting measure.
(iv) (Discrete probability measure) Let � =
�1� �2�
� be a countable set
and �pj j∈� be a sequence of real numbers pj ∈ �0� 1� such that
∑
j∈� pj = 1.
On ��� ��� the set-function
P�A = ∑
j � �j ∈A
pj =
∑
j∈�
pj ��j �A � A ⊂ ��
defines a probability measure. The triplet ��� ��� � P is called discrete
probability space.
(v) (Trivial measures) Let �X� � be a measurable space. Then
��A �=
{
0 if A = ∅
+� if A �= ∅� and ��A �= 0� A ∈ �
are measures.
Note that our list of examples does not include the most familiar of all measures:
length, area and volume.
4.8 Definition The set-function �n on ��n� ���n that assigns every half-open
rectangle ��a� b = �a1� b1 × · · · × �an� bn ∈ � the value
�n
(
��a� b
)
�=
n∏
j=1
�bj − aj
is called n-dimensional Lebesgue measure.
The problem here is that we do not know whether �n is a measure in the sense
of Definition 4.1: �n is only explicitly given on the half-open rectangles � and it
is not obvious at all that �n is a pre-measure on � ; much less clear is the question
if and how we can extend this pre-measure from � to a proper measure on ��� .
Over the next few chapters we will see that such an extension is indeed possible.
But this requires some extra work and a more abstract approach. One of the main
obstacles is, of course, that ��� cannot be obtained by a bare-hands construction
from � .
Let us, meanwhile, note the upshot of what will be proved in the next chapters.
28 R.L. Schilling
4.9 Theorem Lebesgue measure �n exists, is a measure on the Borel sets ���n and
is unique. Moreover, �n enjoys the following additional properties for B ∈ ���n :
(i) �n is invariant under translations: �n�x + B = �n�B , x ∈ �n;
(ii) �n is invariant under motions: �n�R−1�B = �n�B where R is a motion,
i.e. a combination of translations, rotations and reflections;
(iii) �n�M −1�B = � det M�−1 �n�B for any invertible matrix M ∈ �n×n.
The attentive reader will have noticed that the sets x + B �=
x + y � y ∈ B��
R−1�B �=
R−1�y � y ∈ B� and M −1�B must again be Borel sets, otherwise the
statement of T4.9 would be senseless, cf. T5.8 and Chapter 7.
Problems
4.1. Extend Proposition 4.3(i), (iv) and (v) to finitely many sets A1� A2�
� AN ∈ �.
4.2. Check that the set-functions defined in Example 4.7 are measures in the sense of
Definition 4.1.
4.3. Is the set-function � of 4.7 (ii) still a measure on the measurable space ��� ��� ?
And on the measurable space ��� � ∩ ��� ?
4.4. Let X = �. For which �-algebras are the following set-functions measures:
(i) ��A =
{
0� if A = ∅
1� if A �= ∅�
(ii) ��A =
{
0� if A is finite
1� if Ac is finite?
4.5. Find an example showing that the finiteness condition in Theorem 4.4 (iii′) or (iii′′)
is essential.
[Hint: use Lebesgue measure or the counting measure on infinite tails �k� � ↓ ∅.]
4.6. Let �X� � be a measurable space.
(i) Let �� � be two measures on �X� � . Show that for all a� b � 0 the set-function
��A �= a��A + b��A , A ∈ �, is again a measure.
(ii) Let �1� �2�
be countably many measures on �X� � and let ��j j∈� be a
sequence of positive numbers. Show that ��A �= ∑�j=1 �j �j �A , A ∈ �, is
again a measure.
[Hint: to show �-additivity use (and prove) the following helpful lemma: for
any double sequence �ij � i� j ∈ �, of real numbers we have
sup
i∈�
sup
j∈�
�ij = sup
j∈�
sup
i∈�
�ij
Thus limi→� limj→� �ij = limj→� limi→� �ij if i �→ �ij , and j �→ �ij increases
when the other index is fixed.]
4.7. Let �X� �� � be a measure space and F ∈ �. Show that � � A �→ ��A ∩ F defines
a measure.
Measures, Integrals and Martingales 29
4.8. Let ��� �� P be a probability space and �Aj j∈� ⊂ � a sequence of sets with
P�Aj = 1 for all j ∈ �. Show that P
(⋂
j∈� Aj
) = 1.
4.9. Let �X� �� � be a finite measure space and �Aj j∈�� �Bj j∈� ⊂ � such that Aj ⊃ Bj
for all j ∈ �. Show that
�
( ⋃
j∈�
Aj
)
− �
( ⋃
j∈�
Bj
)
�
∑
j∈�
(
��Aj − ��Bj
)
[Hint: show first that
⋃
j Aj \
⋃
k Bk ⊂
⋃
j �Aj \ Bj then use C4.6.]
4.10. Null sets. Let �X� �� � be a measure space. A set N ∈ � is called a null set or
�-null set if ��N = 0. We write � for the family of all �-null sets. Check that
� has the following properties:
(i) ∅ ∈ �;
(ii) if N ∈ �� M ∈ � and M ⊂ N� then M ∈ �;
(iii) if �Nj j∈� ⊂ �, then
⋃
j∈� Nj ∈ �.
4.11. Let � be one-dimensional Lebesgue measure.
(i) Show that for all x ∈ � the set
x� is a Borel set with ��
x� = 0.
[Hint: consider the intervals �x − 1/k� x + 1/k � k ∈ � and use Theorem 4.4.]
(ii) Prove that � is a Borel set and that ��� = 0 in two ways:
a) by using the first part of the problem;
b) by considering the set C�� �= ⋃k∈��qk − � 2−k� qk + � 2−k , where �qk k∈�
is an enumeration of �, and letting � → 0.
(iii) Use the trivial fact that �0� 1� = ⋃0�x�1
x� to show that a non-countable union
of null sets (here:
x�) is not necessarily a null set.
4.12. Determine all null sets of the measure �a + �b, a� b ∈ �, on ��� ��� .
4.13. Completion (1). We have seen in Problem 4.10 that measurable subsets of null
sets are again null sets: M ∈ �� M ⊂ N ∈ �� ��N = 0 then ��M = 0; but there
might be subsets of N which are not in �. This motivates the following definition:
a measure space �X� �∗� � (or a measure �) is complete if all subsets of �-null
sets are again in �∗. In other words: if all subsets of a null set are null sets.
The following exercise shows that a measure space �X� �� � which is not yet
complete can be completed.
(i) �∗ �=
A ∪ N � A ∈ �� N is a subset of some �-measurable null set� is a
�-algebra satisfying � ⊂ �∗.
(ii) �̄�A∗ �= ��A for A∗ = A ∪ N ∈ �∗ is well-defined, i.e. it is independent of
the way we can write A∗, say as A∗ = A ∪ N = B ∪ M where A� B ∈ � and
M� N are subsets of null sets.
(iii) �̄ is a measure on �∗ and �̄�A = ��A for all A ∈ �.
30 R.L. Schilling
(iv) �X� �∗� �̄ is complete.
(v) We have �∗ =
A∗ ⊂ X � ∃ A� B ∈ �� A ⊂ A∗ ⊂ B� ��B \ A = 0�.
4.14. Restriction. Let �X� �� � be a measure space and let � ⊂ � be a sub-�-algebra.
Denote by � �= ��� the restriction of � to �.
(i) Show that � is again a measure.
(ii) Assume that � is a finite measure [a probability measure]. Is � still a finite
measure [a probability measure]?
(iii) Does � inherit �-finiteness from �?
4.15. Show that a measure space �X� �� � is �-finite if, and only if, there exists a
sequence of measurable sets �Ej j∈� ⊂ � such that
⋃
j∈� Ej = X and ��Ej < � for
all j ∈ �.
5
Uniqueness of measures
Before we embark on the proof of the existence of measures in the following
chapter, let us first check whether it is enough to consider measures on some
generator � of a �-algebra – otherwise our construction of Lebesgue measure
would be flawed from the start.
As mentioned in Remark 3.10 a major problem is that, apart from trivial cases,
���� cannot be constructively obtained from �. To overcome this obstacle we
need a new concept.
5.1 Definition A family � ⊂ ��X� is a Dynkin system if
X ∈ �� (�1)
D ∈ � =⇒ Dc ∈ �� (�2)
�Dj �j∈� ⊂ � pairwise disjoint =⇒ ·
⋃
j∈�
Dj ∈ �� (�3)
5.2 Remark As for �-algebras, cf. Properties 3.2, one sees that ∅ ∈ � and that
finite disjoint unions are again in �: D� E ∈ �� D ∩ E = ∅ =⇒ D ∪· E ∈ �.
Of course, every �-algebra is a Dynkin system, but the converse is, in general,
wrong[�], Problem 5.2.
5.3 Proposition Let � ⊂ ��X�. Then there is a smallest (also minimal, coarsest)
Dynkin system ���� containing �. ���� is called the Dynkin system generated
by �. Moreover, � ⊂ ���� ⊂ ����.
Proof The proof that ���� exists parallels the proof of T3.4(ii). As in the
case of �-algebras, ���� = � if � is a Dynkin system (by minimality) and so
������� = ����. Hence, � ⊂ ���� implies that ���� ⊂ ������� = ����.
It is important to know when a Dynkin system is already a �-algebra.
31
32 R.L. Schilling
5.4 Lemma A Dynkin system � is a �-algebra if, and only if, it is stable under
finite intersections:1 D� E ∈ � =⇒ D ∩ E ∈ �.
Proof Since a �-algebra is ∩-stable (cf. Properties 3.2, Problem 3.1) as well as
a Dynkin system (Remark 5.2) it only remains to show that a ∩-stable Dynkin
system � is a �-algebra.
Let �Dj �j∈� be a sequence of subsets in �. We have to show that D =⋃
j∈� Dj ∈ �. Set E1 = D1 ∈ � and
Ej+1 = �� � � ��Dj+1 \ Dj � \ Dj−1� \ � � �� \ D1
= Dj+1 ∩ Dcj ∩ Dcj−1 ∩ � � � ∩ Dc1 ∈ �
where we used ��2� and the assumed ∩-stability of �. The Ej are obviously
mutually disjoint and D = ·⋃
j∈�Ej ∈ � by ��3�.
Lemma 5.4 is not applicable if � is given in terms of a generator �, which is
often the case. The next theorem is very important for applications as it extends
Lemma 5.4 to the much more convenient setting of generators.
5.5 Theorem If � ⊂ ��X� is stable under finite intersections, then ���� = ����.
Proof We have already established ���� ⊂ ���� in P5.3. If we knew that ����
were a �-algebra, the minimality of ���� and � ⊂ ���� would immediately imply
���� ⊂ ����, hence equality.
In view of L5.4 it is enough to show that ���� is ∩-stable. For this we fix
some D ∈ ���� and introduce the family
�D =
Q ⊂ X Q ∩ D ∈ ������
Let us check that �D is a Dynkin system: ��1� is obviously true. ��2�: take
Q ∈ �D. Then
Qc ∩ D = �Qc ∪ Dc� ∩ D = �Q ∩ D�c ∩ D = ��Q ∩ D�︸ ︷︷ ︸
∈ ����
∪· Dc︸︷︷︸
∈ ����
�c (5.1)
and disjoint unions of sets from ���� are still in ����. Thus Qc ∈ �D. ��3�: let
�Qj �j∈� be a sequence of pairwise disjoint sets from �D. By definition, �Qj ∩
D�j∈� is a disjoint sequence in ���� and ��3� for the Dynkin system ���� shows
(
·⋃
j∈�
Qj
)
∩ D = ·⋃
j∈�
�Qj ∩ D� ∈ �����
which means that ·⋃
j∈�Qj ∈ �D.
1 ∩-stable, for short.
Measures, Integrals and Martingales 33
Since � ⊂ ���� and since � is ∩-stable, we have � ⊂ �G for all G ∈ �.[�] But
�G is a Dynkin system and so ���� ⊂ �G for all G ∈ � (use P5.3, Problem 5.4).
Consequently, if D ∈ ���� and G ∈ � we find because of ���� ⊂ �G and the
very definition of �G that
G ∩ D ∈ ���� ∀ G ∈ �� ∀ D ∈ ����
so � ⊂ �D ∀ D ∈ ����
and ���� ⊂ �D ∀ D ∈ �����
The latter just says that ���� is stable under intersections with D ∈ ����. By
Lemma 5.4 ���� is a �-algebra and the theorem is proved.
5.6 Remark The technique used in the proof of Theorem 5.5 is an extremely
important and powerful tool. We will use it almost exclusively in this chapter
to prove the uniqueness of measures theorem and some properties of Lebesgue
measure �n.
5.7 Theorem (Uniqueness of measures). Assume that �X� �� is a measurable
space and that � = ���� is generated by a family � such that
• � is stable under finite intersections: G� H ∈ � =⇒ G ∩ H ∈ �;
• there exists an exhausting sequence �Gj �j∈� ⊂ � with Gj ↑ X.
Any two measures
� � that coincide on � and are finite for all members of the
exhausting sequence
�Gj � = ��Gj � <
, are equal on �, i.e.
�A� = ��A� for
all A ∈ �.
Proof For j ∈ � we define
�j =
{
A ∈ �
�Gj ∩ A� = ��Gj ∩ A� �<
!�
}
and we claim that every �j is a Dynkin system. ��1� is clear. ��2�: if A ∈ �j
we have
�Gj ∩ Ac� =
�Gj \ A� =
�Gj � −
�Gj ∩ A�
= ��Gj � − ��Gj ∩ A�
= ��Gj \ A� = ��Gj ∩ Ac��
34 R.L. Schilling
so that Ac ∈ �j . ��3�: if �Ak�k∈� ⊂ �j are mutually disjoint sets, we get
(
Gj ∩ ·
⋃
k∈�
Ak
)
=
(
·⋃
k∈�
�Gj ∩ Ak�
)
= ∑
k∈�
�Gj ∩ Ak�
= ∑
k∈�
��Gj ∩ Ak� = �
(
·⋃
k∈�
�Gj ∩ Ak�
)
= �
(
Gj ∩ ·
⋃
k∈�
Ak
)
�
and ·⋃
k∈�Ak ∈ �j follows.
Since � is ∩-stable, we know from T5.5 that ���� = ����; therefore,
�j ⊃ � =⇒ �j ⊃ ���� = ���� ∀ j ∈ ��
On the other hand, � = ���� ⊂ �j ⊂ �, which means that � = �j for all j ∈ �,
and so
�Gj ∩ A� = ��Gj ∩ A� ∀ j ∈ �� ∀ A ∈ �� (5.2)
Using T4.4(iii) we can let j →
in (5.2) to get
�A� = lim
j→
�Gj ∩ A� = lim
j→
��Gj ∩ A� = ��A� ∀ A ∈ ��
The following two theorems show why Lebesgue measure (if it exists) plays a
very special rôle indeed.
5.8 Theorem (i) n-dimensional Lebesgue measure �n is invariant under trans-
lations, i.e.
�n�x + B� = �n�B� ∀ x ∈ �n� ∀ B ∈ ��n�� (5.3)
(ii) Every measure
on ��n� ��n�� which is invariant under translations and
satisfies � =
��0� 1�n� <
is a multiple of Lebesgue measure:
= ��n.
Proof First of all we should convince ourselves that
B ∈ ��n� =⇒ x + B ∈ ��n� ∀ x ∈ �n (5.4)
otherwise the statement of T5.8 would be senseless. For this set
�x =
{
B ∈ ��n� x + B ∈ ��n�} ⊂ ��n��
It is clear that �x is a �-algebra and that
⊂ �x.[�] Hence, ��n� = ��
� ⊂
�x ⊂ ��n� and (5.4) follows. We can now start the proof proper.
Measures, Integrals and Martingales 35
(i) Set ��B� = �n�x + B� for some fixed x = �x1� � � � � xn� ∈ �n. It is easy to
check that � is a measure on ��n� ��n��[�]. Take I = �a1� b1� × · · · × �an� bn� ∈
and observe that
x + I = �a1 + x1� b1 + x1� × · · · × �an + xn� bn + xn� ∈
�
so that
��I� = �n�x + I� =
n∏
j=1
(
�bj + xj � − �aj + xj �
) =
n∏
j=1
(
bj − aj
) = �n�I��
This means that �
= �n
.2 But
is ∩-stable,3 generates ��n� and admits
the exhausting sequence
�−k� k�n ↑ �n� �n(�−k� k�n) = �2k�n <
�
We can now invoke T5.5 to see that �n = � on the whole of ��n�.
(ii) Take I ∈
as in part (I) but with rational endpoints aj � bj ∈ �. Thus there
is some M ∈ � and k�I� ∈ � and points x�j� ∈ �n, such that
I =
k�I�
·⋃
j=1
(
x�j� +[0� 1
M
)n)
i.e. we pave the rectangle I by little squares �0� 1
M
�n of side-length 1/M, where M
is, say, the common denominator of all aj and bj . Using the translation invariance
of
and �n, we see
�I� = k�I�
([0� 1
M
)n)
�
(
�0� 1�n
) = M n
([0� 1
M
)n)
�
�n�I� = k�I� �n([0� 1
M
)n)
� �n
(
�0� 1�n
)
︸ ︷︷ ︸
= 1
= M n �n([0� 1
M
)n)
�
and dividing the top two and bottom two equalities gives
�I� = k�I�
M n
(
�0� 1�n
)
� �n�I� = k�I�
M n
�n
(
�0� 1�n
) = k�I�
M n
�
Thus
�I� =
(�0� 1�n)�n�I� = � �n�I� for all I ∈
and, as in part (I), an
application of T5.5 finishes the proof.
Incidentally, Theorem 5.8 proves Theorem 4.9(I). Further properties of Lebesgue
measure will be studied in the following chapters, but first we concentrate on its
existence.
2 This is short for ��I� = �n�I� ∀ I ∈
.
3 Use
n×××
j=1
�aj � bj � ∩
n×××
j=1
�a′j � b
′
j � =
n×××
j=1
�aj ∨ a′j � bj ∧ b′j ��.
[�]
36 R.L. Schilling
Problems
5.1. Verify the claims made in Remark 5.2.
5.2. The following exercise shows that Dynkin systems and �-algebras are, in general,
different: Let X =
1� 2� 3� � � � � 2k − 1� 2k� for some fixed k ∈ �. Then the family
� =
A ⊂ X #A is even� is a Dynkin system, but not a �-algebra.
5.3. Let � be a Dynkin system. Show that for all A� B ∈ � the difference B \ A ∈ �.
[Hint: use R \ Q = ��R ∩ Q� ∪· Rc�c where R� Q ⊂ X.]
5.4. Let � be a �-algebra, � be a Dynkin system and � ⊂ � ⊂ ��X� two collections
of subsets of X. Show that
(i) ���� = � and ���� = �;
(ii) ���� ⊂ ����;
(iii) ���� ⊂ ����.
5.5. Let A� B ⊂ X. Compare ��
A� B�� and ��
A� B��. When are they equal?
5.6. Show that Theorem 5.7 is still valid, if �Gj �j∈� ⊂ � is not an increasing sequence
but any countable family of sets such that
�1�
⋃
j∈�
Gj = X and �2� ��Gj � =
�Gj � <
�
[Hint: set FN = G1 ∪ � � � ∪ GN = FN −1 ∪ GN and check by induction that
�FN � =
��FN �; use then T5.7.]
5.7. Show that the half-open intervals
n in �n are stable under finite intersections.
[Hint: check that I = n×××
j=1
�aj � bj �, I
′ = n×××
j=1
�a′j � b
′
j � satisfy I ∩I ′ =
n×××
j=1
�aj ∨a′j � bj ∧b′j �. ]
5.8. Dilations. Mimic the proof of Theorem 5.8(I) and show that t · B =
tb b ∈ B�
is a Borel set for all B ∈ ��n� and t > 0. Moreover,
�n�t · B� = tn �n�B� ∀ B ∈ ��n�� ∀ t > 0� (5.5)
5.9. Invariant measures. Let �X� ��
� be a finite measure space where � = ���� for
some ∩-stable generator �. Assume that � X → X is a map such that �−1�A� ∈ �
for all A ∈ �. Prove that
�G� =
��−1�G�� ∀ G ∈ � =⇒
�A� =
��−1�A�� ∀ A ∈ ��
(A measure
with this property is called invariant w.r.t. the map �.)
5.10. Independence (1). Let ��� �� P� be a probability space and let �
⊂ � be two
sub-�-algebras of �. We call and
independent, if
P�B ∩ C� = P�B� P�C� ∀ B ∈ � C ∈
�
Assume now that = ���� and
= ���� where �, � are ∩-stable collections
of sets. Prove that and
are independent if, and only if,
P�G ∩ H� = P�G� P�H� ∀ G ∈ �� H ∈ ��
6
Existence of measures
In Chapter 4 we saw that it is not a trivial task to assign explicitly a �-value to
every set A from a �-algebra �. Rather than doing this it is often more natural
to assign �-values to, say, rectangles (in the case of the Borel �-algebra) or,
in general, to sets from some generator � of �. Because of Theorem 4.4 (and
Remark 4.5) ��� should be a pre-measure. If � and � satisfy the conditions of
the uniqueness theorem 5.7, this approach will lead to a unique measure on �,
provided we can extend � from � onto ���� = �.
To get such an automatic extension the following (technically motivated) class
of generators is useful. A semi-ring is a family � ⊂ ��X� with the following
properties:
∅ ∈ �� (S1)
S� T ∈ � =⇒ S ∩ T ∈ �� (S2)
for S� T ∈ � there exist finitely many disjoint
S1� S2� � � � � SM ∈ � such that S \ T =
M
·⋃
j=1
Sj �
(S3)
The solution to our problems is the following deep extension theorem for
measures which goes back to Carathéodory [9].
6.1 Theorem (Carathéodory) Let � be a semi-ring of subsets of X and � � � →
0�
be a pre-measure, i.e. a set-function with
(i) ��∅� = 0;
(ii) �Sj �j∈� ⊂ �, disjoint and S = ·
⋃
j∈�
Sj ∈ � =⇒ ��S� =
∑
j∈�
��Sj �.
37
38 R.L. Schilling
Then � has an extension to a measure � on ����. If, moreover, � contains an
exhausting sequence �Sj �j∈�, Sj ↑ X such that ��Sj � < for all j ∈ �, then the
extension is unique.
6.2 Remark From the Definition 4.1 of a measure it is clear that the condi-
tions 6.1(i) and (ii) are necessary for � to become a measure. Theorem 6.1 says
that they are even sufficient. Remarkable is the fact that (ii) is only needed
relative to � – its extension to ���� is then automatic.
The proof of Carathéodory’s theorem is a bit involved and not particularly
rewarding when read superficially. Therefore we recommend skipping the proof
on first reading and resuming on p. 44.
Proof (of Theorem 6.1) We begin with the construction of an auxiliary set-
function �∗ � ��X� → 0�
which will, eventually, extend �. Define for each
A ⊂ X the family of countable �-coverings of A
��A� �= {�Sj �j∈� ⊂ � �
⋃
j∈� Sj ⊃ A
}
(��A� = ∅ is possible since we do not require X ∈ �), and set
�∗�A� �= inf
{ ∑
j∈�
��Sj � � �Sj �j∈� ∈ ��A�
}
(6.1)
where, as usual, inf ∅ �= + .
Step 1: Claim: �∗ has the following three properties:1
�∗�∅� = 0� �OM1�
�monotone� A ⊂ B =⇒ �∗�A� � �∗�B�� �OM2�
�Aj �j∈� ⊂ ��X� =⇒
��-subadditive� �OM3�
�∗
( ⋃
j∈�
Aj
)
�
∑
j∈�
�∗�Aj ��
�OM1� is obvious since we can take in (6.1) the constant sequence S1 = S2 =
� � � = ∅ which is clearly in ��∅�.
�OM2�: if B ⊃ A, then each �-cover of B also covers A, i.e. ��B� ⊂ ��A�.
Therefore,
�∗�A� = inf
{ ∑
j∈�
��Sj � � �Sj �j∈� ∈ ��A�
}
� inf
{ ∑
k∈�
��Tk� � �Tk�k∈� ∈ ��B�
}
= �∗�B��
1 A set-function �∗ � ��X� → 0�
satisfying �OM1�–�OM3� is called outer measure.
Measures, Integrals and Martingales 39
�OM3�: without loss of generality we can assume that �
∗�Aj � < for all j ∈ �
and so ��Aj �
= ∅. Fix � > 0 and observe that by the very nature of the infimum
we find for each Aj a cover �S
j
k�k∈� ∈ ��Aj � with
∑
k∈�
��S
j
k� � �
∗�Aj � +
�
2j
� j ∈ �� (6.2)
The double sequence �S
j
k�j�k∈� is an �-cover of A �=
⋃
j∈� Aj , and so
�∗�A� �
∑
�j�k�∈�×�
��S
j
k� =
∑
j∈�
∑
k∈�
��S
j
k�
(6.2)
�
∑
j∈�
(
�∗�Aj � +
�
2j
)
= ∑
j∈�
�∗�Aj � + ��
where the second ‘�’ follows from (6.2). Letting � → 0 proves �OM3�.
Step 2. Claim: �∗ extends �, i.e. �∗�S� = ��S� ∀ S ∈ �.
Observe that � can be uniquely extended to the set �∪ �= �S1 ∪· � � � ∪· SM � M ∈
�� Sj ∈ �
of all finite unions of disjoint �-sets by
�̄�S1 ∪· � � � ∪· SM � �=
M∑
j=1
��Sj �� (6.3)
Since (6.3) is necessary for an additive set-function on �∪, (6.3) implies the
uniqueness of the extension[�] once we know that �̄ is well-defined, that is,
independent of the particular representation of sets in �∪. To see this assume that
S1 ∪· � � � ∪· SM = T1 ∪· � � � ∪· TN � M� N ∈ �� Sj � Tk ∈ ��
Then
Sj = Sj ∩ �T1 ∪· � � � ∪· TN � =
N
·⋃
k=1
�Sj ∩ Tk��
and the additivity of � on � shows
��Sj � =
N∑
k=1
��Sj ∩ Tk��
40 R.L. Schilling
Summing over j = 1� 2� � � � � M and swapping the rôles of Sj and Tk gives
M∑
j=1
��Sj � =
M∑
j=1
N∑
k=1
��Sj ∩ Tk� =
N∑
k=1
��Tk��
which proves that (6.3) does not depend on the representation of �∪-sets.
The family �∪ is clearly stable under finite disjoint unions. If S� T ∈ �∪ we
find (notation as before)
S ∩ T = �S1 ∪· � � � ∪· SM � ∩ �T1 ∪· � � � ∪· TN � =
M�N
·⋃
j�k=1
�Sj ∩ Tk�︸ ︷︷ ︸
∈ �
∈ �∪�
and, since by �S3� Sj \ Tk ∈ �∪, also
S \ T = �S1 ∪· � � � ∪· SM � \ �T1 ∪· � � � ∪· TN �
=
M
·⋃
j=1
N⋂
k=1
(
Sj ∩ T ck
) =
M
·⋃
j=1
N⋂
k=1
Sj \ Tk︸ ︷︷ ︸
∈ �∪︸ ︷︷ ︸
∈�∪
∈ �∪�
where we used the ∩- and ∪· -stability of �∪. Finally,
S ∪ T = (S \ T) ∪· (S ∩ T) ∪· (T \ S) ∈ �∪�2
and the prescription (6.3) can be used to extend � to finite unions of �-sets.
Let us show that �̄ is �-additive on �∪, i.e. a pre-measure. For this take
�Tk�k∈� ⊂ �∪ such that T �= ·
⋃
k∈�Tk ∈ �∪. By the definition of the family
�∪ we find a sequence of disjoint sets �Sj �j∈� ⊂ � and a sequence of integers
0 = n�0� � n�1� � n�2� � � � � such that
Tk = Sn�k−1�+1 ∪· � � � ∪· Sn�k�� k ∈ �
and T = U1∪· � � � ∪· UN , where U� = ·
⋃
j∈J�
Sj ∈ �[�] with disjoint index sets
J1 ∪· J2 ∪· � � � ∪· JN = � partitioning �. Thus
�̄�T �
def=
N∑
�=1
��U��
6.1(ii)=
N∑
�=1
∑
j∈J�
��Sj � =
∑
k∈�
n�k�∑
j=n�k−1�+1
��Sj �
def= ∑
k∈�
�̄�Tk��
which proves �-additivity of �̄.
2 This shows that �∪ is the ring generated by �, i.e. the smallest ring containing �.
Measures, Integrals and Martingales 41
Using the pre-measure �̄ we get from Corollary 4.6 for any cover �Sj �j∈� ∈
��S�, S ∈ �, that
��S� = �̄�S� = �̄
( ⋃
j∈�
Sj ∩ S
)
�
∑
j∈�
�̄�Sj ∩ S�
= ∑
j∈�
��Sj ∩ S� �
∑
j∈�
��Sj ��
and passing to the infimum over ��S� shows ��S� � �∗�S�. The special cover
�S� ∅� ∅� � � �� ∈ ��S�, on the other hand, yields �∗�S� � ��S� and this shows that
��� = �∗��.
Step 3. Claim: � ⊂ �∗, where �∗ is given by
�∗ �= {A ⊂ X � �∗�Q� = �∗�Q ∩ A� + �∗�Q \ A� ∀ Q ⊂ X}� (6.4)
Let S� T ∈ �. From �S3� we get
T = �S ∩ T � ∪· �T \ S� = �S ∩ T � ∪·
M
·⋃
j=1
Sj
for some mutually disjoint sets Sj ∈ �, j = 1� 2� � � � � M. Since � is additive on �
and �∗ is (�-)subadditive by �OM3�, we find
�∗�S ∩ T � + �∗�T \ S� � ��S ∩ T � +
M∑
j=1
��Sj � = ��T �� (6.5)
Take any B ⊂ X and some �-cover �Tj �j∈� ∈ ��B�. Using �∗�Tj � = ��Tj � and
summing the inequality (6.5) for T = Tj over j ∈ � yields
∑
j∈�
�∗�Tj \ S� +
∑
j∈�
�∗�Tj ∩ S� �
∑
j∈�
�∗�Tj ��
and the �-subadditivity �OM3� and monotonicity �OM2� of �
∗ give (recall that
B ⊂ ⋃j∈� Tj )
�∗�B \ S� + �∗�B ∩ S� � �∗
( ⋃
j∈�
Tj \ S
)
+ �∗
( ⋃
j∈�
Tj ∩ S
)
�
∑
j∈�
�∗�Tj � =
∑
j∈�
��Tj ��
We can now pass to the inf over ��B� and find �∗�B \ S� + �∗�B ∩ S� � �∗�B�.
Since the reverse inequality follows easily from the (�-)subadditivity �OM3� of
�∗, S ∈ �∗ holds for all S ∈ �.
Step 4. Claim: �∗ is a �-algebra and �∗ is a measure on �X� �∗�.
42 R.L. Schilling
Clearly, ∅ ∈ �∗ and by the symmetry (w.r.t. A and Ac) of definition (6.4) of
�∗ we have A ∈ �∗ if, and only if, Ac ∈ �∗. Let us show that �∗ is ∪-stable.
Using the (�-)subadditivity �OM3� of �
∗ we find for A� A′ ∈ �∗ and any P ⊂ X
�∗�P ∩ �A ∪ A′�� + �∗�P \ �A ∪ A′��
= �∗�P ∩ �A ∪ A′ \ A
�� + �∗�P \ �A ∪ A′��
� �∗�P ∩ A� + �∗�P ∩ �A′ \ A�� + �∗�P \ �A ∪ A′��
= �∗�P ∩ A� + �∗��P \ A� ∩ A′� + �∗��P \ A� \ A′�
�6�4�= �∗�P ∩ A� + �∗�P \ A� (6.6)
�6�4�= �∗�P�� (6.6′)
where we used in the last two steps the definition (6.4) of �∗ with Q =̂ P \ A and
Q =̂ P, respectively. The reverse inequality follows from �OM3�, hence equality,
and we conclude that A ∪ A′ ∈ �∗.
If A� A′ are disjoint, the equality (6.6)=(6.6′) becomes, for P �= �A ∪· A′� ∩ Q,
Q ⊂ X,
�∗�Q ∩ �A ∪· A′�� = �∗�Q ∩ A� + �∗�Q ∩ A′� ∀ Q ⊂ X�
and a simple induction argument yields
�∗�Q ∩ �A1 ∪· � � � ∪· AM �� =
M∑
j=1
�∗�Q ∩ Aj � ∀ Q ⊂ X
for all mutually disjoint A1� A2� � � � � AM ∈ �∗.
In particular, if �Aj �j∈� ⊂ �∗ is a sequence of pairwise disjoint sets, we find
for their union A �= ·⋃
j∈�Aj that
�∗�Q ∩ A� � �∗�Q ∩ �A1 ∪· � � � ∪· AM �� =
M∑
j=1
�∗�Q ∩ Aj �� (6.7)
Since A1 ∪ � � � ∪ AM ∈ �∗, we can use �OM3� and (6.7) to deduce
�∗�Q� = �∗�Q ∩ �A1 ∪ � � � ∪ AM �� + �∗�Q \ �A1 ∪ � � � ∪ AM ��
� �∗�Q ∩ �A1 ∪ � � � ∪ AM �� + �∗�Q \ A� (6.8)
=
M∑
j=1
�∗�Q ∩ Aj � + �∗�Q \ A��
Measures, Integrals and Martingales 43
The left-hand side is independent of M; therefore, we can let M → and get
�∗�Q� �
∑
j=1
�∗�Q ∩ Aj � + �∗�Q \ A� � �∗�Q ∩ A� + �∗�Q \ A�� (6.9)
The reverse inequality �∗�Q� � �∗�Q ∩ A� + �∗�Q \ A� follows at once from the
subadditivity of �∗. This means that equality holds throughout (6.9) and we get
A ∈ �∗. If we take Q �= A in (6.9) we even see the �-additivity of �∗ on �∗.
So far we have seen that �∗ is a ∪-stable Dynkin system. Because of A ∩ B =
�Ac ∪ Bc�c we see that �∗ is also ∩-stable and, by L5.4, a �-algebra.
Step 5. Claim: �∗ is a measure on ���� which extends �.
By step 3, � ⊂ �∗ and thus ���� ⊂ ���∗� = �∗ since �∗ is itself a �-algebra
(step 4). Again by step 4, �∗����� is a measure which, by step 2, extends �.
Step 6. Uniqueness of �∗�����. If there is an exhausting sequence �Sj �j∈� ⊂ �,
Sj ↑ X such that ��Sj � < for all j ∈ �, it follows from T5.7 that any two
extensions of � to ���� coincide.
6.3 Remark The core of Carathéodory’s theorem 6.1 is the definition (6.4) of
�∗-measurable sets, i.e. of the �-algebra �∗. The proof shows that, in general, we
cannot expect �∗ to be (�-)additive outside �∗. In many situations the �-algebra
��X� is simply too big to support a non-trivial measure. Notable exceptions are
countable sets X or Dirac measures[�]. For n-dimensional Lebesgue measure, this
was first remarked by Hausdorff [19, pp. 401–402]. The general case depends on
the cardinality of X and the behaviour of � on one-point sets; see the discussion
in Oxtoby [33, Chapter 5].
Put in other words this says that even a household measure like Lebesgue
measure cannot assign a content to every set! In �3 (and higher dimensions)
we even have the Banach–Tarski paradox: the open balls B1�0� and B2�0� with
centre 0 and radii 1 resp. 2 have finite disjoint decompositions B1�0� = ·
⋃M
j=1Ej
and B2�0� = ·
⋃M
j=1Fj such that for every j = 1� 2� � � � � M the sets Ej and Fj are
geometrically congruent (hence, should have the same Lebesgue measure); see
Stromberg [49] or Wagon [52]. Of course, not all of the sets Ej and Fj can be
Borel sets.
This brings us to the question if and how we can construct a non-Borel mea-
surable set, i.e. a set A ∈ ���n� \ ��n�. Such constructions are possible but
they are based on the axiom of choice, see for example Hewitt and Stromberg
[20, pp. 136–7], Oxtoby [33, pp. 22–3] or Appendix D.
∗ ∗ ∗
44 R.L. Schilling
Let us now apply Theorem 6.1 to prove the existence of n-dimensional Lebesgue
measure �n which was defined for half-open rectangles
n =
��n� in D4.8:
�n� a� b��� =
n∏
j=1
�bj − aj �� a� b�� =
n×
j=1
aj � bj � ∈
n�
6.4 Proposition The family of n-dimensional rectangles
n is a semi-ring.
Proof (By induction) It is obvious that
1 satisfies the properties �S1�–�S3� from
page 37. Assume that
n is a semi-ring for some n � 1. From the definition of
rectangles it is clear that
n+1 =
n ×
1 �= {In × I1 � In ∈
n� I1 ∈
1
}
�
�S1� is obviously true and �S2� follows from the identity
�In × I1� ∩ �Jn × J1� = �In ∩ Jn� × �I1 ∩ J1� (6.10)
where In� Jn ∈
n and I1� J1 ∈
1. Since
�Jn × J1�c
= {�x� y� � x
∈ Jn� y
∈ J1 or x ∈ Jn� y
∈ J1 or x
∈ Jn� y ∈ J1
}
= �J cn × J c1 � ∪· �Jn × J c1 � ∪· �J cn × J1�
we see, using (6.10),
�In × I1� \ �Jn × J1�
= �In × I1� ∩ �Jn × J1�c
= [�In \ Jn� × �I1 \ J1�
]∪· [�In ∩ Jn� × �I1 \ J1�
]∪· [�In \ Jn� × �I1 ∩ J1�
]
�
Both In \ Jn and I1 \ J1 are made up of finitely many disjoint rectangles from
n
and
1, and therefore �In × I1� \ �Jn × J1� is a finite union of disjoint rectangles
from
n ×
1; thus �S3� holds.
In �2 it is easy to depict the two typical situations that occur in the proof of
�S3� in Proposition 6.4:
I1
In
J1
Jn
1
2 3
Jn × J1I1
In
6
1
2
3 4
8 7
5
Measures, Integrals and Martingales 45
The proof of Proposition 6.4 reveals a bit more: the Cartesian product of any two
semi-rings is again a semi-ring.[�]
6.5 Proposition �n is a pre-measure on
n
Proof It is enough to verify (i), (ii) and (iii′′) of Theorem 4.4 since �n assigns
finite measure to every rectangle in
n.
We consider only the case n = 2, since
n = 1 is similar but easier and n � 3 adds
only notational complications. Obviously,
�2�∅� = 0. To see additivity on
2, we
may as well cut I = a1� b1� × a2� b2� along
one direction (say, along j = 2) to get
I1 = a1� b1�× a2� ��, I2 = a1� b1�× �� b2�
and reassemble it I = I1 ∪· I2 (if n � 3 this is
accomplished by a hyperplane). Thus
(a1, a2)
(b1, b2)
I1
I2
γ
�2�I1� + �2�I2� = �b1 − a1��� − a2� + �b1 − a1��b2 − ��
= �b1 − a1��b2 − � + � − a2�
= �b1 − a1��b2 − a2�
= �2�I��
Now let �Ij �j∈� ⊂
2, Ij = a�j�� b�j���, be a decreasing sequence of rectangles
Ij ↓ ∅. We have to show that limj→ �2
(
Ij
) = 0. Since Ij ↓ ∅, it is clear that
at least in one coordinate direction, say k = 2, we have limj→
(
b
�j�
2 − a
�j�
2
) = 0,
otherwise
⋂
j∈� Ij would contain a rectangle with side-lengths limj→
(
b
�j�
k −
b
�j�
k
)
> 0 for k = 1� 2. But then
�2�Ij � =
2∏
k=1
(
b
�j�
k − a
�j�
k
)
� max
k
=2
(
b
�j�
k − a
�j�
k
)2−1 (
b
�j�
2 − a
�j�
2
) j→ −−−→ 0�
6.6 Corollary (Existence of Lebesgue measure) There is a unique extension of
n-dimensional Lebesgue pre-measure �n from
n (Definition 4.8) to a measure
on the Borel sets ��n�. This extension is again denoted by �n and is called
Lebesgue measure.
Proof We know from Theorem 3.8 that ��n� = ��
n�. Since −k� k�n ↑
�n is an exhausting sequence of cubes and since �n
(
−k� k�n) = �2k�n < ,
all conditions of Carathéodory’s theorem 6.1 are fulfilled, and �n extends to a
measure on ��n�.
46 R.L. Schilling
6.7 Remark The uniqueness of Lebesgue measure and its properties (cf. Theorem
4.9) show that it is necessarily the familiar elementary-geometric volume (length,
area …)-function vol�n��•� in the sense that vol�n� can in only one way be extended
to a measure on the Borel �-algebra.
Problems
6.1. Consider on � the family � of all Borel sets which are symmetric w.r.t. the origin.
Show that � is a �-algebra. Is it possible to extend a pre-measure � on � to a
measure on ���? If so, is this extension unique? Continues in Problem 9.12.
6.2. Completion (2). Recall from Problem 4.13 that a measure space �X� �� �� is com-
plete, if every subset of a �-null set is a �-null set (thus, in particular, measurable).
Let �X� �� �� be a �-finite measure space – i.e. there is an exhausting sequence
�Aj �j∈� ⊂ � such that ��Aj � < . As in the proof of Theorem 6.1 we write �∗ for
the outer measure (1) – now defined using �-coverings – and �∗ for the �-algebra
defined by (6.4).
(i) Show that for every Q ⊂ X there is some A ∈ � such that �∗�Q� = ��A� and
that ��N � = 0 for all N ⊂ Q \ A with N ∈ �.
[Hint: since �∗ is defined as an infimum, every Q with �∗�Q� < admits
a sequence Bk ∈ � with Bk ⊃ Q and ��B� − �∗�Q� � 1/k. If �∗�Q� = ,
consider for each j ∈ � the set Q ∩ Aj .]
(ii) Show that �X� �∗� �∗��∗ � is a complete measure space.
(iii) Show that �X� �∗� �∗��∗ � is the completion of �X� �� �� in the sense of
Problem 4.13.
6.3. (i) Show that non-void open sets in � (resp. �n) have always strictly positive
Lebesgue measure.
[Hint: let U be open. Find a small ball in U and inscribe a cube.]
(ii) Is (i) still true for closed sets?
6.4. (i) Show that �1��a� b�� = b − a for all a� b ∈ �� a � b.
[Hint: approximate �a� b� by half-open intervals and use Theorem 4.4.]
(ii) Let H ⊂ �2 be a hyperplane which is perpendicular to the x1-direction (that is
to say: H is a translate of the x2-axis). Show that H ∈ ��2� and �2�H� = 0.
[Hint: consider the sets Ak = −� 2−k� � 2−k� × −k� k� and note that H ⊂
y + ∪k∈�Ak for some y.]
(iii) State and prove the �n-analogues of (i) and (ii).
6.5. Let �X� �� �� be a measure space such that all singletons �x
∈ �. A point x is
called an atom, if ���x
� > 0. A measure is called non-atomic or diffuse, if there
are no atoms.
(i) Show that one-dimensional Lebesgue measure �1 is diffuse.
(ii) Give an example of a non-diffuse measure on ��� ����.
(iii) Show that for a diffuse measure � on �X� �� all countable sets are null sets.
Measures, Integrals and Martingales 47
(iv) Show that every probability measure P on ��� ���� can be decomposed into
a sum of two measures � + �, where � is diffuse and � is a measure of the
form � = ∑j∈� �j �xj , �j > 0, xj ∈ �.
[Hint: since P��� = 1, there are at most k points y�k�1 � y�k�2 � � � � � y�k�k such that
1
k−1 > P��y
�k�
j
� �
1
k
. Find by recursion (in k) all points satisfying such a
relation. There are at most countably many of these y
�k�
j . Relabel them as
x1� x2� � � �. These are the atoms of P. Now take �j = P��yj
�, define � as stated
and prove that � and P − � are measures.]
6.6. A set A ⊂ �n is called bounded, if it can be contained in a ball Br �0� ⊃ A of finite
radius r. A set A ⊂ �n is called connected, if we can go along a curve from any
point a ∈ A to any other point a′ ∈ A without ever leaving A, cf. Appendix B.
(i) Construct an open and unbounded set in � with finite, strictly positive Lebesgue
measure.
[Hint: try unions of ever smaller open intervals centred around n ∈ �.]
(ii) Construct an open, unbounded and connected set in �2 with finite, strictly
positive Lebesgue measure.
[Hint: try a union of adjacent, ever longer, ever thinner rectangles.]
(iii) Is there a connected, open and unbounded set in � with finite, strictly positive
Lebesgue measure?
6.7. Let � �= �1� 0�1
be Lebesgue measure on � 0� 1
� 0� 1
�. Show that for every � > 0
there is a dense open set U ⊂ 0� 1
with ��U � � �.
[Hint: take an enumeration �qj �j∈� of � ∩ �0� 1� and make each qj the centre of a
small open interval.]
6.8. Let � = �1 be Lebesgue measure on ��� ����. Show that N ∈ ��� is a null set
if, and only if, for every � > 0 there is an open set U = U� ⊃ N such that ��U � < �.
[Hint: sufficiency is trivial, for necessity use �∗ constructed in Theorem 6.1 (6.1)
from ��� and observe that by Theorem 6.1 �� ��n� = �∗� ��n�. This gives the required
open cover.]
6.9. Borel–Cantelli lemma (1) – the direct half. Prove the following theorem.
Theorem (Borel–Cantelli lemma). Let ��� �� P� be a probability space. For
every sequence �Aj �j∈� ⊂ � we have
∑
j=1
P�Aj � < =⇒ P
( ⋂
n=1
⋃
j=n
Aj
)
= 0� (6.11)
[Hint: use Theorem 4.4 and the fact that P
(⋃
j�n Aj
)
�
∑
j�n P�Aj �.]
Remark. This is the ‘easy’ or direct half of the so-called Borel–Cantelli lemma; the
more difficult part see T18.9. The condition � ∈
⋂
n=1
⋃
j=n
Aj means that � happens
to be in infinitely many of the Aj and the lemma gives a simple sufficient condition
when certain events happen almost surely not infinitely often, i.e. only finitely often
with probability one.
48 R.L. Schilling
6.10. Non-measurable sets (1). Let � be a measure on � = �∅� 0� 1�� 1� 2�� 0� 2�
,
X = 0� 2�, such that �� 0� 1�� = �� 1� 2�� = 1
2
and �� 0� 2�� = 1. Denote by �∗
and �∗ the outer measure and �-algebra which appear in the proof of Theorem 6.1.
(i) Find �∗��a� b�� and �∗��a
� for all 0 � a < b < 2 if we use � = � in T6.1;
(ii) Show that �0� 1�� �0
∈ �∗.
6.11. Non-measurable sets (2). Consider on X = � the �-algebra � �= �A ⊂ � �
A or Ac is countable
from Example 3.3(v) and the measure ��A� from 4.7(ii)
which is 0 or 1 according to A or Ac being countable. Denote by �∗ and �∗ the
outer measure and �-algebra which appear in the proof of Theorem 6.1.
(i) Find �∗ if we use � = � in T6.1;
(ii) Show that no set B ⊂ �, such that both B and Bc are uncountable, is in � or
in �∗.
7
Measurable mappings
In this chapter we consider maps T � X → X′ between two measurable spaces
�X� �� and �X′� �′� which respect the measurable structures, that is �-algebras,
on X and X′. Such maps can be used to transport a given measure �, defined
on �X� ��, onto �X′� �′�. We have already used this technique in Theorem 5.8,
where we considered shifts of sets: A � x + A, but it is in probability theory
where this concept is truly fundamental: you use it whenever you speak of the
‘distribution’ of a ‘random variable’.
7.1 Definition Let �X� ��� �X′� �′� be two measurable spaces. A map T � X → X′
is called �/�′-measurable (or measurable unless this is too ambiguous) if the
pre-image of every measurable set is a measurable set:
T −1�A′� ∈ � ∀ A′ ∈ �′� (7.1)
A random variable is a measurable map from a probability space to any measur-
able space.
Note that T −1��′� ⊂ � is a common shorthand for (7.1).
In the language of Definition 7.1 the translation (and its inverse) which we
used in Theorem 5.8 is a ���n�/���n�-measurable map:
x � �
n → �n
y �→ y − x and
−1x � �
n → �n
y �→ y + x� (7.2)
In fact, Theorem 5.8 states that
n�B� =
n�x + B� ∀ B ∈ ���n� (7.3)
and this requires x + B = −x�B� = −1x �B� to be a Borel set! Our proof of T5.8
needed (and proved) this for rectangles B ∈ � n and not for all Borel sets – but
49
50 R.L. Schilling
this is good enough even in the most general case. The following lemma shows
that measurability needs only to be checked for the sets of a generator.
7.2 Lemma Let �X� ��� �X′� �′� be measurable spaces and let �′ = ���′�. Then
T � X → X′ is �/�′-measurable if, and only if, T −1��′� ⊂ �, i.e. if
T −1�G′� ∈ � ∀ G′ ∈ �′� (7.4)
Proof If T is �/�′-measurable, we have T −1��′� ⊂ T −1��′� ⊂ �, and (7.4) is
obviously satisfied.
Conversely, consider the system �′ �= �A′ ⊂ X′ � T −1�A′� ∈ �
. By (7.4),
�′ ⊂ �′ and it is not difficult to see that �′ is itself a �-algebra since T −1
commutes with all set-operations.[�] Therefore,
�′ = ���′� ⊂ ���′� = �′ =⇒ T −1�A′� ∈ � ∀ A′ ∈ �′�
On a topological space �X� �� – see Appendix B – we consider usually the
(topological) Borel �-algebra ��X� �= ����. The interplay between measurability
and topology is often quite intricate. One of the simple and extremely useful
aspects is the fact that continuous maps are measurable; let us check this for �n.
7.3 Example Every continuous map T � �n → �m is �n/�m-measurable.
From calculus1 we know that T is continuous if, and only if,
T −1�U� ⊂ �n is open ∀ open U ⊂ �m� (7.5)
Since the open sets �m in �m generate the Borel �-algebra �m, we can use (7.5)
to deduce
T −1��m� ⊂ �n ⊂ ���n� = �n�
By Lemma 7.2, T −1��m� ⊂ �n which means that T is measurable.
Caution: Not every measurable map is continuous, e.g. x �→ 1�−1�1��x�.
7.4 Theorem Let �Xj � �j �, j = 1� 2� 3, be measurable spaces and T � X1 → X2,
S � X2 → X3 be �1/�2- resp. �2/�3-measurable maps. Then S T � X1 → X3
is �1/�3-measurable.
Proof For A3 ∈ �3 we have
�S T �−1�A3� = T −1
(
S−1�A3�︸ ︷︷ ︸
∈ �2
) ∈ T −1��2� ⊂ �1�
1 See also Appendix B, Theorem B.12 and B.19.
Measures, Integrals and Martingales 51
Often we find ourselves in a situation where T � X → X′ is given and where
X′ is equipped with a natural �-algebra �′ – e.g. if X′ = � and �′ = ���� –
but no �-algebra is specified in X. Then the question arises: is there a (small-
est) �-algebra on X which makes T measurable? An obvious, but nevertheless
useless, candidate is ��X�, which renders every map measurable.[�] From Exam-
ple 3.3(vii) we know that T −1��′� is a �-algebra in X but we cannot remove a
single set from it without endangering the measurability of T .[�] Let us formalize
this observation.
7.5 Definition (and Lemma) Let �Ti�i∈I be arbitrarily many mappings Ti � X →
Xi from the same space X into measurable spaces �Xi� �i�. The smallest
�-algebra on X that makes all Ti simultaneously measurable
[�] is
��Ti � i ∈ I� �= �
(⋃
i∈I
T −1i ��i�
)
� (7.6)
We say that ��Ti � i ∈ I� is generated by the family �Ti�i∈I .
Although T −1i ��i� is a �-algebra this is, in general, no longer true for⋃
i∈I T −1i ��i� if #I > 1; this explains why we have to use the �-hull in (7.6).
7.6 Theorem Let �X� ��� �X′� �′� be measurable spaces and T � X → X′ be an
�/�′-measurable map. For every measure � on �X� ��,
�′�A′� �= ��T −1�A′��� A′ ∈ �′� (7.7)
defines a measure on �X′� �′�.
Proof If A′ = ∅, then T −1�∅� = ∅ and �′�∅� = ��∅� = 0. If �A′j �j∈ ⊂ �′ is a
sequence of mutually disjoint sets, then
�′
(
·⋃
j∈
A′j
)
= �
(
T −1
(
·⋃
j∈
A′j
))
[�]= �
(
·⋃
j∈
T −1�A′j �
)
= ∑
j∈
�
(
T −1�A′j �
) = ∑
j∈
�′�A′j ��
Notice that we have seen a special case of Theorem 7.6 in the proof of
Theorem 5.8 when considering translates of Lebesgue measure:
n�x + B� =
� −1x �B��.
7.7 Definition The measure �′�•� of Theorem 7.6 is called the image measure
of � under T and is denoted by T����•� or � T −1�•�.
52 R.L. Schilling
7.8 Example Let ��� �� P� be a probability space and � � � → � be a random
variable, i.e. an �/�-measurable map. Then2
��P��A� = P��−1�A�� = P��� � ���� ∈ A
� = P�� ∈ A�
is again a probability measure, called the law or distribution of the random
variable �.
More concretely, if ��� �� P� describes throwing two fair dice, i.e. � �=
��j� k� � 1 � j� k � 6
, � = ���� and P���j� k�
� = 1/36, we could ask for the
total number of points thrown: � � � → �2� 3� � � � � 12
, ���j� k�� �= j + k, which
is a measurable map.[�] The law of � is then given in the table below:
j 2 3 4 5 6 7 8 9 10 11 12
P�� = j� 136 118 112 19 536 16 536 19 112 118 136
We close this section with some transformation formulae for Lebesgue measure.
Recall that O�n� is the set of all orthogonal n × n matrices: T ∈ O�n� if, and
only if, tT · T = id. Orthogonal matrices preserve lengths and angles, i.e. we have
for all x� y ∈ �n
�x� y� = �Tx� Ty� ⇐⇒ �x� = �Tx� (7.8)
where �x� y� = ∑nj=1 xj yj and �x�2 = �x� x� denote the usual Euclidean scalar
product, resp. norm.
7.9 Theorem If T ∈ O�n�, then
n = T�
n�.
Proof Since T is a linear orthogonal map it is continuous and by (7.8) even an
isometry,
�Tx − Ty� = �T�x − y�� = �x − y��
hence measurable by Example 7.3. Therefore, the image measure ��B� �=
n�T −1�B�� is well-defined (by T7.6) and satisfies for all x ∈ �n
��x + B� =
n�T −1�x + B�� =
n�T −1x + T −1B�
5.8=
n�T −1B�
= ��B�
2 We use the shorthand �� ∈ A
for �−1�A� and P�� ∈ A� for P��� ∈ A
�.
Measures, Integrals and Martingales 53
and, again by Theorem 5.8, ��B� = �
n�B� for all B ∈ ���n�. To determine
the constant � we choose B = B1�0�. Since T ∈ O�n�, (7.8) implies B1�0� = �x �
�x� < 1
= �x � �Tx� < 1
= T −1B1�0� and thus
n�B1�0�� =
n�T −1B1�0�� = ��B1�0�� = �
n�B1�0���
As 0 <
n�B1�0�� < �, we have � = 1, and the theorem follows.
Theorem 7.9 is a particular case of the following general change-of-variable
formula for Lebesgue measure. Recall that GL�n� �� is the set of all invertible
n × n matrices, i.e. S ∈ GL�n� �� ⇐⇒ det S �= 0.
7.10 Theorem Let S ∈ GL�n� ��. Then
S�
n� = � det S−1�
n = 1� det S�
n� (7.9)
Proof Since S is invertible, both S and S−1 are linear maps on �n, and as
such continuous and measurable (Example 7.3). Set ��B� �=
n�S−1�B�� for
B ∈ ���n�. Then we have for all x ∈ �n
��x + B� =
n�S−1�x + B�� =
n�S−1x + S−1B�
5�8=
n�S−1B�
= ��B��
and from Theorem 5.8 we conclude that
��B� = �(�0� 1�n)
n�B� =
n(S−1��0� 1�n�)
n�B��
From elementary geometry we know that S−1��0� 1�n� is a parallelepiped spanned
by the vectors S−1ej � j = 1� 2� � � � n, ej = �0� � � � � 0� 1︸ ︷︷ ︸
j
� 0� � � ��. Its geometric
volume is
vol�n�
(
S−1��0� 1�n�
) = � det S−1� = 1� det S� �
see also Appendix C. By Remark 6.7, vol�n� =
n (at least on the Borel sets) and
the proof is finished.
Theorem 7.9 or 7.10 allow us to complete the characterization of Lebesgue
measure announced earlier in Theorem 4.9. A motion is a linear transformation
of the form
Mx = x T
54 R.L. Schilling
where x�y� = y −x is a translation and T ∈ O�n� is an orthogonal map (
t
T · T = id).
In particular, congruent sets are connected by motions.
7.11 Corollary Lebesgue measure is invariant under motions:
n = M�
n� for
all motions M in �n. In particular, congruent sets have the same measure.
Proof We know that M is of the form x T . Since det T = ±1, we get
M�
n� = x�T�
n��
7.10= x�
n�
5.8=
n�
Problems
7.1. Use Lemma 7.2 to show that x of (7.2), i.e. x�y� = y − x� x� y ∈ �n, is �n/�n-
measurable.
7.2. Show that �′ defined in the proof of Lemma 7.2 is a �-algebra.
7.3. Let X be a set, �Xi� �i�, i ∈ I , be arbitrarily many measurable spaces, and Ti � X → Xi
be a family of maps.
(i) Show that for every i ∈ I the smallest �-algebra in X that makes Ti measurable
is given by T −1i ��i�.
(ii) Show that �
(⋃
i∈I T
−1
i ��i�
)
is the smallest �-algebra in X that makes all Ti,
i ∈ I , simultaneously measurable.
7.4. Let X be a set, �Xi� �i�� i ∈ I , be arbitrarily many measurable spaces, and Ti � X →
Xi be a family of maps. Show that a map f from a measurable space �F�
� to
�X� ��Ti � i ∈ I�� is measurable if, and only if, all maps Ti f are
/�i-measurable.
7.5. Use Problem 7.4 to show that a function f � �n → �m, x �→ �f1�x�� � � � � fm�x��
is ’take out’ measurable if, and only if, all coordinate maps fj � �
n → �, j =
1� 2� � � � � m, are measurable.
[Hint: show that the coordinate projections x = �x1� � � � � xn� �→ xj are measurable.]
7.6. Let T � �X� �� → �X′� �′� be a measurable map. Under which circumstances is the
family of sets T��� a �-algebra?
7.7. Use image measures to give a new proof of Problem 5.8, i.e. show that
n�t · B� = tn
n�B� ∀ B ∈ ���n�� ∀ t > 0�
7.8. Let T � X → Y be any map. Show that T −1������ = ��T −1���� holds for arbitrary
families � of subsets of Y .
7.9. Stieltjes measure (1). Throughout this exercise �X� �� = ��� �1� and
=
1 is
one-dimensional Lebesgue measure.
(i) Let � be a measure on ��� �1�. Show that F��x� �=
⎧
⎪⎪⎨
⎪⎪⎩
���0� x��� if x > 0
0� if x = 0
−���x� 0��� if x < 0
is a monotonically increasing and left-continuous function F� � � → �.
Measures, Integrals and Martingales 55
Remark. Increasing and left-continuous functions are called Stieltjes func-
tions.
(ii) Let F � � → � be a Stieltjes function (cf. part (i)). Show that
�F ��a� b�� �= F�b� − F�a�� ∀ a� b ∈ �� a < b�
has a unique extension to a measure on �1.
[Hint: check the assumptions of Theorem 6.1 with � = ��a� b� � a � b
.]
(iii) Use part (i) to show that every measure � on ��� �1� with ���−n� n�� < �,
n ∈ , can be written in the form �F as in (ii) with some Stieltjes function
F = F� as in (i).
(iv) Which Stieltjes function F corresponds to
?
(v) Which Stieltjes function F corresponds to �0?
(vi) Show that F� as in (i) is continuous at x ∈ � if, and only if, ���x
� = 0.
(vii) Show that every measure � on ��� �1� which has no atoms (see Problem 6.5)
can be written as image measure of
.
[Hint: � has no atoms implies that F� is continuous. So G = F −1 exists
and can be made left-continuous. Finally ���a� b�� = F��b� − F��a� =
�G−1��a� b�
� ]
(viii) Is (vii) true for measures with atoms, say, � = �0?
[Hint: determine F −1�0 . Is it measurable?]
7.10. Cantor’s ternary set. Let �X� �� = (�0� 1�� �0� 1� ∩ �1),
=
1��0�1�, and set
E0 = �0� 1�. Remove the open middle third of E0 to get E1 = I 11 ∪· I 21 . Remove the
open middle thirds of I j1 , j = 1� 2, to get E2 = I 12 ∪· I 22 ∪· I 32 ∪· I 42 and so forth.
(i) Make a sketch of E0� E1� E2� E3.
(ii) Prove that each En is compact. Conclude that C �=
⋂
n∈ 0 En is non-void and
compact.
(iii) The set C is called the Cantor set or Cantor’s discontinuum. It satisfies
C ∩⋃n∈
⋃
k∈ 0
(
3k+1
3n
� 3k+2
3n
) = ∅.
(iv) Find the value of
�En� and show that
�C� = 0.
(v) Show that C does not contain any open interval. Conclude that the interior
(of the closure) of C is empty.
Remark. Sets with empty interior are called nowhere dense.
(vi) We can write x ∈ �0� 1� as a base-3 ternary fraction, i.e. x = 0�x1x2x3 � � � where
xj ∈ �0� 1� 2
, which is short for x =
∑�
j=1 xj 3
−j . (E.g. 1
3
= 0�1 = 0�02222 � � �;
note that this representation is not unique[�], which is important for this
exercise.)
Show that x ∈ C if, and only if, x has a ternary representation involving only
0s and 2s.
[Hint: the numbers in � 1
3
� 2
3
�, the first interval to be removed, are all of the
form 0�1 ∗ ∗ ∗ ���, i.e. they contain at least one ‘1’, while in �0� 1
3
� and � 2
3
� 1� we
have numbers of the form 0�0 ∗ ∗ ∗ � � �–0�022222 � � � and 0�2 ∗ ∗ ∗ � � �–0�2222 � � �,
56 R.L. Schilling
respectively. The next step eliminates the 0�01 ∗ ∗ ∗ � � �s and 0�21 ∗ ∗ ∗ � � �s –
etc.]
(vii) Use (vi) to show that C is not countable and has even the same cardinality as
�0� 1�. Nevertheless,
�C� = 0 �= 1 =
��0� 1��.
7.11. Factorization lemma. Let X be a set, �Y� �� be a measurable space and T � X → Y
be a surjective map. Show that a function f � X → � is ��T �/�1-measurable if, and
only if, there exists some �/�1-measurable function g � Y → � such that f = g T .
[Hint: show first that T�x� = T�x′� implies f�x� = f�x′�.]
Remark. The result is actually true for any map T � X → Y , but the proof is quite
difficult if T�X� �∈ �. The problem is that one has to extend the T�X� ∩ �/�1
measurable function g � T�X� → � to an �/�1-measurable function g � Y → �.
8
Measurable functions
A measurable function is a measurable map u � X → � from some measurable
space �X� �� to ��� �����. Measurable functions will play a central rôle in the
theory of integration. Recall that u � X → � is �/�-measurable1 (� = ����) if
u−1�B� ∈ � ∀ B ∈ � (8.1)
which is, due to Lemma 7.2, equivalent to
u−1�G� ∈ � ∀ G from a generator � of �� (8.2)
As we have seen in Remark 3.9, � is generated by all sets of the form �a� ��
(or �b� �� or �−�� c� or �−�� d�) with a� b� c� d ∈ � or �, and we need
u−1��a� ��� = x ∈ X � u�x� ∈ �a� ��
= x ∈ X � u�x� � a
∈ �� (8.3)
with similar expressions for the other types of intervals. Let us introduce the
following useful shorthand notation:
u � v
�= x ∈ X � u�x� � v�x�
(8.4)
and u > v
� u � v
� u < v
� u = v
� u �= v
� u ∈ B
, etc. which are defined in
a similar fashion.
In this new notation measurability of functions reads as
8.1 Lemma Let �X� �� be a measurable space. The function u � X → � is
�/�-measurable if, and only if, one, hence all, of the following conditions hold
(i) u � a
∈ � for all a ∈ � (or all a ∈ �),
(ii) u > a
∈ � for all a ∈ � (or all a ∈ �),
1 We will frequently drop the � since � is naturally equipped with the Borel �-algebra and just say that u is
�-measurable.
57
58 R.L. Schilling
(iii) u � a
∈ � for all a ∈ � (or all a ∈ �),
(iv) u < a
∈ � for all a ∈ � (or all a ∈ �).
Proof Combine Remark 3.9 and Lemma 7.2.
It is sometimes practical to admit the values +� and −� in some calculations.
To do this properly, consider the extended real line �̄ �= �−�� +��. If we agree
that −� < x and y < +� for all x� y ∈ �, then �̄ inherits the ordering from �
as well as the usual rules of addition and multiplication of elements from �. The
latter need to be augmented as follows: for all x ∈ � we have
x + �+�� = �+�� + x = +�� x + �−�� = �−�� + x = −��
�+�� + �+�� = +�� �−�� + �−�� = −��
and, if x ∈ �0� ��,
�±x��+�� = �+���±x� = ±��
�±x��−�� = �−���±x� = ∓��
0 · �±�� = �±�� · 0 = 0�2 1±� = 0�
Caution: �̄ is not a field. Expressions of the form
� − � and ±�±� must be avoided�2
Functions which take values in �̄ are called numerical functions. The Borel
�-algebra �̄ = ���̄� is defined by
B∗ ∈ �̄ ⇐⇒ B
∗ = B ∪ S for some B ∈ � and
S ∈ {∅� −�
� +�
� −�� +�
} (8.5)
and it is not hard to see that �̄ is again a �-algebra whose trace w.r.t. � is
����.[�]
8.2 Lemma ���� = � ∩ ���̄�.
Moreover,
8.3 Lemma �̄ is generated by all sets of the form �a� �� (or �b� �� or �−�� c�
or �−�� d�) where a (or b� c� d) is from � or �.
2 Conventions are tricky. The rationale behind our definitions is to understand ‘±�’ in every instance as
the limit of some (possibly each time different) sequence, and ‘0’ as a bona fide zero. Then 0 · �±�� =
0 · limn an = limn�0 · an� = limn 0 = 0 while expressions of the type � − � or ±�±� become limn an − limn bn
or limn an
limn bn
where two sequences compete and do not lead to unique results.
Measures, Integrals and Martingales 59
Proof Set � �= �� �a� �� � a ∈ �
�. Since
�a� �� = �a� �� ∪ +�
and �a� �� ∈ ��
we see that �a� �� ∈ �̄ and � ⊂ �̄. On the other hand,
�a� b� = �a� �� \ �b� �� ∈ � ∀ − � < a � b < �
which means that � ⊂ � ⊂ �̄. Since also
+�
= ⋂
j∈�
�j� ��� −�
= ⋂
j∈�
�−�� −j� = ⋂
j∈�
�−j� ��c
we have −�
� +�
∈ � which entails that all sets of the form
B� B ∪ +�
� B ∪ −�
� B ∪ −�� +�
∈ �� ∀ B ∈ ��
therefore, �̄ ⊂ �.
The proofs for a ∈ � and the other generating systems are similar.
8.4 Definition Let �X� �� be a measurable space. We write � �= ���� and
��̄ �= ��̄��� for the families of real-valued �/�-measurable and numerical
�/�̄-measurable functions on X.
8.5 Examples Let �X� �� be a measurable space.
(i) The indicator function f�x� �= 1A�x� is measurable if, and only if, A ∈ �.
This follows easily from Lemma 8.1 and
1A >
=
⎧
⎪⎨
⎪⎩
∅� if
� 1
A� if 1 >
� 0
X� if
< 0�
(ii) Let A1� A2� � � � � AM ∈ � be mutually disjoint sets and y1� � � � � yM ∈ �. Then
the function
g�x� �=
M∑
j=1
yj 1Aj �x� (8.6)
is measurable.
This follows from Lemma 8.1 and the fact (compare with the picture!) that
g >
= ·⋃
j � yj >
Aj ∈ ��
60 R.L. Schilling
A1 A2 A3 A4 A1
y3
y1
λ λ
y2
y4
{ f > λ } = A2 · A4
Functions of the form (8.6) are the building blocks for all measurable functions
as well as for the definition of the integral.
8.6 Definition A simple function g � X → � on a measurable space �X� ��
is a function of the form (8.6) with finitely many sets A1� � � � � AM ∈ � and
y1� � � � � yM ∈ �. The set of simple functions is denoted by or ���.
If the sets Aj � 1 � j � M, are mutually disjoint we call
M∑
j=0
yj 1Aj �x� (8.7)
with y0 �= 0 and A0 �= �A1 ∪ � � � ∪ AM �c a standard representation of g.
Caution: The representations (8.7) are not unique.
8.7 Examples (continued)
(iii) If a measurable function h � X → � attains only finitely many values
y1� y2� � � � � yM ∈ �, then it is a simple function.
Indeed: set Bj �= h = yj
= h � yj
\ h < yj
∈ �� j = 1� 2� � � � � M� and
note that the Bj are disjoint. Thus
h�x� =
M∑
j=1
yj 1Bj �x� =
M∑
j=1
yj 1 h=yj
�x��
Since every simple function attains only finitely many values, this shows that
every simple function has at least one standard representation. In particular,
��� ⊂ ���� consists of measurable functions.
(iv) f� g ∈ ��� =⇒ f ± g� f g ∈ ���.
Indeed: let f = ∑Mj=0 yj 1Aj and g =
∑N
k=0 zk 1Bk be standard representations
of f and g.
Measures, Integrals and Martingales 61
f g
It is not hard to see (use the picture!) that
f ± g =
M∑
j=0
N∑
k=0
�yj ± zk� 1Aj ∩Bk
fg =
M∑
j=0
N∑
k=0
yj zk 1Aj ∩Bk
and that �Aj ∩ Bk� ∩ �Aj′ ∩ Bk′ � = ∅ whenever �j� k� �= �j′� k′�. After rela-
belling and merging the double indexation into a single index, this shows
that f ± g� fg ∈ ���. Notice that �Aj ∩ Bk�j�k is the common refinement
of the partitions �Aj �j and �Bk�k and that inside each of the sets Aj ∩ Bk the
functions f and g do not change their respective values.
(v) f ∈ ��� =⇒ f +� f − ∈ ���.
Here we use the following notation: for a function u � X → � we write for the
– u–u+
u+�x� �= max u�x�� 0
u−�x� �= − min u�x�� 0
(8.8)
for the positive �u+� and negative �u−� parts of u. Obviously,
u = u+ − u− and �u� = u+ + u−� (8.9)
(vi) f ∈ ��� =⇒ �f � ∈ ���.
Our next theorem reveals the fundamental rôle of simple functions.
8.8 Theorem Let �X� �� be a measurable space. Every �/�̄-measurable numer-
ical function u � X → �̄ is the pointwise limit of simple functions: u�x� =
limj→� fj �x�, fj ∈ ��� and �fj� � �u�.
If u � 0, all fj can be chosen to be positive and increasing towards u so that
u = supj∈� fj .
62 R.L. Schilling
Proof Assume first that u � 0. Fix j ∈ � and define level sets
A
j
k �=
{{
k2−j � u < �k + 1�2−j} k = 0� 1� 2� � � � j2j − 1{
u � j
}
k = j2j
which slice up the graph of u horizontally
as shown in the picture. The approximat-
ing simple functions are
fj �x� �=
j2j∑
k=0
k2−j 1
A
j
k
�x�
j
2–j
fj
and from the picture it is easy to see that
• �fj �x� − u�x�� � 2−j if x ∈ u < j
;
• Ajk =
{
k2−j � u
}∩{u < �k + 1�2−j}� u � j
∈ �;
• 0 � fj � u and fj ↑ u.
For a general u, we consider its positive and negative parts u±. Since
u+ >
=
{
u >
� if
� 0�
∅� if
< 0�
and since u− = �−u�+, we have u± >
∈ � for all
∈ �̄. Thus u± are positive
measurable functions, and we can construct, as above, simple functions gj ↑ u+
and hj ↑ u−. Clearly, fj �= gj − hj
j→�−−−→ u+ − u− = u as well as �fj� = gj + hj �
u+ + u− = �u�, and we are done.
8.9 Corollary Let �X� �� be a measurable space. If uj � X → �̄, j ∈ �, are
measurable functions, then so are
sup
j∈�
uj � inf
j∈�
uj � lim sup
j→�
uj � lim inf
j→�
uj �
and, whenever it exists, limj→� uj .
Before we prove Corollary 8.9 let us stress again that expressions of the type
supj∈� uj or uj
j→�−−−→ u, etc. are always understood in a pointwise, x-by-x sense,
i.e. they are short for supj∈� uj �x� �= sup uj �x� � j ∈ �
or limj→� uj �x� = u�x�
at each x (or for a specified range).
The infimum ‘inf ’ and supremum ‘sup’ are familiar from calculus. Recall the
following useful formula
inf
j∈�
uj �x� = − sup
j∈�
�−uj �x�� (8.10)
Measures, Integrals and Martingales 63
which allows us to express an inf as a sup, and vice versa. Recall also the
definition of the lower resp. upper limits lim inf and lim sup,
lim inf
j→�
uj �x� �= sup
k∈�
(
inf
j�k
uj �x�
)
= lim
k→�
(
inf
j�k
uj �x�
)
� (8.11)
lim sup
j→�
uj �x� �= inf
k∈�
(
sup
j�k
uj �x�
)
= lim
k→�
(
sup
j�k
uj �x�
)
� (8.12)
more details can be found in Appendix A.
In the extended real line �̄ lim inf and lim sup always exist – but they may
attain the values +� and −� – and we have
lim inf
j→�
uj �x� � lim sup
j→�
uj �x�� (8.13)
Moreover, limj→� uj �x� exists [and is finite] if, and only if, upper and lower
limits coincide lim inf j→� uj �x� = lim supj→� uj �x� [and are finite]; in this case
all three limits have the same value.
Proof (of Corollary 8.9) We show that supj uj and �−1�u = −u (for a measurable
function u) are again measurable. Observe that for all a ∈ �
{
sup
j∈�
uj > a
}
= ⋃
j∈�
uj > a
︸ ︷︷ ︸
∈ �
∈ ��
The inclusion ‘⊃’ is trivial since a < uj �x� � supj∈� uj �x� always holds; the
direction ‘⊂’ follows by contradiction: if uj �x� � a for all j ∈ �, then also
supj∈� uj �x� � a. This proves the measurability of supj∈� uj .
If u is measurable, we have for all a ∈ �
−u > a
= u < −a
∈ ��
which shows that −u is also measurable.
The measurability of inf j∈� uj , lim inf j→� uj and lim supj→� uj follows now
from formulae (8.10)–(8.12), which can be written down in terms of supj s
and several multiplications by �−1�. If limj→� uj exists, it coincides with
lim inf j→� uj = lim supj→� uj and inherits their measurability.
8.10 Corollary Let u� v be �/�̄-measurable numerical functions. Then the func-
tions
u ± v� uv� u ∨ v �= max u� v
� u ∧ v �= min u� v
(8.14)
are �/�̄-measurable (whenever they are defined).
64 R.L. Schilling
u
υmin{u, υ}
max{u, υ}
The maximum u ∨ v and minimum u ∧ v of two functions is always meant
pointwise, i.e.
�u ∨ v��x� = max u�x�� v�x�
[�]= 1
2
(
u�x� + v�x� + �u�x� − v�x��)�
�u ∧ v��x� = min u�x�� v�x�
[�]= 1
2
(
u�x� + v�x� − �u�x� − v�x��)�
Proof (of Corollary 8.10) If u� v ∈ are simple functions, all functions in (8.14)
are again simple functions[�] and, therefore, measurable. For general u� v ∈ ��̄
choose sequences �fj �j∈�� �gj �j∈� ⊂ of simple functions such that fj
j→�−−−→ u
and gj
j→�−−−→ v. The claim now follows from the usual rules for limits.
8.11 Corollary A function u is �/�̄-measurable if, and only if, u± are �/�̄-
measurable.
8.12 Corollary If u� v are �/�̄-measurable numerical functions, then
u < v
� u � v
� u = v
� u �= v
∈ ��
Let us finally show an interesting result on the structure of ��T �-measurable
functions; see also Problem 7.11 of the previous chapter.
8.13 Lemma (Factorization lemma) Let T � �X� �� → �X′� �′� be an �/�′-
measurable map and let ��T � ⊂ � be the �-algebra generated by T . Then
u = w�T � for some �′/�̄-measurable function w � X′ → �̄ if, and only if, u �
X → �̄ is ��T �/�̄-measurable.
Proof Suppose that u is ��T �-measurable. If u is an indicator function, u = 1A
with A ∈ ��T �, we know from the definition of ��T � that A = T −1�A′� for some
A′ ∈ �′. Thus
u = 1A = 1T −1�A′� = 1A′ � T�
and w �= 1A′ will do. This consideration remains true for simple functions u ∈
���T �� since they are just sums of scalar multiples of indicator functions;[�]
hence u = w � T for a suitable w ∈ ��′�.
Measures, Integrals and Martingales 65
We can now use Theorem 8.8 and approximate the ��T �-measurable function
u by a sequence �uj �j∈� ⊂ ���T ��. By what was said above, uj = wj � T for
suitable wj ∈ ��′�. Then w �= lim supj→� wj is measurable by C8.9 and satisfies
w�T � = lim sup
j→�
wj �T �
�∗�= lim
j→�
wj �T � = lim
j→�
uj = u
where we used for the equality marked �∗� the fact that the limit limj→� uj exists.
The converse, that u �= w � T is ��T �-measurable, is obvious.
Problems
8.1. Show directly that condition (i) of Lemma 8.1 is equivalent to either of (ii), (iii), (iv).
8.2. Verify that �̄ = ���̄� defined in (8.5) is a �-algebra. Moreover, prove that
���� = � ∩ ���̄�.
8.3. Let �X� �� be a measurable space.
(i) Let f� g � X → � be measurable functions. Show that for every A ∈ � the
function h�x� �= f�x�, if x ∈ A, and h�x� �= g�x�, if x �∈ A, is measurable.
(ii) Let �fj �j∈� be a sequence of measurable functions and let �Aj �j∈� ⊂ � such
that
⋃
j∈� Aj = X. Suppose that fj �Aj ∩Ak = fk�Aj ∩Ak for all j� k ∈ � and set
f�x� �= fj �x� if x ∈ Aj . Show that f � X → � is measurable.
8.4. Let �X� �� be a measurable space and let � � � be a sub-�-algebra. Show that
���� � ����.
8.5. Show that f ∈ implies that f ± ∈ . Is the converse valid?
8.6. Show that for every real-valued function u = u+ − u− and �u� = u+ + u−.
8.7. Scrutinize the proof of Theorem 8.8 and check that bounded [positive] measurable
functions u ∈ ���� can be approximated uniformly by an [increasing] sequence
�fj �j∈� ⊂ ��� of [positive] simple functions.
8.8. Show that every continuous function u � � → � is �/� measurable.
[Hint: check that for continuous functions f > �
is an open set.]
8.9. Show that x �→ max x� 0
and x �→ min x� 0
are continuous, and by Problem 8.8 or
Example 7.3, measurable functions from � → �. Conclude that on any measurable
space �X� �� positive and negative parts u± of a measurable function u � X → �
are measurable.
8.10. Check that the approximating sequence �fj �j∈� for u in Theorem 8.8 consists of
��u�-measurable functions.
8.11. Complete the proofs of Corollaries 8.11 and 8.12.
8.12. Let u � � → � be differentiable. Explain why u and u′ = du/dx are measurable.
8.13. Find ��u�, i.e. the �-algebra generated by u, for the following functions:
f� g� h � � → �� (i) f�x� = x� (ii) g�x� = x2� (iii) h�x� = �x��
F� G � �2 → �� (iv) F�x� y� = x + y� (v) G�x� y� = x2 + y2�
66 R.L. Schilling
[Hint: under f� g� h the pre-images of intervals are (unions of) intervals, under F
we get strips in the plane, under G annuli and discs.]
8.14. Consider ��� �� and u � � → �. Show that x
∈ ��u� for all x ∈ � if, and only
if, u is injective.
8.15. Let
be one-dimensional Lebesgue measure. Find
� u−1, if u�x� = �x�.
8.16. Let u � � → � be measurable. Which of the following functions are measurable:
u�x − 2� eu�x� sin�u�x� + 8� u′′�x� sgn u�x − 7�?
8.17. One can show that there are non-Borel measurable sets A ⊂ �, cf. Appendix D.
Taking this fact for granted, show that measurability of �u� does not, in general,
imply the measurability of u. (The converse is, of course, true: measurability of
u always guarantees that of �u�.)
8.18. Show that every increasing function u � � → � is �/� measurable. Under which
additional condition(s) do we have ��u� = �?
[Hint: show that u <
is an interval by distinguishing three cases: u is contin-
uous and strictly increasing when passing the level
, u jumps over the level
� u
is ‘flat’ at level
. Make a picture of these situations.]
9
Integration of positive functions
Throughout this chapter �X� �� �� will be some measurable space.
Recall that �+ [�+
�̄
] are the �-measurable positive real [numerical] functions
and � [�+] are the [positive] simple functions.
The fundamental idea of integration is to measure the area between the graph
of a function and the abscissa. For a positive simple function f ∈ �+ in standard
representation1 this is easily done:
if f =
M∑
j=0
yj 1Aj ∈ �+ then
M∑
j=0
yj ��Aj � (9.1)
should be the �-area enclosed by the graph and the abscissa.
f
yj
µ (Aj)
There is only the problem that (9.1) might depend on the particular (standard)
representation of f – and this should not happen.
9.1 Lemma Let
∑M
j=0 yj 1Aj =
∑N
k=0 zk 1Bk be two standard representations of
the same function f ∈ �+. Then
M∑
j=0
yj ��Aj � =
N∑
k=0
zk ��Bk��
1 In the sense of Definition 8.6. By 8.5(iii) every f ∈ � has a standard representation.
67
68 R.L. Schilling
Proof Since A0 ∪· A1 ∪· � � � ∪· AM = X = B0 ∪· B1 ∪· � � � ∪· BN we get
Aj =
N
·⋃
k=0
�Aj ∩ Bk� and Bk =
M
·⋃
j=0
�Bk ∩ Aj ��
Using the (finite) additivity of � we see that
M∑
j=0
yj ��Aj � =
M∑
j=0
yj
N∑
k=0
��Aj ∩ Bk� =
M∑
j=0
N∑
k=0
yj ��Aj ∩ Bk� (9.2)
(since all yj are positive, the above sums always exist in �0� ��). Similarly,
N∑
k=0
zk ��Bk� =
N∑
k=0
zk
M∑
j=0
��Aj ∩ Bk� =
N∑
k=0
M∑
j=0
zk ��Aj ∩ Bk�� (9.3)
But yj = zk whenever Aj ∩ Bk �= ∅, while for Aj ∩ Bk = ∅ we have ��Aj ∩ Bk� =
��∅� = 0. Thus
yj ��Aj ∩ Bk� = zk ��Aj ∩ Bk� ∀ �j� k��
and (9.2) and (9.3) have the same value.
Lemma 9.1 justifies the following definition based on (9.1).
9.2 Definition Let f = ∑Mj=0 yj 1Aj ∈ �+ be a simple function in standard repre-
sentation. Then the number
I��f � =
M∑
j=0
yj ��Aj � ∈ �0� ��
(which is independent of the representation of f ) is called the (�-)integral of f .
9.3 Properties (of I� �
+ → �0� ��). Let f� g ∈ �+. Then
(i) I��1A� = ��A� ∀ A ∈ �;
(ii) I��
f � =
I��f � ∀
� 0; (positive homogeneous)
(iii) I��f + g� = I��f � + I��g�; (additive)
(iv) f � g =⇒ I��f � � I��g�. (monotone)
Proof (i) and (ii) are obvious from the definition of I�. (iii): take standard
representations
f =
M∑
j=0
yj 1Aj and g =
N∑
k=0
zk 1Bk
Measures, Integrals and Martingales 69
and observe that, as in Example 8.5(iv),
f + g =
M∑
j=0
N∑
k=0
�yj + zk� 1Aj ∩Bk ∈ �+
is a standard representation of f + g. Thus
I��f + g� =
M∑
j=0
N∑
k=0
�yj + zk� ��Aj ∩ Bk�
=
M∑
j=0
yj
N∑
k=0
��Aj ∩ Bk� +
N∑
k=0
zk
M∑
j=0
��Aj ∩ Bk�
�9�2���9�3�=
M∑
j=0
yj ��Aj � +
N∑
k=0
zk ��Bk�
= I��f � + I��g��
(iv): If f � g, then g = f + �g − f � where g − f ∈ �+, see examples 8.7(iv).
By part (iii) of this proof,
I��g� = I��f � + I��g − f � � I��f �
since I��•� is positive.
In Theorem 8.8 we have seen that every u ∈ �+ can be written as an increasing
limit of simple functions; by Corollary 8.9, suprema of simple functions are again
measurable, so that
u ∈ �+ ⇐⇒ u = sup
j∈�
fj � fj ∈ �+� fj � fj+1 � � � �
We will use this to ‘inscribe’ simple functions (which we know how to integrate)
below the graph of a positive measurable function u and exhaust the �-area
below u.
9.4 Definition Let �X� �� �� be a measure space. The (�-)integral of a positive
numerical function u ∈ �+
�̄
is given by
∫
u d� = sup{I��g� g � u� g ∈ �+
} ∈ �0� ��� (9.4)
If we need to emphasize the integration variable, we also write
∫
u�x� ��dx� or∫
u�x� d��x�.
The key observation is that the integral
∫
� � � d� extends I�, i.e.
70 R.L. Schilling
9.5 Lemma For all f ∈ �+ we have ∫ f d� = I��f �.
Proof Let f ∈ �+. Since f � f , f is an admissible function in the supremum
appearing in (9.4), hence
I��f � � sup
{
I��g� g � f� g ∈ �+
} def=
∫
f d��
On the other hand, �+ � g � f implies that I��g� � I��f � by Properties 9.3(iv),
and ∫
f d�
def= sup{I��g� g � f� g ∈ �+
}
� I��f ��
The next result is the first of several convergence theorems. It shows, in
particular, that we could have defined (9.4) using any increasing sequence fj ↑ u
of simple functions fj ∈ �+.
9.6 Theorem (Beppo Levi) Let �X� �� �� be a measure space. For an increasing
sequence of numerical functions �uj �j∈� ⊂ �+�̄ , 0 � uj � uj+1 � � � � � we have
u = supj∈� uj ∈ �+�̄ and
∫
sup
j∈�
uj d� = sup
j∈�
∫
uj d�� (9.5)
Note that we can write limj→� instead of supj∈� in (9.5) since the supremum
of an increasing sequence is its limit. Moreover, (9.5) holds in �0� +��, i.e. the
case ‘+� = +�’ is possible.
Proof (of Theorem 9.6) That u ∈ �+
�̄
follows from Corollary 8.9.
Step 1. Claim: u� v ∈ �+
�̄
� u � v =⇒ ∫ u d� � ∫ v d�.
This follows from the monotonicity of the supremum since every simple f ∈ �+
with f � u also satisfies f � v, and so
∫
u d� = sup{I��f � f � u� f ∈ �+
}
� sup
{
I��f � f � v� f ∈ �+
} =
∫
v d��
Step 2. Claim: supj∈�
∫
uj d� �
∫
supj∈� uj d�; this shows ‘�’ in (9.5).
Because of step 1 and uj � u = supj∈� uj we see
∫
uj d� �
∫
u d� ∀ j ∈ ��
Measures, Integrals and Martingales 71
The right-hand side is independent of j, so that we may take the supremum over
all j ∈ � on the left.
Step 3. Claim: f � u� f ∈ �+ =⇒ I��f � � supj∈�
∫
uj d�.
This will prove ‘�’ in (9.5) since the right-hand side does not depend on f and
so we may take the supremum over all f ∈ �+ with f � u on the left (which is,
by definition, the integral
∫
u d�).
To prove the claim we fix some f ∈ �+, f � u. Since u = supj∈� uj we can
find[�] for every � ∈ �0� 1� and every x ∈ X some N�x� �� ∈ � with
� f�x� � uj �x� ∀ j � N�x� ���
which means that the sets Bj = ��f � uj
increase as j ↑ � towards X and are,
by Corollary 8.12, measurable as f� uj ∈ �+�̄ . By the very definition of the Bj
� 1Bj f � 1Bj uj � uj
and, if f = ∑Mk=0 yk 1Ak , we get from Lemma 9.5 and step 1
�
M∑
k=0
yk ��Ak ∩ Bj � = I���1Bj f � �
∫
uj d� � sup
j∈�
∫
uj d�� (9.6)
At this point we use the �-additivity of � (in the guise of T4.4(iii)) to get
��Ak ∩ Bj � ↑ ��Ak ∩ X� = ��Ak� as Bj ↑ X� j ↑ ��
which implies (the far right of (9.6) no longer depends on j)
� I��f � = �
M∑
k=0
yk ��Ak� � sup
j∈�
∫
uj d��
Since we were free in our choice of � ∈ �0� 1�, we can make � → 1, and the
claim and the theorem follow.
One can see the next corollary just as a special case of Theorem 9.6. Its true
meaning, however, is that it allows us to calculate the integral of a measurable
function using any approximating sequence of elementary functions—and this is
a considerable simplification of the original definition (9.4).
9.7 Corollary Let u ∈ �+
�̄
. Then
∫
u d� = lim
j→�
∫
fj d�
holds for every increasing sequence �fj �j∈� ⊂ �+ with limj→� fj = u.
72 R.L. Schilling
9.8 Properties (of the integral) Let u� v ∈ �+
�̄
. Then
(i)
∫
1A d� = ��A� ∀ A ∈ �;
(ii)
∫
�u d� = �
∫
u d� ∀ � � 0; (positive homogeneous)
(iii)
∫
�u + v� d� =
∫
u d� +
∫
v d�; (additive)
(iv) u � v =⇒
∫
u d� �
∫
v d�. (monotone)
Proof (i) follows from Properties 9.3(i) and Lemma 9.5 and (ii), (iii) follow from
the corresponding properties of I�, Corollary 9.7 and the usual rules for limits.
(iv) has been proved in step 1 of the proof of Theorem 9.6.
9.9 Corollary Let �uj �j∈� ⊂ �+�̄ . Then
∑�
j=1 uj is measurable and we have
∫ �∑
j=1
uj d� =
�∑
j=1
∫
uj d� (9.7)
(including the possibility +� = +�).
Proof Set sM = u1 + u2 + · · · + uM and apply Properties 9.8(iii) and T9.6.
9.10 Examples Let �X� �� be a measurable space.
(i) Let � = �y be the Dirac measure for fixed y ∈ X. Then
∫
u d�y =
∫
u�x� �y�dx� = u�y� ∀ u ∈ �+�̄ �
Indeed: for any f ∈ �+ with standard representation f = ∑Mj=0 �j 1Aj , we
know that y ∈ X lies in exactly one of the Aj , say y ∈ Aj0 . Then
∫
f�x� �y�dx� =
∫ M∑
j=0
�j 1Aj �x� �y�dx� =
M∑
j=0
�j �y�Aj � = �j0 = f�y��
Now take any sequence of simple functions fk ↑ u. By Corollary 9.7∫
u�x� �y�dx� = lim
k→�
∫
fk�x� �y�dx� = lim
k→�
fk�y� = u�y��
(ii) Let �X� �� �� = (�� �����∑�j=1 �j �j
)
. As we have seen in Problem 4.6(ii),
� is indeed a measure and ���k
� = �k. On the other hand, all ����-
measurable functions u ∈ �+
�̄
are of the form
u�k� =
�∑
j=1
uj 1�j
�k� ∀ k ∈ �
Measures, Integrals and Martingales 73
for a suitable sequence �uj �j∈� ⊂ �0� ��.2 Thus by Corollary 9.9,
∫
u d� =
∫ �∑
j=1
uj 1�j
d� =
�∑
j=1
∫
uj 1�j
d�
=
�∑
j=1
uj ���j
� =
�∑
j=1
uj �j �
We close this chapter with another convergence theorem due to P. Fatou and
which is often called Fatou’s lemma.
9.11 Theorem (Fatou) Let �uj �j∈� ⊂ �+�̄ be a sequence of positive measurable
numerical functions. Then u = lim inf j→� uj is measurable and
∫
lim inf
j→�
uj d� � lim inf
j→�
∫
uj d��
Proof Recall that lim inf j→� uj = supk∈� inf j�k uj always exists in �̄; the mea-
surability of lim inf was shown in C8.9. Applying T9.6 to the increasing sequence(
inf j�k uj
)
k∈� – which is in �
+
�̄
by C8.9 – we find
∫
lim inf
j→�
uj d�
9.6= sup
k∈�
(∫
inf
j�k
uj d�
)
9.8(iv)
� sup
k∈�
(
inf
��k
∫
u� d�
)
= lim inf
�→�
∫
u� d�
where we used that inf
j�k
uj � u� for all � � k and the monotonicity of the integral,
cf. Properties 9.8(iv).
Problems
9.1. Let f X → � be a positive simple function of the form f�x� = ∑mj=1 �j 1Aj �x�,
�j � 0, Aj ∈ �—but not necessarily disjoint. Show that I��f � =
∑m
j=1 �j ��Aj �.
[Hint: use additivity and positive homogeneity of I�.]
9.2. Complete the proof of Properties 9.8 (of the integral).
9.3. Find an example showing that an ‘increasing sequence of functions’ is, in general,
different from a ‘sequence of increasing functions’.
2 This means that we can identify ����-measurable functions f � → �0� �� and arbitrary sequences �uj �j∈� ⊂
�0� �� by u�k� = uk.
74 R.L. Schilling
9.4. Complete the proof of Corollary 9.9 and show that (9.7) is actually equivalent to
(9.5) in Beppo Levi’s theorem 9.6.
9.5. Let �X� �� �� be a measure space and u ∈ �+���. Show that the set-function
A �→ ∫ 1A u d�, A ∈ �, is a measure.
9.6. Prove: Every function u � → � on ��� ����� is measurable.
9.7. Let �X� �� be a measurable space and ��j �j∈� be a sequence of measures thereon.
Set, as in 9.10(ii), � = ∑j∈� �j . By Problem 4.6(ii) this is again a measure. Show
that ∫
u d� = ∑
j∈�
∫
u d�j ∀ u ∈ �+����
[Instructions: (1) consider u = 1A. (2) consider u = f ∈ �+. (3) approximate
u ∈ �+ by an increasing sequence of simple functions and use Theorem 9.6. To
interchange increasing limits/suprema use the hint to Problem 4.6(ii).]
9.8. Reverse Fatou lemma. Let �X� �� �� be a measure space and �uj �j∈� ⊂ �+���.
If uj � u for all j ∈ � and some u ∈ �+��� with
∫
u d� < �, then
lim sup
j→�
∫
uj d� �
∫
lim sup
j→�
uj d��
9.9. Fatou’s lemma for measures. Let �X� �� �� be a measure space and let �Aj �j∈�,
Aj ∈ �, be a sequence of measurable sets. We set
lim inf
j→�
Aj =
⋃
k∈�
⋂
j�k
Aj and lim sup
j→�
Aj =
⋂
k∈�
⋃
j�k
Aj � (9.8)
(i) Prove that 1lim inf
j→�
Aj
= lim inf
j→�
1Aj and 1lim sup
j→�
Aj
= lim sup
j→�
1Aj .
[Hint: check first that 1⋂
j∈� Aj = inf j∈� 1Aj and 1⋃j∈� Aj = supj∈� 1Aj .]
(ii) Prove that �
(
lim inf
j→�
Aj
)
� lim inf
j→�
��Aj �.
(iii) Prove that lim sup
j→�
��Aj � � �
(
lim sup
j→�
Aj
)
if � is a finite measure.
(iv) Provide an example showing that (iii) fails if � is not finite.
9.10. Let �Aj �j∈� ⊂ � be a sequence of disjoint sets such that ·
⋃
j∈�Aj = X. Show that
for every u ∈ �+���
∫
u d� =
�∑
j=1
∫
1Aj u d��
Use this to construct on a �-finite measure space �X� �� �� a function w which
satisfies w�x� > 0 for all x ∈ X and ∫ w d� < �.
9.11. Kernels. Let �X� �� �� be a measure space. A map N X × � → �0� �� is called
kernel if
A �→ N�x� A� is a measure for every x ∈ X
x �→ N�x� A� is a measurable function for every A ∈ ��
Measures, Integrals and Martingales 75
(i) Show that � � A �→ �N�A� = ∫ N�x� A� ��dx� is a measure on �X� ��.
(ii) For u ∈ �+��� define Nu�x� = ∫ u�y� N�x� dy�. Show that u �→ Nu is addi-
tive, positive homogeneous and Nu�•� ∈ �+���.
(iii) Let �N be the measure introduced in (i). Show that
∫
u d��N� = ∫ Nu d� for
all u ∈ �+���.
[Hint: consider in each part of this problem first indicator functions u = 1A, then
simple functions u ∈ �+��� and then approximate u ∈ �+��� by simple functions
using 8.8 and 9.6]
9.12. (Continuation of Problem 6.1) Consider on � the �-algebra � of all Borel sets
which are symmetric w.r.t. the origin. Set A+ = A ∩ �0� ��, A− = �−�� 0� ∩ A
and consider their symmetrizations A±� = A± ∪ �−A±� ∈ �. Show that for every
u ∈ �+��� with 0 � u � 1 and for every measure � on ��� �� the set-function
���� � A �→
∫
1A+� u d� +
∫
1A−� �1 − u� d�
is a measure on ���� that extends �.
Why does this not contradict the uniqueness theorem 5.7 for measures?
10
Integrals of measurable functions and null sets
Throughout this chapter �X� �� �� will be a measure space.
Let us briefly review how we constructed the integral for positive measurable
functions u ∈ �+
�̄
. Guided by the idea that the integral should be the area
between the graph of a function and the x-axis, we defined for indicator functions∫
1A d� = ��A� and extended this definition by linearity to all positive simple
functions �+ which are just linear combinations of indicator functions (there was
an issue about well-definedness which was addressed in L9.1). Since all positive
measurable functions can be obtained as increasing limits of simple functions
(Theorem 8.8), we could then define the integral of u ∈ �+
�̄
by exhausting the
area below u with elementary functions f � u, see Definition 9.4. Beppo Levi’s
theorem (in the form of C9.7) finally allowed us to replace the sup by an increasing
limit. The integral turned out to be positive homogeneous, additive and monotone.
We want to extend this integral now to not necessarily positive measurable
functions u ∈ ��̄ by linearity. The fundamental observation here is that
u ∈ ��̄ ⇐⇒ u = u+ − u−� u+� u− ∈ �+�̄
(cf. Corollary 8.11). This remark suggests the following definition.
10.1 Definition A function u � X → �̄ on a measure space �X� �� �� is said to be
(�-)integrable, if it is �/�̄-measurable and if the integrals
∫
u+ d��
∫
u− d� < �
are finite. In this case we call
∫
u d� �=
∫
u+ d� −
∫
u− d� ∈ �−�� �� (10.1)
the (�-)integral of u. We write �1��� [�1
�̄
���] for the set of all real-valued
[numerical] �-integrable functions.
76
Measures, Integrals and Martingales 77
In case we need to exhibit the integration variable, we write
∫
u d� =
∫
u�x� ��dx� =
∫
u�x� d��x��
If � = �n, we call ∫ u d�n the (n-dimensional) Lebesgue integral and u ∈ �1
�̄
��n�
is said to be Lebesgue integrable.1 Traditionally one writes
∫
u�x� dx or
∫
u dx
for the formally more correct
∫
u d�n. If we want to stress X or �, etc., we will
also write �1
�̄
�X� or �1
�̄
���, etc.
10.2 Remark In the definition of the integral for positive u ∈ �+
�̄
we did allow
that
∫
u d� = �. Since we want to avoid the case ‘� − �’ in (10.1), we impose
the finiteness condition
∫
u± d� < �. In particular, a positive function is said to
be integrable only if the integral is finite:
u ∈ �1
�̄
���� u � 0 ⇐⇒ u ∈ �+
�̄
and
∫
u d� < �
(which is clear since for positive functions u+ = u and u− = 0).
Caution: Some authors call u �-integrable (in the wide sense) whenever
∫
u+ d�−∫
u− d� makes sense in �̄, i.e. whenever it is not of the form ‘� − �’. We will
not use this convention.
Let us briefly summarize the most important integrability criteria.
10.3 Theorem Let u ∈ ��̄. Then the following conditions are equivalent:
(i) u ∈ �1
�̄
���;
(ii) u+� u− ∈ �1
�̄
���;
(iii) �u� ∈ �1
�̄
���;
(iv) ∃ w ∈ �1
�̄
���� w � 0 such that �u� � w.
Proof (i)⇔(ii): this is just the definition of integrability.
(ii)⇒(iii): since �u� = u+ + u−, we can use the additivity of the integral on
�+
�̄
, see 9.8(iii), to get
∫ �u� d� = ∫ u+ d� +∫ u− d� < �.
(iii)⇒(iv): take w �= �u�.
(iv)⇒(i): we have to show that u± ∈ �1
�̄
���. Since u± � �u� � w we find by
the monotonicity of the integral 9.8(iv) that
∫
u± d� �
∫
w d� < �.
1 The letter � is in honour of H. Lebesgue who was one of the pioneers of modern integration theory. If �
is other than �n,
∫
� � � d� is sometimes called the abstract Lebesgue integral.
78 R.L. Schilling
It is now easy to see that the properties 9.8 of the integral on �+
�̄
extend to
the set �1
�̄
���:
10.4 Theorem Let �X� �� �� be a measure space and u� v ∈ �1
�̄
���, ∈ �. Then
(i) u ∈ �1
�̄
��� and
∫
u d� =
∫
u d�; (homogeneous)
(ii) u + v ∈ �1
�̄
��� and
∫
�u + v� d� =
∫
u d� +
∫
v d� (additive)
(whenever u + v is defined);
(iii) min
u� v�� max
u� v� ∈ �1
�̄
���; (lattice property)
(iv) u � v =⇒
∫
u d� �
∫
v d�; (monotone)
(v)
∣∣∣∣
∫
u d�
∣∣∣∣ �
∫
�u� d�. (triangle inequality)
Proof There are principally two ways to prove this theorem: either we consider
positive and negative parts for (i)–(v) and show that their integrals are finite, or
we use T10.3(iii), (iv). Doing this we find
(i) � u� = � � · �u� ∈ �1
�̄
��� by 9.8(ii).
(ii) �u + v� � �u� + �v� ∈ �1
�̄
��� by 9.8(iii).
(iii) � max
u� v�� � �u� + �v� ∈ �1
�̄
��� and
� min
u� v�� � �u� + �v� ∈ �1
�̄
��� by 9.8(iii).
(iv) If u � v, we find that u+ � v+ and v− � u−. Thus
∫
u d� =
∫
u+ d� −
∫
u− d�
9.8(iv)
�
∫
v+ d� −
∫
v− d� =
∫
v d��
(v) Using ±u � �u� we deduce from (iv) that
∣∣∣∣
∫
u d�
∣∣∣∣ = max
{∫
u d�� −
∫
u d�
}
� max
{∫
�u� d��
∫
� − u� d�
}
=
∫
�u� d��
10.5 Remark If u�x� ± v�x� is defined in �̄ for all x ∈ X – i.e. if we can exclude
‘� − �’ – then T10.4(i),(ii) just say that the integral is linear:
∫
� u + �v� d� =
∫
u d� + �
∫
v d�� � � ∈ �� (10.2)
This is always true for real-valued u� v ∈ �1���, i.e. �1��� is a vector space with
addition and scalar (�) multiplication defined by
�u + v��x� �= u�x� + v�x�� � · u��x� �= · u�x��
Measures, Integrals and Martingales 79
and ∫
� � � d� � �1��� → �� u
→
∫
u d��
is a positive linear functional.
10.6 Examples Let us reconsider the examples from 9.10:
(i) On �X� ��
y�, y ∈ X fixed, we have
∫
u�x�
y�dx� = u�y� and
u ∈ �1
�̄
�
y� ⇐⇒ u ∈ ��̄ and �u�y�� < ��
(ii) On
(
�� ���� � �= ∑�j=1 j
j
)
every u � � → � is measurable, cf. Prob-
lem 9.6. From 9.10(ii) we know that
∫ �u� d� = ∑�j=1 j �u�j��, so that
u ∈ �1��� ⇐⇒
�∑
j=1
j �u�j�� < ��
If 1 = 2 = � � � = 1, �1��� is called the set of summable sequences and
customarily denoted by �1��� = {�xj �j∈� ⊂ � �
∑�
j=1 �xj� < �
}
. This space
is important in functional analysis.
(iii) Let ��� �� P� be a probability space. Then every bounded measurable func-
tion (‘random variable’) � ∈ ����, C �= sup�∈� ������ < �, is integrable.
This follows immediately from
∫
��� dP �
∫
sup
�∈�
������ P�d�� = C
∫
P�d�� = C < ��
Caution: Not every P-integrable function is bounded.[�]
For A ∈ � and u ∈ �+
�̄
[or �1
�̄
���] we know from 8.5(i) and C8.10 [and
10.3(iv) using �1A u� � �u�] that 1A u is again measurable [or integrable].
10.7 Definition Let �X� �� �� be a measure space and u ∈ �1
�̄
��� or u ∈ �+
�̄
���.
Then ∫
A
u d� �=
∫
1A u d� =
∫
1A�x�u�x� ��dx�� ∀ A ∈ ��
Of course,
∫
X
u d� = ∫ u d�.
10.8 Lemma On the measure space �X� �� �� let u ∈ �+. The set-function
� � A
→
∫
A
u d� =
∫
1A u d�� A ∈ ��
80 R.L. Schilling
is a measure on �X� ��. It is called the measure with density (function) u with
respect to � and denoted by � = u �.
Proof Exercise.
If � has a density w.r.t. �, one writes traditionally d�/d� for the density
function. This notation is to be understood in a purely symbolical way; it is
motivated by the well-known fundamental theorem of integral and differential
calculus (for Riemann integrals)
u�b� − u�a� =
∫ b
a
u′�x� dx
where u′ = d�u′ �1�/d�1�in our notation� = du/dx. At least if u′�x� � 0 one
can show that ��a� b� �= u�b� − u�a� defines a measure and that, taking the
fundamental theorem of integral calculus for granted, � = u′ �1 = u′ dx, compare
with Problem 7.9. A more advanced discussion of derivatives can be found in
Chapter 19, Theorem 19.20 and Appendix E.16–E.19.
Null sets and the ‘a.e.’
We will now discuss the behaviour of integrable functions on null sets which we
have already encountered in Problem 4.10.
Let �X� �� �� be a measure space. A (�-)null set N ∈
� is a measurable set
N ∈ � satisfying
N ∈
� ⇐⇒ N ∈ � and ��N � = 0� (10.3)
If a property � = ��x� is true for all x ∈ X apart from some x contained in a
null set N ∈
�, we say that ��x� holds for (�-) almost all (a.a.) x ∈ X or that
� holds (�-) almost everywhere (a.e.). In other words,
� holds a.e. ⇐⇒
x � ��x� fails� ⊂ N ∈
��
but we do not a priori require that the set
� fails� is itself measurable. Typically
we are interested in properties ��x� of the type: u�x� = v�x�� u�x� � v�x�, etc.
and we say, for example,
u = v a.e. ⇐⇒
x � u�x� �= v�x�� is (contained in) a �-null set�
Caution: The assertions ‘u enjoys a property � a.e.’ and ‘u is a.e. equal to v
which satisfies � everywhere’ are, in general, far apart; see in this connection
Problem 10.14.
Measures, Integrals and Martingales 81
10.9 Theorem Let u ∈ �1
�̄
��� be a numerical integrable function on a measure
space �X� �� ��. Then
(i)
∫
�u� d� = 0 ⇐⇒ �u� = 0 a.e. ⇐⇒ ��
u �= 0�� = 0;
(ii)
∫
N
u d� = 0 ∀ N ∈
�.
Proof Let us begin with (ii). Obviously, min
�u�� j� ↑ �u� as j ↑ �. By Beppo
Levi’s theorem 9.6 we find
∣∣∣∣
∫
N
u d�
∣∣∣∣ =
∣∣∣∣
∫
1N u d�
∣∣∣∣
10.4(v)
�
∫
1N �u� d�
9.6= sup
j∈�
∫
1N min
�u�� j� d� � sup
j∈�
∫
j 1N d�
= sup
j∈�
(
j
∫
1N d�
)
= sup
j∈�
(
j ��N�︸ ︷︷ ︸
= 0
) = 0�
The second equivalence in (i) is clear since, due to the measurability of u, the
set
u �= 0� is not just a subset of a null set, but measurable, hence a proper null
set. In order to see ‘⇐’ of the first equivalence, we use (ii) with N =
u �= 0�:
∫
�u� d� =
∫
�u��=0�
�u� d� +
∫
�u�=0�
�u� d�
=
∫
�u��=0�
�u� d� +
∫
�u�=0�
0 d�
(ii)= 0�
For ‘⇒’ we use the so-called Markov inequality: for A ∈ � and c > 0 we have
��
�u� � c� ∩ A� =
∫
1
�u��c�∩A�x� ��dx�
=
∫
A
c
c
1
�u��c��x� ��dx�
�
1
c
∫
A
�u�x�� 1
�u��c��x� ��dx�
�
1
c
∫
A
�u�x�� ��dx��
(10.4)
82 R.L. Schilling
and for A = X this inequality implies that
��
�u� > 0�� [�]= �
( ⋃
j∈�
�u� � 1
j
�
) 4.6
�
∑
j∈�
�
(
�u� � 1
j
�
)
�
∑
j∈�
(
j
∫
�u� d�
︸ ︷︷ ︸
= 0
)
= 0�
10.10 Corollary Let u� v ∈ ��̄ such that u = v �-almost everywhere. Then
(i) u� v � 0 =⇒
∫
u d� =
∫
v d�;2
(ii) u ∈ �1
�̄
��� =⇒ v ∈ �1
�̄
��� and
∫
u d� =
∫
v d�.
Proof Since u� v are measurable, N �=
u �= v� ∈
�. Therefore (i) follows from
∫
u d� =
∫
N c
u d� +
∫
N
u d�
10.9(i)=
∫
N c
v d� + 0 �use that u = v on N c�
10.9(i)=
∫
N c
v d� +
∫
N
v d� =
∫
v d��
For (ii) we observe first that u = v a.e. implies that u± = v± a.e. and then apply
(i) to positive and negative parts:
∫
v± d� = ∫ u± d� < �; the claim follows.
10.11 Corollary If u ∈ ��̄ and v ∈ �1�̄���, v � 0, then
�u� � v a.e. =⇒ u ∈ �1
�̄
����
Proof We have u± � �u� � v a.e., and by C10.10 ∫ u± d� � ∫ v d� < �. This
shows that u is integrable.
10.12 Proposition (Markov inequality) For all u ∈ �1
�̄
���, A ∈ � and c > 0
��
�u� � c� ∩ A� � 1
c
∫
A
�u� d�� (10.5)
and if A = X, in particular,
��
�u� � c�� � 1
c
∫
�u� d�� (10.6)
Proof See (10.4) in the proof of Theorem 10.9(i).
2 including, possibly, +� = +�.
Measures, Integrals and Martingales 83
10.13 Corollary If u ∈ �1
�̄
���, then u is almost everywhere �-valued. In partic-
ular, we can find a version ũ ∈ �1��� such that ũ = u a.e. and ∫ ũ d� = ∫ u d�.
Proof Set N �=
�u� = �� =
u = +�� ∪·
u = −�� ∈ �. Now
N = ⋂
j∈�
�u� � j�
and by 3.4(iii′)3 and the Markov inequality we get
��N� = lim
j→�
��
�u� � j�� � lim
j→�
(
1
j
∫
�u� d�
︸ ︷︷ ︸
< �
)
= 0�
The function ũ �= 1N c u is real-valued, measurable and coincides outside N with
u. From C10.10 we deduce that ũ is integrable (and even ∈ �1���) with ∫ ũ d� =∫
u d�.
Corollary 10.13 allows us to identify (up to null sets) functions from �1
�̄
and
�1. Since �1 is a much nicer space – it is a vector space and we need not take
any precautions when adding functions, etc. – we will work from now on only
with �1. The corresponding statements for �1
�̄
are then easily derived.
We close this section with a technique which will be useful in many applications
later on.
10.14 Corollary Let � ⊂ � be a sub-�-algebra.
(i) If u� w ∈ �1��� and if ∫
G
u d� = ∫
G
w d� for all G ∈ �, then u = w �-a.e.
(ii) If u� w ∈ �+��� and if ∫
G
u d� = ∫
G
w d� for all G ∈ �, then u = w �-a.e.
under the additional assumption that ��� is �-finite.4
Proof (i) Since u and w are �-measurable functions, we have G ∩
u � w�� G ∩
u < w� ∈ � for all G ∈ �. Thus
∫
G
�u − w� d� =
∫
G∩
u�w�
�u − w� d� +
∫
G∩
u
whenever the expressions involved make sense/are finite, then:
(i) ��
�u� > c�� � 1
c
∫
�u� d�;
(ii) ��
�u� > c�� � 1
cp
∫
�u�p d� for all 0 < p < �;
Measures, Integrals and Martingales 85
(iii) ��
�u� � c�� � 1
��c�
∫
���u�� d� for an increasing function � � �+ → �+;
(iv) �
({
u �
∫
u d�
})
�
1
;
(v) ��
�u� < c�� � 1
��c�
∫
���u�� d� for a decreasing function � � �+ → �+;
(vi) P
(�X − EX� �
√
VX
)
�
1
2
, where ��� �� P� is a probability space and,
in probabilistic jargon, X is a random variable (i.e. a measurable function
X � � → �), EX = ∫ X dP the expectation or mean value and VX = ∫ �X −
EX�2 dP the variance.
Remark. This is Chebyshev’s inequality.
10.6. Show that
∫ �u�p d� < � implies that �u� is a.e. real-valued (in the sense �−�� ��-
valued!). Is this still true if we have
∫
arctan�u� d� < �?.
10.7. Let �Aj �j∈� ⊂ � be a sequence of pairwise disjoint sets. Show that
u 1⋃
j Aj
∈ �1��� ⇐⇒ u 1An ∈ �1��� and
�∑
j=1
∫
Aj
�u� d� < ��
10.8. Generalized Fatou lemma. Assume that �uj �j∈� ⊂ �1���. Prove:
(i) If uj � v for all j ∈ � and some v ∈ �1���, then
∫
lim inf
j→�
uj d� � lim inf
j→�
∫
uj d��
(ii) If uj � w for all j ∈ � and some w ∈ �1���, then
lim sup
j→�
∫
uj d� �
∫
lim sup
j→�
uj d��
(iii) Find examples that show that the upper and lower bounds in (i) and (ii) are
necessary.
[Hint: mimic and scrutinize the proof of Fatou’s Lemma 9.11 especially when it
comes to the application of Beppo Levi’s theorem. What goes wrong if we do not
have this upper/lower bound? Note that we have an ‘invisible’ v = 0 in T9.11.]
10.9. Let ��� �� P� be a probability space. Show that for u ∈ ����
u ∈ �1�P� ⇐⇒
�∑
j=0
P�
u � j�� < ��
10.10. Independence (2). Let ��� �� P� be a probability space. Recall the notion of
independence of two �-algebras �� � ⊂ � introduced in Problem 5.10. Show that
u ∈ �+��� and w ∈ �+��� satisfy
∫
uw dP =
∫
u dP ·
∫
w dP
86 R.L. Schilling
and that for u ∈ ���� and w ∈ ����
u ∈ �1��� and w ∈ �1��� ⇒ uw ∈ �1����
Find an example proving that this fails if � and � are not independent.
[Hint: start with simple functions and use Beppo Levi’s theorem 9.6.]
10.11. Completion (3). Let �X� �∗� �̄� be the completion of �X� �� �� – cf. Prob-
lems 4.13, 6.2.
(i) Show that for every f ∗ ∈ �+��∗� there are f� g ∈ �+��� with f � f ∗ � g
and ��f �= g� = 0 as well as ∫ f d� = ∫ f ∗ d�̄ = ∫ g d�.
(ii) u∗ � X → � is �∗-measurable if, and only if, there exist �-measurable
functions u� w � X → �̄ with u � u∗ � w and u = w �-a.e.
(iii) If u∗ ∈ �1��̄�, then u� w from (ii) can be chosen from �1��� such that∫
u d� = ∫ u∗ d�̄ = ∫ w d�.
[Hint: (i) use Problem 4.13(v). (ii) for ‘⇒’ consider the sets
u∗ > � and
use 4.13(v). The other direction is harder. For this consider first step functions
using again 4.13(v) and then general functions by monotone convergence. (iii) by
4.13(iii), � = �̄ on �, and thus ∫ f d� = ∫ f d�̄ for �-measurable f .]
10.12. Completion (4). Inner measure and outer measure. Let �X� �� �� be a finite
measure space. Define for every E ⊂ X the outer resp. inner measure
�∗�E� �= inf
��A� � A ∈ �� A ⊃ E� and �∗�E� �= sup
��A� � A ∈ �� A ⊂ E��
(i) Show that
�∗�E� � �
∗�E��
�∗�E ∪ F� � �∗�E� + �∗�F��
�∗�E� + �∗�Ec� = ��X��
�∗�E� + �∗�F� � �∗�E ∪ F��
(ii) For every E ⊂ X there exist sets E∗� E∗ ∈ � such that ��E∗� = �∗�E� and
��E∗� = �∗�E�.
[Hint: use the definition of the infimum to find sets En ⊃ E such that
��En� − �∗�E� � 1
n
and consider
⋂
n E
n ∈ �.]
(iii) Show that �∗ �=
E ⊂ X � �∗�E� = �∗�E�� is a �-algebra and that it is the
completion of � w.r.t. �. Conclude, in particular, that �∗��∗ = �∗��∗ = �̄ if
�̄ is the completion of �.
10.13. Let �X� �� �� be a measure space and u ∈ ����. Assume that u ∈ ���� and
u = w almost everywhere w.r.t. �. When can we say that w ∈ ����?
10.14. ‘a.e.’ is a tricky business. When working with ‘a.e.’ properties one has to be
extremely careful. For example, the assertions ‘u is continuous a.e.’ and ‘u is
a.e. equal to an (everywhere) continuous function’ are far apart! Illustrate this
by considering the functions u = 1
and u = 1�0���.
10.15. Let � be a �-finite measure on the measurable space �X� ��. Show that there
exists a finite measure P on �X� �� such that
� =
P , i.e. � and P have the
same null sets.
Measures, Integrals and Martingales 87
10.16. Construct an example showing that for u� w ∈ �+��� the equality ∫
G
u d� =∫
G
w d� for all G ∈ � does not necessarily imply that u = w almost everywhere.
[Hint: In view of 10.14 ��� cannot be �-finite. Consider on ��� ����� the
measure � = m �1 where m = 1
�x��1� + � 1
�x�>1�, u ≡ 1 and w = 1
�x��1� + 2 1
�x�>1�.
Then all Borel subsets of
�x� > 1� have either �-measure 0 or +�, thus ∫
B
u d� =∫
B
w d� for all B ∈ ���� while ��u �= w� = �.]
11
Convergence theorems and their applications
Throughout this chapter �X� �� �� will be some measure space.
One of the shortfalls of the Riemann integral is the fact that we do not have
sufficiently general results that allow us to interchange limits and integrals –
typically one has to assume uniform convergence for this. This has partly to do
with the fact that the set of Riemann integrable functions is somewhat limited, see
Theorem 11.8. The classical counterexample for this defect is Dirichlet’s jump
function x �→ 1�∩�0�1��x� which is not Riemann integrable since its upper function
is 1�0�1� while the lower function is 0 · 1�0�1�.[�]
For the Lebesgue integral on �+
�̄
we have already seen more powerful conver-
gence results in the form of Beppo Levi’s theorem 9.6 or Fatou’s lemma 9.11. They
can deal with Dirichlet’s jump function: for any enumeration of � = �qj j ∈ �
we get
∫
1�∩�0�1� d�
1 =
∫
sup
N ∈�
1�q1�����qN
∩�0�1� d�
1
9.6= sup
N ∈�
∫
1�q1�����qN
∩�0�1� d�
1
= sup
N ∈�
�1
(
�qj ∈ �0� 1� 1 � j � N
)
︸ ︷︷ ︸
= 0
= 0�
In this chapter we study systematically convergence theorems for �1��� and
some of their most important applications. The first is a generalization of Beppo
Levi’s theorem 9.6.
11.1 Theorem (Monotone convergence). Let �X� �� �� be a measure space.
(i) Let �uj �j∈� ⊂ �1��� be an increasing sequence of integrable functions
u1 � u2 � � � � with limit u = supj∈� uj . Then u ∈ �1��� if, and only if,
88
Measures, Integrals and Martingales 89
supj∈�
∫
uj d� < +�, in which case
sup
j∈�
∫
uj d� =
∫
sup
j∈�
uj d��
(ii) Let �vk�k∈� ⊂ �1��� be a decreasing sequence of integrable functions v1 �
v2 � � � � with limit v = inf k∈� vk. Then v ∈ �1��� if, and only if, inf k∈�
∫
vk d� >
−�, in which case
inf
k∈�
∫
vk d� =
∫
inf
k∈�
vk d��
Proof Obviously, (i) implies (ii) as uj = −vj fulfils all the assumptions of (i).
To see (i), we remark that uj − u1 ∈ �1��� defines an increasing sequence of
positive functions
0 � uj − u1 � uj+1 − u1 � � � � �
for which we may use the Beppo Levi theorem 9.6:
0 � sup
j∈�
∫
�uj − u1� d� =
∫
sup
j∈�
�uj − u1� d�� (11.1)
Assume that u ∈ �1���. Since the ‘sup’ in (11.1) stands for an increasing limit,
we find that
sup
j∈�
∫
uj d� =
∫
�u − u1� d� +
∫
u1 d�
(10.2)=
∫
u d� −
∫
u1 d� +
∫
u1 d� =
∫
u d� < ��
Conversely, if supj∈�
∫
uj d� < �, we see from (11.1) that u − u1 ∈ �1��� and,
as u1 ∈ �1���, u = �u − u1� + u1 ∈ �1��� by (10.2). Therefore, (11.1) implies
∫
u d� =
∫
�u − u1� d� +
∫
u1 d� = sup
j∈�
∫
uj d� < ��
One of the most useful and versatile convergence theorems is the following.
11.2 Theorem (Lebesgue. Dominated convergence). Let �X� �� �� be a mea-
sure space and �uj �j∈� ⊂ �1��� be a sequence of functions such that �uj� � w for
all j ∈ � and some w ∈ �1+���. If u�x� = limj→� uj �x� exists for almost every
x ∈ X, then u ∈ �1��� and we have
(i) lim
j→�
∫
�uj − u� d� = 0;
(ii) lim
j→�
∫
uj d� =
∫
lim
j→�
uj d� =
∫
u d�.
90 R.L. Schilling
Proof Since all uj are measurable, N = �x limj uj �x� does not exist
is mea-
surable, hence N ∈ ��, and we can assume that N = ∅ as the integral over the
null set N gives no contribution, cf. Theorem 10.9(ii) – alternatively we could
consider 1N c u and 1N c uj instead of u and uj .
From �uj� � w we get �u� = limj→� �uj� � w, and u ∈ �1��� by C10.11(iv).
Therefore,
∣∣∣∣
∫
uj d� −
∫
u d�
∣∣∣∣ =
∣∣∣∣
∫
�uj − u� d�
∣∣∣∣
10.4(v)
�
∫
�uj − u� d�
which means that (i) implies (ii). Since
�uj − u� � �uj� + �u� � 2w ∀ j ∈ �
we get 2w − �uj − u� � 0 and Fatou’s lemma 9.11 tells us that
∫
2w d� =
∫
lim inf
j→�
�2w − �uj − u�� d�
� lim inf
j→�
∫
�2w − �uj − u�� d�
=
∫
2w d� − lim sup
j→�
∫
�uj − u� d��1
Thus 0 � lim inf j→�
∫ �uj − u� d� � lim supj→�
∫ �uj − u� d� � 0, and conse-
quently limj→�
∫ �uj − u� d� = 0.
11.3 Remark The uniform boundedness assumption
�uj� � w ∀ j ∈ � and some w ∈ �1+ (11.2)
is very important for Theorem 11.2. To see this, consider ��� � �1� and set
uj �x� = j1�0� 1
j
��x�
j→�−−−→ � 1�0
�x�
a.e.= 0
whereas
∫
uj d� = j 1j = 1 �= 0 =
∫ � 1�0
d�.
The only obvious possibility to weaken (11.2) would be to require it to hold
only almost everywhere.[�] Lebesgue’s theorem gives merely sufficient – but eas-
ily verifiable – conditions for the interchange of limits and integrals; the ultimate
version for such a result with necessary and sufficient conditions will be given in
the form of Vitali’s convergence theorem 16.6 in Chapter 16 below.
∗ ∗ ∗
1 Recall that lim inf j→��−xj � = − lim supj→� xj .
Measures, Integrals and Martingales 91
Let us now have a look at two of the most important applications of the
convergence theorems.
Parameter-dependent integrals
Again �X� �� �� is some measure space.
11.4 Theorem (Continuity lemma) Let ∅ �= �a� b� ⊂ � be a non-degenerate
open interval and u �a� b� × X → � be a function satisfying
(a) x �→ u�t� x� is in �1��� for every fixed t ∈ �a� b�;
(b) t �→ u�t� x� is continuous for every fixed x ∈ X;
(c) �u�t� x�� � w�x� for all �t� x� ∈ �a� b� × X and some w ∈ �1+���.
Then the function v �a� b� → � given by
t �→ v�t� =
∫
u�t� x� ��dx� (11.3)
is continuous.
Proof Let us, first of all, remark that (11.3) is well-defined thanks to assumption
(a). We are going to show that for any t ∈ �a� b� and every sequence �tj �j∈� ⊂
�a� b� with limj→� tj = t we have limj→� v�tj � = v�t�. This proves continuity
of v at the point t.
Because of (b), u�•� x� is continuous and, therefore,
uj �x� = u�tj � x�
j→�−−−→ u�t� x� and �uj �x�� � w�x� ∀ x ∈ X�
Thus we can use Lebesgue’s dominated convergence theorem, and conclude
lim
j→�
v�tj � = lim
j→�
∫
u�tj � x� ��dx�
=
∫
lim
j→�
u�tj � x� ��dx�
=
∫
u�t� x� ��dx� = v�t��
A very similar consideration leads to
11.5 Theorem (Differentiability lemma) Let ∅ �= �a� b� ⊂ � be a non-degener-
ate open interval and u �a� b� × X → � be a function satisfying
(a) x �→ u�t� x� is in �1��� for every fixed t ∈ �a� b�;
(b) t �→ u�t� x� is differentiable for every fixed x ∈ X;
(c) �
tu�t� x�� � w�x� for all �t� x� ∈ �a� b� × X and some w ∈ �1+���.
92 R.L. Schilling
Then the function v �a� b� → � given by
t �→ v�t� =
∫
u�t� x� ��dx� (11.4)
is differentiable and its derivative is
tv�t� =
∫
tu�t� x� ��dx��
2 (11.5)
Proof Let t ∈ �a� b� and fix some sequence �tj �j∈� ⊂ �a� b� such that tj �= t and
limj→� tj = t. Set
uj �x� =
u�tj � x� − u�t� x�
tj − t
j→�−−−→
tu�t� x��
which shows, in particular, that x �→
tu�t� x� is measurable. By the mean
value theorem of differential calculus and (c) we see for some intermediate value
� = ��j� x� ∈ �a� b�
�uj �x�� =
∣∣
tu�t� x�
∣∣
t=�
∣∣ � w�x� ∀ j ∈ �0�
Thus uj ∈ �1, and the sequence �uj �j∈� satisfies all conditions of the dominated
convergence theorem 11.2. Finally,
tv�t� = lim
j→�
v�tj � − v�t�
tj − t
= lim
j→�
∫ u�tj � x� − u�t� x�
tj − t
��dx�
= lim
j→�
∫
uj �x� ��dx�
11.2=
∫
lim
j→�
uj �x� ��dx�
=
∫
tu�t� x� ��dx��
Later in this chapter we will give examples of how to apply the continuity and
differentiability lemmas.
Riemann vs. Lebesgue integration
From here to the end of this chapter we choose �X� �� �� = ��� � ��.
2 This formula is very effectively remembered as ‘
t
∫
� � � = ∫
t � � � ’
Measures, Integrals and Martingales 93
Let us briefly recall the definition of the Riemann integral (see Appendix E
for a more detailed discussion). Consider on the finite interval �a� b� ⊂ � the
partitions
� = {a = t0 < t1 < � � � < tk��� = b
}
�
define for a given function u �a� b� → �
mj = inf
x∈�tj−1�tj �
u�x�� Mj = sup
x∈�tj−1�tj �
u�x�� j = 1� 2� � � � � k����
and introduce the lower resp. upper Darboux sums
S� �u� =
k���∑
j=1
mj �tj − tj−1� resp. S� �u� =
k���∑
j=1
Mj �tj − tj−1��
11.6 Definition A bounded function u �a� b� → � is said to be Riemann inte-
grable, if the values
∫
∗ u = sup
�
S� �u� = inf
�
S� �u� = ∫ ∗ u
(sup� inf range over all partitions � of �a� b�) coincide and are finite. Their
common value is called the Riemann integral of u and denoted by �R�
∫ b
a
u�x� dx
or
∫ b
a
u�x� dx.
What is going on here? First of all, it is not difficult to see that lower [upper]
Darboux sums increase [decrease] if we add points to the partition ��N �, i.e. the
sup [inf ] in Definition 11.6 makes sense.
Moreover, to S� �u� and S
� �u� there correspond simple functions, namely �� �u�
and �� �u� given by
�� �u��x� =
k���∑
j=1
mj 1�tj−1�tj ��x� and �
� �u��x� =
k���∑
j=1
Mj 1�tj−1�tj ��x�
which satisfy �� �u��x� � u�x� � �
� �u��x� and which increase resp. decrease as
� refines.
tj tj + 1 tj + 2 tj + 3
σπ [u]
∑π [u]
94 R.L. Schilling
11.7 Remark The above construction gives the ‘usual’ integral which is often
introduced as the anti-derivative. Unfortunately, this notion of integration is
somewhat insufficient. Nice general convergence theorems (such as monotone or
dominated convergence) hold only under unnatural restrictions or are not available
at all. Moreover, it cannot deal with functions of the type x �→ 1�∩�0�1��x�:
the smallest upper function �� is 1�0�1� while the largest lower function �� is
identically 0.[�] Thus the Riemann integral of 1�∩�0�1� does not exist, whereas by
T10.9(ii) the Lebesgue integral
∫
1�∩�0�1� d� = 0.
Roughly speaking, the reason for this is the fact that the Riemann sums partition
the domain of the function without taking into account the shape of the function,
thus slicing up the area under the function vertically. Lebesgue’s approach is
exactly the opposite: the domain is partitioned according to the values of the
function at hand, leading to a horizontal decomposition of the area.
Lebesgue
(equidistant) Riemann
There is a beautifully simple connection with Lebesgue integrals which char-
acterizes at the same time the class of Riemann integrable functions. It may
come as a surprise that one needs the notion of Lebesgue null sets to understand
Riemann’s integral completely.
11.8 Theorem Let u �a� b� → � be a measurable function.
(i) If u is Riemann integrable, then u is in �1��� and the Lebesgue and Riemann
integrals coincide:
∫
�a�b�
u d� = �R�∫ b
a
u�x� dx.
(ii) A bounded function f �a� b� → � is Riemann integrable if, and only if, the
points in �a� b� where f is discontinuous are a Lebesgue null set.
Caution: Theorem 11.8(ii) is often phrased in the following way: f is Riemann
integrable if, and only if, f is (Lebesgue) a.e. continuous. Although correct,
this is a dangerous way of putting things since one is led to read this statement
(incorrectly) as ‘if f = � a.e. with � ∈ C�a� b�, then f is Riemann integrable’.
That this is wrong is easily seen from f = 1�∩�a�b� and � ≡ 0; see Problem 10.14
and 11.16.
Measures, Integrals and Martingales 95
Proof (of Theorem 11.8) (i) As u is Riemann integrable, we find a sequence of
partitions ��j� of �a� b� such that
lim
j→�
S��j��u� =
∫
∗ u =
∫ ∗
u = lim
j→�
S��j��u��
Without loss of generality we may assume that the partitions are nested ��j� ⊂
��j + 1� ⊂ � � � – otherwise we could switch to the increasing sequence ��1� ∪ � � � ∪
��j� of partitions, where we also observe that the lower [upper] Riemann sums
increase [decrease] as the partitions refine. The corresponding simple functions
���j��u� and �
��j��u� increase and decrease towards
��u� = sup
j∈�
���j��u� � u � inf
j∈�
���j��u� = ��u��
and from the monotone convergence theorem 11.1 we conclude
∫
∗ u = limj→� S��j��u� = limj→�
∫
�a�b�
���j��u� d� =
∫
�a�b�
��u� d� (11.6)
and also
∫ ∗
u = lim
j→�
S��j��u� = lim
j→�
∫
�a�b�
���j��u� d� =
∫
�a�b�
��u� d�� (11.7)
In other words ��u�� ��u� ∈ �1���. Since u is Riemann integrable,
∫
�a�b�
���u� − ��u�︸ ︷︷ ︸
�0
� d� =
∫
�a�b�
��u� d� −
∫
�a�b�
��u� d� = ∫ ∗ u −∫∗ u = 0�
which implies by Theorem 10.9(i) that ��u� = ��u� Lebesgue a.e. Thus
�u �= ��u�
∪ �u �= ��u�
⊂ ���u� �= ��u�
∈ ��� (11.8)
and by Corollary 10.10(ii) we conclude that u is Lebesgue integrable.
(ii) We continue to use the notation from part (i). The set � = ⋃j∈� ��j� of
all partition points is countable, and by Problem 6.5(i),(iii) a Lebesgue null set. If
f is Riemann integrable, we can find for � > 0 and each x ∈ �a� b� some n��x ∈ �
such that for some suitable tj0−1� tj0 ∈ ��n��x� we have x ∈ �tj0−1� tj0 � and
∣∣���j��f ��x� − ��f ��x�
∣∣+
∣∣���j��f ��x� − ��f ��x�
∣∣ � � ∀ j � n��x�
By construction of the Riemann integral, all x� y ∈ �tj0−1� tj0 � satisfy
�f�x� − f�y�� � Mj0 − mj0 = ���n��x��f ��x� − ���n��x��f ��x�
� � +(��f ��x� − ��f ��x�)�
96 R.L. Schilling
This inequality shows on the one hand that[�]
�x f�x� is not continuous
⊂ � ∪ ���f � �= ��f �
︸ ︷︷ ︸
∈ �� by (i)
∈ ��
is a null set if f is Lebesgue integrable. On the other hand, the above inequality
shows also[�] that
���f � = ��f �
⊂ �x f�x� is continuous
∪ ��
so that (11.6), (11.7) imply
∫ ∗
f = ∫∗ f , i.e. f is Riemann integrable.
∗ ∗ ∗
Let us finally discuss improper Riemann integrals of the type
�R�
∫ �
0
u�x� dx = lim
a→��R�
∫ a
0
u�x� dx� (11.9)
provided the limit exists (cf. Appendix E for other types of improper integrals).
11.9 Corollary Let u �0� �� → � be a measurable function which is Riemann
integrable for every interval �0� N�, N ∈ �. Then u ∈ �1�0� �� if, and only if,
lim
N →�
�R�
∫ N
0
�u�x�� dx < �� (11.10)
In this case, �R�
∫ �
0 u�x� dx =
∫
�0��� u d�.
Proof Using Theorem 11.8 we see that Riemann integrability of u implies
Riemann integrability of u±.[�] Moreover,
�R�
∫ N
0
u±�x� dx =
∫
�0�N�
u±�x� ��dx� =
∫
u± 1�0�N� d�� (11.11)
If u is Riemann integrable and satisfies (11.9) and (11.10), the limit N → �
of the left side of (11.11) exists and guarantees that the right-hand side has also
a finite limit. The monotone convergence theorem 11.1 together with Theo-
rem 10.3(ii) shows that u ∈ �1�0� ��.
Conversely, if u is Lebesgue integrable, then so are u±� u 1�0�a� and u± 1�0�a�
for every a > 0. Since u is Riemann integrable over each interval [0� N ], we
see from Theorem 11.8 that u and u± are Riemann integrable over each interval
�0� a�. The monotone convergence theorem 11.1 shows that for every increasing
sequence aj ↑ �
lim
j→�
∫
u± 1�0�aj � d� =
∫
u± d� < ��
which yields that the limits (11.9), (11.10) exist.
Measures, Integrals and Martingales 97
11.10 Remark We can avoid in T11.8(i) and C11.9 the assumption that u is Borel
measurable. If we admit an arbitrary u, our proofs show that u is outside a subset
of a null set equal to the Borel measurable function � – to wit: �u �= �
⊂ �� �=
�
∈ ��, but �u �= �
is not necessarily measurable. In other words, u becomes
automatically measurable w.r.t. the completed Borel �-algebra. This entails, of
course, that we have to replace � and �1��� with the completed versions �̄ and
�1��̄�, see Problems 4.13, 6.2, 10.11 and 10.12.
11.11 Remark Lebesgue integration does not allow cancellations, but improper
Riemann integrals do. More precisely: the limit (11.9) can make sense even
if limN →��R�
∫ N
0 �u�x�� dx = �. This is illustrated by the following example,
which is typical in the theory of Fourier series:
The function x �→ s�x� = sin x
x
, x ∈ �0� ��, is improperly Riemann integrable but
not Lebesgue integrable.
For a > 0 we can find N = N�a� ∈ � such that N� � a < �N + 1��. Thus
∫ �
0
sin x
x
dx = lim
a→�
(∫ N�
0
sin x
x
dx +
∫ a
N�
sin x
x
dx
)
= lim
N →�
N −1∑
j=0
∫ �j+1��
j�
sin x
x
dx
︸ ︷︷ ︸
= aj
�
where we used∣∣∣∣ lima→�
∫ a
N�
sin x
x
dx
∣∣∣∣ � limN →�
∫ �N +1��
N�
∣∣∣∣
sin x
x
∣∣∣∣ dx � limN →�
�
N�
= 0�
Observe that the aj have alternating signs since
aj =
∫ �j+1��
j�
sin x
x
dx =
∫ �
0
sin�y + j��
y + j� dy = �−1�
j
∫ �
0
sin y
y + j� dy
both as Riemann and Lebesgue integrals, by Theorem 11.8. Further,
�aj� =
∫ �
0
sin y
y + j� dy �
∫ �
0
sin y
y + j y dy =
1
j + 1
∫ �
0
sin y
y
dy�
and also
�aj� =
∫ �
0
sin y
y + j� dy �
= �aj+1�︷ ︸︸ ︷∫ �
0
sin y
y + �j + 1�� dy
�
∫ �
0
sin y
� + �j + 1�� dy =
2
�j + 2�� �
98 R.L. Schilling
Since the function y �→ sin y
y
is continuous and has a finite limit as y ↓ 0[�], we
see that C = ∫ �0 sin yy dy < �, so that
2/�
j + 2 � �aj+1� � �aj� �
C
j + 1 �
This and Leibniz’s convergence test prove that the alternating series
∑�
j=0 aj
converges conditionally but not absolutely, i.e. we get a finite improper Riemann
integral, but the Lebesgue integral does not exist.
Examples
As we have seen in this chapter, the Lebesgue integral provides very powerful
tools that justify the interchange of limits and integrals. On the other hand,
the Riemann theory is quite handy when it comes to calculating the primitive
(anti-derivative) of some concrete integrand. Theorem 11.8 tells us when we can
switch between these two notions.
11.12 Example Let f��x� = x�, x > 0 and � ∈ �. Then
f� ∈ �1�0� 1� ⇐⇒ � > −1
f� ∈ �1�1� �� ⇐⇒ � < −1�
We show only the first assertion; the second follows similarly (or, indeed, from
C11.9). Since f� is continuous, it is Borel measurable, and since f� � 0 it is
enough to show that
∫
�0�1� f� d� < �. We find
∫
�0�1�
x� dx
9.6= lim
j→�
∫
x� 1�1/j�1��x� ��dx�
11.8= lim
j→�
�R�
∫ 1
1/j
x� dx
= lim
j→�
[
x�+1
� + 1
]1
1/j
= lim
j→�
(
1
� + 1 −
1
j�+1�� + 1�
)
�
and the last limit is finite if, and only if, � > −1.
11.13 Example The function f�x� = x�e−�x, x > 0, is Lebesgue integrable over
�0� �� for all � > −1 and � � 0.
Measures, Integrals and Martingales 99
Measurability of f follows from its continuity. Using the exponential series,
we find for all N ∈ � and x > 0
��x�N
N ! �
�∑
j=0
��x�j
j! = e
�x =⇒ e−�x � N !
�N
x−N �
As e−�x � 1 for x > 0, we obtain the following majorization:
f�x� = x�e−�x � x�1�0�1��x�
︸ ︷︷ ︸
∈ �1�0�1�� if �>−1
by Example 11.12
+ N !
�N
x�−N 1�1����x�
︸ ︷︷ ︸
∈ �1�1���� if �−N<−1
by Example 11.12
∈ �1�0� ��� (11.12)
and f ∈ �1�0� �� follows from T10.3(iv).
11.14 Example (Euler’s Gamma function) The parameter-dependent integral
��t� =
∫
�0���
xt−1e−x ��dx�� t > 0 (11.13)
is called the Gamma function. It has the following properties:
(i) � is continuous;
(ii) � is arbitrarily often differentiable; (see Problem 11.13(i))
(iii) t��t� = ��t + 1�, in particular ��n + 1� = n!; (see Problem 11.13(ii))
(iv) ln ��t� is convex. (see Problem 11.13(iii))
Example 11.13 shows that the Gamma function is well-defined for all t > 0. We
prove (i) and (ii) first for every interval �a� b� where 0 < a < b < �. Since both
continuity and differentiability are local properties, i.e. they need to be checked
locally at each point, (i) and (ii) follow for the half-line if we let a → 0 and
b → �.
(i) We apply the continuity lemma T11.4. Set u�t� x� = xt−1e−x. We have
already seen in Example 11.13 that u�t� •� ∈ �1�0� �� for all t > 0; the conti-
nuity of u�•� x� is clear and all that remains is to find a uniform (for t ∈ �a� b�)
dominating function. An argument similar to (11.12) gives for N > b + 1
xt−1e−x � xt−1 1�0�1��x� + N ! xt−1−N 1�1����x�
� xa−1 1�0�1��x� + N ! xb−1−N 1�1����x�
� xa−1 1�0�1��x� + N ! x−2 1�1����x��
The expression on the right no longer depends on t, and is integrable according to
Example 11.12. (Note that N = N�b� depends on the fixed interval �a� b�, but not
on t.) This shows that ��t� = ∫
�0��� u�t� x� ��dx� is continuous for all t ∈ �a� b�.
100 R.L. Schilling
(ii) We apply the differentiability lemma T11.5. The integrand u�t� •� is inte-
grable, and u�•� x� is differentiable for fixed x > 0. In fact,
u�t� x�
t
=
t
xt−1e−x = xt−1e−x ln x�
We still have to show that
t
u�t� x� has an integrable majorant uniformly for all
t ∈ �a� b�. First we observe that ln x � x, thus
∣∣∣
t
u�t� x�
∣∣∣ � xte−x � xbe−x ∀ a < t < b� x � 1�
For 0 < x < 1 we use � ln x� = ln 1
x
, so that
∣∣∣
t
u�t� x�
∣∣∣ = xt−1e−x ln 1
x
� xa−1e−x ln
1
x
∀ a < t < b� 0 < x < 1�
and since a > 0, we find some � > 0 with a − � − 1 > −1, so that
∣∣∣
t
u�t� x�
∣∣∣ � xa−1−�e−x x� ln 1
x︸ ︷︷ ︸
→0 as3 x→0
� C xa−1−�e−x ∀ a < t < b� 0 < x < 1�
Combining these calculations, we arrive at
∣∣∣
t
u�t� x�
∣∣∣ � C xa−1−�e−x 1�0�1��x� + xbe−x 1�1����x� ∀ a < t < b�
which is an integrable majorant (by Examples 11.12, 11.13) independent of
t ∈ �a� b�. This shows that ��t� is differentiable on �a� b�, with derivative
� ′�t� =
∫
�0���
xt−1 e−x ln x ��dx�� t ∈ �a� b��
A similar calculation proves that � �n� exists for every n ∈ �; see Problem 11.13.
Problems
11.1. Adapt the proof of Theorem 11.2 to show that any sequence �uj �j∈� ⊂ � with
limj→� uj �x� = u�x� and �uj � � g for some g with gp ∈ �1+ satisfies
lim
j→�
∫
�uj − u�p d� = 0�
[Hint: mimic the proof of 11.2 using �uj − u�p � ��uj � + �u��p � 2pgp.]
11.2. Give an alternative proof of Lebesgue’s dominated convergence theorem 11.2(ii)
using the generalized Fatou theorem from Problem 10.8.
3 To see this, use lim
x→0
x� ln
1
x
x=exp�−t�= lim
t→�
e−�t t = 0 if � > 0.
Measures, Integrals and Martingales 101
11.3. Prove the following result of W. H. Young [56]; among statisticians it is also known
as Pratt’s lemma, cf. J. W. Pratt [36].
Theorem (Young; Pratt): Let �fk�k� �gk�k and �Gk�k be sequences of integrable
functions on a measure space �X� �� ��. If
(i) fk�x�
k→�−−→ f�x�, gk�x�
k→�−−→ g�x�, Gk�x�
k→�−−→ G�x� for all x ∈ X,
(ii) gk�x� � fk�x� � Gk�x� for all k ∈ � and all x ∈ X,
(iii)
∫
gk d�
k→�−−→ ∫ g d� and ∫ Gk d�
k→�−−→ ∫ G d� with ∫ g d� and ∫ G d� finite,
then limk→�
∫
fk d� =
∫
f d� and
∫
f d� is finite.
Explain why this generalizes Lebesgue’s dominated convergence theorem 11.2(ii).
11.4. Let �uj �j∈� be a sequence of integrable functions on �X� �� ��. Show that, if∑�
j=1
∫ �uj � d� < �, the series
∑�
j=1 uj converges a.e. to a real-valued function
u�x�, and that in this case
∫ �∑
j=1
uj d� =
�∑
j=1
∫
uj d��
[Hint: use C9.9 to see that the series
∑
j uj converges absolutely for almost all
x ∈ X. The rest is then dominated convergence.]
11.5. Let �uj �j∈� be a sequence of positive integrable functions on a measure space
�X� �� ��. Assume that the sequence decreases to 0: u1 � u2 � u3 � � � � and
uj ↓ 0. Show that
∑�
j=1�−1�j uj converges, is integrable and that the integral is
given by
∫ �∑
j=1
�−1�j uj d� =
�∑
j=1
�−1�j
∫
uj d��
[Hint: mimic the proof of the Leibniz test for alternating series.]
11.6. Give an example of a sequence of integrable functions �uj �j∈� with uj �x�
j→�−−→ u�x�
for all x and an integrable function u but such that limj→�
∫
uj d� �=
∫
u d�. Does
this contradict Lebesgue’s dominated convergence theorem 11.2?
11.7. Let � be one-dimensional Lebesgue measure. Show that for every integrable
function u, the integral function
x �→
∫
�0�x�
u�t� ��dt�� x > 0�
is continuous. What happens if we exchange � for a general measure �?
11.8. Consider the functions
(i) u�x� = 1
x
� x ∈ �1� ��� (ii) v�x� = 1
x2
� x ∈ �1� ���
(iii) w�x� = 1√
x
� x ∈ �0� 1�� (iv) y�x� = 1
x
� x ∈ �0� 1��
and check whether they are Lebesgue integrable in the regions given – what would
happen if we consider � 1
2
� 2� instead?
102 R.L. Schilling
[Hint: consider first uk = u 1�1�k�, resp., wk = w 1�1/k�1�, etc. and use monotone
convergence and the fact that Riemann and Lebesgue integrals coincide if both
exist.]
11.9. Show that the function � � x �→ exp�−x�� is �1�dx�-integrable over the set �0� ��
for every � > 0.
[Hint: find dominating integrable functions u resp. w if 0 � x � 1 resp. 1 < x < �
and glue them together by u 1�0�1� + w 1�1��� to get an overall integrable upper
bound.]
11.10. Show that for every parameter � > 0 the function x �→ (sin x
x
)3
e−�x is integrable
over �0� �� and continuous as a function of the parameter.
[Hint: find piecewise dominating integrable functions like in Problem 11.9; use
the continuity lemma 11.4.]
11.11. Show that the function
G � → �� G�x� =
∫
�\�0
sin�tx�
t �1 + t2� dt
is differentiable and find G�0� and G′�0�. Use a limit argument, integration by
parts for
∫
�−n�n� � � � dt and the formula t
t sin�tx� = x
x sin�tx� to show that
x G′�x� =
∫
�
2t sin�tx�
�1 + t2�2 dt�
11.12. Denote by � one-dimensional Lebesgue measure. Prove that
(i)
∫
�1���
e−x ln�x� ��dx� = lim
k→�
∫
�1�k�
(
1 − x
k
)k
ln�x� ��dx�.
(ii)
∫
�0�1�
e−x ln�x� ��dx� = lim
k→�
∫
�0�1�
(
1 − x
k
)k
ln�x� ��dx�.
11.13. Euler’s Gamma function. Show that the function
��t� =
∫
�0���
e−x xt−1 dx� t > 0�
(i) � � � is m-times differentiable with � �m��t� = ∫
�0��� e
−x xt−1 �log x�m dx.
[Hint: take t ∈ �a� b�, use induction in m. Note that �e−x xt−1�log x�m� �
xm+t−1 e−x � Mx−2 for x � 1, and � M ′x�−1 for x < 1 and some � > 0 because
limx→0 x
a−�� log x�m = 0 – use, e.g. the substitution x = e−y .]
(ii) � � � satisfies ��t + 1� = t��t�.
[Hint: use integration by parts for
∫ n
1/n
� � � dt and let n → �.]
(iii) � � � and is logarithmically convex, i.e. t �→ ln ��t� is convex.
[Hint: calculate
(
ln ��t�
)′′
and show that this is positive.]
11.14. Show that x �→ xnf�u� x�, f�u� x� = eux/�ex + 1�, 0 < u < 1, is integrable over �
and that g�u� = ∫ xnf�u� x� dx, 0 < u < 1, is arbitrarily often differentiable.
11.15. Moment generating function. Let X be a random variable on the probabil-
ity space ��� �� P�. The function �X �t� =
∫
e−tX dP is called the moment
generating function. Show that �X is m-times differentiable at t = 0+ if the
Measures, Integrals and Martingales 103
absolute mth moment
∫ �X�m dP exists. If this is the case, the following formulae
hold:
(i)
∫
Xk dP = �−1�k d
k
dtk
�X �t�
∣∣∣
t=0+
for all 0 � k � m.
(ii) �X �t� =
m∑
k=0
∫
Xk dP
k! �−1�
ktk + o�tm�. (f�t� = o�tm� means that
limt→0 f�t�/t
m = 0.)
(iii)
∣∣∣∣�X �t� −
m−1∑
k=0
∫
Xk dP
k! �−1�
ktk
∣∣∣∣ �
�t�m
m!
∫
�X�m dP.
(iv) If
∫ �X�k dP < � for all k ∈ �, then
�X �t� =
�∑
k=0
∫
Xk dP
k! �−1�
ktk
for all t within the convergence radius of the series.
11.16. Consider the functions u�x� = 1�∩�0�1� and v�x� = 1�n−1 n∈�
�x�. Prove or disprove:
(i) The function u is 1 on the rationals and 0 otherwise. Thus u is continu-
ous everywhere except the set � ∩ �0� 1�. Since this is a null set, u is a.e.
continuous, hence Riemann integrable by Theorem 11.8.
(ii) The function v is 0 everywhere but for the values x = 1/n, n ∈ �. Thus v
is continuous everywhere except a countable set, i.e. a null set, and v is a.e.
continuous, hence Riemann integrable by Theorem 11.8.
(iii) The functions u and v are Lebesgue integrable and
∫
u d� = ∫ v d� = 0.
(iv) The function u is not Riemann integrable.
11.17. Construct a sequence of functions �uj �j∈� which are Riemann integrable but con-
verge to a limit uj
j→�−−→ u which is not Riemann integrable.
[Hint: consider, e.g. uj = 1�q1 �q2 �����qj
where �qj �j is an enumeration of �.]
11.18. Assume that u �0� �� → � is positive and improperly Riemann integrable. Show
that u is also Lebesgue integrable.
11.19. Fresnel integrals. Show that the following improper Riemann integrals exist:
∫ �
0
sin x2 dx and
∫ �
0
cos x2 dx�
Do they exist as Lebesgue integrals?
Remark. The above integrals have the value 1
2
√
�
2
. This can be proved by
Cauchy’s theorem or the residue theorem.
11.20. Frullani’s integral. Let f �0� �� → � be a continuous function such that
limx→0 f�x� = m and limx→� f�x� = M. Show that the two-sided improper Riemann
integral
lim
r→0
s→�
∫ s
r
f�bx� − f�ax�
x
dx = �M − m� ln b
a
104 R.L. Schilling
exists for all a� b > 0. Does this integral have a meaning as Lebesgue integral?
[Hint: use the mean value theorem for integrals, E.12.]
11.21. Denote by � one-dimensional Lebesgue measure on the interval �0� 1�.
(i) Show that for all k ∈ �0 one has
∫
�0�1�
�x ln x�k ��dx� = �−1�k
(
1
k + 1
)k+1
��k + 1��
(ii) Use (i) to conclude that
∫
�0�1�
x−x ��dx� =
�∑
k=1
k−k.
[Hint: note that x−x = e−x ln x and use the exponential series.]
12
The function spaces �p� 1 � p � �
Throughout this chapter �X� �� �� will be some measure space.
We will now discuss functions whose (absolute) pth power or pth (absolute)
moment is integrable. More precisely, we are interested in the sets
�p��� �=
{
u � X → � � u ∈ ��
∫
�u�p d� < �
}
� p ∈ �1� ��� (12.1)
As usual, we suppress � if the choice of measure is clear, and we write �p�X� or
�p��� if we want to stress the underlying space or -algebra. It is convenient
to have the following notation:
�u�p �=
(∫
�u�x��p ��dx�
)1/p
� (12.2)
Clearly, u ∈ �p��� ⇐⇒ u ∈ � and �u�p < �. It is no accident that the notation
�•�p resembles the symbol for a norm:1 indeed, we have because of T10.9(i)
�u�p = 0 ⇐⇒ u = 0 a.e., (12.3)
and for all
∈ �
�
u�p =
(∫
�
u�p d�
)1/p
=
(
�
�p
∫
�u�p d�
)1/p
= �
��u�p� (12.4)
The triangle inequality for �•�p and deeper results on �p depend much on the
following elementary inequality.
12.1 Lemma (Young’s inequality) Let p� q ∈ �1� �� be conjugate numbers,
i.e. 1
p
+ 1
q
= 1 or q = p
p−1 . Then
AB �
Ap
p
+ B
q
q
(12.5)
holds for all A� B � 0; equality occurs if, and only if, B = Ap−1.
1 See Appendix B.
105
106 R.L. Schilling
Proof There are various different methods to prove (12.5) but probably the most
intuitive one is through the following picture:
B
A
S2
S1
ξ =
η
q–
1
η =
ξ
p–
1
η
ξ
The shaded area representing the pieces S1 and
S2 between the graph and the �- resp. �-axis is
given by
∫ A
0
�p−1 d� = A
p
p
and
∫ B
0
�q−1 d� = B
q
q
respectively. The picture shows that their
combined area is greater than the area of the
darker rectangle, thus
Ap
p
+ B
q
q
� AB. Equality obtains if, and only if, the lighter
shaded area vanishes, i.e. if B = Ap−1.
We can now prove the following fundamental inequality.
12.2 Theorem (Hölder’s inequality) Assume that u ∈ �p��� and v ∈ �q���
where p� q ∈ �1� �� are conjugate numbers: 1
p
+ 1
q
= 1. Then uv ∈ �1���, and
the following inequality holds:
∣∣∣∣
∫
uv d�
∣∣∣∣ �
∫
�uv� d� � �u�p · �v�q� (12.6)
Equality occurs if, and only if, �u�x��p/�u�pp = �v�x��q/�v�qq a.e.
Proof The first inequality of (12.6) follows directly from T10.4(v). To see the
other inequality we use (12.5) with
A �= �u�x���u�p
and B �= �v�x���v�q
to get
�u�x�v�x��
�u�p �v�q
�
�u�x��p
p �u�pp
+ �v�x��
q
q �v�qq
�
Integrating both sides of this inequality over x yields
∫ �u�x�v�x�� ��dx�
�u�p �v�q
�
�u�pp
p �u�pp
+ �v�
q
q
q �v�qq
= 1
p
+ 1
q
= 1�
Measures, Integrals and Martingales 107
Equality can only happen if we have equality in (12.5). Because of our choice
of A and B, the condition for equality from L12.1 becomes �v�x��/�v�q =(�u�x��/�u�p�
)p−1
a.e. Raising both sides to the qth power gives �v�x��q/�v�qq =
�u�x��p/�u�pp since �p − 1�q = p.
Hölder’s inequality with p = q = 2 is usually called the Cauchy–Schwarz
inequality.
12.3 Corollary (Cauchy–Schwarz inequality) Let u� v ∈ �2���. Then uv ∈
�1��� and
∫
�uv� d� � �u�2 · �v�2� (12.7)
Equality occurs if, and only if, �u�x��2/�u�22 = �v�x��2/�v�22 a.e.
Another consequence of Hölder’s inequality is the Minkowski or triangle
inequality for �•�p.
12.4 Corollary (Minkowski’s inequality) Let u� v ∈ �p���, p ∈ �1� ��. Then
u + v ∈ �p��� and
�u + v�p � �u�p + �v�p� (12.8)
Proof Since
�u + v�p � ��u� + �v��p � 2p max
�u�p� �v�p� � 2p ��u�p + �v�p�
we get that �u + v�p ∈ �1��� or u + v ∈ �p���. Now
∫
�u + v�p d� =
∫
�u + v� · �u + v�p−1 d�
�
∫
�u� · �u + v�p−1 d� +
∫
�v� · �u + v�p−1 d�
�if p = 1 the proof stops here�
12.2
� �u�p ·
∥∥�u + v�p−1
∥∥
q
+ �v�p ·
∥∥�u + v�p−1
∥∥
q
�
Dividing both sides by ��u + v�p−1�q proves our claim since
∥∥�u + v�p−1
∥∥
q
=
(∫
�u + v��p−1�q d�
)1/q
=
(∫
�u + v�p d�
)1−1/p
�
where we also used that q = p
p−1 .
108 R.L. Schilling
12.5 Remarks (i) Formulae (12.4) and (12.8) imply
u� v ∈ �p��� =⇒
u + �v ∈ �p��� ∀
� � ∈ ��
which shows that �p��� is a vector space.
(ii) Formulae (12.3), (12.4) and (12.8) show that �•�p is a semi-norm for �p:
the definiteness of a norm is not fulfilled since
�u�p = 0 only implies that u�x� = 0 for almost every x
but not for all x. There is a standard recipe to fix this: since �p-functions can
be altered on null sets without affecting their integration behaviour, we introduce
the following equivalence relation: we call u� v ∈ �p��� equivalent if they differ
on at most a �-null set, i.e.
u ∼ v ⇐⇒
u �= v� ∈ ���
The quotient space Lp��� �= �p���/∼ consists of all equivalence classes of �p-
functions. If �u�p ∈ Lp��� denotes the equivalence class induced by the function
u ∈ �p���, it is not hard to see that
�
u + �v�p =
�u�p + ��v�p and �uv�1 = �u�p�v�q
hold, turning Lp��� into a bona fide vector space with the canonical norm
��u�p�p �= inf
{�w�p � w ∈ �p� w ∼ u
}
for quotient spaces. Fortunately, ��u�p�p = �u�p and later on we will often follow
the usual abuse of notation and identify �u� with u.
(iii) All results of this chapter are still valid for �̄-valued numerical functions.
Indeed, if f ∈ ��̄ and
∫ �f �p d� < �, then
�
(
�f � = ��) = �(
�f �p = ��) = �
( ⋂
j∈�
�f �p > j�
)
4.4= lim
j→�
�
(
�f �p > j�)
10.12
� lim
j→�
1
j
∫
�f �p d� = 0�
by the Markov inequality. This means, however, that f is a.e. �-valued, so sums
and products of such functions are always defined outside a �-null set. In particular
there is no need to distinguish between the classes Lp��� �= Lp����� and Lp�̄���.
∗ ∗ ∗
Measures, Integrals and Martingales 109
We will need the concept of convergence of a sequence in the space �p���.
A sequence �uj �j∈� ⊂ �p��� is said to be convergent in �p��� with limit �p-
limj→� uj = u if, and only if,
lim
j→�
�uj − u�p = 0�
Remember, however, that �p-limits are only almost everywhere unique. If u� w
are both �p-limits of the same sequence �uj �j∈�, we have
�u − w�p
12�4
� lim
j→�
(�u − uj�p + �uj − w�p
) = 0�
implying only u = w almost everywhere.
We call �uj �j∈� ⊂ �p��� a (�p-) Cauchy sequence, if
∀ � > 0 ∃ N� ∈ � ∀ j� k � N� � �uj − uk�p < ��
Note that these definitions reduce convergence in �p to convergence questions of
the semi-norm �•�p in �+. This means that, apart from uniqueness, many formal
properties of limits in � carry over to �p – most of them even with the same
proof!
Caution: Pointwise convergence of a sequence uj �x� → u�x� of �p-functions
�uj �j∈� ⊂ �p does not guarantee convergence in �p – but in view of Lebesgue’s
dominated convergence theorem 11.2, the additional condition that
�uj �x�� � g�x� for some function g ∈ �p+
is sufficient since �uj − u�p � ��uj� + �u��p � 2p gp and �uj �x� − u�x�� → 0.[�]
Clearly, a convergent sequence �uj �j∈� is also a Cauchy sequence,
�uj − uk�p � �uj − u�p + �u − uk�p < 2� ∀ j� k � N��
the converse of this assertion is also true, but much more difficult to prove. We
start with a simple observation:
12.6 Lemma For any sequence �uj �j∈� ⊂ �p���, p ∈ �1� ��, of positive functions
uj � 0 we have ∥∥∥∥∥
�∑
j=1
uj
∥∥∥∥∥
p
�
�∑
j=1
�uj�p� (12.9)
Proof Repeated applications of Minkowski’s inequality (12.8) show that
∥∥∥∥∥
N∑
j=1
uj
∥∥∥∥∥
p
�
N∑
j=1
�uj�p �
�∑
j=1
�uj�p�
110 R.L. Schilling
and since the right-hand side is independent of N , the inequality remains valid
even if we pass to the sup
N ∈�
on the left. By Beppo Levi’s theorem 9.6, we find
sup
N ∈�
∥∥∥∥∥
N∑
j=1
uj
∥∥∥∥∥
p
p
= sup
N ∈�
∫ ( N∑
j=1
uj
)p
d�
=
∫ (
sup
N ∈�
N∑
j=1
uj
)p
d� =
∫ ( �∑
j=1
uj
)p
d��
and the proof follows.
The completeness of �p was proved by E. Fischer (for p = 2) and F. Riesz
(for 1 � p < ��.
12.7 Theorem (Riesz–Fischer) The spaces �p���, p ∈ �1� ��, are complete,
i.e. every Cauchy sequence �uj �j∈� ⊂ �p��� converges to some limit u ∈ �p���.
Proof The main difficulty here is to identify the limit u. By the definition of a
Cauchy sequence we find numbers
1 < n�1� < n�2� < � � � < n�k� < � � �
such that
∥∥un�k+1� − un�k�
∥∥
p
< 2−k k ∈ ��
To find u, we turn the sequence into a series by
un�k+1� =
k∑
j=0
(
un�j+1� − un�j�
)
� un�0� �= 0� (12.10)
and the limit as k → � would formally be u �= ∑�j=0
(
un�j+1� − un�j�
)
– if we can
make sense of this infinite sum. Since
∥∥∥∥∥
�∑
j=0
∣∣un�j+1� − un�j�
∣∣
∥∥∥∥∥
p
(12.9)
�
�∑
j=0
∥∥un�j+1� − un�j�
∥∥
p
�
∥∥un�1�
∥∥
p
+
�∑
j=1
1
2j
�
(12.11)
we conclude with C10.13 that
(∑�
j=0
∣∣un�j+1� − un�j�
∣∣)p < � a.e., so that u =∑�
j=0
(
un�j+1� − un�j�
)
is a.e. (absolutely) convergent.
Measures, Integrals and Martingales 111
Let us show that u = �p- limk→� un�k�. For this, observe that by the (ordinary)
triangle inequality and (12.11),
∥∥u − un�k�
∥∥
p
=
∥∥∥∥∥
�∑
j=k+1
(
un�j+1� − un�j�
)
∥∥∥∥∥
p
def=
∥∥∥∥∥
∣∣∣∣∣
�∑
j=k+1
(
un�j+1� − un�j�
)
∣∣∣∣∣
∥∥∥∥∥
p
�
∥∥∥∥∥
�∑
j=k+1
∣∣un�j+1� − un�j�
∣∣
∥∥∥∥∥
p
(12.9)
�
�∑
j=k+1
∥∥un�j+1� − un�j�
∥∥
p
k→�−−−→ 0�
Finally, using that �uj �j∈� is a Cauchy sequence, we get, for all � > 0 and suitable
N� ∈ �,
∥∥u − uj
∥∥
p
�
∥∥u − un�k�
∥∥
p
+
∥∥un�k� − uj
∥∥
p
�
∥∥u − un�k�
∥∥
p
+ � ∀ j� n�k� � N��
Letting k → � shows
∥∥u − uj
∥∥
p
� � if j � N�.
The proof of Theorem 12.7 shows even a weak form of pointwise convergence:
12.8 Corollary Let �uj �j∈� ⊂ �p���, p ∈ �1� �� with �p-limj→� uj = u. Then
there exists a subsequence �un�k��k∈� such that limk→� un�k��x� = u�x� holds for
almost every x ∈ X.
Proof Since �uj �j∈� converges in �p, it is also an �p-Cauchy sequence and the
claim follows from (12.11).
As we have already remarked, pointwise convergence alone does not guarantee
convergence in �p, not even of a subsequence, see Problem 12.7. Let us repeat
the following sufficient criterion, which we have already proved on page 109.
12.9 Theorem Let �uj �j∈� ⊂ Lp���, p ∈ �1� ��, be a sequence of functions such
that �uj� � w for all j ∈ � and some w ∈ �p+���. If u�x� = limj→� uj �x� exists
for almost every x ∈ X, then
u ∈ �p��� and lim
j→�
�u − uj�p = 0�
Of a different flavour is the next result which is sometimes called F. Riesz’s
convergence theorem.
112 R.L. Schilling
12.10 Theorem (Riesz) Let �uj �j∈� ⊂ �p���, p ∈ �1� ��, be a sequence such
that limj→� uj �x� = u�x� for almost every x ∈ X and some u ∈ �p���. Then
lim
j→�
�uj − u�p = 0 ⇐⇒ lim
j→�
�uj�p = �u�p� (12.12)
Proof The direction ‘⇒’ in (12.12) follows from the lower triangle inequality2
��uj�p − �u�p� � �uj − u�p for �•�p.
For ‘⇐’ we observe that
�uj − u�p � ��uj� + �u��p � 2p max
�uj�p� �u�p� � 2p ��uj�p + �u�p��
and we can apply Fatou’s lemma 9.11 to the sequence
2p ��uj�p + �u�p� − �uj − u�p � 0
to get
2p+1
∫
�u�p d� =
∫
lim inf
j→�
(
2p ��uj�p + �u�p� − �uj − u�p
)
d�
� lim inf
j→�
(
2p
∫
�uj�p d� + 2p
∫
�u�p d� −
∫
�uj − u�p d�
)
= 2p+1
∫
�u�p d� − lim sup
j→�
∫
�uj − u�p d��
where we used that limj→�
∫ �uj�p d� =
∫ �u�p d�. This shows that
lim sup
j→�
∫
�uj − u�p d� = 0� hence lim
j→�
∫
�uj − u�p d� = 0�
Let us note the following structural result on �p, which will become important
later on.
12.11 Corollary The simple p-integrable functions � ∩ �p���, p ∈ �1� ��, are
a dense subset of �p���, i.e. for every u ∈ �p��� one can find a sequence
�fj �j∈� ⊂ � such that limj→� �fj − u�p = 0.
Proof Assume first that u ∈ �p+��� is positive. By Theorem 8.8 we find an
increasing sequence �fj �j∈� of positive simple functions with supj∈� fj = u. Since
0 � fj � u, we have fj ∈ �p��� as well as supj∈�
∫ �fj�p d� =
∫ �u�p d�.[�] We
can now apply Theorem 12.10 and deduce that limj→� �fj − u�p = 0.
2 Follows exactly as ��a� − �b�� � �a − b� follows from �a + b� � �a� + �b�, a� b ∈ �.
Measures, Integrals and Martingales 113
For a general u ∈ �p���, we consider its positive and negative parts u± and
construct, as before, sequences gj � hj ∈ � ∩ �p��� with gj → u+ and hj → u− in
�p���. But then gj − hj ∈ � ∩ �p���, and
�u − �gj − hj ��p � �u+ − gj�p + �u− − hj�p
j→�−−−→ 0
finishes the proof.
With a special choice of �X� �� �� we can see that integrals generalize infinite
series.
12.12 Example Consider the counting measure � = ∑�j=1 �j , cf. Example 4.7(iii),
on the measurable space ��� ����. As we have seen in Examples 9.10(ii) and
10.6(ii), a function u � � → � is �-integrable if, and only if,
�∑
j=1
�u�j�� < �� in which case
∫
�
u d� =
�∑
j=1
u�j��
In a similar way one shows that v ∈ �p��� if, and only if, ∑�j=1 �v�j��p < �.
Functions u � � → � are determined by their values �u�1�� u�2�� u�3�� � � �� and
every sequence �aj �j∈� ⊂ � defines a function u by u�j� �= aj . This means that
we can identify the function u with the sequence �u�j��j∈� of real numbers. Thus
�p��� =
{
u � � → � �
�∑
j=1
�u�j��p < �
}
=
{
�aj �j∈� ⊂ � �
�∑
j=1
�aj�p < �
}
=� �p����
the latter being a so-called sequence space. Note that in this context Hölder’s and
Minkowski’s inequalities become
�∑
j=1
�aj bj� �
( �∑
j=1
�aj�p
)1/p( �∑
j=1
�bj�q
)1/q
(12.13)
if p� q ∈ �1� �� are conjugate numbers, and
( �∑
j=1
�aj ± bj�p
)1/p
�
( �∑
j=1
�aj�p
)1/p
+
( �∑
j=1
�bj�p
)1/p
� (12.14)
114 R.L. Schilling
We close this chapter with a useful convexity, resp. concavity, inequality.
Recall that a function � � �a� b� → � on an interval �a� b� ⊂ �̄ is convex [concave] if
��tx + �1 − t�y� � t��x� + �1 − t���y�� 0 < t < 1�
[
��tx + �1 − t�y� � t��x� + �1 − t���y�� 0 < t < 1�
] (12.15)
holds for all x� y ∈ �a� b�. Geometrically this means that the graph of a convex
z x
A concave function Φ
y
[concave] function � between
the points �x� ��x�� and
�y� ��y�� lies below [above]
the chord linking �x� ��x��
and �y� ��y��. Convex [con-
cave] functions have nice
properties: they are continu-
ous in �a� b� and if �′ exists,
it is increasing [decreasing].
If � is twice differentiable, convexity [concavity] is equivalent to �′′ � 0
[�′′ � 0]. Further details and proofs can be found in Boas [8]. For our pur-
poses we need the following lemma.
12.13 Lemma A convex [concave] function � � �a� b� → � has at every point in
the open interval �a� b� a finite right-hand derivative �′+ and satisfies
��x� � �′+�y� �x − y� + ��y�� ∀ x� y ∈ �a� b�[
��x� � �′+�y� �x − y� + ��y�� ∀ x� y ∈ �a� b�
]
�
(12.16)
In particular, a convex [concave] function is the upper [lower] envelope of all
linear functions below [above] its graph
��x� = sup
��x� � ��z� =
z + � � ��z� ∀ z ∈ �a� b��
[
��x� = inf
��x� � ��z� =
z + � � ��z� ∀ z ∈ �a� b��
]
�
(12.17)
Proof Since the graph of a convex [concave] function looks like a smile [frown],
the last statement of the lemma is intuitively clear. A rigorous argument uses
(12.16) which says that � admits at every point a tangent below [above] its graph.
We show (12.16) only for concave functions. Pick numbers
z < � < y < � < x
Measures, Integrals and Martingales 115
in �a� b� and choose t = t�y� �� x� ∈ �0� 1� such that � = ty + �1 − t�x, t′ =
t′��� y� �� such that y = t′� + �1 − t′�� and t′′ = t′′�z� �� �� such that � = t′′z +
�1 − t′′��. Using these values in (12.15) we see after some simple manipulations
that
��x� − ��y�
x − y �
���� − ��y�
� − y �
���� − ����
� − � �
��z� − ����
z − � �
cf. the picture on page 114. This shows that ����−��y�
�−y is bounded and increasing
as � ↓ y. Therefore, the right-hand derivative �′+�y� �= lim�↓y ����−��y��−y exists
and is finite. In particular, � is continuous from the right at the point y, so that
lim�↓y ���� = ��y�. Letting � → y in the above chain of inequalities therefore
yields
��x� − ��y�
x − y � �
′
+�y� �
��y� − ����
y − � ∀ � < y < x�
and rearranging the first of these inequalities gives (12.16).
12.14 Theorem (Jensen’s inequality) Let � � �0� �� → �0� �� be a concave and
V � �0� �� → �0� �� be a convex function. For any w ∈ �1+��� we have
∫
��u� w d�∫
w d�
� �
(∫
uw d�∫
w d�
)
� ∀ u ∈ �+� (12.18)
V
(∫
uw d�∫
w d�
)
�
∫
V�u� w d�∫
w d�
� ∀ u ∈ �+� (12.19)
If uw ∈ �1���, then ��u� w ∈ �1���.
Proof We prove only (12.18) since (12.19) is similar. If the right-hand side
of (12.18) is infinite, there is nothing to show. Therefore we may assume∫
uw d� < �. Since ��x� is concave, we find for any ��x� �=
x + � � ��x�
that
∫
��u� w d�∫
w d�
�
∫
�
u + �� w d�∫
w d�
=
∫
uw d�∫
w d�
+ � = �
(∫
uw d�∫
w d�
)
and the inequality follows from (12.17) if we pass to the inf over all linear
functions satisfying � � �.
12.15 The case p = �. In Theorem 12.2 and Corollary 12.4 we avoided the
cases p = 1 or �. This can be overcome by introducing the space �����:
����� = {u ∈ ���� � u is a.e. bounded}� (12.20)
116 R.L. Schilling
Obviously, ����� is a vector space, and we can introduce by
�u�� �= inf
{
C � ��
�u� > C�� = 0} (12.21)
a norm[�] which is, for continuous u, just �u�� = sup �u�. Interpreting p = 1 and
q = � as conjugate numbers, it is not hard to verify T12.2 and C12.4 for these
values of p and q. The completeness of ����� is much easier to prove than
T12.7: if �uj �j∈� is a Cauchy sequence in �����, we set
Ak�� �=
�uk� > �uk��� ∪
�uk − u�� > �uk − u����� A �=
⋃
k��∈�
Ak���
By definition, ��Ak��� = 0 and ��A� = 0, so that �uj 1A�� = 0 for all j ∈ �.
On the set Ac, however, �uj �j∈� converges uniformly to a bounded function u,
i.e. u1Ac ∈ ����� as well as �uj − u1Ac �� → 0.
As in Remark 12.5 we write L���� for �����/∼, where u ∼ v means that
u �= v� ∈ �� is a �-null set.
Note also that T12.10 and C12.11 are no longer true for p = �. This can
be seen on ���
� �� from uj �x� �= e−�x/j�
j→�−−−→ 1��x� for the former and from
u�x� = ∑�j=−� 1�2j�2j+1��x� for the latter.[�]
Problems
12.1. Let �X� �� �� be a finite measure space and let 1 � q < p < �.
(i) Show that �u�q � ��X�1/q−1/p �u�p.
[Hint: use Hölder’s inequality for u · 1.]
(ii) Conclude that �p��� ⊂ �q��� for all p � q � 1 and that a Cauchy sequence
in �p is also a Cauchy sequence in �q .
(iii) Is this still true if the measure � is not finite?
12.2. Let �X� �� �� be a general measure space and 1 � p � r � q � �. Prove that
�p��� ∩ �q��� ⊂ �r ��� by establishing the inequality
�u�r � �u��p · �u�1−�q ∀ u ∈ �p��� ∩ �q����
with � = � 1
r
− 1
q
�/� 1
p
− 1
q
�.
[Hint: use Hölder’s inequality.]
12.3. Extend the proof of Hölder’s inequality 12.2 to p = 1 and q = �, i.e. show that
∫
uv d� � �u�1 · �v�� (12.22)
holds for all u ∈ �1��� and v ∈ �����.
Measures, Integrals and Martingales 117
12.4. Generalized Hölder inequality. Iterate Hölder’s inequality to derive the following
generalization:
∫
�u1 · u2 · � � � · uN � d� � �u1�p1 · �u2�p2 · � � � · �uN �pN (12.23)
for all pj ∈ �1� �� such that
∑N
j=1 p
−1
j = 1 and all measurable uj ∈ �.
12.5. Young functions. Let � � �0� �� → �0� �� be a strictly increasing continuous
function such that ��0� = 0 and lim�→� ���� = �. Denote by ���� �= �−1���
the inverse function. The functions
��A� �=
∫
�0�A�
���� �1�d�� and ��B� �=
∫
�0�B�
���� �1�d�� (12.24)
are called conjugate Young functions. Adapt the proof of L12.1 to show the
following general Young’s inequality:
AB � ��A� + ��B� ∀ A� B � 0� (12.25)
[Hint: interpret ��A� and ��B� as areas below the graph of ����, resp. ����.]
12.6. Let 1 � p < � and u� uk ∈ �p��� such that
∑�
k=1 �u − uk�p < �. Show that
limk→� uk�x� = u�x� almost everywhere.
[Hint: mimic the proof of the Riesz–Fischer theorem using
∑
j �uj+1 − uj �.]
12.7. Consider one-dimensional Lebesgue measure on �0� 1�. Verify that the sequence
un�x� �= n 1�0�1/n��x�, n ∈ �, converges pointwise to the function u ≡ 0, but that
no subsequence of un converges in �
p-sense for any p � 1.
12.8. Let p� q ∈ �1� �� be conjugate indices, i.e. p−1 + q−1 = 1 and assume that �uk�k∈� ⊂
�p and �wk�k∈� ⊂ �q are sequences with limits u and w in �p, resp. �q -sense.
Show that ukwk converges in �
1 to the function uw.
12.9. Prove that �uj �j∈� ⊂ �2 converges in �2 if, and only if, limn�m→�
∫
unum d� exists.
[Hint: verify and use the identity �u − w�22 = �u�22 + �w�22 − 2
∫
uw d�.]
12.10. Let �X� �� �� be a finite measure space. Show that every measurable u � 0 with∫
exp�hu�x�� ��dx� < � for some h > 0 is in �p for every p � 1.
[Hint: check that �t�N /N ! � e�t� implies u ∈ �N , N ∈ �; then use Problem 12.1.]
12.11. Let � be Lebesgue measure in �0� �� and p� q � 1 arbitrary.
(i) Show that un�x� �= n
�x + n�−� (
∈ �� � > 1) is for every n ∈ � in �p���.
(ii) Show that vn�x� �= n� e−nx (� ∈ �) is for every n ∈ � in �q���.
12.12. Let u�x� = �x
+ x��−1, x�
� � > 0. For which p � 1 is u ∈ �p��1� �0� ���?
12.13. Consider the measure space �� =
1� 2� � � � � n�� ���� ��, n � 2 where � is the
counting measure. Show that
(∑n
j=1 �xj �p
)1/p
is a norm if p ∈ �1� ��, but not for
p ∈ �0� 1�.
[Hint: you can identify �p��� with �n.]
118 R.L. Schilling
12.14. Let �X� �� �� be a measure space. The space �p��� is called separable, if there
exists a countable dense subset �p ⊂ �p���. Show that �p���, p ∈ �1� ��, is
separable if, and only if, �1��� is separable.
[Hint: use Riesz’s convergence theorem 12.10.]
12.15. Let un ∈ �p, p � 1, for all n ∈ �. What can you say about u and w if you know
that limn→�
∫ �un − u�p d� = 0 and limn→� un�x� = w�x� for almost every x?
12.16. Let �X� �� �� be a finite measure space and let u ∈ �1 be strictly positive with∫
u d� = 1. Show that
∫
�log u� d� � ��X� log
1
��X�
�
12.17. Let u be a positive measurable function on �0� 1�. Which of the following is larger:
∫
�0�1�
u�x� log u�x� ��dx� or
∫
�0�1�
u�s� ��ds� ·
∫
�0�1�
log u�t� ��dt�?
[Hint: show that log x � x log x, x > 0, and assume first that
∫
u d� = 1, then
consider u/
∫
u d�.]
12.18. Let �X� �� �� be a measure space and p ∈ �0� 1�. The conjugate index is given
by q �= 1/�p − 1� < 0. Prove for all measurable u� v� w � X → �0� �� with∫
up d��
∫
vp d� < � and 0 < ∫ wq d� < � the inequalities
∫
uw d� �
(∫
up d�
)1/p(∫
wq d�
)1/q
and (∫
�u + v�p d�
)1/p
�
(∫
up d�
)1/p
+
(∫
vp d�
)1/p
�
[Hint: consider Hölder’s inequality for u and 1/w.]
12.19. Let �X� �� �� be a finite measure space and u ∈ ���� be a bounded function
with �u�� > 0. Prove that for all n ∈ �:
(i) Mn �=
∫ �u�n d� ∈ �0� ��;
(ii) Mn+1Mn−1 � M
2
n ;
(iii) ��X�−1/n�u�n � Mn+1/Mn � �u��;
(iv) limn→� Mn+1/Mn = �u��.
[Hint: (ii) – use Hölder’s inequality; (iii) – use Jensen’s inequality for the lower
estimate, Hölder’s inequality for the upper estimate; (iv) – observe that
∫
un d� �∫
u>�u��−����u�� − ��
n d� = ��
u > �u�� − ��� ��u�� − ��n, take the nth root and
let n → �.]
12.20. Let �X� �� �� be a general measure space and let u ∈ ⋂p�1 �p���. Then
lim
p→�
�u�p = �u��
where �u�� = � if u is unbounded.
Measures, Integrals and Martingales 119
[Hint: start with �u�� < �. Show that for any sequence qn → � one has
�u�p+qn � �u�qn/�p+qn�� ·�u�p/�p+qn�p and conclude that lim supp→� �u�p � �u��. The
other estimate follows from �u�p � ��
�u� > �1 − ���u����1/p �1 − ���u�� and
p → �� � → 0, see also the hint to Problem 12.19, where ��� � �� is finite in view
of the Markov inequality.
If �u�� = �, use part one of the hint and observe that
lim inf
p→�
sup
k∈�
��u� ∧ k�p � sup
k∈�
lim
p→�
��u� ∧ k�p = sup
k∈�
��u� ∧ k��
= sup
k∈�
sup
x
��u�x�� ∧ k� = sup
x
sup
k∈�
��u�x�� ∧ k�
= �u�� = �� �
12.21. Let �X� �� �� be a measure space and 1 � p < �. Show that f ∈ � ∩ �p��� if,
and only if, f ∈ � and ��
f �= 0�� < �. In particular, � ∩ �p��� = � ∩ �1���.
12.22. Use Jensen’s inequality (12.18) to derive Hölder’s and Minkowski’s inequalities.
Instructions: use
��x� = x1/q� x � 0� w = �f �p and u = �g�q�f �−p 1
f �=0�
for Hölder’s inequality and
��x� = �x1/p + 1�p� x � 0� w = �f �p 1
f �=0� and u = �f �−p�g�p 1
f �=0�
for Minkowski’s inequality.
13
Product measures and Fubini’s theorem
Lebesgue measure on �n has, inherent in its definition, an interesting additional
property: if n > d � 1
�n
(
�a1� b1� × · · · × �an� bn�
)
= �b1 − a1� · � � � · �bd − ad� · �bd+1 − ad+1� · � � � · �bn − an� (13.1)
= �d(�a1� b1� × · · · × �ad� bd�
)· �n−d(�ad+1� bd+1� × · · · × �an� bn�
)
�
i.e. it is – at least for rectangles – the product of Lebesgue measures in lower-
dimensional spaces. In this chapter we will see that (13.1) remains true for any
E
x0
y0
x∈�d
y∈
�
n
–
d
1E (x, y0)
1E (x0, y)
product A × B of sets A ∈ ���d� and B ∈
���n−d�. More importantly, we will prove
the following version of Cavalieri’s principle
�n�E� =
∫
1E d�
n
=
∫ [∫
1E�x� y0� �
d�dx�
]
�n−d�dy0�
=
∫ [∫
1E�x0� y� �
n−d�dy�
]
�d�dx0�
which just says that we carve up the set E ⊂ �n horizontally or vertically, measure
the volume of the slices and ‘sum’ them up along the other direction to get the
volume of the whole set E.
Clearly, we should be careful about the measurability of products of sets. Recall
the following simple rules for Cartesian products of sets A� A′� Ai ⊂ X, i ∈ I ,
120
Measures, Integrals and Martingales 121
and B� B′ ⊂ Y :
(⋃
i∈I
Ai
)
× B = ⋃
i∈I
�Ai × B��
(⋂
i∈I
Ai
)
× B = ⋂
i∈I
�Ai × B��
�A × B� ∩ �A′ × B′� = �A ∩ A′� × �B ∩ B′��
Ac × B = �X × B� \ �A × B��
A × B ⊂ A′ × B′ ⇐⇒ A ⊂ A′ and B ⊂ B′�
(13.2)
which are easily derived from the formula
A × B = �A × Y � ∩ �X × B� = �−11 �A� ∩ �−12 �B��
where �1 X × Y → X and �2 X × Y → Y are the coordinate projections, and
the compatibility of inverse mappings and set operations. To treat measurability,
we assume throughout this chapter that
�X� �� �� and �Y� �� �� are �-finite measure spaces.
Following (13.1) we want to define a measure
on rectangles of the form A × B
such that
�A × B� = ��A���B� for A ∈ � and B ∈ �. The first problem which
we encounter is that the family
� × � =
A × B A ∈ �� B ∈ �� (13.3)
is, in general, no �-algebra.
13.1 Lemma Let � and � be two �-algebras (or only semi-rings). Then � × �
is a semi-ring.1
Proof Literally the same as the induction step in the proof of P6.4.
13.2 Definition Let �X� �� and �Y� �� be two measurable spaces. Then �⊗� =
��� × �� is called a product �-algebra, and �X × Y� � ⊗ �� is the product of
measurable spaces.
The following lemma is quite useful since it allows us to reduce considerations
for � ⊗ � to generators � and � of � and � – just as we did in (13.1).
1 See �S1�–�S3� on p. 37 for the definition of a semi-ring.
122 R.L. Schilling
13.3 Lemma If � = ��� � and � = ���� and if � � � contain exhausting
sequences �Fj �j∈ ⊂ � , Fj ↑ X and �Gj �j∈ ⊂ �, Gj ↑ Y , then
��� × �� = ��� × �� def= � ⊗ ��
Proof Since � × � ⊂ � × � we have ��� × �� ⊂ � ⊗ �. On the other hand,
the system
� = {A ∈ � A × G ∈ ��� × �� ∀ G ∈ �}
is a �-algebra: Let A� Aj ∈ �, j ∈ , and G ∈ �; ��1� follows from
X × G = ⋃
j∈
�Fj × G�︸ ︷︷ ︸
∈ � ×�
∈ ��� × ���
��2� from A
c × G = �X × G� \ �A × G� ∈ ��� × ��, and ��3� from
( ⋃
j∈
Aj
)
× G = ⋃
j∈
�Aj × G�︸ ︷︷ ︸
∈ ��� ×��
∈ ��� × ���
Obviously, � ⊂ � ⊂ �, and therefore � = �; by the very definition of � we
conclude that � × � ⊂ ��� × ��. A similar consideration shows � × � ⊂
��� × ��. This means that for all A ∈ � and B ∈ �
A × B = �A × X� ∩ �Y × B� = ⋃
j�k∈
�A × Gk�︸ ︷︷ ︸
∈ ��� ×��
∩ �Fj × B�︸ ︷︷ ︸
∈ ��� ×��
�
so that � × � ⊂ ��� × �� and thus � ⊗ � ⊂ ��� × ��.
If the generators � � � are rich enough, we have not too many choices of
measures
with
�F × G� = ��F ���G�. In fact,
13.4 Theorem (Uniqueness of product measures) Let �X� �� �� and �Y� �� ��
be two measure spaces and assume that � = ��� � and � = ����. If
• � � � are ∩-stable,
• � � � contain exhausting sequences Fj ↑ X and Gk ↑ Y with ��Fj � < � and
��Gk� < � for all j� k ∈ ,
then there is at most one measure
on �X × Y� � ⊗ �� satisfying
�F × G� = ��F ���G� ∀ F ∈ �� G ∈ ��
Measures, Integrals and Martingales 123
Proof By Lemma 13.3 � × � generates � ⊗ �. Moreover, � × � inherits the
∩-stability of � and �[�], the sequence Fj × Gj increases towards X × Y and
�Fj × Gj � = ��Fj ���Gj � < �. These were the assumptions of the uniqueness
theorem 5.7, showing that there is at most one such product measure
.
As so often, it is the existence which is more difficult than uniqueness.
13.5 Theorem (Existence of product measures) Let �X� �� �� and �Y� �� �� be
�-finite measure spaces. Then the set-function
� × � → �0� ���
�A × B� = ��A���B��
extends uniquely to a �-finite measure on �X × Y� � ⊗ �� such that
�E� =
∫∫
1E�x� y� ��dx� ��dy� =
∫∫
1E�x� y� ��dy� ��dx� (13.4)
holds2 for all E ∈ � ⊗ �. In particular, the functions
x
→ 1E�x� y�� y
→ 1E�x� y�� x
→
∫
1E�x� y� ��dy�� y
→
∫
1E�x� y� ��dx�
are �, resp. �-measurable for every fixed y ∈ Y , resp. x ∈ X.
Proof Uniqueness of
follows from T13.4. Existence: Let �Aj �j∈ , �Bj �j∈ be
sequences in � resp. � with Aj ↑ X, Bj ↑ Y and ��Aj �� ��Bj � < �. Clearly,
Ej = Aj × Bj ↑ X × Y .
For every j ∈ we consider the family
j of all subsets D ⊂ X × Y satisfying
the following conditions:
• x
→ 1D∩Ej �x� y� and y
→ 1D∩Ej �x� y� are measurable,
• x
→
∫
1D∩Ej �x� y� ��dy� and y
→
∫
1D∩Ej �x� y� ��dx� are measurable,
•
∫∫
1D∩Ej �x� y� ��dx� ��dy� =
∫∫
1D∩Ej �x� y� ��dy� ��dx�.
That � × � ⊂
j follows from
∫∫
1�A×B�∩Ej �x� y� ��dx� ��dy� =
∫∫
1A∩Aj �x�1B∩Bj �y� ��dx� ��dy�
= ��A ∩ Aj �
∫
1B∩Bj �y� ��dy�
= ��A ∩ Aj ���B ∩ Bj �
= � � � =
∫∫
1�A×B�∩Ej �x� y� ��dy� ��dx��
2 We use the symbols
∫
� � � d� like brackets, i.e.
∫∫
� � � d� d� = ∫ (∫ � � � d�)d�.
124 R.L. Schilling
where the ellipsis � � � stands for the same calculations run through backwards. In
each step the measurability conditions needed to perform the integrations are
fulfilled because of the product structure.[�] In particular, X × Y� ∅� Ek ∈
j . If
D ∈
j , then 1Dc∩Ej = 1Ej − 1Ej ∩D and
∫∫
1Dc∩Ej �x� y� ��dx� ��dy�
=
∫ (∫
1Ej �x� y� ��dx� −
∫
1Ej ∩D�x� y� ��dx�
)
��dy�
=
∫∫
1Ej �x� y� ��dx� ��dy� −
∫∫
1Ej ∩D�x� y� ��dx� ��dy�
=
∫∫
1Ej �x� y� ��dy� ��dx� −
∫∫
1Ej ∩D�x� y� ��dy� ��dx�
�by definition, since Ej � D ∈
j �
= � � � =
∫∫
1Dc∩Ej �x� y� ��dy� ��dx��
Again, in each step the measurability conditions hold since measurable functions
form a vector space. If �Dk�k∈ ⊂
j are mutually disjoint sets, D = ·
⋃
k∈ Dk, the
linearity of the integral and Beppo Levi’s theorem in the form of C9.9 show that
∫∫
1D∩Ej �x� y� ��dx� ��dy� =
∫ ( �∑
k=1
∫
1Dk∩Ej �x� y� ��dx�
)
��dy�
=
�∑
k=1
∫∫
1Dk∩Ej �x� y� ��dx� ��dy�
=
�∑
k=1
∫∫
1Dk∩Ej �x� y� ��dy� ��dx�
�by definition, since Dk ∈
j �
= � � � =
∫∫
1D∩Ej �x� y� ��dy� ��dx�
and the measurability conditions hold since measurability is preserved under sums
and increasing limits.
The last three calculations show that
j is a Dynkin system containing the
∩-stable family � × �. By Theorem 5.5, � ⊗ � ⊂
j for every j ∈ . Since
Ej ↑ X × Y , Beppo Levi’s theorem 9.6 proves (13.4) along with the measurability
of the functions 1E�•� y�, 1E�x� •�,
∫
1E�•� y� ��dy� and
∫
1E�x� •� ��dx� since �
is stable under pointwise limits.
Measures, Integrals and Martingales 125
Replacing in the above calculations Ej by X × Y finally proves that
E
→
�E� =
∫∫
1E�x� y� ��dx� ��dy�
is indeed a measure on �X × Y� � ⊗ �� with
�A × B� = ��A���B�.
13.6 Definition Let �X� �� �� and �Y� �� �� be �-finite measure spaces. The
unique measure
constructed in Theorem 13.5 is called the product of the
measures � and �, denoted by � × �. �X × Y� �⊗�� � × �� is called the product
measure space.
Returning to the example considered at the beginning we find
13.7 Corollary If n > d � 1,
��n� ���n�� �n� = (�d × �n−d� ���d� ⊗ ���n−d�� �d × �n−d) �
The next step is to see how we can integrate w.r.t. � × �. The following two
results are often stated together as the Fubini or Fubini–Tonelli theorem. We
prefer to distinguish between them since the first result, Theorem 13.8, says that
we can always swap iterated integrals of positive functions (even if we get +�),
whereas 13.9 applies to more general functions but requires the (iterated) integrals
to be finite.
13.8 Theorem (Tonelli) Let �X� �� �� and �Y� �� �� be �-finite measure spaces
and let u X × Y → �0� �� be � ⊗ �-measurable. Then
(i) x
→ u�x� y�, y
→ u�x� y� are �, resp. �-measurable for all y ∈ Y , resp.
x ∈ X;
(ii) x
→
∫
Y
u�x� y� ��dy�, y
→
∫
X
u�x� y� ��dx� are �, resp. �-measurable;
(iii)
∫
X×Y
u d�� × �� =
∫
Y
∫
X
u�x� y� ��dx� ��dy� =
∫
X
∫
Y
u�x� y� ��dy� ��dx�
with values in �0� ��.
Proof Since u is positive and �⊗�-measurable, we find an increasing sequence
of simple functions fj ∈ �+�� ⊗ �� with supj∈ fj = u. Each fj is of the form
fj �x� y� =
∑N�j�
k=0 �k1Ek �x� y�, where �k � 0 and the Ek ∈ � ⊗ �, 0 � k � N�j�,
126 R.L. Schilling
are disjoint. By Theorem 13.5, the fact that ��� ⊗ �� is a vector space and the
linearity of the integral we conclude that
x
→ fj �x� y�� y
→ fj �x� y�� x
→
∫
Y
fj �x� y� ��dy�� y
→
∫
X
fj �x� y� ��dx�
are measurable functions and (i), (ii) follow from the usual Beppo-Levi argument
since ���� and ���� are stable under increasing limits, cf. C8.9. Linearity of
the integral and Theorem 13.5 also show
∫
X×Y
fj d�� × �� =
∫
Y
∫
X
fj d� d� =
∫
X
∫
Y
fj d� d� ∀ j ∈ �
and (iii) follows from several applications of Beppo Levi’s theorem 9.6.
13.9 Corollary (Fubini’s theorem) Let �X� �� �� and �Y� �� �� be �-finite mea-
sure spaces and let u X × Y → �̄ be � ⊗ �-measurable. If at least one of the
following three integrals is finite
∫
X×Y
�u� d�� × ���
∫
Y
∫
X
�u�x� y�� ��dx� ��dy��
∫
X
∫
Y
�u�x� y�� ��dy� ��dx��
then all three integrals are finite, u ∈
1�� × ��, and
(i) x
→ u�x� y� is in
1��� for �-a.e. y ∈ Y ;
(ii) y
→ u�x� y� is in
1��� for �-a.e. x ∈ X;
(iii) y
→
∫
X
u�x� y� ��dx� is in
1���;
(iv) x
→
∫
Y
u�x� y� ��dy� is in
1���;
(v)
∫
X×Y
u d�� × �� =
∫
Y
∫
X
u�x� y� ��dx� ��dy� =
∫
X
∫
Y
u�x� y� ��dy� ��dx�.
Proof Tonelli’s theorem 13.8 shows that in �0� ��
∫
X×Y
�u� d�� × �� =
∫
Y
∫
X
�u� d� d� =
∫
X
∫
Y
�u� d� d� � (13.5)
If one of the integrals is finite, all of them are finite and u ∈
1�� × �� fol-
lows. Again by Tonelli’s theorem, x
→ u±�x� y� is �-measurable and y
→∫
u±�x� y� ��dx� is �-measurable. Since u± � �u�, (13.5) and C10.13 show that
∫
X
u±�x� y� ��dx� �
∫
X
�u�x� y�� ��dx� < � for �-a.e. y ∈ Y
Measures, Integrals and Martingales 127
and ∫
Y
∫
X
u±�x� y� ��dx� ��dy� �
∫
Y
∫
X
�u�x� y�� ��dx� ��dy� < ��
This proves (i) and (iii); (ii) and (iv) are shown in a similar way. Finally, (v)
follows for u+ and u− from Theorem 13.8 and for u = u+ − u− by linearity, since
(i)–(iv) exclude the possibility of ‘� − �’.
More on measurable functions
There is an alternative way to introduce the product �-algebra �⊗�. Recall that
the coordinate projections
�j X1 × X2 → Xj � �x1� x2�
→ xj � j = 1� 2�
induce the �-algebra ���1� �2� on X1 × X2 which is by Definition 7.5 the smallest
�-algebra such that both �1 and �2 are measurable maps.
13.10 Theorem Let �Xj � �j �, j = 1� 2, and �Z� �� be measurable spaces. Then
(i) �1 ⊗ �2 = ���1� �2�;
(ii) T Z → X1 × X2 is �/�1 ⊗ �2-measurable if, and only if, �j � T is �/�j -
measurable �j = 1� 2�;
(iii) if S X1 × X2 → Z is measurable, then S�x1� •� and S�•� x2� are �2/�- resp.
�1/�-measurable for every x1 ∈ X1, resp. x2 ∈ X2.
Proof (i) Since �−11 �x� =
x� × X2, �−12 �y� = X1 ×
y� and A1 × A2 = �A1 × Y � ∩
�X × A2�, we have
���1� �2�
7.5= ���−11 ��1�� �−12 ��2�� = �
({
A1 × X2� X1 × A2 Aj ∈ �j
})
�
which shows that �1 ×�2 ⊂ ���1� �2� ⊂ �1 ⊗�2, hence ���1� �2� = �1 ⊗�2.
(ii) If T Z → X1 × X2 is measurable, then so is �j � T by part (i) and T7.4.
Conversely, if �j � T , j = 1� 2, are measurable we find
T −1�A1 × A2� = T −1
(
�−11 �A1� ∩ �−12 �A2�
)
= T −1 (�−11 �A1�
)∩ T −1 (�−12 �A2�
)
= ��1 � T�−1�A1� ∩ ��2 � T�−1�A2� ∈ ��
Since �1 × �2 generates �1 ⊗ �2, T is measurable by L7.2.
(iii) Fix x1 ∈ X1 and consider y
→ S�x1� y�. Then S�x1� •� = S � ix1 �•�, where
ix1 X2 → X1 × X2, y
→ �x1� y�. By part (ii), ix1 is �2/�1 ⊗�2-measurable since
128 R.L. Schilling
the maps �j � ix1 �x2� = xj are �j /�j -measurable �j = 1� 2�. The claim follows
now from T7.4.
Distribution functions
Let �X� �� �� be a �-finite measure space. For u ∈ �+��� the decreasing,
left-continuous[�] numerical function
� � t
→ ��
u � t��
is called the distribution function of u (under �).
The next theorem shows that Lebesgue integrals still represent the area between
the graph of a function and the abscissa.
13.11 Theorem Let �X� �� �� be a �-finite measure space and u X → �0� ��
be �-measurable. Then∫
u d� =
∫
�0���
� �
u � t�� �1�dt� ∈ �0� ��� (13.6)
Proof Consider the function U�x� t� = �u�x�� t� on X × �0� ��. By Theorem
13.10(ii), U is � ⊗ ��0� ��-measurable, thus
E =
�x� t� u�x� � t� ∈ � ⊗ ��0� ���
An application of Tonelli’s theorem 13.8 shows∫
u�x� ��dx� =
∫∫
1�0�u�x���t� �
1�dt� ��dx�
=
∫∫
X�0���
1E�x� t� �
1�dt� ��dx�
=
∫∫
�0���×X
1E�x� t� ��dx� �
1�dt�
=
∫
�0���
� �
u � t�� �1�dt��
If � �0� �� → �0� �� is continuously differentiable, increasing and ��0� = 0,
we even have in the setting of Theorem 13.11∫
� � u d� =
∫
�0���
� �
��u� � t�� �1�dt�
�∗�=
∫ �
0
� �
��u� � t�� dt
t=��s�=
∫ �
0
�′�s� � �
��u� � ��s��� ds
=
∫ �
0
�′�s� � �
u � s�� ds�
Measures, Integrals and Martingales 129
The problem with this calculation is the step marked �∗� where we equate a
Lebesgue integral with a Riemann integral. By Theorem 11.8(ii) we can do this
if t
→ ��
��u� � t�� is Lebesgue a.e. continuous and bounded. Boundedness is
not a problem since we may consider ��
��u� � t�� ∧ N , N ∈ , and let N → �
using T9.6. For the a.e. continuity we need
13.12 Lemma Every monotone function � � → � has at most countably many
discontinuities and is, in particular, Lebesgue a.e. continuous.
Proof Without loss of generality we may assume that � increases. Therefore,
the one-sided limits lims↑t ��s� = ��t−� � ��t+� = lims↓t ��s� exist in �, so
that � can only have jump discontinuities where ��t−� < ��t+�. Define for all
� > 0
J � =
t ∈ � ���t� = ��t+� − ��t−� � �� �
Since on every compact interval �a� b� and for every � > 0
0 � ��b� − ��a� = ��b� − ��a�
�
� < ��
we can have at most
[��b�−��a�
�
]
jumps of size � or larger in the interval �a� b�,
that is # ��a� b� ∩ J �� < �. Therefore, the set of all discontinuities of �
J =
t ∈ � ���t� > 0� = ⋃
j�k∈
�−j� j� ∩ J 1/k
is a countable set, hence a Lebesgue null set.
Since t
→ ��
��u� � t�� is decreasing, we finally have
13.13 Corollary Let �X� �� �� be �-finite and let � �0� �� → �0� �� with
��0� = 0 be increasing and continuously differentiable. Then
∫
� � u d� =
∫ �
0
�′�s� � �
u � s�� ds (13.7)
holds for all u ∈ �+���; the right-hand side is an improper Riemann integral.
Moreover, � � u ∈
1��� if, and only if, this Riemann integral is finite.
In the important special case where ��t� = tp, p � 1, (13.7) reads
�u�pp =
∫
�u�p d� =
∫ �
0
psp−1 ���u� � s� ds� (13.8)
130 R.L. Schilling
Minkowski’s inequality for integrals
The following inequality is a generalization of Minkowski’s inequality C12.4 to
double integrals. In some sense it is also a theorem on the change of the order of
iterated integrals, but equality is only obtained if p = 1.
13.14 Theorem (Minkowski’s inequality for integrals) Let �X� �� �� and
�Y� �� �� be �-finite measure spaces and u X × Y → �̄ be � ⊗ �-measurable.
Then
(∫
X
(∫
Y
�u�x� y�� ��dy�
)p
��dx�
)1/p
�
∫
Y
(∫
X
�u�x� y��p ��dx�
)1/p
��dy�
holds for all p ∈ �1� ��, with equality for p = 1.
Proof If p = 1, the assertion follows directly from Tonelli’s theorem 13.8. If
p > 1 we set
Uk�x� =
(∫
Y
�u�x� y�� ��dy� ∧ k
)
1Ak �x�
where Ak ∈ � is a sequence with Ak ↑ X and ��Ak� < �. Without loss of
generality we may assume that Uk�x� > 0 on a set of positive �-measure, other-
wise the left-hand side of the above inequality would be 0 (using Beppo Levi’s
theorem 9.6) and there would be nothing to prove. By Tonelli’s theorem and
Hölder’s inequality T12.2 with 1
p
+ 1
q
= 1 or q = p
p−1 , we find
∫
X
U
p
k �x� ��dx� �
∫
X
U
p−1
k �x�
(∫
Y
�u�x� y�� ��dy�
)
��dx�
=
∫
Y
∫
X
U
p−1
k �x� �u�x� y�� ��dx� ��dy�
�
∫
Y
(∫
X
U
p
k �x� ��dx�
)1−1/p (∫
X
�u�x� y��p ��dx�
)1/p
��dy��
The claim follows upon dividing both sides by
(∫
X
U
p
k �x� ��dx�
)1−1/p
and letting
k → � with Beppo Levi’s theorem 9.6.
Problems
13.1. Prove the rules (13.2) for Cartesian products.
13.2. Let �X� �� �� and �Y� �� �� be two �-finite measure spaces. Show that A × N ,
where A ∈ � and N ∈ �, ��N� = 0, is a � × �-null set.
Measures, Integrals and Martingales 131
13.3. Denote by � Lebesgue measure on �0� ��. Prove that the following iterated
integrals exist and that
∫
�0���
∫
�0���
e−xy sin x sin y ��dx���dy� =
∫
�0���
∫
�0���
e−xy sin x sin y ��dy���dx��
Does this imply that the double integral exists?
13.4. Denote by � Lebesgue measure on �0� 1�. Show that the following iterated integrals
exist, but yield different values:
∫
�0�1�
∫
�0�1�
x2 − y2
�x2 + y2�2 ��dx���dy� �=
∫
�0�1�
∫
�0�1�
x2 − y2
�x2 + y2�2 ��dy���dx��
What does this tell about the double integral?
13.5. Denote by � Lebesgue measure on �−1� 1�. Show that the iterated integrals exist,
coincide,
∫
�−1�1�
∫
�−1�1�
xy
�x2 + y2�2 ��dx���dy� =
∫
�−1�1�
∫
�−1�1�
xy
�x2 + y2�2 ��dy���dx�
but that the double integral does not exist.
13.6. (i) Prove that
∫
�0��� e
−tx ��dt� = 1
x
for all x > 0.
(ii) Use (i) and Fubini’s theorem to show that the sine integral
lim
n→�
∫
�0�n�
sin x
x
��dx� = �
2
�
13.7. Let ��A� = #A be the counting measure and � be Lebesgue measure on the
measurable space ��0� 1�� ��0� 1��. Denote by � =
�x� y� ∈ �0� 1�2 x = y� the
diagonal in �0� 1�2. Check that
∫
�0�1�
∫
�0�1�
1��x� y� ��dx���dy� �=
∫
�0�1�
∫
�0�1�
1��x� y� ��dy���dx��
Does this contradict Tonelli’s theorem?
13.8. (i) State Tonelli’s and Fubini’s theorems for spaces of sequences, i.e. for the
measure space � � �� �� �� where � = ∑j∈ �j , and obtain criteria when
one can interchange two infinite summations.
(ii) Using similar considerations as in part (i) deduce the following.
Lemma Let �Aj �j be countably many (i.e. a finite or countably infinite number
of) mutually disjoint sets whose union is , and let �xk�k∈ ⊂ � be a sequence.
Then
∑
k∈
xk =
∑
j
∑
k∈Aj
xk
in the sense that if either side converges absolutely, so does the other, in
which case both sides are equal.
13.9. Let u �2 → �0� �� be a Borel measurable function. Denote by S�u� =
�x� y� 0 �
y � u�x�� the set above the abscissa and below the graph � �u� =
�x� u�x�� x ∈
�� of u.
132 R.L. Schilling
(i) Show that S�u� ∈ ���2�.
(ii) Is it true that �2�S�u�� = ∫ u d�1?
(iii) Show that � �u� ∈ ���2� and that �2�� �u�� = 0.
[Hint: (i) – use T8.8 to approximate u by simple functions fj ↑ u. Thus S�u� =⋃
j S�fj � and S�fj � ∈ ���2� is easy to see; alternatively, use T13.10, set U�x� y� =
�u�x�� y� and observe that S�u� = U −1�C� for the closed set C =
�x� y� x � y�;
(ii) – use Tonelli’s theorem; (iii) – use � �u� ⊂ S�u� \ S��u − ��+� or � �u� =
U −1�
�x� y� x = y��; show first that �2�� �u� ∩ �−n� n�2� = 0 for every n ∈ and
observe that � �u� ∩ �−n� n�2 = � ��u 1�−n�n�� ∧ n�.]
13.10. Let �X� �� �� be a �-finite measure space and let u ∈ �+��� be a �0� ��-valued
measurable function. Show that the set
Y = {y ∈ � ��
x u�x� = y�� �= 0} ⊂ �
is countable.
[Hint: assume that u ∈
1+���. Set Y��� =
y > � ��
u = y�� > �� and observe
that for t1� � � � � tN ∈ Y��� we have N � � �
∑N
j=1 tj ��
u = tj �� �
∫
u d�. Thus Y���
is a finite set, and Y = ⋃k�n∈ Y 1
n
� 1
k
is countable. If u is not integrable, consider
�u ∧ m� 1Am , m ∈ , where Am ↑ X is an exhaustion.]
13.11. Completion (5). Let �X� �� �� and �Y� �� �� be any two measure spaces such that
� �= ��X� and such that � contains non-empty null sets.
(i) Show that � × � on �X × Y� � ⊗ �� is not complete, even if both � and �
were complete.
(ii) Conclude from (i) that neither ��2� ���� ⊗ ����� � × �� nor the product of
the completed spaces ��2� �∗��� ⊗ �∗���� �̄ × �̄� are complete.
[Hint: you may assume in (ii) that ����� �∗��� �= ����.]
13.12. Let � be a bounded measure on the measure space ��0� ��� ��0� ���.
(i) Show that A ∈ ��0� �� ⊗ �� � if, and only if, A = ⋃j∈ Bj ×
j�, where
�Bj �j∈ ⊂ ��0� ��.
(ii) Show that there exists a unique measure � on ��0� �� ⊗ �� � satisfying
��B ×
n�� =
∫
B
e−t
tn
n! ��dt��
13.13. Stieltjes measure (2). Stieltjes integrals. This continues Problem 7.9. Let �
and � be two measures on ��� ����� such that ���−n� n��� ���−n� n�� < � for
all n ∈ , and denote by
F�x� =
⎧
⎪⎪⎨
⎪⎪⎩
���0� x��� if x > 0
0� if x = 0
−���x� 0��� if x < 0
and G�x� =
⎧
⎪⎪⎨
⎪⎪⎩
���0� x��� if x > 0
0� if x = 0
−���x� 0��� if x < 0
the associated right-continuous distribution functions (in Problem 7.9 we consid-
ered left-continuous distribution functions). Moreover, set �F�x� = F�x� − F�x−�
and �G�x� = G�x� − G�x−�.
Measures, Integrals and Martingales 133
(i) Show that F� G are increasing, right-continuous and that �F�x� = 0 if, and
only if, ��
x�� = 0. Moreover, F and � are in one-to-one correspondence.
(ii) Since measures and distribution functions are in one-to-one correspondence,
it is customary to write
∫
u d� = ∫ u dF , etc.
If a < b we set B =
�x� y� a < x � b� x � y � b�. Show that B is measurable
and that
� × ��B� =
∫
�a�b�
F�s� dG�s� − F�a��G�b� − G�a���
(iii) Integration by parts. Show that
F�b�G�b� − F�a�G�a�
=
∫
�a�b�
F�s� dG�s� +
∫
�a�b�
G�s−� dF�s�
=
∫
�a�b�
F�s−� dG�s� +
∫
�a�b�
G�s−� dF�s� + ∑
a 0 some �� ∈ Cc��n�
such that
u − ��
� �.
By the lower triangle inequality for
•
p and the translation invariance of �n
we find for any two x� x′ ∈ �n
∣∣
u�x + •� − u
p −
u�x′ + •� − u
p
∣∣ �
u�x + •� − u�x′ + •�
p
(14.5)=
u�x − x′ + •� − u
p�
Using again the triangle inequality and translation invariance we get for every
R > 0 and all x� x′ with �x − x′� < R/2
u�x−x′ + •� − u
p
�
∥∥�u�x − x′ + •� − u� 1BR�0�
∥∥
p
+
∥∥�u�x − x′ + •� − u� 1BcR�0�
∥∥
p
[�]
�
∥∥�u�x − x′ + •� − u� 1BR�0�
∥∥
p
+ 2
(∫
BcR/2�0�
�u�p d�n
)1/p
�
Since u ∈ �p��n�, it follows from the monotone convergence theorem 11.1 that
limR→�
∫
BcR�0�
�u�p d�n = 0, so that we can achieve
(∫
BcR/2�0�
�u�p d�n
)1/p
� � ∀ R > R��
Since �� is continuous with compact support, it is uniformly continuous, which
means that there is a � = ���R > 0 such that for all y ∈ �n, �x� < �, and any fixed
R > R� we have ����x + y� − ���y�� � �/�n�BR�0��1/p.
140 R.L. Schilling
Another application of the triangle inequality for
•
p and translation invariance
yields
∥∥�u�x−x′ + •� − u� 1BR�0�
∥∥
p
�
∥∥�u�x − x′ + •� − ���x − x′ + •�� 1BR�0�
∥∥
p
+
∥∥�u − ��� 1BR�0�
∥∥
p
+
∥∥����x − x′ + •� − ��� 1BR�0�
∥∥
p
� 2
u − ��
p +
(∫
BR�0�
∣∣���x − x′ + y� − ���y�
∣∣p
︸ ︷︷ ︸
��p/�n�BR�0�� if �x−x′�<�
�n�dy�
)1/p
� 3��
Combining the above estimates, we get
u�x + •� − u�x′ + •�
p � 5� ∀ x� x′ � �x − x′� < �
(
< R/2
)
�
which is but uniform continuity.
(ii) We have for any x� x′ ∈ �n
∣∣u � v�x� − u � v�x′�
∣∣ �
∫ ∣∣v�y�u�x − y� − v�y�u�x′ − y�
∣∣ �n�dy�
�
v
�
u�x − •� − u�x′ − •�
1
(14.4)=
v
�
u�x + •� − u�x′ + •�
1�
and continuity follows from part (i). The boundedness of u � v is proved with
a similar calculation.
Problems
14.1. Let �X� �� �� be a measure space and T � X → X be a bijective measurable map
whose inverse T −1 � X → X is again measurable. Show that for every f ∈ �+���
one has∫
u d�T�f ��� =
∫
u � T f d� =
∫
u f � T −1 dT��� =
∫
u d
(
f � T −1 T���)�
14.2. Let �X� �� �� be a measure space and �Y� �� be a measurable space. Assume that
T � A → B, A ∈ �, B ∈ �, is an invertible measurable map. Show that
T����B = T���A�
with the restrictions ��A�•� �= ��A ∩ •� and T����B �= T����B ∩ •�.
14.3. Let � be a measure on ��n� ���n�� and x� y� z ∈ �n. Find �x � �y and �z � �.
Measures, Integrals and Martingales 141
14.4. Let �� � be two -finite measures on ��n� ���n��. Show that � � � has no atoms
(cf. Problem 6.5) if � has no atoms.
14.5. Let P be a probability measure on ��n� ���n�� and denote by P�n the n-fold
convolution product P � P � · · · � P. Show that
∫
��� P�n�d�� � n
∫
��� P�d���
if
∫ ��� P�d�� < �, then
∫
� P�n�d�� = n
∫
� P�d���
14.6. Let p � � → � be a polynomial and u ∈ Cc���. Show that u � p exists and is again
a polynomial.
14.7. Let w � � → � be an increasing (hence, measurable, by Problem 8.18) and bounded
function. Show that for every u ∈ �1��1� the convolution u � w is again increasing,
bounded and continuous.
14.8. Assume that u ∈ Cc��� and w ∈ C����. Show that u � w exists, is of class C�
and satisfies �j �u � w� = u � ��j w�.
14.9. Young’s inequality. Adapt the proof of Theorem 14.6 and show that
u � w
r �
u
p ·
w
q
for all p� q� r ∈ �1� ��, u ∈ �p, w ∈ �q and r−1 + 1 = p−1 + q−1.
14.10. Friedrichs mollifiers. Let � � �n → �+ be a C�-function such that ∫ � d�n = 1
and supp � = B1�0�. For � > 0 define the function ���x� �= �−n ��x/��. The
function �� � u is called the Friedrichs mollifier of u ∈ �p, 1 � p < �.
(i) Show that ��x� �= � exp�1/��x�2 − 1�� 1B1 �0��x� has, for a suitable � > 0, the
properties mentioned above. Determine �.
(ii) Show that �� ∈ C�c ��n�, supp �� = B��0�, and
��
1 = 1.
(iii) Show that supp u � �� ⊂ supp u + supp �� = �y � ∀ x ∈ supp u � �x − y� � ��.
(iv) Show that �� � u is in C
� ∩ �p and
�� � u
p �
u
p ∀ � > 0�
(v) Show that Lp-lim�→0 �� � u = u.
[Hint: split the region of integration as in the proof Theorem 14.8 and use the
uniform boundedness shown in (iv).]
14.11. Define � � � → � by ��x� �= �1 − cos x� 1�0�2���x� and let u�x� �= 1, v�x� �= �′�x�,
and w�x� �= ∫
�−��x� ��t� dt. Then
(i) u � v�x� = 0 for all x ∈ �;
(ii) u � w�x� = � � ��x� > 0 for all x ∈ �0� 4��;
(iii) �u � v� � w ≡ 0 �= u � �v � w�.
Does this contradict the commutativity of the convolution which was used in
Theorem 14.6?
15
Integrals of images and Jacobi’s transformation rule1
The previous chapter dealt with image measures and, by their definition, with
measures of pre-images of sets. Sometimes one needs to know the measure of the
direct image of a set under T � X → Y . If T −1 exists and is measurable, we can
apply the results of Chapter 14 to S �= T −1 and we are done. If, however, T −1 is
not measurable, the direct image T�A� of a set A ∈ � need not be �′-measurable;
in particular, an expression of the type �′�T�A�� – here �′ is any measure on
�X′� �′� – may not be well-defined, let alone a measure. Let us consider this
problem in a very particular setting, where
X ⊂ �n� X′ ⊂ �d and �� �′ are Lebesgue measures �n� resp. �d�
We need some notation: if � X → �d, we write = � 1� 2� � � � � d�
for its components and we set for vectors x = �x1� � � � � xn� ∈ �n and matrices
A = �ajk�j=1�����n
k=1�����d
�x�� �= max
1�j�n
�xj�� �A�� �= max
1�j�n
1�k�d
�ajk�� (15.1)
A set F ⊂ �n [G ⊂ �n] is called an F
-set [G�-set] if it is the countable union
of closed sets [countable intersection of open sets], i.e. if
F = ⋃
�∈�
C�
[
G = ⋂
�∈�
U�
]
(15.2)
for closed sets C� [open sets U�]. Obviously, both F
– and G�-sets are Borel
sets; but, in general, neither are F
-sets closed nor are G�-sets open.
1 The proofs in this chapter can be left out at first reading.
142
Measures, Integrals and Martingales 143
15.1 Theorem Let F ⊂ �n be an F
-set and � F → �d be an
-Hölder
continuous map, that is
� �x� − �y��� � L �x − y�
� ∀ x� y ∈ F (15.3)
with constant L and exponent
∈ �0� 1�. For every F
-set E ⊂ �n, �F ∩ E� is
an F
-set in �
d, hence Borel measurable. If d �
n, we have
�d� �F ∩ E�� � Ld �n�E�� (15.4)
Proof Since E� F are F
-sets, they have representations E =
⋃
j∈� �j and
F = ⋃j∈� Cj with closed sets �j � Cj ⊂ �n. Moreover,
E = ⋃
k∈�
E ∩ Bk�0� =
⋃
k�j∈�
�j ∩ Bk�0� =�
⋃
�∈�
K��
where �K���∈� is an enumeration of the family ��j ∩ Bk�0��j�k∈� of closed and
bounded, hence compact, sets. Thus Cj ∩ K� is a compact set, and since images
of compact sets under continuous maps are compact, we see that �Cj ∩ K�� is
compact and, in particular, closed. So,
�F ∩ E� (2.4)= ⋃
j��∈�
�Cj ∩ K��
is an F
-set.
Assume now that d �
n. If �n�E� = �, (15.4) is trivial and we will consider
only the case �n�E� < �. The proof of Carathéodory’s extension theorem 6.1 –
in particular (6.1) – for � = �n and the semi-ring � = � n of n-dimensional
half-open rectangles (cf. P6.4) shows that we can find for every � > 0 a sequence
�J �j �j∈� ⊂ � n with
E ⊂ ⋃
j∈�
J �j and
∑
j∈�
�n�J �j � � �
n�E� + �� (15.5)
Without loss of generality we can assume that all J �j are squares, i.e. have sides
of equal length s�j < s < 1, otherwise we could subdivide each J
�
j into finitely
many non-overlapping squares of this type.[�] So,
�d� �F ∩ E�� � �d
( ⋃
j∈�
�F ∩ J �j �
) 4.6
�
∑
j∈�
�d
(
�F ∩ J �j �
)
� (15.6)
which means that it is enough to check (15.4) for a square E = J of side-length
s < 1 and centre c ∈ �n. Because of (15.3),
�J� =
(
n×
k=1
[
ck − 12 s� ck + 12 s
)) ⊂
d×
k=1
[
k�c� − L2 s1/
� k�c� + L2 s1/
)
144 R.L. Schilling
and (notice that �n�J� � 1 and d/
n � 1)
�d� �F ∩ J�� � �Ls1/
�d = Ld (�n�J�)d/�
n� � Ld �n�J��
From (15.5), (15.6) we conclude
�d� �F ∩ E�� � ∑
j∈�
Ld �n�J �j � � L
d
(
�n�E� + �)
and the claim follows upon letting � → 0.
Theorem 15.1 can be improved if we use the completed Borel-
-algebra
�∗��n�, cf. Problems 4.13, 6.2, 10.11 and 10.12. Recall that
B∗ ∈ �∗��n� ⇐⇒ B
∗ = B ∪ N for some B ∈ ���n�
and a subset N of a Borel null set.
The advantage of �∗��n� over ���n� is that
-Hölder continuous maps �
�n → �d map �∗��n�-measurable sets into �∗��d�-sets if d �
n; this is not
true for ���n�. To see this we need a few preparations.
15.2 Lemma Let B ∈ ���n� be a Borel set. Then there exists an F
-set F and
a G�-set G such that
F ⊂ B ⊂ G and �n�F � = �n�B� = �n�G��
Proof The proof consists of three stages:
Step 1: Construction of the set G. If �n�B� = �, we take G = �n. If
�n�B� < �, we find as in the proof of Theorem 15.1 (or as in Carathéodory’s
extension theorem 6.1, (6.1), with � = �n and � = � n) for every k ∈ � a sequence
of half-open squares �J kj �j∈� ⊂ � n of side-length sj such that
B ⊂ ⋃
j∈�
J kj and
∑
j∈�
�n�J kj � � �
n�B� + 1
k
�
We can now enlarge J kj by moving the lower left corner by
�j �= �snj + 2−j /k�1/n − sj units ‘to the left’ in each coordinate
direction. The new open square J̃ kj has volume
�n�J̃ kj � = �n�J kj � +
1
k
2−j �
sj
sj
Jj
k Jj
k
εj
j
~
∋
Measures, Integrals and Martingales 145
and we see that the open sets Gk �= ⋃j∈� J̃ kj ⊃ B satisfy
�n�Gk�
4.6
�
∑
j∈�
�n�J̃ kj � =
∑
j∈�
�n�J kj � +
1
k
�
(
�n�B� + 1
k
)
+ 1
k
�
Thus G �= ⋂k∈� Gk is a G�-set with G ⊃ B, and
�n�B� � �n�G�
4.4= lim
k→�
�n�Gk� � lim
k→�
(
�n�B� + 2
k
)
= �n�B��
Step 2: Construction of the set F if �n�B� < �. Denote by B̄ the closure2 of
B. Since B̄ \ B is a Borel set, we find as in step 1 open sets U k with
B̄ \ B ⊂ U k and �n�U k� � �n�B̄ \ B� + 1
k
� (15.7)
Observe that
B ⊂ (B \ U k)∪(U k ∩ B) ⊂ (B̄ \ U k)∪(U k \(B̄ \ B))�
so that by the subadditivity of measures
�n�B� � �n
(
B̄ \ U k)+ �n(U k \ �B̄ \ B�)
= �n�B̄ \ U k� + �n�U k� − �n�B̄ \ B�
(15.7)
� �n�B̄ \ U k� + 1
k
�
By construction, Ck �= B̄ \ U k ⊂ B̄ \ �B̄ \ B� = B is a closed set and F �=
⋃
k∈� Ck
⊂ B is an F
-set satisfying
�n�B� − 1
k
� �n�Ck� � �
n
( ⋃
j∈�
Cj
)
= �n�F � � �n�B��
The claim follows as k → �.
Step 3: Construction of the set F if �n�B� = �. Setting
Bj �= B ∩
(
Bj �0� \ Bj−1�0�
)
� j ∈ ��
we get a disjoint partitioning of B = ·⋃
j∈�Bj where each set Bj is a Borel set
with finite volume. Applying step 2 to each Bj , we find F
-sets Fj ⊂ Bj with
2 i.e. the smallest closed set containing B, cf. Appendix B, Definition B.3(iii).
146 R.L. Schilling
�n�Fj � = �n�Bj �, j ∈ �. Since the Bj are mutually disjoint, so are the Fj , and
since F �= ⋃j∈� Fj is again an F
-set (cf. Problem 15.1) we end up with F ⊂ B and
�n�F � = ∑
j∈�
�n�Fj � =
∑
j∈�
�n�Bj � = �n�B��
The proof of the lemma is now complete.
15.3 Lemma Let � �n → �d be an
-Hölder continuous map with
∈ �0� 1�
and d �
n. If N ∗ is a subset of a Borel null set N ∈ ���n�, then �N ∗� is a
subset of a Borel null set M ∈ ���d�.
Proof Since N ∗ ⊂ N ∈ ���n� where �n�N � = 0, we can repeat the argument of
the proof of Theorem 15.1 to find for k ∈ � a covering of N by half-open squares
J kj ∈ � n such that
N ⊂ ⋃
j∈�
J kj and �
n
( ⋃
j∈�
J kj
)
�
∑
j∈�
�n�J kj � �
1
k
�
Since �n�J kj � = �n�J̄ kj �, J̄ kj is the closed square, we have also
�n
( ⋃
j∈�
J̄ kj
)
�
∑
j∈�
�n�J̄ kj � �
1
k
�
Applying T15.1 to the F
-set F
k �= ⋃j∈� J̄ kj shows that
⋂
k∈� �F k� ∈ ���n� as
well as
�d� �F k�� � Ld �n�F k� �
Ld
k
�
Since
⋂
k∈� �F k� ⊃ �N� ⊃ �N ∗�, we conclude
�d
( ⋂
�∈�
�F ��
)
� �d� �F k�� �
Ld
k
k→�−−−→ 0�
Lemma 15.3 is just a special case of the following theorem which has already
been announced above.
15.4 Theorem Let F ⊂ �n be an F
-set, � F → �d be an
-Hölder continuous
map with exponent
∈ �0� 1�. If d �
n, then maps the completed Borel
-algebra F ∩ �∗��n� into �∗��d�, and the inequality (15.4) holds for all
B ∈ �∗��n� with the completed Lebesgue measures3 �̄n and �̄d.
3 See Problems 4.13, 6.2, 10.11, 10.12, 13.3 for the completion of measures and their properties.
Measures, Integrals and Martingales 147
Proof Pick B∗ ∈ �∗��n� and write B∗ = B ∪ N ∗ where B ∈ ���n� and N ∗ is a
subset of a Borel null set N ∈ ���n�. According to L15.2 we have B∗ = E ∪ M ∗ ∪
N ∗ =� E ∪ N ∗∗ where E is an F
-set, �n�E� = �n�B�, and M ∗� N ∗∗ �= N ∗ ∪ M ∗
are subsets of Borel null sets. Thus
�B∗� = �E ∪ N ∗∗� = �E� ∪ �N ∗∗�
and �E� is an F
-set, see T15.1, and �N
∗∗� is contained in a Borel null
set ⊂ �d, see L15.3, hence �B∗� ∈ �∗��d�. Finally, by T15.1,
�̄d
(
�F ∩ B∗�) = �̄d( �F ∩ �E ∪ N ∗∗��)
� �̄d
(
�F ∩ E� ∪ �N ∗∗�)
= �d( �F ∩ E�)
15.1
� Ld �n�E� = Ld �n�B� = Ld �̄n�B∗��
Let us stress that both Hölder continuity of and the condition d �
n are crucial
for Theorem 15.4; one can find counterexamples if we have only ∈ C�F� �d�
or d <
n.
Jacobi’s transformation formula
One of the most interesting situations arises if = � 1� � � � � n� � �nx → �ny
(we write �nx if we want to indicate the generic variable in order to distinguish
between the domain and range of ) is a C1-map with everywhere defined inverse
−1 � �ny → �nx which is again a C1-map. Such maps are called C1��n� �n�-
diffeomorphisms. As usual, we write D �x� �=
(
�
�xj
k�x�
)
j�k=1�����n
for the
Jacobian at the point x ∈ �nx . By Taylor’s theorem we find for all x� x′ ∈ K
from a compact set K ⊂ �nx
� k�x� − k�x′�� �
n∑
j=1
∣∣∣ �
�xj
k���
∣∣∣· �xj − x′j�
� n sup
�∈K
�D ����� · �x − x′�� �
(15.8)
i.e. is locally Lipschitz (1-Hölder) continuous with Lipschitz constant
L = LK = n sup�∈K �D �����.
148 R.L. Schilling
15.5 Theorem (Jacobi’s transformation theorem) Let � �nx → �ny be a
C1-diffeomorphism. Then
�n� �B�� =
∫
B
� det D �x�� �n�dx� (15.9)
holds for all Borel sets B ∈ ���nx�.
The proof of Theorem 15.5 is based on two auxiliary results.
15.6 Lemma Let � and � be two measures on the measurable space �X� �� and
let � be a semi-ring such that
��� = �. If �
∣∣
�
� �
∣∣
�
4 and if there is a sequence
�Sj �j∈� ⊂ � with Sj ↑ X, then � � �.
Proof It is clear from the properties of � and � that � �= � − � � � → �0� �� is
a pre-measure. By T6.1, � has a unique extension to a measure �̃ on � and
��S� = �� + ���S� = �̃ + ��S� ∀ S ∈ �
where �̃ + � is the unique extension of the pre-measure �� + ��
∣∣
�
to a measure
on �. But the measures �̃ + � and � satisfy
��S� = �̃�S� + ��S� = ��S� + ��S� = �̃ + ��S� ∀ S ∈ �
and we conclude from the uniqueness of the extensions that � = �̃ + � on �,
i.e. ��A� − ��A� = �̃�A� � 0 for all A ∈ �.
Caution: Lemma 15.6 fails if � is not a semi-ring; see Problem 15.4.
15.7 Lemma For every C1-diffeomorphism � �nx → �ny we have
�n� �J�� �
∫
J
� det D �x�� �n�dx� ∀ J ∈ � ��nx��
Proof Let J = ��a� b��, a� b ∈ �nx , and note that J̄ = ��a� b�� is a compact set. Since
D� −1� is continuous, we find on the compact set �J̄ �
L �= sup
x∈J
��D �x��−1�� � sup
y∈ � ¯J�
�D� −1��y���� (15.10)
where we used the inverse function theorem.[�] Since D is uniformly continuous
on J̄ , we find for a given � > 0 some � > 0 such that
sup
x�x′∈��a�b��
�x−x′����
�D �x� − D �x′��� �
�
L
� (15.11)
4 This is short for ��S� � ��S� for all S ∈ �.
Measures, Integrals and Martingales 149
Partition J into N disjoint half-open squares J1� � � � � JN ∈ � ��nx� of the same
side-length < �. Since D and det D are continuous functions[�], we can find
for each � = 1� 2� � � � � N a point
x� ∈ J̄� such that � det D �x��� = inf
x∈J̄�
� det D �x���
Set T� �= �D ��x�� ∈ �n×n and observe that
D�T −1� � ��x� = T −1� � �D ��x� = idn +T −1� � �D �x� − D �x���
(idn is the identity matrix in �
n×n). The estimates (15.10), (15.11) show that
sup
x∈J̄�
�D�T −1� � ��x��� � 1 + L
�
L
= 1 + � ∀ 1 � � � N�
i.e. T −1� � is Lipschitz (1-Hölder) continuous with constant 1 + �, see (15.8).
Therefore, the special transformation rule T6.10 for Lebesgue measure and T15.1
show
�n
(
�J��
) = �n(T� � T −1� � �J��
)
= � det T�� · �n
(
T −1� � �J��
)
� � det T�� �1 + ��n �n�J���
Since J = ·⋃N
�=1J� and � det T�� � � det D �x�� for all x ∈ J�, we get
�n� �J �� �
N∑
�=1
�n� �J��� � �1 + ��n
N∑
�=1
� det T�� �n�J��
� �1 + ��n
N∑
�=1
∫
J�
� det D �x�� �n�dx�
= �1 + ��n
∫
J
� det D �x�� �n�dx��
and the proof is finished by letting � → 0.
We can finally proceed to the proof of Theorem 15.5.
Proof (of Theorem 15.5) Set � �= −1. Since � is continuous, � �= �d � =
���d� is a measure on ���nx�, compare T7.6 and D7.7. The determinant det D
is also continuous, thus ��A� �= ∫
A
� det D �x�� �n�dx� defines a measure on
���nx�, see L10.8. From Lemma 15.7 we know that ��J � � ��J � < � for all
150 R.L. Schilling
rectangles J ∈ � ��nx�, and Lemma 15.6 shows that � � � holds on the whole of
���nx�, i.e.
�n� �X�� �
∫
X
� det D �x�� �n�dx� ∀ X ∈ ���nx�� (15.12)
This proves ‘�’ of (15.9). For the other direction our strategy is to apply Lemma
15.7 to the inverse function � = −1. If X = −1�Y �, Y ∈ ���ny �, (15.12) becomes
∫
1Y �y� �
n�dy� = �n�Y � �
∫
1 −1�Y ��x� � det D �x�� �n�dx�
=
∫
1Y � �x�� � det D �x�� �n�dx��
and with exactly the same arguments which we used to prove Theorem 14.1, this
inequality is easily extended from indicator functions to all u ∈ �+����ny ��:
∫
�ny
u�y� �n�dy� �
∫
�nx
u� �x�� � det D �x�� �n�dx�� (15.13)
Switching in (15.13) the rôles of �nx ↔ �ny , x ↔ y and considering the
C1-diffeomorphism � � �ny → �nx (instead of ) and the measurable[�] func-
tion u�x� �= 1 �A� � �x� � det D �x�� for some A ∈ ���nx� yields
∫
�nx
1 �A� � �x� � det D �x�� �n�dx�
�
∫
�ny
(
1 �A� � � det D �
)� −1�y� � det D� −1��y�� �n�dy�
=
∫
�ny
1 �A��y� · � det�D � � −1�y�� · � det D� −1��y�� �n�dy�
=
∫
�ny
1 �A��y� ·
∣∣det [ �D � � −1�y� · D� −1��y�︸ ︷︷ ︸
idn=D�idn�=D� � −1�=�D �� −1·D� −1�
]∣∣�n�dy�
=
∫
�ny
1 �A��y� �
n�dy� = �n� �A���
This proves that for all A ∈ ���nx�
∫
�nx
1A�x� � det D �x�� �n�dx� =
∫
�nx
1 �A� � �x� � det D �x�� �n�dx�
� �n� �A���
and, together with the converse inequality (15.12), the theorem follows.
Measures, Integrals and Martingales 151
If X ⊂ �nx� Y ⊂ �ny are open sets and � X → Y is a C1-diffeomorphism, we
still can apply Theorem 15.5 to A = −1�B�, A ∈ X ∩ ���nx�, B ∈ Y ∩ ���ny �
to get
�n
∣∣
Y
= ���
∣∣
Y
= ��
∣∣
X
�� ��•� �=
∫
• ∩X
� det D �x�� �n�dx�� (15.14)
i.e. Theorem 14.1 yields the following important result.
15.8 Corollary (General transformation theorem) Let X� Y ⊂ �n be open sets
and � X → Y be a C1-diffeomorphism. A function u � Y → �̄ is integrable w.r.t.
�n if, and only if, the function u � · � det D � � X → �̄ is integrable w.r.t. �n. In
this case ∫
Y
u�y� �n�dy� =
∫
X
u� �x�� � det D �x�� �n�dx�� (15.15)
For many applications we need a somewhat reinforced version of C15.8 since
is often only almost everywhere a diffeomorphism. The following simple
generalization takes care of that. Recall that �̄n is the completed Lebesgue
measure, cf. Problems 4.13, 6.2, 10.11, 10.12, 13.11.
15.9 Corollary Let � X → �ny be a C1-map on a measurable set X ∈ �∗��nx�
whose open interior is denoted by X�. If X \ X� is a �̄n-null set5 and
∣∣
X� is a
C1-diffeomorphism onto �X��, then
∫
�X�
u�y� �̄n�dy� =
∫
X
u � �x� � det D �x�� �̄n�dx� (15.16)
holds for all �∗-measurable positive functions u � �X� → �0� ��. Moreover,
u � �X� → � is �̄n integrable if, and only if, u � · � det D � � X → �̄ is �̄n
integrable; in this case (15.16) remains valid.
Proof The argument proving C15.8 remains literally valid for �̄n, i.e. the difficulty
of C15.9 is not the completion of the measure but the fact that is only almost
everywhere a diffeomorphism.
Since �̄n�X \ X�� = 0, we get �X� \ �X�� ⊂ �X \ X��, cf. Chapter 2,
which is again a �̄n-null set by Lemma 15.3. In view of C10.10 we can alter
1-functions on null sets, which means that the equality
∫
�X��
u d�̄n =
∫ (
1 �X�� · u
)� · � det D � d�̄n
from C15.8 immediately implies (15.16).
5 i.e. a subset of a Borel null set.
152 R.L. Schilling
15.10 Remark Formulae (15.9) and (15.15) have the following interesting inter-
pretation in connection with the Radon–Nikodým theorem 19.2 and Lebesgue’s
differentiation theorem for measures T19.20, in particular C19.21:
d�n�
d�n
�x� = � det D �x�� = lim
r→0
�n� �Br �x���
�n�Br �x��
�
Spherical coordinates and the volume of the unit ball
Some of the most interesting applications of Corollaries 15.8 and 15.9 are coor-
dinate changes.
15.11 Example (Planar polar coordinates) Consider the map
P � �0� �� × �0� 2�� → �2 \ �0� �� × �0�� P�r� �� �= �r cos �� r sin ��
which introduces polar coordinates �r� �� in �2. It is not hard to see that P
is bijective and even a C1-diffeomorphism. The determinant of the Jacobian is
given by
det
(
�P�r� ��
��r� ��
)
=
∣∣∣∣∣
(
cos � −r sin �
sin � r cos �
)∣∣∣∣∣ = r cos
2 � + r sin2 � = r�
Since �0� �� × �0� is a �2-null set, we can apply Corollary 15.8 (or 15.9) and find
for every u � �2 → �, u ∈ 1��2� �2�
∫
�2
u�x� y� d�2�x� y� =
∫
�0����0�2��
r u�r cos �� r sin �� d�2�r� ��
=
∫
�0���
∫
�0�2��
r u�r cos �� r sin �� d�1��� d�1�r��
where we used Fubini’s theorem 13.9 for the last equality. This shows, in
particular, that
u ∈ 1��2� ⇐⇒ �r� �� �→ r u�r cos �� r sin �� ∈ 1(�0� �� × �0� 2��)�
A simple but quite interesting application of planar polar coordinates is the
following formula which plays a central rôle in probability theory: this is where
the norming factor 1√
2�
for the Gaussian distribution comes from.
Measures, Integrals and Martingales 153
15.12 Example We have
∫
�
e−x
2
d�1�x� = √� � (15.17)
Proof: We use the following trick: by Tonelli’s theorem 13.8
(∫
�
e−x
2
d�1�x�
)2
=
∫
�
∫
�
e−x
2
e−y
2
d�1�x� d�1�y�
=
∫
�2
e−�x
2+y2� d�2�x� y�
=
∫
�0���
∫
�0�2��
r e−r
2
d�1�r� d�1����
Since re−r
2
is positive and improperly Riemann integrable[�], we know that
Lebesgue and Riemann integrals coincide (cf. 11.8, 11.18), and therefore
(∫
�
e−x
2
d�1�x�
)2
= �1�0� 2��
∫ �
0
r e−r
2
dr = 2�[− 12 e−r
2]�
0
= ��
Polar coordinates also exist in higher
dimensions but, unfortunately, the for-
mulae become quite messy. The idea
here is that we parametrize �n by the
radius r ∈ �0� ��, and n − 1 angles � ∈
�0� 2�� and � ∈ �−�/2� �/2�n−2, so that
x = P�r� �� ��. The Jacobian is now of
the form rn−1J��� �� and, if we denote
by v = u � P the function u expressed in
polar coordinates, the transformation for-
mula gives
θ
ω
∫
�n
u d�n =
∫∫∫
�0���×�0�2��×
×�−�/2��/2�n−2
rn−1 v�r� �� �� � det J��� ��� d�1�r� d�1��� d�n−2����
We will not give further details but settle for the slightly simpler case of
spherical coordinates which will lead to a similar formula. Let Sn−1 �= �x ∈ �n �
�x�2 = 1� be the unit sphere of �n (�x�2 = x21 + · · · + x2n is the Euclidean norm)
and set
� �n \ �0� → �0� �� × Sn−1� x �→ ��x�� ��x��
where ��x� �= x/�x� ∈ Sn−1 is the directional unit vector for x. Obviously, is
bijective, differentiable and has a differentiable inverse −1�r� s� = r · s.
154 R.L. Schilling
15.13 Theorem On ��Sn−1� = Sn−1 ∩���n� there exists a measure
n−1 which
is invariant under rotations and satisfies
∫
�n
u d�n =
∫∫
�0���×Sn−1
rn−1 u�rs� �1�dr�
n−1�ds� (15.18)
for all u ∈ 1��n�. In other words, ��n� = �×
n−1 where ��dr� = rn−1 1�0����r�
�1�dr�; in particular
u ∈ 1��n� �n� ⇐⇒ rn−1 u�rs� ∈ 1(�0� �� × Sn−1� �1 ×
n−1)�
Proof We define
n−1 by
n−1�A� �= n �n(�−1�A� ∩ B1�0�
) ∀ A ∈ ��Sn−1�
which is an image measure, hence a measure, cf. T7.6. Since �−1 and �n are in-
variant w.r.t. rotations around the origin, see T7.9, it is obvious that
n−1 inherits
this property, too.
Both and −1 are continuous, hence measurable. Therefore,
−1
(
���� ⊗ ��Sn−1�) ⊂ ���n� and (���n�) ⊂ ���� ⊗ ��Sn−1��
which shows that ���n� = −1����� ⊗ ��Sn−1��. To see (15.18), fix A ∈
��Sn−1� and consider first the set B �= �x ∈ �n � �x� ∈ �a� b�� ��x� ∈ A� =
�−1�A� ∩ �x � a � �x� < b�, which is clearly a Borel set of �n. Thus
�n�B� = �n(�−1�A� ∩ �x � a � �x� < b�)
= �n(�−1�A� ∩ Bb�0�
)− �n(�−1�A� ∩ Ba�0�
)
= bn �n(�−1�A� ∩ B1�0�
)− an �n(�−1�A� ∩ B1�0�
)
= �bn − an� �n(�−1�A� ∩ B1�0�
)
�
where we used that �n�a · B� = an �n�B�, cf. T7.10 or Problems 5.8, 7.7, and that
�−1 is invariant under dilations. This shows
�n�B� = 1
n
�bn − an�
n−1�A� =
∫
�a�b�
rn−1
n−1�A� �1�dr�
= � ×
n−1(�a� b� × A)�
Measures, Integrals and Martingales 155
Since the family ��a� b� × A � a < b� A ∈ ��Sn−1�� generates ���� ⊗ ��Sn−1�,
see Lemma 13.3, and satisfies the conditions of the uniqueness theorem 5.7,
the above relation extends to all sets B′ ∈ ���� ⊗ ��Sn−1�. Since ���n� =
−1����� ⊗ ��Sn−1��, we have B′ = �B� for some B ∈ ���n�, so that
�n�B� = �n� −1 � �B�� = �n� −1�B′�� = � ×
n−1�B′��
All other assertions follow now from Theorem 14.1 on image integrals and
Fubini’s theorem 13.9.
Let us note the particularly interesting case where u�x� = f��x�� is rotationally
invariant.
15.14 Corollary If u�x� = f��x�� is a rotationally invariant function, then u ∈
1��n� �n� if, and only if, r �→ rn−1f�r� ∈ 1��0� ��� �1�. In this case
∫
�n
f��x�� �n�dx� = n �n
∫
�0���
rn−1 f�r� �1�dr��
where �n = �n�B1�0�� denotes the volume of the unit ball in �n. In particular,
we get for the functions f
�x� �= �x�
,
∈ �,
f
∈ 1�B1�0� \ �0�� ⇐⇒
> −n�
f
∈ 1��n \ B1�0�� ⇐⇒
< −n�
Proof The integral formula follows from (15.18) where the constant n �n �=
n−1�Sn−1�.6 That �n must be the volume of B1�0� is immediately clear
if we choose u�x� = 1B1�0��x�. The integrability of f
follows now from
Example 11.12.
Let us finally determine �n, the volume of the unit ball in �
n. For this we use
the same method which we employed in Example 15.12:
�
√
� �n
(15.17)=
(∫
e−t
2
�1�dt�
)n
=
∫
· · ·
∫
e−�x
2
1 +···+x2n� �1�dx1� � � � �
2�dxn�
15.14= n �n
∫
�0���
rn−1 e−r
2
�1�dr��
6 This is, actually, the surface area of the unit ball B1�0� in �
n.
156 R.L. Schilling
Since rn−1e−r
2
is positive and improperly Riemann integrable[�], Riemann and
Lebesgue integrals coincide (use 11.8, 11.18), and we find after a change of
variables according to s = r 2
�
√
� �n = n �n
∫ �
0
rn−1 e−r
2
dr = �n
n
2
∫ �
0
sn/2−1 e−s ds = �n n2 �� n2 ��
see Example 11.14. Since n2 ��
n
2 � = �� n2 + 1�, we have finally established
15.15 Corollary �n = �n�B1�0�� =
�n/2
�� n2 + 1�
.
Continuous functions are dense in
p��n�
We will now establish a result that is closely related to Lemma 15.2: we show that
the continuous functions with compact support Cc��
n� are dense in the space of
Lebesgue p-integrable functions p��n�, 1 � p < �, that is, if u ∈ p��n�, then
∀ � > 0 ∃ � = ���u ∈ Cc��n� � �u − ��p � ��
Since every compact set K ⊂ �n is bounded, we find for some sufficiently
large R > 0 that K ⊂ �−R� R�n, hence �n�K� � �2R�n. Thus for � ∈ Cc��n� with
support supp � �= �� �= 0� ⊂ K,
���pp =
∫
���p d�n =
∫
K
���p d�n � sup
x∈K
���x��p �2R�n < ��
so that Cc��
n� ⊂ p��n� (measurability is clear because of continuity).
Our strategy will be to approximate first indicator functions of Borel sets and
simple functions. For this we need the following
15.16 Lemma (Urysohn) Let K ⊂ �n be a compact set and U ⊃ K be an
open set. Then there exists a continuous function � = �K�U ∈ C��n� such that
1K � � � 1U .
Proof Let d�x� A� �= inf y∈A �x − y� be the distance of the point x ∈ �n from the
set A ⊂ �n. For x� x′ ∈ �n we have
d�x� A� = inf
y∈A
�x − y� � inf
y∈A
(�x − x′� + �x′ − y�) = �x − x′� + d�x′� A��
Measures, Integrals and Martingales 157
which shows, due to the symmetry in x and x′, that �d�x� A� − d�x′� A�� � �x − x′�,
or, in other words, that x �→ d�x� A� is continuous. It is now easy to see that the
function
��x� �= d�x� U
c�
d�x� K� + d�x� U c�
is continuous and satisfies 1K � � � 1U .
15.17 Theorem Cc��
n� is a dense subset of p��n�, 1 � p < �.
Proof We have already verified that Cc��
n� ⊂ p��n�.
Step 1: C��n� ∩ p��n� is dense in �����n�� ∩ p��n�. Let B ∈ ���n� such
that 1B ∈ p��n� (i.e. �n�B� < �). In steps 1,2 of the proof of Lemma 15.2 we
constructed for such sets open sets U� and closed sets C� such that
C� ⊂ B ⊂ U� and ��n�U�� − �n�B�� + ��n�B� − �n�C��� � �p�
By the continuity of measures T4.4(iii) we find for the closed and bounded, hence
compact, sets Bj �0� ∩ C� ↑ C� that limj→� �n
(
Bj �0� ∩ C�
) = �n�C��. This means
that we can replace C� by a compact set K� ⊂ C� and still have
K� ⊂ B ⊂ U� and ��n�U�� − �n�K��� � �2��p�
Using Lemma 15.16 we find a continuous function �� �= �U��K� ∈ C��n� with
1K� � �� � 1U� . As 1K� � 1B � 1U� we have, in particular,
�1B − ���p � �1B − 1K� �p + �1K� − ���p � 2 �1U� − 1K� �p � 4��
which also shows that �� ∈ p��n�. Since any f ∈ �����n�� ∩ p��n� has a
standard representation of the form f = ∑Mj=0 yj 1Bj where y0 = 0 and B1� � � � � BM
are Borel sets of finite volume, it is clear that C��n� ∩ p��n� is dense in the set
of all pth power integrable simple functions.
Step 2 : C��n� ∩ p��n� is dense in p��n�. Fix � > 0. Since �����n�� ∩
p��n� is dense in p��n�, cf. C12.11, there exists some f� ∈ �����n�� ∩ p��n�
such that
�f� − u�p � ��
Using step 1 we find some �� ∈ C��n� ∩ p��n� with
��� − f��p � ��
and the claim follows from Minkowski’s inequality for �•�p
��� − u�p � ��� − f��p + �f� − u�p � 2��
158 R.L. Schilling
Step 3 : Cc��
n� is dense in p��n�. Let �� ∈ C��n� be the function constructed
in step 2. Using Lemma 15.16 we obtain a sequence of functions �j such that
1
Bj �0�
� �j � 1Bj+1�0�. Obviously, �j ��
j→�−−−→ ��, ��j ��� � ���� and �j �� ∈
Cc��
n�. Lebesgue’s dominated convergence theorem 11.2 (or 12.9) therefore
shows that
lim
j→�
�u − �j ���p = �u − ���p � 2��
and the theorem is proved.
Regular measures
The seemingly innocuous question whether the continuous functions are a dense
subset of p is – even for Lebesgue measure in �n – quite hard to answer,
as we have seen in Theorem 15.17. In general measure spaces, such results
require a connection between measure and topology that reaches further than just
considering the Borel (= topological)
-algebra on a topological space �X� ��.
This connection is made in the following
15.18 Definition Let �X� �� be a topological space, denote by
the compact
subsets of X and let � be a measure on �X� ��, � =
���. The measure � is
called outer regular if
��B� = inf���U � � U ∈ �� U ⊃ B� ∀ B ∈ ��
and (compact) inner regular if
��B� = sup���K� � K ∈
� K ⊂ B� ∀ B ∈ ��
For Lebesgue measure �n on ��n� ���n�� we have proved outer and inner
regularity in Lemma 15.2, see also step 1 in the proof of Theorem 15.17 and
Problem 15.2. Let us note, without proof, the following characterization of outer
regular measures.
15.19 Theorem Let �X� �� be a complete separable metric space7 and denote
the open sets by � and the compact sets by
. Every measure � on �X� ��X��
which is locally finite, i.e. every x ∈ X has an open neighbourhood U = U�x� of
finite measure ��U� < �, is both outer regular and inner regular, i.e.
��B� = inf���U � � U ∈ �� U ⊃ B� = sup���K� � K ∈
� K ⊂ B��
7 cf. Appendix B.
Measures, Integrals and Martingales 159
A proof can be found in Bauer [6, §26]. Note the analogy to Lemma 15.2 and
the proof of Theorem 15.17 where we (essentially) verified Theorem 15.19 for
Lebesgue measure. Also note that the measure � in Theorem 15.19 is
-finite:
since X is separable, there is a countable dense subset D ⊂ X, and the collection
= {Br �d� � r ∈ �+� d ∈ D� Br �d� ⊂ U�d�� U�d� as in T15.19
}
is a countable family of open balls with finite �-measure. Moreover, since every
U ∈ � can be written in the form8
U = ⋃
�Br �d�⊂U
Br �d�
we find that X = ⋃�N =1
⋃N
j=1 Brj �dj � with ��Brj �dj �� < �.
Almost the same argument that was used in the proof of Theorem 15.17 is
valid in the abstract setting.
15.20 Theorem Let �X� �� be a topological space and � be an outer regular
measure on �X� ��X��. Then the set
Cfin�X� �= �u � X → � � u is continuous� ���u �= 0�� < ��
is dense in Lp�X� �� ��, 1 � p < �.
Proof Let A ∈ � be a set with ��A� < �. Since � is outer regular, we find for
every � > 0 some U ∈ � such that
A ⊂ U and ��U � − �p � ��A� � ��U ��
Literally as in step 2 of the proof of Lemma 15.2 we can find some closed set
F with
F ⊂ A and ��F � � ��A� � ��F � + �p�
and, consequently, ��U � − ��F � � 2�p. The rest of the proof is now as in
T15.17.
Problems
15.1. Let F� F1� F2� F3� � � � be F
-sets in �
n. Show that
(i) F1 ∩ F2 ∩ � � � ∩ FN is for every N ∈ � an F
-set;
(ii)
⋃
j∈�
Fj is an F
-set;
8 This is similar to (3.2) in the proof of T3.8: the inclusion ‘⊂’ is obvious, for ‘⊃’ fix x ∈ U . Then there
exists some r ∈ �+ with Br �x� ⊂ U . Since D is dense, x ∈ Br/2�d� for some d ∈ D with ��d� x� < r/4, so
that x ∈ Br/2�d� ⊂ U .
160 R.L. Schilling
(iii) F c and
⋂
j∈� F
c
j are G�-sets;
(iv) all closed sets are F
-sets.
15.2. Prove the following corollary to Lemma 15.2: Lebesgue measure �n on �n is
outer regular, i.e.
�n�B� = inf {�n�U� � U ⊃ B� U open } ∀ B ∈ ���n��
and inner regular, i.e.
�n�B� = sup{�n�F� � F ⊂ B� F closed } ∀ B ∈ ���n�
= sup{�n�K� � K ⊂ B� K compact } ∀ B ∈ ���n��
15.3. Completion (6). Combine Problems 15.2 and 10.12 to show that the completion
�̄n of n-dimensional Lebesgue measure is again inner and outer regular.
15.4. Consider the Borel
-algebra ��0� �� and write � = �1��0��� for Lebesgue measure
on the half-line �0� ��.
(i) Show that � �= ��a� �� � a � 0� generates ��0� ��.
(ii) Show that ��B� �= ∫
B
1�2�4� ��dx� and ��B� �= ��5 · B�, B ∈ ��0� �� are mea-
sures on ��0� �� such that ��� � ��� but not � � � in general.
Why does this not contradict Lemma 15.6?
15.5. Use Jacobi’s transformation formula to recover Theorem 5.8(i), Problem 5.8 and
Theorem 7.10. Show, in particular, that for all integrable functions u � �n → �0� ��
∫
u�x + y� �n�dx� =
∫
u�x� �n�dx� ∀ y ∈ �n �
∫
u�t x� �n�dx� = 1
tn
∫
u�x� �n�dx� ∀ t > 0�
∫
u�Ax� �n�dx� = 1� det A�
∫
u�x� �n�dx� ∀ A ∈ GL�n� ���
In particular, the l.h.s. of the above equalities exists and is finite if, and only if,
the r.h.s. exists and is finite.
Why can’t we use 15.5 and 15.8 to prove these formulae?
15.6. Arc-length. Let f � � → � be a twice continuously differentiable function and
denote by �f �= ��t� f�t�� � t ∈ �� its graph. Define a function � � → �2 by
�x� �= �x� f�x��. Then
(i) � � → �f is a C1-diffeomorphism and det D �x� = 1 + �f ′�x��2.
(ii)
�= �� det D � �1� is a measure on �f .
(iii)
∫
�f
u�x� y� d
�x� y� = ∫
�
u�t� f�t��
√
1 + �f ′�t��2 d�1�t� with the understand-
ing that whenever one side of the equality makes sense (measurability!) and
is finite, so does the other.
Measures, Integrals and Martingales 161
The measure
is called canonical surface measure on �f . This name is justified
by the following compatibility property w.r.t. �2: Let n�x� be a unit normal vector
to �f at the point �x� f�x�� and define a map ̃ � � × � → �2 by ̃�x� r� �=
�x� + r n�x�. Then
(iv) n�x� = �−f ′�x�� 1�/
√
1 + �f ′�x��2 and det D ̃�x� r� = 1+�f ′�x��2 −r f ′′�x�.
Conclude that for every compact interval �c� d� there exists some � > 0 such
that ̃��c�d�×�−���� is a C1-diffeomorphism.
(v) Let C ⊂ �f ��c�d� and r < � with � as in (iv). Make a sketch of the set
C�r� �= ̃( −1�C� × �−r� r�) and show that it is Borel measurable.
(vi) Use dominated convergence to show that for every x ∈ �c� d�
lim
r↓0
1
2r
∫
�−r�r�
∣∣det D ̃�x� r�
∣∣�1�dr� =
∣∣det D ̃�x� 0�
∣∣�
(vii) Use the general transformation theorem 15.8, Tonelli’s theorem 13.8, (vi)
and dominated convergence to show that
lim
r↓0
�2�C�r�� =
∫
−1 �C�
� det D �x�� �1�dx��
(viii) Conclude that
∫ √
1 + �f ′�t��2 dt is the arc-length of the graph of �f .
15.7. Let � �d → M ⊂ �n, d � n, be a C1-diffeomorphism.
(i) Show that �M �=
(� det D � �d) is a measure on M. Find a formula for∫
M
u d�M .
(ii) Show that for a dilation �r � �
n → �n, x �→ r x, r > 0, we have
∫
M
u�r �� r n d�M ��� =
∫
�r �M�
u��� d�M ����
(iii) Let M = ��x� = 1� = Sn−1 be the unit sphere in �n, so that d = n − 1. Show
that for every integrable u ∈ 1��n� and
�= �M
∫
u�x��n�dx� =
∫
�0���
∫
��x�=r�
u�x�
�dx� �1�dr�
=
∫
�0���
∫
��x�=1�
u�r x�
�dx� �1�dr��
Remark. With somewhat more effort it is possible to show the analogue of the
approximation formula in Problem 15.6(vii) for �M ; all that changes are technical
details, the idea of the proof is the same, cf. Stroock [50, pp. 94–101] for a nice
presentation.
15.8. In Example 11.14 we introduced Euler’s Gamma function:
��t� =
∫
�0���
xt−1e−x �1�dx��
Show that �� 1
2
� = √�.
162 R.L. Schilling
15.9. 3-d polar coordinates. Define � �0� �� × �0� 2�� × �−�/2� �/2� → �3 by
�r� �� �� �= (r cos � cos �� r sin � cos �� r sin �)�
Show that � det D �r� �� ��� = r 2 cos � and find the integral formula for the
coordinate change from Cartesian to polar coordinates �x� y� z� � �r� �� ��.
15.10. Compute for m� n ∈ � the integral
∫∫
B1 �0�
xm yn d�2�x� y�.
16
Uniform integrability and Vitali’s
convergence theorem
Lebesgue’s dominated convergence theorem 11.2 gives sufficient conditions
which allow us to interchange limits and integrals. A crucial ingredient is the
assumption that �uj� � w a.e. for all j ∈ � and some w ∈ �1+���. This condition
is not necessary, but a slightly weaker one is indeed necessary and sufficient in
order to swap limits and integrals. The key idea is to control the size of the sets
where the uj exceed a given reference function. This is the rationale behind the
next definition.
16.1 Definition Let �X� �� �� be a measure space and � ⊂ ���� be a family of
measurable functions. We call � uniformly integrable (also: equi-integrable) if
∀ � > 0 ∃ w� ∈ �1+��� � sup
u∈�
∫
��u�>w�
�u� d� < �
(16.1)
Note that there are other (but for ��X� < � usually equivalent) definitions of
uniform integrability, see Theorem 16.8 below for a discussion. We follow the
universal formulation due to G. A. Hunt [21, p. 33].
The other key assumption in Theorem 11.2 was that uj �x�
j→�−−−→ u�x� for
(almost) all x ∈ X; we can weaken this assumption, too.
16.2 Definition Let �X� �� �� be a measure space. A sequence of �-measurable
numerical functions uj � X → �̄ converges in measure1 if
∀ � > 0� ∀ A ∈ �� ��A� < � � lim
j→�
�
(
��uj − u� > � ∩ A
) = 0 (16.2)
holds for some u ∈ ����. We write �- limj→� uj = u or uj
�−→ u.
1 If � is a probability measure one usually speaks of convergence in probability.
163
164 R.L. Schilling
16.3 Example Convergence in measure is strictly weaker than pointwise conver-
gence. To see this, take �X� �� �� = ��0� 1�� ��0� 1��
1��0�1�� and set
un�x� �= 1�j2−k��j+1�2−k��x�� n = j + 2k� 0 � j < 2k
This is a sequence of rectangular functions of width 2−k moving in 2k steps
through �0� 1�, jump back to x = 0, halve their width and start moving again.
Obviously,
1���un� > � � = 2−k
n = n�k�→�−−−−−−−→ 0 ∀ � ∈ �0� 1��
so that un
1−→ 0 in measure, but the pointwise limit limn→� un�x� does not exist
anywhere.[�]
16.4 Lemma Let �uj �j∈� ⊂ �p���, p ∈ �1� ��, and �wk�k∈� ⊂ ����. Then
(i) lim
j→�
uj − u
p = 0 implies uj
�−→ u;
(ii) lim
k→�
wk�x� = w�x� a.e. implies wk
�−→ w.
Proof (i) follows immediately from the Markov inequality P10.12,
�
(
��uj − u� > � ∩ A
)
� �
(
��uj − u� > �
) = �(��uj − u�p > �p
)
�
1
�p
uj − u
pp
(ii) Observe that for all � > 0
��wk − w� > � ⊂ �� ∧ �wk − w� � �
An application of the Markov inequality P10.12 yields
����wk − w� > � ∩ A� � ���� ∧ �wk − w� � � ∩ A�
�
1
�
∫
A
� ∧ �wk − w� d� =
1
�
∫ (
� ∧ �wk − w�
)
1A d�
If ��A� < �, the function � 1A ∈ �1+��� is integrable, dominates the integrand(
� ∧ �wk − w�
)
1A, and Lebesgue’s dominated convergence theorem 11.2 implies
that limk→�
∫
A
�� ∧ �wk − w�� d� = 0.
16.5 Lemma Assume that �X� �� �� is �-finite and that �uj �j∈� ⊂ ���� con-
verges in measure to u. Then u is a.e. unique.
Measures, Integrals and Martingales 165
Proof Let �Ak�k∈� ⊂ � be a sequence with Ak ↑ X and ��Ak� < �. Suppose that
u and w are two measurable functions such that uj
�−→ u and uj
�−→ w. Because
of �u − w� � �u − uj� + �uj − w� we find for all j� n ∈ � that
{�u − w� > 2
n
} ⊂ {�u − uj� > 1n
}∪{�uj − w� > 1n
}
Therefore,
�
(
Ak ∩
{�u − w� > 2
n
})
� �
(
Ak ∩
{�u − uj� > 1n
})+ �(Ak ∩
{�uj − w� > 1n
}) j→�−−−→ 0
holds for all k� n ∈ �, i.e. Ak ∩
{�u − w� > 2
n
}
is a null set for all k� n ∈ �; but
then �u �= w ⊂ ⋃n∈�
{�u − w� > 2
n
} = ⋃k�n∈�
(
Ak ∩
{�u − w� > 2
n
})
is also a null
set, and we are done.
Caution: Limits in measure on a non-�-finite measure space �X� �� �� need not
be unique, see Problem 16.6.
We are now ready for the main result of this chapter, which generalizes
Lebesgue’s dominated convergence theorem 11.2.
16.6 Theorem (Vitali) Let �X� �� �� be �-finite and let �uj �j∈� ⊂ �p���, p ∈
�1� ��, be a sequence which converges in measure to some measurable function
u ∈ ����. Then the following assertions are equivalent:
(i) lim
j→�
uj − u
p = 0;
(ii)
(�uj�p
)
j∈� is a uniformly integrable family;
(iii) lim
j→�
∫
�uj�p d� =
∫
�u�p d�.
Proof (iii)⇒(ii): Since limj→�
∫ �uj�p d� =
∫ �u�p d�, there exists some constant
C < � such that supj∈�
∫ �uj�p d� � C, and for every � > 0 there is some N� ∈ �
such that ∣∣∣
∫
�uj�p d� −
∫
�u�p d�
∣∣∣ � �p ∀ j � N�
Setting w� �= max��u1�� �u2��
� �uN� �� �u� , we have w� ∈ �
p
+���
[�] and we see
for every � ∈ �0� 1� that
{�uj� > 1� w�
} = ∅ ∀ j � N��
{�uj� > 1� w�
} ⊂ {��uj� > �u�
} ∀ j ∈ �
166 R.L. Schilling
This implies for all j ∈ � that
∫
{
�uj �> 1� w�
} �uj�p d� �
∣∣∣∣∣
∫
{
�uj �> 1� w�
}
(�uj�p − �u�p
)
d�
∣∣∣∣∣+
∫
{
�uj �> 1� w�
} �u�p d�
� �p +
∫
���uj �>�u�
�u�p d�
� �p + �p sup
j∈�
∫
�uj�p d� � �1 + C� �p
Since
w� ∈ �p+��� ⇔ wp� ∈ �1+��� and
{�uj� > 1� w�
} = {�uj�p > 1�p wp�
}
� (16.3)
we have established the uniform integrability of
(�uj�p
)
j∈�.
(ii)⇒(i): Let us first check that the double sequence (�uj − uk�p
)
j�k∈� is again
uniformly integrable. In view of (16.3), our assumption reads
∫
��uj �>w
�uj�p d� < � ∀ j ∈ � (16.4)
for some suitable w = w� ∈ �p+���. From �a − b� � �a� + �b� � 2 max��a�� �b� we
deduce ∫
��uj −uk�>2w
�uj − uk�p d� � 2p
∫
��uj −uk�>2w
(�uj� ∨ �uk�
)p
d��
and since �uj − uk� � �uj� + �uk� we get
��uj − uk� > 2w ⊂ ��uj� > w ∪ ��uk� > w
Consequently,
∫
��uj −uk�>2w
�uj − uk�p d�
� 2p
{ ∫
��uj �>w ∩��uk�>w
+
∫
��uj �>w��uk�
+
∫
��uk�>w��uj �
}
�uj�p ∨ �uk�p d�
� 2p
{ ∫
��uj �>w ∩��uk�>w
�uj�p d� +
∫
��uj �>w ∩��uk�>w
�uk�p d�
}
+ 2p
∫
��uj �>w
�uj�p d� + 2p
∫
��uk�>w
�uk�p d�
�16
4�
� 4 · 2p � = 2p+2 �
Measures, Integrals and Martingales 167
From this we conclude that for W = 2w ∈ �p+��� and large R > 0
∫
�uj − uk�p d�
=
∫
��uj −uk�>W
�uj − uk�p d� +
∫
��uj −uk��W
�uj − uk�p d�
� 2p+2 � +
∫
��uj −uk��W ∧�
�uj − uk�p d� +
∫
��<�uj −uk��W
�uj − uk�p d�
� 2p+2 � +
∫
�p ∧ W p d� +
{ ∫
��uj −uk�>�
∩ �W>R
W p d� +
∫
��uj −uk�>�
∩ ��
W p d�
+ Rp�(��uj − uk� > � ∩ �� < W � R
)
Letting first j� k → � we find because of uj
�−→ u that[�]
lim sup
j�k→�
∫
�uj − uk�p d� � 2p+2 � +
∫
�p ∧ W p d� +
∫
�W>R
W p d�
The last two terms vanish as � → 0 and R → � by the dominated convergence
theorem 12.9, so that limj�k→�
∫ �uj − uk�p d� = 0. Since �p��� is complete (cf.
T12.7), �uj �j∈� converges in �p��� to a limit ũ ∈ �p���.
Due to Lemma 16.4, �p-convergence also implies uj
�−→ ũ and, by Lemma
16.5, we have u =ũ a.e., hence �p- limj→� uj = u.
(i)⇒(iii): is a consequence of the lower triangle inequality for the �p-norm,
cf. the first part of the proof of Theorem 12.10
16.7 Remark Vitali’s theorem 16.6 still holds for measure spaces �X� �� ��
which are not �-finite. In this case, however, we can no longer identify the
�p-limit and the theorem reads: If uj
�−→ u, then the following are equivalent:
(i) �uj �j∈� converges in �
p;
(ii) �uj �j∈� is uniformly integrable;
(iii) �
uj
p�j∈� converges in �.
The reason for this is evident from the proof of T16.6: the last few lines of the
step (ii)⇒(i) require �-finiteness of �X� �� ��.
168 R.L. Schilling
Different forms of uniform integrability2
In view of Vitali’s convergence theorem 16.6 one is led to suspect that uniform
integrability is essentially a sufficient (and also necessary, if �X� �� �� is �-finite)
condition for weak sequential relative compactness in �1���, i.e.
(16.1) =⇒
{
every �uj �j∈� ⊂ � has a subsequence �uj�k��k∈� such
that lim
k→�
∫
uj�k� · � d� exists for all � ∈ �����.
(see Dunford and Schwartz [15, pp. 289–90, 386–7]). In �p���, 1 < p < �,
uniform boundedness of � ⊂ �p��� is enough for this:
sup
u∈�
u
p < � ⇐⇒
⎧
⎪⎪⎨
⎪⎪⎩
every �uj �j∈� ⊂ � has a subsequence
�uj�k��k∈� such that lim
k→�
∫
uj�k� · � d�
exists for all � ∈ �q���� 1
p
+ 1
q
= 1.
This is a consequence of the reflexivity of the spaces �p���, p > 1.
Let us give various equivalent conditions for uniform integrability.
16.8 Theorem Let �X� �� �� be some measure space and � ⊂ �1���. Then the
following statements (i)–(iv) are equivalent:
(i) � is uniformly integrable, i.e. (16.1) holds;
(ii) a) sup
u∈�
∫
�u� d� < �;
b) ∀ � > 0 ∃ w� ∈ �1+���� � > 0 ∀ B ∈ � �
∫
B
w� d� < �
=⇒ sup
u∈�
∫
B
�u� d�< �;
(iii) a) sup
u∈�
∫
�u� d� < �;
b) ∀ � > 0 ∃ K� ∈ �� ��K�� < � � sup
u∈�
∫
Kc�
�u� d� < �;
c) ∀ � > 0 ∃ � > 0 ∀ B ∈ � � ��B� < � =⇒ sup
u∈�
∫
B
�u� d� < �;
(iv) a) ∀ � > 0 ∃ K� ∈ �� ��K�� < � � sup
u∈�
∫
Kc�
�u� d� < �;
b) lim
R→�
sup
u∈�
∫
��u�>R
�u� d� = 0.
If �X� �� �� is a �-finite measure space, (i)–(iv) are also equivalent to
2 This section can be left out at first reading.
Measures, Integrals and Martingales 169
(v) a) sup
u∈�
∫
�u� d� < �;
b) lim
j→�
sup
u∈�
∫
Aj
u d� = 0 for every decreasing sequence �Aj �j∈� ⊂ �, Aj ↓
∅. [Note: ��Aj � < � is not assumed.]
If �X� �� �� is a finite measure space, (i)–(v) are also equivalent to
(vi) lim
R→�
sup
u∈�
∫
��u�>R
�u� d� = 0;
(vii) sup
u∈�
∫
���u�� d� < � for some increasing, convex function � � �0� ��
→ �0� �� such that lim
t→�
��t�
t
= �.
16.9 Remark Almost any combination of the above criteria appears in the liter-
ature as uniform integrability or under different names. Here is a short list:
(ii-a) – uniform boundedness
(iii-b) – tightness
(iii-c) – uniform absolute continuity
(v-b) – uniform �-additivity
(vii) – de la Vallée Poussin’s condition
(iii) – Dieudonné’s condition (weak seq. relative compactness)
(v) – Dunford–Pettis condition (weak seq. relative compactness)
Proof (of Theorem 16.8) First we show (iv)⇒(iii)⇒(ii)⇒(i)⇒(iv) for general
measure spaces, then (ii)⇒(v)⇒(i) for �-finite measure spaces and, finally, for
finite measure spaces (iv)⇒(vi)⇒(vii)⇒(i).
(iv)⇒(iii): Condition (iii-b) is clear. Given � > 0 we can pick K = K�/2 ∈ �
and R = R�/2 > 0 such that∫
�u� d� =
∫
K∩��u�>R
�u� d� +
∫
K∩��u��R
�u� d� +
∫
Kc
�u� d�
�
�
2
+ R��K� + �
2
< ��
uniformly for all u ∈ � . Setting � �= �2R we see for every B ∈ � with ��B� < � that∫
B
�u� d� =
∫
B∩��u�>R
�u� d� +
∫
B∩��u��R
�u� d�
�
∫
��u�>R
�u� d� + R ��B� � �
2
+ R � = ��
and (iii) follows.
170 R.L. Schilling
(iii)⇒(ii): Condition (ii-a) is clear. Given � > 0 we pick K = K� ∈ � with
��K� < � and � = �� > 0 and set w� �= 1K� . If B ∈ � is such that ��B ∩ K�� =∫
B
w� d� < �, we get from (iii-c) and (iii-b) that
∫
B
�u� d� =
∫
B∩K�
�u� d� +
∫
B∩Kc�
�u� d� � � + ��
uniformly for all u ∈ � which is just (ii-b).
(ii)⇒(i): Take w = w� and � = �� > 0 as in (ii). If R > 1� sup
u∈�
∫ �u� d� we see
∫
�u� d� �
∫
��u�>Rw
�u� d� � R
∫
��u�>Rw
w d��
and so ∫
��u�>Rw
w d� �
1
R
sup
u∈�
∫
�u� d� � �
From (ii-b) we infer that supu∈�
∫
��u�>Rw �u� d� � �.
(i)⇒(iv): Let w = w� be as in (i) resp. (16.1). Since ��u� � w ∩ ��u� � R ⊂
�w � R , we have
∫
��u�>R
�u� d� =
∫
��u�>w ∩��u�>R
�u� d� +
∫
��u��w ∩��u�>R
�u� d�
�
∫
��u�>w
�u� d� +
∫
��w�>R
w d�
� � +
∫
w 1��w�>R d�
(16.5)
From the dominated convergence theorem 11.2 we see that the right-hand side
tends (uniformly for all u ∈ � ) to � as R → � and (iv-b) follows. To see (iv-a)
we choose r = r� > 0 so small that
∫
�w�r
w d� �
∫
w ∧ r d� � �; this is possible
since by Lebesgue’s convergence theorem 11.2 limr→0
∫ �w� ∧ r d� = 0. By the
Markov inequality P10.12 we see ���w > r � � 1
r
∫
w d� < �, and we get for
K �= �w > r
sup
u∈�
∫
Kc
�u� d� = sup
u∈�
(∫
�w�r ∩��u�>w
�u� d� +
∫
�w�r ∩��u��w
�u� d�
)
(16.1)
� � + sup
u∈�
∫
�w�r ∩��u��w
�u� d�
� � +
∫
�w�r
w d�
� 2�
Measures, Integrals and Martingales 171
This proves (iv).
Assume for the rest of the proof that � is �-finite
(ii)⇒(v): (v-a) is clear. If Aj ↓ ∅ we see from the monotone conver-
gence theorem 11.1 that limj→�
∫
Aj
w d� = 0, so that for we have by (ii-b)
supu∈�
∫
Aj
u d� � supu∈�
∫
Aj
�u� d� < � for sufficiently large j ∈ �.
(v)⇒(i): Note that for the positive, resp. negative parts u± of u
∫
Aj
u± d� =
∫
Aj ∩�±u�0
�±u� d� and Aj ∩ �±u � 0 ↓ ∅�
which implies that we may replace u in (v-b) by �u�. Since � is �-finite, we can
find an exhausting sequence Ek ∈ �, Ek ↑ X, ��Ek� < �. The function
w �= ∑
k∈�
2−k
1 + ��Ek�
1Ek
is clearly positive and ∈ �1+���. Assume (i) false; in particular,
∃ � > 0 ∀ j ∈ � � sup
u∈�
∫
��u�>j w
�u� d� > �
But Aj �= ��u� > j w ↓ ∅ and (v) (with the above discussed modification) will
then lead to a contradiction.
Assume for the rest of the proof that � is finite
(iv)⇒(vi): is trivial.
(vi)⇒(vii): For u ∈ � we set �n �= �n�u� �= ����u� > n � and define
��t� �=
∫
�0�t�
��s�
�ds�� ��s� �=
�∑
n=1
�n 1�n�n+1��s�
We will now determine the numbers �1� �2� �3�
. Clearly,
��t� =
�∑
n=1
�n
∫
�0�t�
1�n�n+1��s�
�ds� =
�∑
n=1
�n ��t − n�+ ∧ 1�
and
∫
���u�� d� =
�∑
n=1
�n
∫ [
��u� − n�+ ∧ 1]d� �
�∑
n=1
�n ���u� > n�
(16.6)
172 R.L. Schilling
If we can construct ��n�n∈� such that it increases to � and (16.6) is finite
(uniformly for all u ∈ � ), then we are done: ��s� will increase to �, ��t� will
be convex3 and satisfy
��t�
t
= 1
t
∫
�0�t�
��s�
�ds� �
1
t
∫
�t/2�t�
��s�
�ds� � 12 �
(
t
2
) ↑ �
By assumption we can find an increasing sequence �rj �j∈� ⊂ � such that
limj→� rj = � and
∫
��u�>rj �u� d� � 2
−j . Thus
�∑
k=rj
����u� > k � =
�∑
k=rj
�∑
�=k
���� < �u� � � + 1 �
=
�∑
�=rj
�∑
k=rj
���� < �u� � � + 1 �
�
�∑
�=rj
� ���� < �u� � � + 1 �
Now sum the above inequality over j = 1� 2� 3�
to get
�∑
j=1
�∑
k=rj
����u� > k � �
�∑
j=1
�∑
�=rj
� ���� < �u� � � + 1 �
�
�∑
j=1
�∑
�=rj
∫
��<�u���+1
�u� d�
=
�∑
j=1
∫
��u�>rj
�u� d�
︸ ︷︷ ︸
�2−j by assumption
� 1�
3 Usually one argues that �′′ � 0 a.e., but for this we need to know that the monotone function � = �′
is almost everywhere differentiable – and this requires Lebesgue’s differentiation theorem 19.20. Here is
an alternative elementary argument: it is not hard to see that � � �a� b� → � is convex if, and only if,
��y�−��x�
y−x �
��z�−��x�
z−x holds for all a < x < y < z < b, use e.g. the technique of the proof of Lemma 12.13.
Since ��x� = ∫ x0 ��s� ds (by L13.12 and T11.8), this is the same as
1
y − x
∫ y
x
��s� ds �
1
z − x
∫ z
x
��s� ds ⇐⇒ 1
y − x
∫ y
x
��s� ds �
1
z − y
∫ z
y
��s� ds
⇐⇒
∫ 1
0
��s�y − x� + x� ds �
∫ 1
0
��s�z − y� + y� ds
The latter inequality follows from the fact that � is increasing and s�y − x� + x ∈ �x� y� while s�z − y� + y ∈
�y� z� for 0 � s � 1.
Measures, Integrals and Martingales 173
and interchange the order of summation in the first double sum on the left:
�∑
j=1
�∑
k=rj
����u� > k � =
�∑
k=1
( �∑
j=1
1�1�k��rj �
)
︸ ︷︷ ︸
=� �k
����u� > k � � 1
This finishes the construction of the sequence ��k�k∈�.
(vii)⇒(i): Since ��X� < �, constants are integrable and we may take w��x� �=
r� for all x ∈ X. Fix � > 0 and choose r� so big that t−1 ��t� > 1/� for all t > r�.
Then ∫
��u�>r�
�u� d� �
∫
��u�>r�
� ���u�� d� � �
∫
���u�� d��
and (i) follows.
Problems
16.1. Let �X� �� �� be a finite measure space and �uj �j∈� ⊂ ����. Prove that
lim
k→�
�
({
sup
j�k
�uj � > �
})
= 0 ∀ � > 0 =⇒ fj
j→�−−→ 0 a.e.
[Hint: �uj � → 0 a.e. if, and only if, �
(⋃
j�k��uj � > �
)
is small for all � > 0 and
big k � k�.]
16.2. Show that for a sequence �uj �j∈� of measurable functions on a finite measure space
lim
k→�
�
({
sup
j�k
�uj � > �
})
= �
(
lim sup
j→�
{
�uj � > �
})
∀ � > 0�
and combine this with Problem 16.1 to give a new criterion for a.e. convergence.
16.3. Let �X� �� �� be a measure space and �uj �j∈� ⊂ ����. Show that uj
j→�−−→ u in
measure if, and only if, uj − uk
j�k→�−−−→ 0 in measure.
16.4. Consider one-dimensional Lebesgue measure
on ��0� 1�� ��0� 1��. Compare the
convergence behaviour (a.e., �p, in measure) of the following sequences:
(i) fn�j �= n 1��j−1�/n�j/n�, n ∈ �� 1 � j � n run through in lexicographical order;
(ii) gn �= n 1�0�1/n�, n ∈ �;
(iii) hn �= an�1 − nx�+, n ∈ �, x ∈ �0� 1� and a sequence �an�n∈� ⊂ �+.
16.5. Let �uj �j∈�� �wj �j∈� be two sequences of measurable functions on �X� �� ��. Sup-
pose that uj
�−→ u and wj
�−→ w. Show that auj + bwj , a� b ∈ �, max�uj � wj ,
min�uj � wj and �uj � converge in measure and find their limits.
16.6. Let �X� �� �� be a measure space which is not �-finite. Construct an example of
a sequence �uj �j∈� ⊂ ���� which converges in measure but whose limit is not
unique. Can this happen in a �-finite measure space?
174 R.L. Schilling
[Hint: let X�f �=
⋃
�F � ��F � < � be the �-finite part of X. Show that X \ X�f �=
∅, that every measurable E ⊂ X \ X�f satisfies ��E� = � and that we can change
every limit of �uj �j∈� outside X�f .]
16.7. (i) Prove, without using Vitali’s convergence theorem, the following
Theorem (Bounded convergence). Let �X� �� �� be a measure space, A ∈ �
be a set with ��A� < � and �uj �j∈� be a sequence of measurable functions.
Suppose that all uj vanish on A
c, that �uj � � C for all j ∈ � and some constant
C > 0 and that uj
�−→ u. Then L1-limj uj = u.
(ii) Use one-dimensional Lebesgue measure and the sequence uj = 1�j�j+1� to show
that the assumption ��A� < � is really needed in (i).
(iii) As L1-limit the function u is unique but, as we have seen in Problem 16.6,
this is not the case for limits in measure. Why does the uniqueness of the
limit in (i) not contradict Problem 16.6?
16.8. Let ��� �� P� be a probability space. Define for two random variables X� Y
���X� Y � �= inf
{
� > 0 � P���X − Y � � � � � �}
(i) �� is a pseudo-metric on the space of random variables ����, i.e. �� satisfies
properties �d2�, �d3� of a metric, cf. Appendix B, Definition B.15.
(ii) A sequence �Xj �j∈� ⊂ ���� converges in probability to a random variable
X if, and only if, ���Xj � X�
j→�−−→ 0.
(iii) �� is a complete pseudo-metric on ����, i.e. every ��-Cauchy sequence
converges in probability to some limit in ����.
(iv) Show that
g��X� Y � �=
∫ �X − Y �
1 + �X − Y � dP and ���X� Y � �=
∫
��X − Y � ∧ 1� dP
are pseudo-metrics on ���� which have the same Cauchy sequences as ��.
16.9. Let �X� �� �� be a �-finite measure space. Suppose that �Aj �j∈� ⊂ � satisfies
��Aj �
j→�−−→ 0. Show that
lim
j→�
∫
Aj
u d� = 0 ∀ u ∈ �1���
[Hint: use Vitali’s convergence theorem 16.6.]
16.10. Let �X� �� �� be a measure space and �un�n∈� ⊂ ����.
(i) Let �xn�n∈� ⊂ �. Show that xn
n→�−−→ 0 if, and only if, every subsequence
�xnk �k∈� satisfies xnk
k→�−−→ 0.
(ii) Show that un
�−→ u if, and only if, every subsequence �unk �k∈� has a sub-
subsequence �ũnk �k∈� which converges a.e. to u on every set A ∈ � of finite
�-measure.
Measures, Integrals and Martingales 175
[Hint: use L16.4 for necessity. For sufficiency show that ũnk → u in mea-
sure, hence the sequence of reals ��A ∩ ��unk − u� > � � has a subsequence
converging to 0; use (i) to conclude that ��A ∩ ��un − u� > � � → 0.]
(iii) Use part (ii) to show that un
�−→ u entails that � � un
�−→ � � u for every
continuous function � � � → �.
16.11. Let � and be two families of uniformly integrable functions on an arbitrary
measure space �X� �� ��. Show that
(i) every finite collection of functions �f1�
� fn ⊂ �1��� is uniformly inte-
grable.
(ii) � ∪ �f1�
� fn , f1�
� fn ∈ �1���� is uniformly integrable.
(iii) � + �= �f + g � f ∈ � � g ∈ is uniformly integrable.
(iv) c.h.�� � �= �tf + �1 − t�� � f� � ∈ � � 0 � t � 1 (‘c.h.’ stands for convex
hull) is uniformly integrable.
(v) the closure of c.h.�� � in the space �1 is uniformly integrable.
16.12. Assume that �uj �j∈� is uniformly integrable. Show that
lim
k→�
1
k
∫
sup
j�k
uj d� = 0
16.13. Let ��� �� P� be a probability space. Adapt the proof of Theorem 16.8 to show
that a sequence �uj �j∈� ⊂ �1��� is uniformly integrable if it is bounded in some
space �p�P� with p > 1, i.e. if supj∈�
uj
p < �.
Use Vitali’s convergence theorem 16.6 to construct an example illustrating that
�1-boundedness of �uj �j∈� does not guarantee uniform integrability.
16.14. Let �X� �� �� be a finite measure space and � ⊂ �1��� be a family of integrable
functions. Show that � is uniformly integrable if, and only if,
∑�
j=1 j �
(
�j < �f � �
j + 1 ) converges uniformly for all f ∈ � .
[Hint: compare (vi)⇒(vii) of the proof of Theorem 16.8.]
17
Martingales
Martingales are a key tool of modern probability theory, in particular, when
it comes to a.e. convergence assertions and related limit theorems. The ori-
gins of martingale techniques can be traced back to analysis papers by Kac,
Marcinkiewicz, Paley, Steinhaus, Wiener and Zygmund from the early 1930s on
independent (or orthogonal) functions and the convergence of certain series of
functions, see e.g. the paper by Marcinkiewicz and Zygmund [28] which contains
many references. The theory of martingales as we know it now goes back to
Doob and most of the material of this and the following chapter can be found in
his seminal monograph [13] from 1953.
We want to understand martingales as an analysis tool which will be useful
for the study of Lp- and almost everywhere convergence and, in particular, for
the further development of measure and integration theory. Our presentation
differs somewhat from the standard way to introduce martingales – conditional
expectations will be defined later in Chapter 22 – but the results and their proofs
are pretty much the usual ones. The only difference is that we develop the theory
for �-finite measure spaces rather than just for probability spaces. Those readers
who are familiar with martingales and the language of conditional expectations
we ask for patience until Chapter 23, in particular Theorem 23.9, when we catch
up with these notions.
Throughout this chapter �X� �� �� is a measure space which admits a filtration,
i.e. an increasing sequence
�0 ⊂ �1 ⊂ � � � ⊂ �j ⊂ � � � ⊂ �
of sub-�-algebras of �. If �X� �0� �� is �-finite
1 we call �X� �� �j � �� a
�-finite filtered measure space. This will always be the case from now on. Finally,
1 i.e. �Aj �j∈� ⊂ �0 with Aj ↑ X and ��Aj � < �.
176
Measures, Integrals and Martingales 177
we write �� �= ���j � j = 0� 1� 2� � � �� for the smallest �-algebra generated by
all �j .
17.1 Definition Let �X� �� �j � �� be a �-finite filtered measure space. A sequence
of �-measurable functions �uj �j∈� is called a martingale (w.r.t. the filtration
��j �j∈�), if uj ∈ �1��j � for each j ∈ � and if
∫
A
uj+1 d� =
∫
A
uj d� ∀ A ∈ �j � (17.1)
We say that �uj �j∈� is a submartingale (w.r.t. ��j �j∈�) if uj ∈ �1��j � and
∫
A
uj+1 d� �
∫
A
uj d� ∀ A ∈ �j � (17.2)
and a supermartingale (w.r.t. ��j �j∈�) if uj ∈ �1��j � and
∫
A
uj+1 d� �
∫
A
uj d� ∀ A ∈ �j � (17.3)
If we want to emphasize the underlying filtration, we write �uj � �j �j∈�.
17.2 Remark (i) It is enough to assume instead of (17.1) that
∫
G
uj+1 d� =∫
G
uj d� for all G ∈ �j where �j is a generator of �j containing an exhausting
sequence �Gk�k∈� ⊂ �j with Gk ↑ X. This follows from the fact that
∫
A
uj+1 d� =
∫
A
uj d� ⇐⇒
∫
A
�u+j+1 + u−j � d�
︸ ︷︷ ︸
=� �A�
=
∫
A
�u−j+1 + u+j � d�
︸ ︷︷ ︸
=�
�A�
where �
are finite measures on �j and from the uniqueness theorem 5.7:
�j =
�j implies – under our assumptions on �j – that =
on �j .
(For sub- or supermartingales we need, in addition, that �j is a semi-ring, cf.
Lemma 15.6.)
(ii) Set �j �= �A ∈ �j � ��A� < ��. It is not hard to see that �j is a semi-
ring and that, because of �-finiteness, ���j � = �j . Therefore (ii) means that it
is enough to assume (17.1)–(17.3) for all sets in �j , i.e. for all sets with finite
�-measure.
(iii) Condition (17.2) in Definition 17.1 is equivalent to
∫
uj+1 d� �
∫
uj d� ∀
∈ ��+ ��j �� (17.2′)
Indeed: Since
�= 1A ∈ ��+ ��j � for all A ∈ �j , (17.2′) implies (17.2). Con-
versely, if
∈ �+��j � is a simple function, (17.2′) follows from (17.2) by lin-
earity. For general
∈ ��+ ��j �, we find by T8.8 a sequence of �j -measurable
178 R.L. Schilling
simple functions
k such that
k �
and
k ↑
. Since
uj �
uj+1 ∈ �1���,
we can use Lebesgue’s dominated convergence theorem 11.2 and get
∫
uj+1 d� = lim
k→�
∫
k uj+1 d�
�17.2’�
� lim
k→�
∫
k uj d� =
∫
uj d��
Similar statements hold for martingales (17.1) and supermartingales (17.3).
(iv) With some obvious (notational) changes in Definiton 17.1 we can also
consider other index sets such as �0, � or −�.
17.3 Examples Let �X� �� �j � �� be a �-finite filtered measure space.
(i) �uj �j∈� is a martingale if, and only if, it is both a sub- and a supermartingale.
(ii) �uj �j∈� is a supermartingale if, and only if, �−uj �j∈� is a submartingale.
(iii) Let �uj �j∈� and �wj �j∈� be [sub-]martingales and let �� � be [positive] real
numbers. Then ��uj + �wj �j∈� is a [sub-]martingale.
(iv) Let �uj �j∈� be a submartingale. Then �u
+
j �j∈� is a submartingale.
Indeed: Take A ∈ �j and observe that �uj � 0� ∈ �j . Then
∫
A
u+j+1 d� �
∫
A∩�uj�0�
u+j+1 d� �
∫
A∩�uj�0�
uj+1 d�
(17.2)
�
∫
A∩�uj�0�
uj d� =
∫
A
u+j d��
(v) Let �uj �j∈� be a martingale. Then � uj �j∈� is a submartingale. This
follows from uj = 2u+j − uj , (iii) and (iv).
(vi) Let �uj �j∈� be a martingale. If uj ∈ �p��j � for some p ∈ �1� ��, then
� uj p�j∈� a submartingale.
Indeed: Note that y p − x p = ∫ y x p tp−1 dt � p x p−1� y − x � for all x� y ∈
where we set, as usual,
∫ y
x = −
∫ x
y if x > y . If we take y = uj+1 and
x = uj and integrate over A ∈ �j , we find by dominated convergence T11.2
∫
A
( uj+1 p − uj p
)
d� � p
∫
�1A uj p−1� � uj+1 − uj � d�
= lim
N →�
p
∫ [
�1A uj p−1� ∧ N︸ ︷︷ ︸
∈ ��+ ��j �
]
� uj+1 − uj � d�
(17.2′ ),(v)
� 0�
since � uj �j∈� is, by (v), a submartingale.
Measures, Integrals and Martingales 179
(vii) Let uj ∈ �1��j �, j ∈ �, and u1 � u2 � u3 � � � �. Then �uj �j∈� is a sub-
martingale.
(viii) Let �X� �� �� = (�0� 1��
�0� 1�� � �= �1 �0�1�
)
and consider the finite (�-)
algebras generated by all dyadic intervals of �0� 1� of length 2−j , j ∈ �0:
��j �= �
(
�0� 2−j �� � � � � �k2−j � �k + 1�2−j �� � � � � ��2j − 1�2−j � 1�)�
Obviously, ��0 ⊂ ��1 ⊂ � � � ⊂
�0� 1� and ��0� 1��
�0� 1�� ��j � �� is a
(�-) finite filtered measure space. Then �uj �j∈�0 , uj �= 2j 1�0�2−j �, is a
martingale.
Indeed: Since the sets �k2−j � �k + 1�2−j �, k = 0� 1� � � � � 2j − 1 are a disjoint
partition of �0� 1�, every A ∈ � consists of a (finite) disjoint union of such
sets. If �0� 2−j � ⊂ A, we have
∫
A
uj+1 d� =
∫
2j+1 1A∩�0�2−�j+1�� d� = 2j+12−�j+1�
= 2j 2−j =
∫
2j 1A∩�0�2−j � d� =
∫
A
uj d�
and, otherwise,
∫
A
uj+1 d� =
∫
A
2j+1 1�0�2−�j+1�� d� = 0 =
∫
A
2j 1�0�2−j � d� =
∫
A
uj d��
(ix) Let �X� �� �� = (�0� ��n�
��0� ��n�� � = �n �0���n
)
and consider the �-
algebras �j generated by the lattice of half-open dyadic squares of side-
length 2−j , j ∈ �0,
��j �= �
(
z + �0� 2−j �n � z ∈ 2−j�n0
)
� j ∈ �0�
Then ��0 ⊂ ��1 ⊂ � � � ⊂
��0� ��n�, and ��0� ��n�
��0� ��n�� ��j � �� is a
�-finite filtered measure space.
For every real-valued function u ∈ �1��0� ��n� �� we can define an ��j –
measurable step function uj on the dyadic squares in �
�
j by
uj �x� �=
∑
z∈2−j�n0
∫
z+�0�2−j �n u d�
�
(
z + �0� 2−j �n) 1z+�0�2−j �n �x�
= ∑
z∈2−j�n0
{∫
u
1z+�0�2−j �n
�
(
z + �0� 2−j �n) d�
}
1z+�0�2−j �n �x��
(17.4)
Then �uj � �
�
j �j∈� is a martingale.
180 R.L. Schilling
Indeed: Since the sets z + �0� 2−j �n are disjoint for different z ∈ 2−j�n0 , the
sums in (17.4) are actually finite sums.
That uj ∈ �1���j � is clear from the construction. To see (17.1), fix z′ ∈
2−j�n0 and j ∈ �0 and observe that for all k = j� j + 1� j + 2� � � �
∫
z′+�0�2−j �n
uk�x� ��dx�
= ∑
z∈2−k�n0
{∫
u
1z+�0�2−k�n
�
(
z + �0� 2−k�n) d�
}
·
∫
1z+�0�2−k�n 1z′+�0�2−j �n d�
= ∑
z∈2−k�n0
z+�0�2−k�n⊂z′+�0�2−j �n
∫
u
1z+�0�2−k�n
�
(
z + �0� 2−k�n) d� · �
(
z + �0� 2−k�n)
= ∑
z∈2−k�n0
z+�0�2−k�n⊂z′+�0�2−j �n
∫
z+�0�2−k�n
u�x� ��dx�
=
∫
z′+�0�2−j �n
u�x� ��dx��
The r.h.s. is independent of k and, therefore, we get
∫
z′+�0�2−j �n
uj d� =
∫
z′+�0�2−j �n
u d� =
∫
z′+�0�2−j �n
uj+1 d� �
Since ��j is generated by (disjoint unions of) squares of the form z
′ +
�0� 2−j �n, z′ ∈ 2−j�n0 , the claim follows from Remark 17.2(i).
(x) Assume that �X� �� �� is a probability space, i.e. a measure space where
��X� = 1. A family of real functions �uj �j∈� ⊂ �1��� is called indepen-
dent, if
�
( M⋂
j=1
u−1j �Bj �
)
=
M∏
j=1
�
(
u−1j �Bj �
)
(17.5)
holds for all M ∈ � and any choice of B1� B2� � � � � BM ∈
� �. If �k �=
��u1� u2� � � � � uk� is the �-algebra generated by u1� u2� � � � � uk, then the
sequence of partial sums
sk �= u1 + u2 + · · · + uk� k ∈ ��
is an ��k�k∈�-submartingale if, and only if,
∫
uj d� � 0 for all j.
Measures, Integrals and Martingales 181
To see this we need an auxiliary result which is of some interest on its
own: If u1� u2� � � � � uk+1 are independent integrable functions, then
∫
A
uk+1 d� = ��A�
∫
uk+1 d� ∀ A ∈ ��u1� u2� � � � � uk� (17.6)
and
∫
uk+1 d� =
∫
d� ·
∫
uk+1 d� ∀
∈ �1���u1� � � � � uk��� (17.7)
In particular, integrable independent functions satisfy
∫ k∏
j=1
uj d� =
k∏
j=1
∫
uj d��
The proof of (17.6) and (17.7) will be given in Scholium 17.4 below.
Returning to the original problem, we find for all A ∈ �k that
∫
A
sk+1 d� =
∫
A
�sk + uk+1� d� =
∫
A
sk d� +
∫
A
uk+1 d�
(17.6)=
∫
A
sk d� + ��A�
∫
uk+1 d��
Thus
∫
uk+1 d� � 0 is necessary and sufficient for �sk�k∈� to be a sub-
martingale.
(xi) Let �uj �j∈� ⊂ �1+��� ∩ ��+ ��� be independent functions (in the sense of
(x)). Then pk �= u0 · u1 · � � � · uk, k ∈ �, is a submartingale w.r.t. the filtration
�k �= ��u0� u1� � � � � uk� if, and only if,
∫
uj d� � 1 for all j. This follows
directly from
∫
A
pk+1 d� =
∫
1A pk uk+1 d�
(17.7)=
∫
1A pk d� ·
∫
uk+1 d�
=
∫
A
pk d� ·
∫
uk+1 d� ∀ A ∈ �k �
17.4 Scholium (on independent functions) (i) Let u1� u2� � � � � uk+1 be indepen-
dent integrable functions on the probability space �X� �� ��. Then
∫
A
uk+1 d� = ��A�
∫
uk+1 d� ∀ A ∈ ��u1� u2� � � � � uk� (17.6)
182 R.L. Schilling
and
∫
uk+1 d� =
∫
d� ·
∫
uk+1 d� ∀
∈ �1���u1� � � � � uk��� (17.7)
Proof. We begin with (17.6). Pick a set AM �=
⋂M
j=1 u
−1
j �Bj �, B1� � � � � BM ∈
� �, M � k, from the generator of �k = ��u1� u2� � � � � uk�. Because of The-
orem 8.8 (and Problem 8.10) we find a sequence of simple functions �f���∈� ⊂
����uk+1�� such that f� � uk+1 and lim�→� f� = uk+1. For the standard repre-
sentations f� =
∑N���
j=0 y
�
j 1H �j , H
�
j ∈ ��uk+1�, we get using dominated convergence
T11.2
∫
AM
uk+1 d�
11.2= lim
�→�
∫
AM
N���∑
j=0
y�j 1H �j d�
= lim
�→�
N���∑
j=0
y�j ��AM ∩ H �j �
(17.5)= lim
�→�
N���∑
j=0
y�j ��AM ���H
�
j �
11.2= ��AM �
∫
uk+1 d��
where we applied (17.5) for H �j ∈ ��uk+1� �⇐⇒ H �j = u−1k+1�C�j � with some
suitable C�j ∈
� �� and AM . This proves (17.6) for a generator of �k which
satisfies the conditions stated in Remark 17.2(i); a similar argument as the one in
this remark now proves that (17.6) holds for all A ∈ �k.
For (17.7) let us first assume that
is bounded. Set �k �= ��u1� � � � � uk�.
By Theorem 8.8 (and Problem 8.10) we find a sequence of simple functions
�f���∈� ⊂ ���k� such that f� �
and lim�→� f� =
. For the standard
representations f� =
∑N���
j=0 y
�
j 1A�j , A
�
j ∈ �k, we get using dominated convergence
T11.2 and (17.6)
∫
uk+1 d�
11.2= lim
�→�
∫ N���∑
j=0
y�j 1A�j uk+1 d�
= lim
�→�
N���∑
j=0
y�j ��A
�
j �
∫
uk+1 d�
Measures, Integrals and Martingales 183
= lim
�→�
∫ N���∑
j=0
y�j 1A�j d� ·
∫
uk+1 d�
11.2=
∫
d� ·
∫
uk+1 d��
If
is integrable but not bounded, we apply the previous calculation to the
bounded functions
� �=
∧ � and use dominated convergence on the right
and monotone convergence on the left to get
∫
· uk+1 d�
9.6= lim
�→�
∫
� · uk+1 d� = lim
�→�
∫
� d� ·
∫
uk+1 d�
11.2=
∫
d� ·
∫
uk+1 d��
This shows, in particular, that
uk+1 ∈ �1���. We can therefore apply dominated
convergence to �� �= �−�� ∨
∧ � to derive
∫
uk+1 d� = lim
�→�
∫
�� uk+1 d� = lim
�→�
∫
�� d� ·
∫
uk+1 d�
=
∫
d� ·
∫
uk+1 d��
(ii) In Example 17.3(x) we assumed the existence of infinitely many inde-
pendent functions. As a matter of fact, this is a not completely trivial matter.
If we want to construct finitely many independent functions u1� u2� � � � � un, we
can proceed as follows. Replace the probability space �X� �� �� by the n-fold
product measure space
(
Xn� �⊗n� �×n
)
(which is again a probability space[�])
and define ũj �x1� � � � � xn� �= uj �xj � for j = 1� 2� � � � � n. Since each of the new
functions ũj depends only on the variable xj , their independence follows from
a simple Fubini-type argument. A similar argument can be applied to count-
ably many functions – provided we know how to construct infinite-dimensional
products.
We will not follow this route but construct instead countably many independent
functions �Xj �j∈� on the probability space ��0� 1��
�0� 1�� � �= �1 �0�1�� which
are identically distributed, i.e. the image measures satisfy X1��� = Xj ��� for all
j ∈ � with a Bernoulli distribution X1��� = p �1 + �1 − p� �0, p ∈ �0� 1�.
Consider the interval map �p � �0� 1� → �0� 1�
�p�x� �=
x
p
1�0�p��x� +
x − p
1 − p 1�p�1��x��
184 R.L. Schilling
and its iterates �np �= �p � · · · � �p︸ ︷︷ ︸
n times
, see the pictures for the graphs of �p and �
2
p.
Define
Xn�x� �= 1�0�p���n−1p �x��� n ∈ ��
In the first step the interval �0� 1� is split according
to p � �1 − p� into two intervals �0� p� and �p� 1�
and X1 is 1 on the left segment and 0 on the right.
The subsequent iterations split each of the intervals
of the previous step – say, step n − 1 – into two
new sub-intervals according to the ratio p � �1 − p�,
and we define Xn to be 1 on each new left sub-
interval and 0 otherwise, see the picture for n = 1� 2.
Thus ���Xn = 1�� = p and ���Xn = 0�� = 1 − p,
which means that the Xn are identically Bernoulli
distributed.
To see independence, fix �j ∈ �0� 1�, and observe
that �X1 = �1� ∩ �X2 = �2� ∩ � � � ∩ �Xn−1 = �n−1�
exactly determines the segment before the nth split.
Since each split preserves the proportion between p
and 1 − p, we find
p
p
pp2 2p-p20
0
1
1
1
1
�2p
�p
�
(
�X1 = �1� ∩ � � � ∩ �Xn−1 = �n−1� ∩ �Xn = 1�
)
= �(�X1 = �1� ∩ � � � ∩ �Xn−1 = �n−1�
)· p�
so that
�
(
�X1 = �1� ∩ � � � ∩ �Xn−1 = �n−1� ∩ �Xn = �n�
)
= p�1+···+�n �1 − p�n−�1−···−�n =
n∏
j=1
���Xj = �j ���
This shows that the Xj are all independent.
For later reference purposes let us derive some formulae for the arithmetic
means 1
n
Sn �= 1n �X1 + X2 + · · · + Xn�. The mean value is
1
n
∫
Sn d� =
1
n
∫
�X1 + · · · + Xn� d� =
∫
X1 d� = 1 · p + 0 · �1 − p� = p�
Measures, Integrals and Martingales 185
while the variance is given by
∫ [
1
n
�Sn − np�
]2
d� = 1
n2
∫ ( n∑
j=1
�Xj − p�
)2
d�
= 1
n2
n∑
j�k=1
∫
�Xj − p��Xk − p� d�
= 1
n2
n∑
j=1
∫
�Xj − p�2 d� (independence)
= 1
n
∫
�X1 − p�2 d� (identical distr.)
= 1
n
(
�1 − p�2p + p2�1 − p�)
= 1
n
p�1 − p��
In the next chapter we study the convergence behaviour of a martingale �uj �j∈�;
therefore, it is natural to ask questions of the type from which index j onwards does
uj �x� exceed a certain threshold, etc. This means that we must be able to admit
indices � which may depend on the argument x of uj �x�: u��x��x�. The problem is
measurability.
17.5 Definition Let �X� �� �j � �� be a �-finite filtered measure space. A stop-
ping time is a map � � X → � ∪ ��� which satisfies �� � j� ∈ �j for all j ∈ �.
The associated �-algebra is given by
�� �=
{
A ∈ � � A ∩ �� � j� ∈ �j ∀ j ∈ �
}
�
As usual, we write u� �x� instead of the more correct u��x��x�.
17.6 Lemma Let �� � be stopping times on a �-finite filtered measure space
�X� �� �j � ��.
(i) � ∧ �, � ∨ �, � + k, k ∈ �0 are stopping times.
(ii) �� < �� ∈ �� ∩ �� and �� ⊂ �� if � � �.
(iii) If uj is a sequence of real functions such that uj ∈ ���j �, then u� is
�� /
� �-measurable.
186 R.L. Schilling
Proof (i) follows immediately from the identities
�� ∧ � � j� = �� � j� ∪ �� � j� ∈ �j �
�� ∨ � � j� = �� � j� ∩ �� � j� ∈ �j �
�� + k � j� = �� � j − k� ∈ ��j−k�∨0 ⊂ �j �
(ii) Since for all j ∈ �
�� < �� ∩ �� � j� =
j⋃
k=1
�� = k� ∩ �k < ��
=
j⋃
k=1
�� � k�︸ ︷︷ ︸
∈ �k
∩ �� � k − 1�c︸ ︷︷ ︸
∈ �k
∩ �� � k�c︸ ︷︷ ︸
∈ �k
∈ �j �
we find that �� < �� ∈ �� , while a similar calculation for �� < �� ∩ �� � j� yields
�� < �� ∈ �� .
If � � � we find for A ∈ ��
A ∩ �� � j� = A ∩ �� � ��︸ ︷︷ ︸
= �
∩�� � j� = A ∩ �� � j�︸ ︷︷ ︸
∈ �j
∩ �� � j�︸ ︷︷ ︸
∈ �j
∈ �j �
i.e. A ∈ �� , hence �� ⊂ �� .
(iii) We have for all B ∈
� � and j ∈ � ∪ ���
�u� ∈ B� ∩ �� � j� =
j⋃
k=1
�uk ∈ B� ∩ �� = k�
=
j⋃
k=1
�uk ∈ B�︸ ︷︷ ︸
∈ �k
∩ �� � k�︸ ︷︷ ︸
∈ �k
∩ �� � k − 1�c︸ ︷︷ ︸
∈ �k
∈ �j �
The next result is a very useful characterization of (sub-)martingales.
17.7 Theorem Let �X� �� �j � �� be a �-finite filtered measure space. For a
sequence �uj �j∈�, uj ∈ �1��j �, the following assertions are equivalent:
(i) �uj �j∈� is a submartingale;
(ii)
∫
u� d� �
∫
u� d� for all bounded stopping times � � �;
(iii)
∫
A
u� d� �
∫
A
u� d� for all bounded stopping times � � � and A ∈ �� .
Proof (i)⇒(ii): Let � � � � N be two stopping times. By Lemma 17.6 u� is
measurable, and since∫
u� d� =
N∑
j=1
∫
��=j�
uj d� �
N∑
j=1
∫
uj d� < ��
we find that u� � u� ∈ �1�X� �� ��.
Measures, Integrals and Martingales 187
Step 1: � − � � 1. In this case
�� < �� ∩ �� = j� = �� > j� ∩ �� = j� = �� � j�c ∩ �� = j� ∈ �j �
and we see
∫
u� d� =
∫
��=��
u� d� +
N −1∑
j=1
∫
��<��∩��=j�
uj d�
(17.2)
�
∫
��=��
u� d� +
N −1∑
j=1
∫
��<��∩��=j�
uj+1 d�
=
∫
��=��
u� d� +
∫
��<��
u� d� (use � − � � 1)
=
∫
u� d��
Step 2: if � � � � N we introduce (at most N ) intermediate stopping times
j �= �� + j� ∧ �, j = 0� 1� 2� � � � � k � N . For some k � N we get � =
0 �
1
� � � � �
k = � while
j+1 −
j � 1. Repeating step 1 from above k times yields
∫
u� d� =
∫
u
0 d� �
∫
u
1 d� � � � � �
∫
u
k d� =
∫
u� d��
(ii)⇒(iii): Note that for any A ∈ �� the function
�=
A �= � 1A + � 1Ac is
again a bounded stopping time. This follows from
�
� j� = ��� � j� ∩ A� ∪ ��� � j� ∩ Ac� ∈ �j � j ∈ ��
where we used that A ∈ �� ⊂ �� , cf. Lemma 17.6. Since
� �, (ii) shows
∫
�u� 1A + u� 1Ac � d� =
∫
u
d� �
∫
u� d��
which is but
∫
A
u� d� �
∫
A
u� d�.
(iii)⇒(i): Take � = j and � = j + 1.
17.8 Remark One should read Theorem 17.7(iii) in the following way:
Let �1 � �2 � � � � � �k � N be bounded stopping times. Then
�uj � �j �j∈� is a
submartingale
=⇒ �u�j � ��j �j=1�����k is a
submartingale�
This statement is often called the optional sampling theorem.
188 R.L. Schilling
Problems
Unless otherwise stated �X� �� �j � �� will be a �-finite filtered measure space.
17.1. Let �X� �� �� be a finite measure space and let �uj � �j �j∈� be a martingale. Set
�0 �= �∅� X�. Show that �uj � �j �j∈�0 is a martingale if, and only if, u0 =
∫
u1 d�.
17.2. Let �uj � �j �j∈� be a (sub-, super-)martingale and let �
j �j∈� and ��j �j∈� be filtra-
tions in � which are smaller resp. larger than ��j �j∈�, i.e. such that
j ⊂ �j ⊂ �j .
(a) Show that �uj �
j �j∈� is again a (sub-, super-)martingale.
(b) Show that �uj � �j �j∈� is, in general, no longer a (sub-, super-)martingale.
17.3. Completion (7). Let �uj � �j �j∈� be a submartingale and denote by �
∗
j the com-
pletion of �j . Then �uj � �
∗
j �j∈� is still a submartingale.
17.4. Show that �uj �j∈� is a submartingale if, and only if, uj ∈ �1��j � for all j ∈ � and∫
A
uj d� �
∫
A
uk d� ∀ j < k� ∀ A ∈ �j �
Find similar statements for martingales and supermartingales.
17.5. Prove the assertion made in Remark 17.2(ii).
17.6. Let �uj � �j �j∈� be a martingale with uj ∈ �2��j �. Show that∫
uj uk d� =
∫
u2j∧k d��
[Hint: assume that j < k. Approximate uj by simple functions from ���j �, use
dominated convergence and (17.1).]
17.7. Martingale transform Let �uj � �j �j∈� be a martingale and let �fj �j∈� be a sequence
of bounded functions such that fj ∈ ���j � for every j ∈ �. Set f0 �= 0 and
u0 �=
∫
u1 d�. Then the so-called martingale transform
�f • u�k �=
k∑
j=1
fj−1 · �uj − uj−1�� k ∈ ��
is again a martingale w.r.t. ��j �j∈�.
17.8. Let ��� �� P� be a probability space and let �Xj �j∈� be a sequence of independent
identically distributed random variables with Xj ∈ �2��� and
∫
Xj dP = 0. Set
�j �= ��X1� X2� � � � � Xj �.
(i) Show, without using Example 17.3(vi), that S2n �= �X1 + X2 + · · · + Xn�2 is a
submartingale w.r.t. ��n�n∈�.
(ii) Show that there exists a constant � such that S2n −� n is a martingale w.r.t. ��n�n∈�.
17.9. Let ��� �� P� be a probability space and let �Xj �j∈� be a sequence of independent
random variables with Xj ∈ �2���,
∫
Xj dP = 0 and
∫
X2j dP = � 2j . Set �j �=
��X1� X2� � � � � Xj � and Aj �= � 21 + · · · + � 2j . Show that
Mn �= S2n − An =
( n∑
j=1
Xj
)2
−
n∑
j=1
� 2j
is a martingale.
[Hint: use formulae (17.6), (17.7) and Remark 17.2(ii).]
Measures, Integrals and Martingales 189
17.10. Martingale difference sequence Let �dj �j∈� be a sequence in �
2��� ∩�1��� and
define �0 �= �∅� X� and �j �= ��d1� d2� � � � � dj �. Suppose that for each j ∈ �
∫
A
dj d� = 0 ∀ A ∈ �j−1�
Show that �u2n�n∈� where un �= d1 + · · · + dn is a submartingale which satisfies
∫
u2n d� =
n∑
j=1
∫
d2j d��
Show that on � �
� �� �1� the sequence dj �x� �= sgn sin�2j � x�, x ∈ , j ∈ �, is
a martingale difference sequence. (See Chapter 24, in particular pp. 299 and 302
for more details.)
17.11. Let ��� �� P� be a probability space and let �Xj �j∈� be a sequence of indepen-
dent identically Bernoulli �p� 1 − p�-distributed random variables with values ±1,
i.e. such that P�Xj = 1� = p and P�Xj = −1� = 1 − p – this can be constructed as
in Scholium 17.4. Set Sn �= X1 + · · · + Xn. Then � 1−pp �Sn is a martingale w.r.t. the
filtration given by �n �= ��X1� � � � � Xn�.
17.12. Let �X� �� �� be a �-finite measure space, let be a further measure on � and
let �An�j �j∈� ⊂ � be for each n ∈ � a sequence of mutually disjoint sets such that
X = ·⋃
j∈�
An�j . Assume, moreover, that each set An�j is the union of finitely many
sets from the sequence �An+1�k�k∈�. Show that
(i) the �-algebras �n �= ��An�j � j ∈ �� form a filtration;
(ii) if ��An�j � > 0 for all n� j ∈ �, then
un �=
�∑
j=1
�An�j �
��An�j �
1An�j
is a martingale w.r.t. ��n�n∈�.
17.13. Let �uj � �j �j∈� be a supermartingale and uj � 0 a.e. Prove that uk = 0 a.e. implies
that uk+j = 0 a.e. for all j ∈ �.
17.14. Verify that the family �� defined in Definition 17.5 is indeed a �-algebra.
17.15. Show that � is a stopping time if, and only if, �� = j� ∈ �j for all j ∈ �.
17.16. Show that, in the notation of Lemma 17.6, ��∧� = �� ∩ �� for any two stopping
times �� �.
18
Martingale convergence theorems
Throughout this chapter �X� �� �j � �� is a �-finite filtered measure space.
One of the foremost applications of martingales is to convergence theorems.
Let us begin with the following simple observation for a sequence �uj �j∈� of real
numbers. If �uj �j∈� has a limit � = limj→� uj and if we know that � ∈ �a� b�,
only finitely many of the uj can be outside of �a� b�. In particular, if infinitely
many uj are bigger than b and infinitely many smaller than a, then the sequence
has no limit at all. We call any occurrence of
uj � a and uj+k � b (for some k ∈ �)
an upcrossing of �a� b – the picture below shows three such upcrossings if
j = 0� 1�
� N – and we have just observed that, if for some pair a� b ∈ �, a < b,
#
{
upcrossings of �a� b
} = � =⇒ �uj �j∈� has no limit
(18.1)
For a submartingale we can estimate the average number of upcrossings over any
interval:
a
b
(uN – a)
–
190
Measures, Integrals and Martingales 191
18.1 Lemma (Doob’s upcrossing estimate) Let �uj �j∈� be a submartingale and
denote by U��a� b � N� x� the number of upcrossings of �uj �x��j∈� across �a� b
which occur for 1 � j � N . Then
∫
A
U��a� b � N� d� �
1
b − a
∫
A
�uN − a�+ d� ∀ A ∈ �0
Proof In order to keep track of the upcrossings we introduce the following
stopping times[�], cf. Problem 18.1: �0
= 0 and
�k
= inf�j > �k−1
uj � a� ∧ N � �k
= inf�j > �k
uj � b� ∧ N�
(as usual we set inf ∅ = +�). Then
�0 = 0 < �1 � �1 � �2 �
� �N = �N = N
By the very definition of an upcrossing we find
�b − a� U��a� b � N� � �u�1 − a�︸ ︷︷ ︸
�b−a
+ �u�2 − u�2 �︸ ︷︷ ︸
�b−a
+ · · · + �u�N − u�N ��
and integrating both sides of this inequality over A ∈ �0 yields, after some simple
rearrangements,
�b − a�
∫
A
U��a� b � N� d�
� −
∫
A
a d� +
�0︷ ︸︸ ︷∫
A
�u�1 − u�2 � d� + · · · +
�0︷ ︸︸ ︷∫
A
�u�N −1 − u�N � d� +
∫
A
u�N d�
17.7
�
∫
A
�u�N − a� d� �
∫
A
�u�N − a�+ d�
The upcrossing lemma is the basis for all martingale convergence theorems.
18.2 Theorem (Submartingale convergence) Let �uj � �j �j∈� be a submartin-
gale on the �-finite filtered measure space �X� �� �j � ��.
If supj∈�
∫
u+j d� < �, then u��x�
= limj→� uj �x� exists for almost all x ∈ �
and defines an ��-measurable function.
Before we give the details of the proof, let us note some immediate conse-
quences.
192 R.L. Schilling
18.3 Corollary Under any of the following conditions the pointwise limit limj→� uj
exists a.e. in �:
(i) �uj �j∈� is a supermartingale and supj∈�
∫
u−j d� < �.
(ii) �uj �j∈� is a positive supermartingale.
(iii) �uj �j∈� is a martingale and supj∈�
∫ uj d� < �.
Proof (of Theorem 18.2) In view of (18.1) we have
{
x
lim
j→�
uj �x� does not exist
}
=
{
x
lim sup
j→�
uj �x� > lim inf
j→�
uj �x�
}
= ⋃
a 0 ∃ w� ∈ �1+��� ��
sup
j∈�
∫
� uj >w��
uj d� < �
18.6 Theorem (Convergence of UI submartingales) Let �uj �j∈� be a submartin-
gale on the �-finite filtered measure space �X� �� �j � ��. Then the following
assertions are equivalent:
(i) u��x� = lim
j→�
uj �x� exists a.e., u� ∈ �1���� ��,
lim
j→�
∫
uj d� =
∫
u� d�, and �uj �j∈�∪��� is a submartingale.
(ii) �uj �j∈� is uniformly integrable.
(iii) �uj �j∈� converges in �1����.
Proof (i)⇒(ii): Since � �0 is �-finite, we can fix an exhausting sequence
�Ak�k∈� ⊂ �0 with Ak ↑ X and ��Ak� < �. It is not hard to see that the
function w
= ∑�k=1 2−k �1 + ��Ak��−1 1Ak is strictly positive w > 0 and inte-
grable w ∈ �1��0� ��. Because of u� ∈ �1���� ��, we find for every � > 0
some � > 0 and some N ∈ � such that ∫
�u+�>�� u
+
� d� +
∫
Acj
u+� d� < � for all
j � N . Example 17.3(iv) shows that �u+j �j∈�∪��� is still a submartingale, so that
for every L > 0
∫
�u+j >Lw�
u+j d� �
∫
�u+j >Lw�
u+� d�
�
∫
�u+j >Lw�∩�u+����∩AN
u+� d� +
∫
�u+�>��∪AcN
u+� d�
� � �
(
�u+j > Lw� ∩ AN
)
+ �
� � �
(
�u+j > L 2
−N �1 + ��AN ��−1�
)
+ ��
where we used that w � 2−N �1 + ��AN ��−1 on AN . The Markov inequality
P10.12 and the submartingale property imply
sup
j∈�
∫
�u+j >Lw�
u+j d� � �
2N �1 + ��AN ��
L
sup
j∈�
∫
u+j d� + �
�
� 2N �1 + ��AN ��
L
∫
u+� d� + �
Since we may choose L > 0 arbitrarily large, we have found that �u+j �j∈� is
uniformly integrable. From limj→� uj = u� a.e., we conclude limj→� u+j = u+�,
Measures, Integrals and Martingales 195
and Vitali’s convergence theorem 16.6 shows that limj→�
∫
u+j d� =
∫
u+� d�.
Thus
∫
uj d� =
∫
�2u+j − uj � d�
j→�−−−→
∫
�2u+� − u�� d� =
∫
u� d��
and another application of Vitali’s theorem proves that �uj �j∈� is uniformly
integrable.
(ii)⇒(iii): Because of uniform integrability we have for some � > 0 and a
suitable w� ∈ �1���
∫
uj d� =
∫
� uj >w��
uj d� +
∫
� uj �w��
uj d�
� � +
∫
w� d� < ��
and the martingale convergence theorem 18.2 guarantees that the pointwise limit
u� = limj→� uj exists a.e.; �1-convergence follows from Vitali’s convergence
theorem 16.6.
(iii)⇒(i): Since �1-limj→� uj = u exists we find (e.g. as in Theorem 12.10)
that supj∈�
∫ uj d� < �. By the martingale convergence theorem 18.2, the
pointwise limit u� = limj→� uj exists a.e. On the other hand, by Corollary 12.8,
u = limk→� uj�k� a.e. for some subsequence. This implies that u = u� a.e. and, in
particular, that u� = �1-limj→� uj ; this entails limj→�
∫
A
uj d� =
∫
A
u� d� for
all A ∈ �.[�] Since �uj �j∈� is a submartingale, we find for all k > j and A ∈ �j
∫
A
uj d� �
∫
A
uk d�
k→�−−−→
∫
A
u� d��
so that �uj �j∈�∪��� is also a submartingale.
Again, �1-convergence of backwards (sub-)martingales holds under much
weaker assumptions.
18.7 Theorem Let �w�� ����∈−� be a backwards submartingale and assume that
� �−� is �-finite. Then
(i) lim
j→+�
w−j = w−� ∈ �−�� �� exists a.e.
(ii) �1- lim
j→+�
w−j = w−� if, and only if, inf j∈�
∫
w−j d� > −�. In this case,
�w�� ����∈−�∪�−�� is a submartingale and w−� is a.e. real-valued.
For a backwards martingale, the condition in (ii) is automatically satisfied.
196 R.L. Schilling
Proof Part (i) has already been proved in Corollary 18.5. For (ii) we start with
the observation that for a backwards submartingale
sup
j∈�
∫
w−j d� < � ⇐⇒ inf
j∈�
∫
w−j d� > −� ⇐⇒ lim
j→+�
∫
w−j d� ∈ �
Indeed: the second equivalence follows from the submartingale property,
∫
w−j−1 d� �
∫
w−j d� �
∫
w−1 d��
while ‘⇐’ of the first equivalence derives from the fact that �w+� ��∈−� is again a
submartingale, cf. Example 17.3(iv), and
∫
w−j d� =
∫
�2w+−j − w−j � d� � 2
∫
w+−1 d� −
∫
w−j d��
the other direction ‘⇒’ is obvious. With exactly the same reasoning which
was used in the proof of T18.6, (i)⇒(ii), we can now show that �w+� ��∈−� and
�w���∈−� are uniformly integrable (of course, the function w� used as a bound
for uniform integrability is now �−�-measurable). The submartingale property
of �w���∈−�∪�−�� follows literally with the same arguments as the corresponding
assertion in (iii)⇒(i) of T18.6.
We close this chapter with a simple but far-reaching application of the (back-
wards) martingale convergence theorem.
18.8 Example (Kolmogorov’s strong law of large numbers) For every sequence
�Xj �j∈� of identically distributed independent random variables on the probability
space ��� �� P� – that is, all Xj
� → � are measurable, independent functions
(in the sense of Example 17.3(x) and Scholium 17.4) such that Xj �P� = X1�P�
for all j ∈ � – the strong law of large numbers holds, i.e. the limit
lim
n→�
1
n
�X1��� + · · · + Xn���� exists and is finite for a.e. � ∈ ��
if, and only if, the Xj are integrable. If this is the case, the above limit is given
by
∫
X1 dP.
Sufficiency: Suppose the Xj are integrable. Then Yj
= Xj −
∫
Xj dP are again
independent identically distributed random variables with zero mean:
∫
Yj dP = 0.
Set
Sn
= Y1 + Y2 + · · · + Yn and �−n
= ��Sn� Sn+1� Sn+2�
��
and
(
1
n
Sn� �−n
)
n∈� is a backwards martingale. In fact, any function of �Y1� Y2�
�
Yn� Sn� is independent of �Yn+1� Yn+2�
�, and (17.6) yields for every set of the
Measures, Integrals and Martingales 197
form A = ⋂Nj=1�Yn+j ∈ Bj � ∩ �Sn ∈ B0�, B0�
� BN ∈ ����, N ∈ �, and all
k = 1� 2�
� n
∫
A
Yk dP =
∫
⋂N
j=1�Yn+j ∈Bj �
1�Sn∈B0� Yk dP
=
∫
�Sn∈B0�
Yk dP · P
( N⋂
j=1
�Yn+j ∈ Bj �
)
(by (17.6))
=
∫
�Sn∈B0�
Y1 dP · P
( N⋂
j=1
�Yn+j ∈ Bj �
)
�
noting that the Yk are identically distributed. Summing over k = 1�
� n gives
∫
A
Sn dP = n
∫
�Sn∈B0�
Y1 dP · P
( N⋂
j=1
�Yn+j ∈ Bj �
)
= n
∫
A
Y1 dP
This means that
∫
A
Y1 dP =
∫
A
1
n
Sn dP for all n ∈ � and all sets A from a generator
of �−n which clearly satisfies the conditions of Remark 17.2(i), proving that(
1
n
Sn� �−n
)
n∈� is a backwards martingale. Theorem 18.7 now guarantees that
L
= lim
n→�
Sn
n
= lim
n→�
Sn2
n2
exists a.e. and in �1
It remains to show that L = 0 a.e. Note that limn→� Sn/n2 = 0 a.e.; since e− x � 1
and since constants are integrable, the dominated convergence theorem 11.2 and
independence (17.7) show
∫ [
e− L
]2
dP =
∫
lim
n→�
[
exp
(−
∣∣Sn
n
∣∣)exp(−
∣∣Sn2 −Sn
n2
∣∣)
]
dP
= lim
n→�
∫
exp
(−
∣∣Sn
n
∣∣)exp(−
∣∣Sn2 −Sn
n2
∣∣)dP
= lim
n→�
(∫
exp
(−
∣∣Sn
n
∣∣)dP
∫
exp
(−
∣∣Sn2 −Sn
n2
∣∣)dP
)
=
(∫
e− L dP
)2
Thus
∫ (
e− L −∫ e− L dP
)2
dP =
∫ [
e− L
]2
dP −
(∫
e− L dP
)2
= 0�
198 R.L. Schilling
and we conclude with Theorem 10.9(i) that e− L = ∫ e− L dP a.e.; as a conse-
quence, L is almost everywhere constant. Using L = L1- limn→� Sn/n, we get
L =
∫
L dP = lim
n→�
∫ Sn
n
dP
︸ ︷︷ ︸
= 0
= 0 a.e.
Necessity: Suppose the a.e. limit L = limn→� 1n �X1��� + · · · + Xn���� exists
and is finite. If all Xj were positive, we could argue as follows: the truncated
random variables Xcj
= Xj ∧ c are still independent and identically distributed.
Since they are also integrable, the sufficiency direction of Kolmogorov’s law
shows that for all c > 0
∫
Xc1 dP = lim
n→�
Xc1 + · · · + Xcn
n
� lim
n→�
X1 + · · · + Xn
n
= L
Letting c → �, Beppo Levi’s theorem 9.6 proves ∫ X1 dP < �.
Such a simple argument is not available in the general case. For this we need
the converse or ‘difficult’ half of the Borel – Cantelli lemma (cf. Problem 6.9).
18.9 Theorem (Borel–Cantelli) Let ��� �� P� be a probability space and �Aj �j∈�
⊂ �. Then
�∑
j=1
P�Aj � < � =⇒ P�lim supj→� Aj � = 0�
if the sets Aj are pairwise independent,
1 then
�∑
j=1
P�Aj � = � =⇒ P�lim supj→� Aj � = 1
Proof Recall that lim supj Aj =
⋂
k
⋃
j�k Aj . Thus � ∈ lim supj Aj if, and only
if, � appears in infinitely many of the Aj . This shows that lim supj Aj =
�
∑�
j=1 1Aj = ��.
The first of the two implications follows thus: by the Beppo Levi theorem for
series C9.9, we see
∫ �∑
j=1
1Aj dP =
�∑
j=1
∫
1Aj dP =
�∑
j=1
P�Aj � < �
Corollary 10.13 then shows
∑�
j=1 1Aj < � a.e., and P�lim supj→� Aj � = 0 follows.
1 i.e. P�Aj ∩ Ak� = P�Aj �P�Ak� for all j �= k.
Measures, Integrals and Martingales 199
For the second implication we set Sn
=
∑n
j=1 1Aj and S
=
∑�
j=1 1Aj . Then
mn
=
∫
Sn dP =
∑n
j=1 P�An� and, by pairwise independence,
∫
�Sn − mn�2 dP =
n∑
j�k=1
∫
�1Aj − P�Aj ���1Ak − P�Ak�� dP
=
n∑
j=1
∫
�1Aj − P�Aj ��2 dP
=
n∑
j=1
P�Aj ��1 − P�Aj �� � mn
Since Sn � S, we can use Markov’s inequality P10.12 to get
P
(
S � 12 mn
)
� P
(
Sn �
1
2 mn
) = P(Sn − mn � − 12 mn
)
� P
( Sn − mn � 12 mn
) = P(�Sn − mn 2 � 14 m2n
)
�
4
m2n
∫
�Sn − mn 2 dP �
4
mn
By assumption mn
n→�−−−→ �, hence P�S < �� = limn→� P�S � 12 mn� = 0.
18.8 Example (continued) We can now continue with the proof of the necessity
part of Kolmogorov’s strong law of large numbers. Since the a.e. limit exists,
we get
Xn
n
= Sn
n
− n − 1
n
Sn−1
n − 1
n→�−−−→ 0�
which shows that � ∈ An
= � Xn > n� happens only for finitely many n. In other
words, P�
∑�
j=1 1Aj = �� = 0; since the An are all independent, the Borel–Cantelli
lemma T18.9 shows that
∑�
j=1 P�Aj � < �. Thus
∫
X1 dP =
�∑
j=1
∫
�j−1� X1
observe that Mn∧�� � C + � and that
∫
An∧�� dP � �K + c�2.]
19
The Radon–Nikodým theorem and other applications
of martingales
After our excursion into the theory of martingales we want to apply martingales
to continue the development of measure and integration theory. The central topics
of this chapter are
• the Radon–Nikodým theorem 19.2 and Lebesgue’s decomposition theorem 19.9;
• the Hardy–Littlewood maximal theorem 19.17;
• Lebesgue’s differentiation theorem 19.20.
For the last two we need (maximal) inequalities for martingales. These will be
treated in a short interlude which is also of independent interest.
The Radon–Nikodým theorem
Let �X� �� �� be a measure space. We have seen in Lemma 10.8 that for any
f ∈ �1+��� – or indeed for f ∈ �+��� – the set-function � �= f � given by
��A� �= ∫
A
f�x� ��dx� is again a measure. From Theorem 10.9(ii) we know that
N ∈ �� ��N � = 0 =⇒ ��N � = 0� (19.1)
This observation motivates the following
19.1 Definition Let �� � be two measures on the measurable space �X� ��. If
(19.1) holds, we call � absolutely continuous w.r.t. � and write � � �.
Measures with densities are always absolutely continuous w.r.t. their base
measure: f � � �. Remarkably, the converse is also true.
19.2 Theorem (Radon–Nikodým). Let �� � be two measures on the measurable
space �X� ��. If � is -finite, then the following assertions are equivalent
202
Measures, Integrals and Martingales 203
(i) ��A� =
∫
A
f�x� ��dx� for some a.e. unique f ∈ �+���;
(ii) � � �.
The unique function f is called the Radon–Nikodým derivative and (traditionally)
denoted by f = d�/d�.
Above we have just verified that (i)⇒(ii). The converse direction is less
obvious and we want to use a martingale argument for its proof. For this we
need a few more preparations which extend the notion of martingale to directed
index sets.
Let �I��� be any partially ordered index set. We call I upwards filtering or
upwards directed if
� � ∈ I =⇒ ∃ � ∈ I �
� �� � � �� (19.2)
A family ��
�
∈I of sub- -algebras of � is called a filtration if
� � ∈ I�
� � =⇒ �
⊂ ��
as before, we set �� �=
(⋃
∈I �
)
, and we treat � as the biggest element of
I ∪ ���, i.e.
< � for all
∈ I . If a -algebra �0 ⊂ �
for all
∈ I and if
� �0 is -finite, we call �X� �� �
� �� a -finite filtered measure space.
19.3 Definition Let �X� �� �
� �� be a -finite filtered measure space. A
family of measurable functions �u
�
∈I is called a martingale (w.r.t. the filtration
��
�
∈I ), if u
∈ �1��
� for each
∈ I and if
∫
A
u� d� =
∫
A
u
d� ∀
� �� ∀ A ∈ �
� (19.3)
The notion of convergence along an upwards filtering set is slightly more com-
plicated than for the index set �. We say
u = �1- lim
∈I
u
⇐⇒ ∀ � > 0 ∃ �� ∈ I ∀
� �� � �u − u
�1 < � �
We can now extend Theorem 18.6.
19.4 Theorem Let I be an upwards filtering index set, �X� �� �
� �� be a
-finite measure space and �u
� �
�
∈I be a martingale. Then the following
assertions are equivalent.
(i) There exists a unique u� ∈ �1���� such that �u
� �
�
∈I∪��� is a martin-
gale. In this case u� = �1-lim
∈I u
.
(ii) �u
� �
�
∈I is uniformly integrable.
204 R.L. Schilling
Proof (i)⇒(ii): (compare with T18.6) Denote by �Aj �j∈� an exhausting sequence
in �0. Since u� ∈ �1����, we find for every � > 0 some � > 0 and N ∈ �
such that ∫
� u� >��
u� d� +
∫
Acj
u� d� � � ∀ j � N�
Clearly, the function w�x� �= ∑j∈� 2−j �1 + ��Aj ��−1 1Aj �x� is in �1+��0�, w > 0
and, as � u
�
∈I∪��� is a submartingale (cf. Example 17.3(v)), we find for every
L > 0
sup
∈I
∫
� u
>Lw�
u
d� � sup
∈I
∫
� u
>Lw�
u� d�
� sup
∈I
∫
� u
>Lw�∩AN ∩� u� ���
u� d� +
∫
AcN
u� d� +
∫
� u� >��
u� d�
� � sup
∈I
�
({ u
> L 2−N �1 + ��AN ��−1
})+ �
(use for the last step that w�x� � 2−N �1 + ��AN ��−1 for x ∈ AN ). By Markov’s
inequality P10.12 and the submartingale property we get
sup
∈I
∫
� u
>Lw�
u
d� � �
2−N �1 + ��AN ��
L
sup
∈I
∫
u
d� + �
�
� 2−N �1 + ��AN ��
L
∫
u� d� + ��
and (ii) follows since we can choose L > 0 as large as we want.
(ii)⇒(i): Step 1: uniqueness. Assume that u� w ∈ �1���� are two functions
which close the martingale �u
�
∈I , i.e. functions satisfying∫
A
u d� =
∫
A
w d� =
∫
A
u
d� ∀ A ∈ �
�
∈ I�
Since u and w are integrable functions, the family
� �=
{
A ∈ �� �
∫
A
u d� =
∫
A
w d�
}
is a -algebra which satisfies
⋃
∈I �
⊂ � ⊂ ��. Since �� is generated by the
�
, we get � = ��, which means that
∫
A
u d� = ∫
A
w d� holds for all A ∈ ��.
Now Corollary 10.14 applies and we get u = w almost everywhere.
Step 2: existence of the limit. We claim that
∀ � > 0 ∃ �� ∈ I ∀
� � � �� �
∫
u
− u� d� < � � (19.4)
Otherwise we could find a sequence �
j �j∈� ⊂ I such that
∫ u
j+1 − u
j d� > �
for all j ∈ �. Since I is upwards filtering, we can assume that �
j �j∈� is an
Measures, Integrals and Martingales 205
increasing sequence.[�] Because of (ii), �u
j � �
j �j∈� is a uniformly integrable
martingale with index set � which is, by construction, not an �1-Cauchy sequence.
This contradicts Theorem 18.6.
We will now prove the existence of the �1-limit. Pick in (19.4) � = 1
n
and
choose �1/n. Since I is upwards directed, we can assume that �1/n increases
as n → �;[�] thus �u�1/n �n∈� ⊂ �1���� is an �1-Cauchy sequence. By The-
orem 18.6 it converges in �1���� and a.e. to some u� �= limn→� u�1/n ∈
�1����. Moreover, for all A ∈ �� and
> �1/n we have
∫
A
u
− u� d� �
∫
A
u
− u�1/n d�︸ ︷︷ ︸
�1/n by (19.4)
+
∫
A
u�1/n − u� d� �
2
n
�
This shows, in particular, that 1A u
�1−→ 1A u� for all A ∈ ��, and in view of
step 1, u� is the only possible limit. The same argument that we used in (iii)⇒(i)
of T18.6 now yields that �u
�
∈I∪��� is still a martingale.
Theorem 19.4 does not claim that u
a.e. along I−−−−−−→ u�. This is, in general, false
for non-linearly ordered index sets I , see e.g. Dieudonné [12].
That uncountable, partially ordered index sets are not at all artificial is shown
by the following example which will be essential for the proof of Theorem 19.2.
19.5 Example Let �X� �� �� be a finite measure space and assume that � is a
measure such that � � �. Set
I �=
{
= �A1� A2� � � � � An� � n ∈ �� Aj ∈ � and
n
·⋃
j=1
Aj = X
}
and define an order relation ‘�’ on I through
�
′ ⇐⇒ ∀ A ∈
� A = A′1 ∪· � � � ∪· A′� where A′k ∈
′� � ∈ � �
Since the common refinement � of any two elements
�
′ ∈ I ,
� �= �A ∩ A′ � A ∈
� A′ ∈
′��
is again in I and satisfies
� � and
′ � �, it is clear that �I��� is upwards
filtering. In particular,
��
�
∈I where �
�= �A � A ∈
�
206 R.L. Schilling
is a filtration as �
⊂ �
′ whenever
�
′. Moreover, �f
� �
�
∈I defined by
f
�=
∑
A∈
��A�
��A�
1A�
(
��A�
��A�
�= 0 if ��A� = 0
)
is a martingale. Indeed, if
� �,
� � ∈ I , then
∫
A
f
d� =
��A�
��A�
��A� =
{
��A� if ��A� > 0
0 if ��A� = 0
}
= ��A�
as � � �. Similarly, for A ∈
with A = B1 ∪· � � � ∪· B� and B1� � � � � B� ∈ �
∫
A
f� d� =
�∑
k=1
∫
Bk
f� d� =
�∑
k=1
��Bk�
��Bk�
��Bk�
= ∑
k � ��Bk�>0
��Bk�
�∗�=
�∑
k=1
��Bk� = ��A��
where we used in �∗� that � � �, i.e. ��Bk� = 0 if ��Bk� = 0. Thus
∫
A
f
d� =∫
A
f� d� for all A ∈
, hence on �
since all A ∈
are disjoint and generate �
([�], cf. also Remark 17.2(i)).
What Example 19.5 really says is that
��A� =
∫
A
f
d� ∀ A ∈ �
� (19.5)
or � �
� � �
and d�� �
�/d�� �
� = f
. Heuristically we should expect that,
if f
→�−−−→ f� exists, f� is the Radon–Nikodým derivative d�/d� = f�. This
idea can be made rigorous and is the basis for the
Proof (of Theorem 19.2 (ii)⇒(i)) Let us first
assume that � and � are finite measures
Denote by �f
� �
�
∈I the martingale of Example 19.5. It is enough to show that
f� = �1- lim
∈I
f
exists and that � = ��� (19.6)
Indeed, (19.6) combined with (19.5) implies
��A� =
∫
A
f� d� ∀ A ∈
⋃
∈I
�
�
Measures, Integrals and Martingales 207
and the uniqueness theorem 5.7 for measures extends this equality to �� =
(⋃
∈I �
)
. Since A ∈ � is trivially contained in �
where
�= �A� Ac� – at
this point we use the finiteness of the measure � – we see
� ⊃ �� =
( ⋃
∈I
�
)
⊃ ⋃
∈I
�
⊃ ��
and all that remains is to prove the existence of the limit in (19.6). In view of
Theorem 19.4 we have to show that �f
� �
�
∈I is uniformly integrable.
We claim that sup
∈I � ��f
> R�� � � for all large enough R = R� > 0. Oth-
erwise we could find some �0 > 0 with ���f
> n�� > �0 for all n ∈ �, so that
�
(⋂
n∈��f
> n�
)
> 0 by the continuity of measures, T4.4. Since � is a finite
measure,
�
( ⋂
n∈�
�f
> n�
)
4.4= inf
n∈�
���f
> n��
10.12
� lim
n→�
1
n
∫
f
d� = lim
n→�
��X�
n
= 0�
which contradicts the fact that � � �. Finally,
∫
� f
>R�
f
d� =
∫
�f
>R�
f
d� = ���f
> R�� � �
if R = R� > 0 is sufficiently large, and uniform integrability follows since the con-
stant function R ∈ �1���. The uniqueness of f� follows also from Theorem 19.4.
Assume that � is finite and ��X� = �
Denote by � �= �F ∈ � � ��F � < �� the sets with finite �-measure. Obviously,
� is ∪-stable, and the constant
c �= sup
F ∈�
��F � � ��X� < �
can be approximated by an increasing sequence �Fj �j∈� ⊂ � such that c =
�
(⋃
j∈� Fj
) = supj∈� ��Fj �.[�] When restricted to the set F� �= ⋃j∈� Fj , � is by
definition -finite, while for A ⊂ F c�, A ∈ �, we have
either ��A� = ��A� = 0 or 0 < ��A� < ��A� = �� (19.7)
In fact, if ��A� < �, then Fj ∪ A ∈ � for all j ∈ �, which implies that
c � �
( ⋃
j∈�
Fj ∪· A
)
= ��F� ∪· A� = ��F�� + ��A� = c + ��A��
that is ��A� = 0, hence ��A� = 0 by absolute continuity; if, however, ��A� = �
we have again by absolute continuity that ��A� > 0. Define now
�j �= �
(
• ∩ �Fj \ Fj−1�
)
� �j �= �
(
• ∩ �Fj \ Fj−1�
)
� �F0 �= ∅�
208 R.L. Schilling
and it is clear that �j � �j for every j ∈ �. Since �j � �j are finite measures, the
first part of this proof shows that �j = fj �j . Obviously, the function
f�x� �=
{
fj �x� if x ∈ Fj \ Fj−1�
� if x ∈ F c��
(19.8)
fulfils � = f �. By construction, f is unique on the set F�. But since every
density f̃ of � with respect to � satisfies
�
(
� f̃ � n� ∩ F c�
) =
∫
� f̃�n�∩F c�
f̃ d� � n �
(
� f̃ � n� ∩ F c�
)
< ��
the alternative (19.7) reveals that �
(
� f̃ � n� ∩ F c�
) = �(� f̃ � n� ∩ F c�
) = 0 for all
n ∈ �, i.e. that f̃ F c� = �. In other words: f , as defined in (19.8), is also unique
on F c�.
Assume that � is �-finite and ��X� � �
Let �Aj �j∈� ⊂ � be an exhausting sequence with Aj ↑ X and ��Aj � < �. Then
the measures
h � and � where h�x� �=
�∑
j=1
2−j
1 + ��Aj �
1Aj �x�
have the same null sets.[�] Therefore � � � if, and only if, � � h �. Since h �
is a finite measure[�], the first two parts of the proof show that � = f · �h �� =
�fh� � for a suitable density f ∈ �+���. The last equality needs proof: if
f = ∑Mj=0 yj 1Aj is a positive simple function,
��A� =
∫
A
M∑
j=0
yj 1Aj d�h �� =
M∑
j=0
yj
∫
1Aj ∩A h d� =
∫
A
�fh� d��
and the general case follows from Beppo Levi’s theorem 9.6. Uniqueness is clear
as f is �h ��-a.e. unique, which implies that fh is �-a.e. unique since h > 0.
19.6 Corollary Let �X� �� �� be a -finite measure space and � = f �. Then
(i) ��X� < � ⇐⇒ f ∈ �1���;
(ii) � is -finite ⇐⇒ ���f = ��� = 0.
Proof The first assertion (i) is obvious. For (ii) assume first that ���f = ��� = 0.
Since � is -finite, we find an exhausting sequence �Aj �j∈� ⊂ � with Aj ↑ X
and ��Aj � < �. The sets
Bk �= �0 � f � k�� B� �= �f = ��
Measures, Integrals and Martingales 209
obviously satisfy
⋃
k∈��Bk ∪ B�� = X as well as ��B�� = 0 and
��Bk ∩ Aj � =
∫
Bk∩Aj
f d� � k
∫
Aj
d� = k ��Aj � < ��
This shows that
(
Aj ∩ �Bk ∪ B��
)
j�k∈� is an exhausting sequence for � which
means that � is -finite.
Conversely, let � be -finite and assume that ���f = ��� > 0. As we can find
one exhausting sequence �Ck�k∈� ⊂ � for both � and �[�], we see that
�f = �� = ⋃
k∈�
(
�f = �� ∩ Ck
) ⊃ �f = �� ∩ Ck0
for some fixed k0 ∈ � with ��Ck0 � > 0. But then
��Ck0 � �
∫
�f =��∩Ck0
f d� = ��
which is impossible.
It is clear that not all measures are absolutely continuous with respect to each
other. In some sense, the next notion is the opposite of absolute continuity.
19.7 Definition Two measures �� � on a measurable space �X� �� are called
(mutually) singular if there is a set N ∈ � such that ��N � = 0 = ��N c�. We
write in this case � ⊥ � (or � ⊥ � as ‘⊥’ is symmetric).
19.8 Examples Let �X� �� = ��n� ���n��. Then
(i) �x ⊥ �n for all x ∈ �n;
(ii) f � ⊥ g � if supp f ∩ supp g = ∅.1
The measures � and � are singular, if they have disjoint ‘supports’, that is, if
� lives in a region of X which is not charged by � and vice versa. In this sense,
Example 19.8(ii) is the model case for singular measures. In general, however,
two measures are neither purely absolutely continuous nor purely singular, but
are a mixture of both.
19.9 Theorem (Lebesgue decomposition) Let �� � be two -finite measures
on a measurable space �X� ��. Then there exists a (up to null sets) unique
decomposition � = �� + �⊥ where �� � � and �⊥ ⊥ �.
1 supp f �= �f �= 0�.
210 R.L. Schilling
Proof Obviously � + � is still a -finite measure[�], and � � �� + ��. In this
situation Theorem 19.2 applies and shows that
� = f �� + �� = f � + f �� (19.9)
For any � > 0 we conclude, in particular, that
���f � 1 + ��� =
∫
�f�1+��
f d�� + ��
� �1 + �����f � 1 + ��� + �1 + �����f � 1 + ����
i.e. ���f � 1 + ��� = ���f � 1 + ��� = 0 for all �, hence ���f > 1�� = ���f >
1�� = 0. Without loss of generality we may therefore assume that 0 � f � 1. In
this case (19.9) can be rewritten as
�1 − f � � = f �� (19.10)
and on the set N �= �f = 1� we have
��N � =
∫
�f =1�
d� =
∫
�f =1�
f d�
(19.10)=
∫
�f =1�
�1 − f� d� = 0�
Therefore, � ⊥ �⊥ where �⊥ �= ��• ∩ �f = 1��, and for �� �= ��• ∩ �f < 1�� we
get from (19.10)
���A� = ��A ∩ �f < 1�� =
∫
A∩�f<1�
d� =
∫
A∩�f<1�
f
1 − f d� ∀ A ∈ ��
showing that �� � �.
The uniqueness (up to null sets) of this decomposition follows directly from
the uniqueness of the Radon–Nikodým derivative f/�1 − f � 1�f<1�.
19.10 Remark We have used the martingale convergence theorem to prove the
Radon–Nikodým theorem. But the connection between these two theorems is
much deeper. For measures with values in a Banach space (‘vector measures’)
the Radon–Nikodým theorem holds if, and only if, the pointwise martingale con-
vergence theorem is valid. One should add that the Radon–Nikodým theorem for
Banach spaces is intimately connected with the geometry of Banach spaces. Note,
however, that the techniques required in the theory of vector measures are dis-
tinctly different from those in the real case. For more on this see Diestel-Uhl [11,
Chapter V.2], Benyamini-Lindenstrauss [7, Chapter 5.2] or Métivier [29, § 11].
Measures, Integrals and Martingales 211
Martingale inequalities
Martingales will allow us to prove maximal inequalities which are useful and
important both in analysis and probability theory. In order to ease the exposition
we introduce the following (quite common) shorthand notation:
u∗N �x� �= max
1�j�N
uj �x� and u∗��x� �= lim
N →�
u∗N �x� = sup
j∈�
uj �x� �
The following simple lemma is the key to all maximal inequalities.
19.11 Lemma Let �X� �� �j � �� be a -finite filtered measure space and let
�uj �j∈� be a submartingale. Then we have for all s > 0
�
({
max
1�j�N
uj � s
})
�
1
s
∫
{
max
1�j�N
uj�s
} uN d� �
1
s
∫
u+N d�� (19.11)
If uj ∈ �p+��� or if �uj �j∈� ⊂ �p���, p ∈ �1� ��, is a martingale, then
� ��u∗N � s�� �
1
sp
∫
�u∗N �s�
uN p d� �
1
sp
∫
uN p d�� (19.12)
Proof Consider the stopping time when uj exceeds the level s for the first time:
�= inf�j � N � uj � s� ∧ �N + 1�� �inf ∅ = +��
and set A �= { max
1�j�N
uj � s
} = ⋃Nj=1�uj � s� = � � N� ∈ � , where we used
Lemma 17.6. From Theorem 17.7(iii) and the fact that u � s on A, we conclude
�
( N⋃
j=1
�uj � s�
︸ ︷︷ ︸
= A
)
�
∫
A
u
s
d� = 1
s
∫
A
u d� �
1
s
∫
A
uN d� �
1
s
∫
u+N d��
The second inequality (19.12) follows along the same lines since, under our
assumptions, � uj p�j∈� is a submartingale, cf. Example 17.3(vi).
The next theorem is commonly referred to as Doob’s maximal inequality.
19.12 Theorem (Doob’s maximal Lp-inequality) Let �X� �� �j � �� be a –
finite filtered measure space, 1 < p < � and let �uj �j∈� be a martingale or
� uj p�j∈� be a submartingale. Then we have
�u∗N �p �
p
p − 1 �uN �p �
p
p − 1 max1�j�N �uj�p�
212 R.L. Schilling
Proof It is enough to consider the case where �uj �j∈� is a martingale; the situation
where � uj p�j∈� is a submartingale is similar and simpler.
If �uN �p = �, the inequality is trivial; if uN ∈ �p���, then u1� � � � � uN −1 ∈
�p��� since � uj p�j∈� is a submartingale by 17.3(vi). Thus
u∗N � u1 + u2 + · · · + uN =⇒ u∗N ∈ �p���
and using (13.8) of Corollary 13.13 and Tonelli’s theorem 13.8 we find
∫
�u∗N �
p
d�
(13.8)= p
∫ �
0
sp−1 � ��u∗N � s�� ds
(13.12)
� p
∫ �
0
sp−2
(∫
uN 1�u∗N �s� d�
)
ds
13.8= p
∫
uN
(∫ u∗N
0
sp−2 ds
)
d�
(13.8)= p
p − 1
∫
uN �u∗N �p−1 d��
Hölder’s inequality T12.2 with 1
p
+ 1
q
= 1, i.e. q = p
p−1 , yields
∫
�u∗N �
p
d� �
p
p − 1
(∫
uN p d�
)1/p(∫
�u∗N �
p
d�
)1−1/p
�
and the claim follows.
Using the continuity of measures T4.4, resp. Beppo Levi’s Theorem 9.6 we
derive from (19.11), resp. Theorem 19.12 the following result.
19.13 Corollary Let �uj �j∈� be a martingale on the -finite filtered measure
space �X� �� �j � ��. Then
���u∗� � s�� �
1
s
sup
j∈�
�uj�1
(19.13)
�u∗��p �
p
p − 1 supj∈�
�uj�p� p ∈ �1� ��� (19.14)
If �uj �j∈�∪��� is a martingale, we may replace supj∈� �uj�p, p ∈ �1� ��, in
(19.13) and (19.14) by �u��p.
An inequality of the form (19.13) is a so-called weak-type maximal inequality
opposed to the strong-type �p� p� inequalities of the form (19.14).
Measures, Integrals and Martingales 213
If p = 1 and �uj �j∈�∪��� is a martingale, we cannot expect a �1� 1� strong-type
inequality like (19.14) and we have to settle for the weak-type maximal inequality
(19.13) instead. Otherwise, the best we can hope for is
�u∗��1 �
e
e − 1
(
��X� +
∫
u� �log u��
+ d�
)
if ��X� < �
or
∫
�u∗��
�
u∗� d� �
e
e
− 1
(
�u��1 +
∫
u� �log u��
+ d�
)
else�
Details can be found in Doob [13, pp. 313–4] apart from some obvious modifi-
cations if ��X� = �.
The Hardy–Littlewood maximal theorem
Doob’s martingale inequalities T19.12 and C19.13 can be seen as abstract versions
of the classical Hardy–Littlewood estimates for maximal functions in �n. To
prepare the ground we begin with a dyadic example.
19.14 Example Consider in �n the half-open squares
Qk�z� �= z + �0� 2−k�n� k ∈ � z ∈ 2−k n�
with lower left corner z and side-length 2−k. Then
�
�0�
k �=
(
Qk�z� � z ∈ 2−k n
)
� k ∈ �
defines a (two-sided infinite) filtration
� � � ⊂ ��0�−2 ⊂ �
�0�
−1 ⊂ �
�0�
0 ⊂ �
�0�
1 ⊂ �
�0�
2 ⊂ � � �
of sub- -algebras of ���n�. The superscript ‘�0�’ indicates that the square lattice
in each �
�0�
k contains some square with the origin 0 ∈ �n as lower left corner.
Just as in Example 17.3(ix) one sees that for a function f ∈ �1��n�
fk�x� �=
∑
z∈2−k n
1
�n�Qk�z��
∫
Qk�z�
f d�n 1Qk�z��x�� k ∈ � (19.15)
is a martingale – if you are unhappy about the two-sided infinite index set,
then think of
(
fk� �
�0�
k
)
k∈�0 as a martingale and of
(
f−k� �
�0�
−k
)
k∈�0 as backwards
martingale.
214 R.L. Schilling
For the square maximal function
f ∗�0��x� �= sup
k∈
fk�x� = sup
{
1
�n�Q�
∫
Q
f d�n � Q ∈ ⋃
k∈
�
�0�
k � x ∈ Q
}
and the submartingale � fk �k∈ , cf. Example 17.3(v), Doob’s inequalities become
�
({
f ∗�0� � s
})
�
1
s
sup
k∈
∫
fk d�n �
1
s
∫
f d�n�
The classical Hardy–Littlewood maximal function is defined similar to the
square maximal function from Example 19.14, the only difference being that one
uses balls rather than squares.
19.15 Definition The Hardy–Littlewood maximal function of the function u ∈
�p��n�, 1 � p < � is defined by
u∗�x� �= sup
B � B�x
1
�n�B�
∫
B
u d�n�
where B ⊂ �n stands for a generic (open or closed) ball of any radius.
From the Hölder inequality we see that for all sets with finite Lebesgue measure
∫
A
u d�n � ��n�A��1−1/p
(∫
A
u p d�n
)1/p
� 1 � p < ��
so that u∗ is well-defined. However, since u∗ is given by a (possibly uncountable)
supremum, it is not obvious whether u∗ is Borel measurable.
19.16 Lemma Let u ∈ �p��n�, 1 � p < �. The Hardy–Littlewood maximal
function satisfies
u∗�x� = sup
{
1
�n�Br �c��
∫
Br �c�
u d�n � r ∈
+� c ∈
n� x ∈ Br �c�
}
�
In particular, u∗ is Borel measurable.
Proof Since
+ ×
n is countable, the formula shows that u∗ arises from a
countable supremum of Borel measurable functions and is, by Corollary 8.9,
again Borel measurable.
The inequality ‘�’ is clear since every ball with rational centre and radius is
admissible in the definition of the maximal function u∗. To see ‘�’, we fix x ∈ �n
Measures, Integrals and Martingales 215
and pick some generic (open or closed) ball B with x ∈ B. Given some � > 0 we
can find r ∈
+ and c ∈
n such that B′ �= Br �c� ⊂ B, 12 �n�B� � �n�B′� and
�n�B \ B′�1−1/p � �
n�B�
�u�p
�
2
�
�n�B′�
�u�p
��
Then
1
�n�B�
∫
B
u d�n � 1
�n�B′�
∫
B
u d�n
= 1
�n�B′�
∫
B\B′
u d�n + 1
�n�B′�
∫
B′
u d�n
�
1
�n�B′�
�n�B \ B′�1−1/p �u�p +
1
�n�B′�
∫
B′
u d�n
� � + sup
B′�x∈B′
1
�n�B′�
∫
B′
u d�n
(the supremum ranges over all balls B′ with rational radius and centre s.t. x ∈ B′),
where we again used Hölder’s inequality in the penultimate line. Since � and B
were arbitrary, the inequality ‘�’ follows by considering the supremum over all
balls with x ∈ B and then letting � → 0.
We will see now that u∗ is in �p if 1 < p < �.
19.17 Theorem (Hardy, Littlewood) Let u ∈ �p��n�, 1 � p < �, and write u∗
for the maximal function. Then
�n��u∗ � s�� �
cn
s
�u�1� s > 0� p = 1� (19.16)
�u∗�p �
p cn
p − 1 �u�p� 1 < p < �� (19.17)
with the universal constant cn =
(
16√
�
)n
�� n2 + 1�.
Proof If we could show that the square maximal function u∗
�0� satisfies u
∗
�0� � u
∗,
then (19.16), (19.17) would immediately follow from Doob’s inequalities C19.13,
compare Example 19.14. The problem, however, is that a ball Br of radius
r ∈ [14 2−k−1� 14 2−k
)
, k ∈ , need not entirely fall into any single square of our
lattice �
�0�
k :
216 R.L. Schilling
0 2–k
r
0
r
1
2
∋
∋
But if we move our lattice by 2 · 14 2−k = 12 2−k in certain (combinations of)
coordinate directions, we can ‘catch’ Br inside a single cube Q
′ of the shifted
lattice.[�] More precisely, if
e �= ��1� � � � � �n�� �j ∈
{
0� 12 2
−k}�
then
�
�e�
k �=
(
e + Qk�z� � z ∈ 2−k n
)
� k ∈ � (u�e�k
)
k∈ � u
∗
�e��
are 2n filtrations with corresponding martingales and square maximal functions.
As in Example 19.14 we find that
�n
({
u∗�e� � s
})
�
1
s
�u�1� s > 0� (19.18)
Combining Corollary 15.15 with the translation invariance and scaling behaviour
of Lebesgue measure we see that the volume of a ball Br of radius
1
4 2
−k−1
� r <
1
4 2
−k and arbitrary centre is
�n�Br � = rn �n�B1�
15.15= �
n/2 rn
�� n2 + 1�
�
�n/2
(
1
4 2
−k−1)n
�� n2 + 1�
�
hence we get from x ∈ Br ⊂ Q′ and �n�Q′� = �2−k�n that
1
�n�Br �
∫
Br
u d�n � �
n�Q′�
�n�Br �
1
�n�Q′�
∫
Q′
u d�n
�
(
2−k
)n
�
(
n
2 + 1
)
�n/2
(
1
8 2
−k)n
1
�n�Q′�
∫
Q′
u d�n
�
( 8√
�
)n
�
(
n
2 + 1
)
︸ ︷︷ ︸
=� �n
max
e
u∗�e��x��
Measures, Integrals and Martingales 217
This shows that u∗ � �n maxe u
∗
�e� and
�n��u∗ � s�� � �n
({
max
e
u∗�e� �
s
�n
})
�
∑
e
�n
({
u∗�e� �
s
�n
})
(19.18)
� 2n �n
1
s
∫
u �n�ds��
A very similar argument yields �u∗
�e��p � pp−1 �u�p for all shifts e, and Doob’s
inequality (19.14) applied to each u∗
�e� finally shows
�u∗�p � �n
∥∥max
e
u∗�e�
∥∥
p
� �n
∑
e
�u∗�e��p � 2n �n
p
p − 1 �u�p�
All that remains to be done is to call cn �= 2n �n.
The proof of Theorem 19.17 extends with very little effort to maximal functions
of finite measures.
19.18 Definition Let � be a locally finite2 measure on ��n� ���n��. The
maximal function is given by
�∗�x� �= sup
B � B�x
��B�
�n�B�
where B ⊂ �n stands for a generic open ball of any radius.
If we replace in the proof of Theorem 19.17 the expression
∫
B
u d�n by ��B�
and u∗
�e��x� by
�∗�e��x� �= sup
{
��Q�
�n�Q�
� Q ∈ ⋃
k∈
�
�e�
k � x ∈ Q
}
�
we arrive at the following generalization of (19.16)
19.19 Corollary Let � be a finite measure on ��n� ���n�� with total mass ���
and maximal function �∗. Then
�n���∗ � s�� �
cn
s
���� s > 0� (19.19)
with the universal constant cn =
(
16√
�
)n
�� n2 + 1�.
2 i.e. every point x ∈ �n has a neighbourhood U = U�x� such that ��U � < �. In �n this is clearly equivalent
to saying that ��B� < � for every open ball B.
218 R.L. Schilling
Lebesgue’s differentiation theorem
Let us return once again to the Radon–Nikodým theorem 19.2. There we have
seen that � � � implies � = f �. The proof, though, shows even more, namely
� �
= f
� �
and �1- lim
∈I f
= f
(notation as in T19.2). Let us consider a concrete measure space �X� �� �� =
��n� ���n�� �n�. In this case we could reduce our consideration to a countable
sequence of -algebras (instead of ��
�
∈I ) – cf. Problem 19.1 – and use The-
orem 18.6 instead of 19.4. In fact, this would even allow us to get f�x� as
pointwise limit. This is one way to prove Lebesgue’s differentiation theorem.
19.20 Theorem (Lebesgue) Let u ∈ �1��n�. Then
lim
r→0
1
�n�Br �x��
∫
Br �x�
u�y� − u�x� �n�dy� = 0 (19.20)
for (Lebesgue) almost all x ∈ �n. In particular,
u�x� = lim
r→0
1
�n�Br �x��
∫
Br �x�
u�y� �n�dy�� (19.21)
We will not follow the route laid out above, but use instead the Hardy–Littlewood
maximal theorem 19.17 to prove T19.20. The reason is mainly a didactic one since
this is a beautiful example of how weak-type maximal inequalities (i.e. inequalities
like (19.16) or (19.13)) can be used to get a.e. convergence. More on this theme
can be found in Krantz [25, pp. 27–30] and Garsia [16, pp. 1–4]. Our proof will
also show that the limits in (19.20) and (19.21) can be strengthened to B ↓ �x�
where B is any ball containing x and, in the limit, shrinking to �x�.
Proof (of Theorem 19.20) We know from Theorem 15.17 that the continuous
functions with compact support Cc��
n� are dense in �1��n�. Since � ∈ Cc��n�
is uniformly continuous, we find for every � > 0 some � > 0 such that
��x� − ��y� < � ∀ x − y � r� r < ��
Thus
lim
r→0
1
�n�Br �x��
∫
Br �x�
��y� − ��x� �n�dy� � � ∀ � ∈ Cc��n�� (19.22)
and (19.20) is true for Cc��
n�.
For a general u ∈ �1��n� we pick a sequence ��j �j∈� ⊂ Cc��n� with
limj→� �u − �j�1 = 0. Denote by
w
�
�
�x� �= sup
0
})
= �n
({
x � inf
�>0
�u − u�x���
�
�x� > 3�
})
� �n
({
x � �u − u�x���
�
�x� > 3�
})
= �n({x � [�u − �j � + ��j − �j �x�� + ��j �x� − u�x��
]�
�
�x� > 3�
})
� �n
({
�u − �j �∗ > �
})+ �n({x � ��j − �j �x�����x� > �
})
+ �n({x � �j �x� − u�x� > �
})
�
cn
�
�u − �j�1 + 0 +
1
�
��j − u�1�
where we used Theorem 19.17, resp., (19.22) with � → 0, resp., the Markov
inequality 10.12 to deal with each of the above three terms respectively. The asser-
tion now follows by letting first j → � and then � → 0.
Let us now investigate the connection between ordinary derivatives and the
Radon–Nikodým derivative. For this the following auxiliary notation will be
useful. If � is a measure on ��n� ���n�� that assigns finite volume to any ball,
we set
D̄��x� �= lim sup
r→0
��Br �x��
�n�Br �x��
= lim
k→�
sup
0
Setting
�1�•� �= ��K ∩ • � and �2 �= � − �1
we obtain two measures �1� �2 with � = �1 + �2 and ��2� � �. Since Kc is open,
we conclude from the definition of the derivative that D̄�1�x� = 0 for all x ∈ Kc,
so that
D̄��x� = D̄�1�x� + D̄�2�x� = D̄�2�x� � �∗2 �x� ∀ x ∈ Kc�
where �∗2 denotes the maximal function for the measure �2. This shows that
�D̄� > s� ⊂ K ∪ ��∗2 > s� ∀ s > 0�
Using that �n�K� � �n�N � = 0 and the maximal inequality Corollary 19.19 we find
�n��D̄� > s�� � �n���∗2 > s�� �
cn
s
��2� �
cn
s
��
4 See the footnote on p. 217.
Measures, Integrals and Martingales 221
Since � > 0 and s > 0 were arbitrary, we conclude that D̄� = 0 Lebesgue a.e.
If � is not finite, we choose an exhausting sequence of open balls Bk�0�, k ∈
�, and set �k�•� �= ��Bk�0� ∩ • �. Obviously, D̄� = D̄�k on Bk�0�, and there-
fore the first part of the proof shows that D̄��x� = 0 for Lebesgue almost all x ∈
Bk�0�. Denoting the exceptional set by Mk we see that D̄��x� = 0 for all x �∈ M �=⋃
k∈� Mk; the latter, however, is an �n-null set, and the theorem follows.
The Calderón–Zygmund lemma
Our last topic is the famous Calderón–Zygmund decomposition lemma which is
the heart of many further developments in the theory of singular integral operators.
We take the proof from Stein’s book [47, p. 17] and rephrase it a little to bring
out the martingale connection.
19.23 Lemma (Calderón–Zygmund decomposition) Let u ∈ �1+��n� and
> 0.
Then there exists a decomposition of �n such that
(i) �n = F ∪ � and F ∩ � = ∅;
(ii) u �
almost everywhere on F ;
(iii) � = ⋃k∈� Qk with mutually disjoint half-open axis-parallel squares Qk such
that for each Qk
<
1
�n�Qk�
∫
Qk
u d�n � 2n
�
Proof Let �k �= ��0�k , k ∈ , be the dyadic filtration of Example 19.14 and let
�uk�k∈ be the corresponding martingale (19.15). Introduce a stopping time
� �= inf�k ∈ � uk >
�� inf ∅ �= +��
and set F �= �� = +�� ∪ �� = −�� and � �= �−� < � < +��. By the very
definition of the martingale �uk�k∈� we see
uk�x� �
1
�n�Qk�
∫
u d�n = 2nk �u�1
k→−�−−−−→ 0�
so that limk→−� uk�x� = 0 and �� = −�� = ∅. If x ∈ �� = +��, we have uk�x� �
and so u�x� = limk→� uk�x� �
a.e., as the almost everywhere pointwise limit exists
by Corollary 18.3 (note that
∫
uk d�
n = ∫ u d�n < �). This settles (i) and (ii).
Since � is a stopping time, �� = k� = �� � k� \ �� � k − 1� ∈ �k, hence �� = k�
as well as � = ·⋃k∈ �� = k� are unions of disjoint half-open squares. The estimate
in (iii) can be written as
< u� �x� � 2
n
∀ x ∈ ��
222 R.L. Schilling
From its definition, u� >
is clear. For the upper estimate we note that every
square Qk−1 ∈ �k−1 contains 2n squares Qk ∈ �k, so that
uk
uk−1
=
∑
z∈2−k+1 n
( ∑
Qk�y�⊂Qk−1�z�
1
�n�Qk�y��
∫
Qk�y�
u d�n 1Qk�y�
)
∑
z∈2−k+1 n
1
�n�Qk−1�z��
∫
Qk−1�z�
u d�n 1Qk−1�z�
� 2n
∑
z∈2−k+1 n
( ∑
Qk�y�⊂Qk−1�z�
2−n
�n�Qk�y��
∫
Qk�y�
u d�n
)
1Qk−1�z�
∑
z∈2−k+1 n
1
�n�Qk−1�z��
∫
Qk−1�z�
u d�n 1Qk−1�z�
= 2n�
Finally, by the definition of �,
1�−�<�<+�� u� =
∑
k∈
uk 1��=k� �
∑
k∈
2n uk−1 1��=k�
�
∑
k∈
2n
1��=k�
= 2n
1�−�<�<+��
and the proof is complete.
Note that all results that we have proved here for �n and �n can be extended to
spaces of homogeneous type, i.e. metric spaces X with a measure � that is finite
and strictly positive on balls and has the following volume doubling property: for
some positive constant � > 0 we have
��B2r �x�� � � ��Br �x�� ∀ x ∈ X� r > 0�
see Krantz [25, §6.1, pp. 235–61].
Problems
19.1. Show that Theorem 18.6 is enough to prove the Radon–Nikodým theorem 19.2 in
the situation where � is countably generated, i.e. where � = ��Aj �j∈��.
[Hint: set �n �= �A1� A2� � � � � An� and observe that the atoms of �n are of the
form C1 ∩ � � � ∩ Cn where Cj ∈ �Aj � Acj �, 1 � j � n.]
19.2. A theorem of Doob. Let ��t�t�0 and ��t�t�0 be two families of measures on the
-finite measure space �X� �� such that �t � �t for all t � 0 and t �→ �t�A�� �t�A�
Measures, Integrals and Martingales 223
are measurable for all A ∈ �. Then there exists a measurable function �t� x� �→
p�t� x�, �t� x� ∈ �0� �� × X, such that �t = p�t� •� �t for all t � 0.
[Hint: set, as in the proof of Theorem 19.2, p
�t� x� �=
∑
A∈
�t �A�
�t �A�
1A�x�, and check
that this function is jointly measurable in t and x. Now argue as in the proof of
19.2.]
19.3. Conditional expectations. Let �X� �� �� be a -finite measure space and let
� ⊂ � be a sub- -algebra. Use the Radon–Nikodým theorem to show that for
every u ∈ �1��� there exists an – up to null sets unique – � -measurable function
u� ∈ �1�� � such that
∫
F
u� d� =
∫
F
u d� ∀ F ∈ � � (19.23)
Use this result to rephrase the (sub-, super-)martingale property 17.1.
Remark. Since u� is unique (modulo null sets) one often writes u� = E� u where
E� is an operator which is called conditional expectation. We will introduce this
operator in a different way in Chapter 22 and show in Theorem 23.9 that we could
have defined E� by (19.23).
19.4. Let �X� �� �� be a -finite measure space and let � be a further measure. Show
that � � � entails that � = f � for some (a.e. uniquely determined) density function
f such that 0 � f � 1.
19.5. Let �, � be two -finite measures on �X� �� which have the same null sets. Show
that � = f � and � = g � where 0 < f < � a.e. and g = 1/f a.e.
Remark. Measures having the same null sets are called equivalent.
19.6. Give an example of a measure � and a density f such that f � is not -finite.
19.7. Let �� � be -finite measures on the measurable space �X� ��. Let ��j �j∈� be
a filtration of sub- -algebras of � such that � = (⋃j∈� �j
)
and denote by
�j �= � �j and �j �= � �j . If �j � �j for all j ∈ �, then � � �. Find an expression
for the density d�/d�.
19.8. Let � be Lebesgue measure on �0� 2� and � be Lebesgue measure on �1� 3�. Find
the Lebesgue decomposition of � with respect to �.
19.9. Stieltjes measure (3). Let �X� �� �� be a finite measure space and denote by F
the left-continuous distribution function of � as in Problem 7.9. Use Lebesgue’s
decomposition theorem 19.9 to show that we can decompose F = F1 + F2 + F3 and,
accordingly, � = �1 + �2 + �3 in such a way that
(1) F1 is discrete, i.e. �1 is the countable sum of weighted Dirac �-measures.
(2) F2 is absolutely continuous, i.e. for every � > 0 there exists a � > 0 such that∑N
j=1 F�yj � − F�xj � � � for all points x1 < y1 < x2 < y2 < � � � < xN < yN with∑N
j=1 yj − xj < �.
(3) F3 is continuous and singular, i.e. �3 ⊥ �1.
[Hint: use in (19.2) the characterization of null sets of Problem 6.1.]
19.10. The devil’s staircase. Recall the construction of Cantor’s ternary set from Problem
7.10. In each step of the construction Ek = I 1k ·
⋃
� � � ·⋃I 2kk . Denote by J 1k � � � � � J 2
k−1
k
224 R.L. Schilling
the intervals which make up �0� 1� \ Ek arranged in increasing order of their end-
points. We construct a sequence of functions Fk � �0� 1� → �0� 1� by
Fk�x� �=
⎧
⎪⎪⎨
⎪⎪⎩
0� if x = 0
j 2−k if x ∈ J jk � 1 � j � 2k − 1
1� if x = 1
and interpolate linearly between these values to get Fk�x� for all other x.
(i) Sketch the first three functions F1� F2� F3.
(ii) Show that the limit F�x� �= limk→� Fk�x� exists.
Remark. F is usually called the Cantor function.
(iii) Show that F is continuous and increasing.
(iv) Show that F ′ exists a.e. and equals 0.
(v) Show that F is not absolutely continuous (in the sense of Problem 19.9(2))
but singular, i.e. the corresponding measure � with distribution function F is
singular w.r.t. Lebesgue measure �1 �0�1�.
19.11. Kolmogorov’s inequality. Let �Xj �j∈� be a sequence of independent, identically
distributed random variables on a probability space ��� �� P�. Then we have the
following generalization of Chebyshev’s inequality, cf. Problem 10.5 (vi),
P
(
max
1�j�n
∣∣∣∣
n∑
j=1
�Xj − EXj �
∣∣∣∣� t
)
�
1
t2
n∑
j=1
VXj �
where, in probabilistic notation, EY = ∫ Y dP is the expectation or mean value and
VY = ∫ �Y − EY �2 dP the variance of the random variable (i.e. measurable function)
Y � � → �.
19.12. Let u� w � 0 be measurable functions on a -finite measure space �X� �� ��.
(i) Show that t ���u � t�� �
∫
�u�t�
w d� for all t > 0 implies that
∫
up d� �
p
p − 1
∫
up−1 w d� ∀ p > 1�
(ii) Assume now that u� w ∈ Lp. Conclude from (i) that �u�p � pp−1 �w�p for p > 1.
[Hint: use the technique of the proof of Theorem 19.12; for (ii) use Hölder’s
inequality.]
19.13. Show the following slight improvement of Doob’s maximal inequality T19.12: Let
�uj �j∈� be a martingale or � uj p�j∈�, 1 < p < �, be a submartingale on a -finite
filtered measure space. Then
max
j�N
�uj �p � �u∗N �p �
p
p − 1 �uN �p �
p
p − 1 max1�j�N �uj �p�
Measures, Integrals and Martingales 225
19.14. �p-bounded martingales. A martingale �uj � �j �j∈� is called �
p-bounded, if
supj∈�
∫ uj p d� < � for some p > 1. Show that the sequence �uj �j∈� converges
a.e. and in �p-sense to a function u ∈ �p.
[Hint: compare with Problem 18.8]
19.15. Use Theorem 18.6 to show that the martingale of Example 19.14 is uniformly
integrable.
19.16. Let u � �a� b� → � be a continuous function. Show that x �→ ∫
�a�x�
u�t� dt is every-
where differentiable and find its derivative. What happens if we only assume that
u ∈ �1�dt�?
[Hint: Theorem 19.20.]
19.17. Let f � � → � be a bounded increasing function. Show that f ′ exists Lebesgue
almost everywhere and that f�b�−f�a� �∫
�a�b�
f ′�x� dx. When do we have equality?
[Hint: assume first that f is left- or right-continuous. Then you can interpret f
as distribution function of a Stieltjes measure �. Use Lebesgue’s decomposition
theorem 19.9 to write � = �� + �⊥ and use Corollaries 19.21 and 19.22 to find f ′.
If f is not one-sided continuous in the first place, use Lemma 13.12 to find a
version � of f which is left- or right-continuous such that �� �= f� is at most
countable, hence a Lebesgue null set.]
19.18. Fubini’s ‘other’ theorem. Let �fj �j∈� be a sequence of monotone increasing
functions fj � �a� b� → �. If the series s�x� �=
∑�
j=1 fj �x� converges, then s
′�x�
exists a.e. and is given by s′�x� = ∑�j=1 f ′j �x� a.e.
[Hint: the partial sums sn�x� and s�x� are again increasing functions and, by
Problem 19.17 s′�x� and s′n�x� exist a.e.; the latter can be calculated through term-
by-term differentiation. Since the fj are increasing functions, the limits of the
difference quotients show that 0 � s′n � s
′
n+1 � s
′ a.e., hence
∑
j f
′
j converges a.e.
To identify this series with s′, show that
∑
k�s�x� − snk �x�� converges on �a� b� for
some suitable subsequence. The first part of the proof applied to this series implies
that
∑
k�s
′�x� − s′nk �x�� converges, thus s′ − s′nk → 0.]
20
Inner product spaces
Up to now we have only considered functions with values in � or �̄. Often it is
necessary to admit complex-valued functions, too. In what follows � will stand
for � or �.
Recall that a �-vector space is a set V with a vector addition ‘ + ’ � V × V → V ,
�v� w� �→ v + w and a multiplication of a vector with a scalar ‘ · ’ � � × V → V ,
��� v� �→ � · v which are defined in such a way that �V� +� is an Abelian group
and that for all �� � ∈ � and v� w ∈ V the relations
�� + ��v = �v + �v ��v + w� = �v + �w
����v = ���v� 1 · v = v
hold. Typical examples of �-vector spaces are the spaces �p or Lp (see
Remark 12.5) and, in particular, the sequence spaces �p from Example 12.12.
For the �-versions we first need to know how to integrate complex functions.
20.1 Scholium (integral of complex functions) It is often necessary to consider
complex-valued functions u � X → � on a measurable space �X� ��. Since � is
a normed space, we have a natural topology on � and we may consider the Borel
-algebra ���� on � . Since we can (even topologically) identify the complex
plane � with �2, the Borel sets in � are generated by the half-open rectangles
z� w�� �= �x + iy � Re z � x < Re w� Im z � y < Im w�
The correspondence � ↔ �2 is accomplished by the map � � � → �2, z �→
��z� �= �Re z� Im z� = (12 �z + z̄�� 12i �z − z̄�
)
which is, along with its inverse
�−1 � �2 → �, �x� y� �→ �−1�x� y� = x + iy, continuous, hence measurable.
226
Measures, Integrals and Martingales 227
Consequently, we have
f � X → � is
�/���� measurable
⎫
⎬
⎭ ⇐⇒
⎧
⎨
⎩
Re f� Im f � X → � are
�/���� measurable.
(20.1)
To see ‘⇒’ note that the maps Re � z �→ 12 �z + z̄� and Im � z �→ 12i �z − z̄� are
continuous, hence measurable, and so are by Theorem 7.4 the compositions Re � f
and Im � f .
Conversely, ‘⇐’ follows – if we write f = u + iv – from the formula
f −1
(
z� w��
) = u−1(
Re z� Re w�)︸ ︷︷ ︸
∈ �
∩ v−1(
Im z� Im w�)︸ ︷︷ ︸
∈ �
∈ �
and the fact that the rectangles of the form
z� w�� generate ����.
This means that we can define the integral of a �-valued measurable function
by linearity
∫
f d� �=
∫
Re f d� + i
∫
Im f d�� (20.2)
and we call f � X → � integrable and write f ∈ �1���� if Re f� Im f � X → �
are integrable in the usual sense. The following rules for f ∈ �1���� are readily
checked:
Re
∫
f d� =
∫
Re f d�� Im
∫
f d� =
∫
Im f d��
∫
f d� =
∫
f d��
(20.3)
f ∈ �1���� ⇐⇒ f ∈ ������� and
f
∈ �1����
(20.4)
In (20.4) the direction ‘⇒’ follows since
f
= (�Re f�2 +�Im f�2)1/2 is measurable
and
f
�
Ref
+
Im f
, while ‘⇐’ is implied by
Re f
�
Im f
�
f
.
The equivalence (20.4) can be used to show that �1���� is a �-vector space:
for f� g ∈ �1���� and �� � ∈ � we have �f + �g ∈ �1����, in which case
∫
��f + �g� d� = �
∫
f d� + �
∫
g d�� (20.5)
moreover, we have the following standard estimate:
∣∣∣∣
∫
f d�
∣∣∣∣ �
∫
f
d�
(20.6)
228 R.L. Schilling
Only (20.6) is not entirely straightforward. Since
∫
f d� ∈ �, we can find some
� ∈
0� 2�� such that
∣∣∣∣
∫
f d�
∣∣∣∣ = ei�
∫
f d� = Re
(
ei�
∫
f d�
)
(20.3),(20.5)=
∫
Re
(
ei�f
)
d�
�
∫ ∣∣ei�f
∣∣d� =
∫
f
d�
The spaces �
p
����, 1 < p � �, are now defined by
�
p
���� �=
{
f ∈ ������� �
f
∈ �p����
}
� (20.7)
and it is obvious that all assertions from Chapter 12 remain valid. In particular,
L
p
���� stands for the set of all equivalence classes of �
p
����-functions if we
identify functions which coincide outside some �-null set. Note also that most of
our results on �-valued integrands carry over to �-valued functions by considering
real and imaginary parts separately.
As we have seen in Chapter 12, cf. Remark 12.5, the spaces �
p
����, resp.
L
p
���� are semi-normed, resp. normed vector spaces. The same and more is true
for �n and �n: here we can even define a product of two vectors which, however,
results in a scalar. It is this notion which we want to study in greater detail.
20.2 Definition A �-vector space V is an inner product space if it supports
a scalar or inner product, i.e. a map �•� •
� V × V → � with the following
properties: for all u� v� w ∈ V and �� � ∈ �
�definiteness� �v� v
> 0 ⇐⇒ v �= 0� �SP1�
�skew-symmetry� �v� w
= �w� v
� �SP2�
��u + �v� w
= ��u� w
+ ��v� w
�SP3�
If � = �, (SP2) becomes symmetry and (SP2), (SP3) together show that both
v �→ �v� w
and w �→ �v� w
are �-linear; therefore we call �v� w� �→ �v� w
bilinear. If � = �, (SP2), (SP3) give
�u� �v + �w
�SP2�= ��v + �w� u
�SP3�= ��v� u
+ ��w� u
= �̄ �v� u
+ �̄ �w� u
�SP2�= �̄�u� v
+ �̄�u� w
�
Measures, Integrals and Martingales 229
i.e. w �→ �v� w
is skew-linear. We call �•� •
in this case a sesqui-linear form.
Since � = � always includes � = �, we will restrict ourselves to � = �.
20.3 Lemma (Cauchy–Schwarz inequality) Let �V� �•� •
� be an inner product
space. Then
�v� w
2 � �v� v
�w� w
∀ v� w ∈ V
(20.8)
Equality holds if, and only if, v = �w for some � ∈ �.
Proof If v = 0 or w = 0, there is nothing to show. For all other v� w ∈ V and
� ∈ � we have
0 � �v − �w� v − �w
= �v� v
− ��w� v
− �̄�v� w
+ ��̄�w� w
= �v� v
− 2 Re (��w� v
)+
�
2�w� w
�
where we used that z + z̄ = 2 Re z. If we set � = �v� v
/�w� v
, we get
0 � �v� v
− 2 Re �v� v
+ �v� v
2 �w� w
�w� v
2 �
which implies (20.8).
Since �v − �w� v − �w
= 0 only if v = �w, this is necessary for equality in
(20.8), too. If, indeed, v = �w, we see
�v� w
2 =
��w� w
2 = ��̄�w� w
�w� w
= ��w� �w
�w� w
= �v� v
�w� w
�
showing that v = �w is also sufficient for equality in (20.8).
Lemma 20.3 is an abstract version of the Cauchy–Schwarz inequality for inte-
grals C12.3. Just as in Chapter 12 we will use it to show that in an inner product
space �V� �•� •
�
�v� �=
√
�v� v
� v ∈ V� (20.9)
defines a norm, i.e. a map �•� � V →
0� �� satisfying for all v� w ∈ V and � ∈ �
�definiteness� �v� > 0 ⇐⇒ v �= 0� �N1�
�pos. homogeneity� ��v� =
�
· �v�� �N2�
�triangle inequality� �v + w� � �v� + �w�
�N3�
230 R.L. Schilling
20.4 Lemma �V� �•� •
1/2� is a normed space.
Proof Because of (SP1) the map �•� � V →
0� ��, �v� �=
√
�v� v
, is well-defined.
All we have to do is to check the properties �N1�–�N3�. Obviously �SP1� ⇔ �N1�,
�N2� follows from �SP2�� �SP3�:
��v�2 = ��v� �v
= ��̄�v� v
=
�
2 · �v�2�
and the triangle inequality �N3� is a consequence of the Cauchy–Schwarz inequal-
ity (20.8):
�v + w�2 = �v + w� v + w
= �v� v
+ �v� w
+ �w� v
+ �w� w
= �v�2 + 2 Re �v� w
+ �w�2
� �v�2 + 2
�v� w
+ �w�2
(20.8)
� �v�2 + 2 �v� · �w� + �w�2
= ��v� + �w��2
20.5 Examples (i) The typical finite-dimensional inner product spaces are
�n ��-vector space�
�x� y
=
n∑
j=1
xj yj
�x� =
( n∑
j=1
x2j
)1/2
�n ��-vector space�
�z� w
=
n∑
j=1
zj w̄j
�z� =
( n∑
j=1
zj
2
)1/2
(ii) The typical separable1 infinite-dimensional inner product spaces are
�2�� � ��-vector space�
x = �xj �j∈ � y = �yj �j∈
�x� y
= �x� y��2 =
�∑
j=1
xj yj
�x� = �x��2 =
( �∑
j=1
x2j
)1/2
�2�� � ��-vector space�
z = �zj �j∈ � w = �wj �j∈
�z� w
= �z� w��2 =
�∑
j=1
zj w̄j
�z� = �z��2 =
( �∑
j=1
zj
2
)1/2
1 Separable means that the space contains a countable dense subset, see Definition 21.14 below.
Measures, Integrals and Martingales 231
(iii) Let �X� �� �� be a measure space. The typical general (finite and infinite-
dimensional) inner product spaces are
L2���� ��-vector space�
�u� v
= �u� v�2 =
∫
u v d�
�u� = �u�2 =
(∫
u2 d�
)1/2
L2���� ��-vector space�
�f� g
= �f� g�2 =
∫
f ḡ d�
�f � = �f �2 =
(∫
f
2 d�
)1/2
Every inner product space becomes a normed space with norm given by (20.9),
but not every normed space is necessarily an inner product space. In fact, Lp���
or �p� � are for all 1 � p � � normed spaces, but only for p = 2 inner product
spaces. The reason for this is that in Lp���, p �= 2, the parallelogram law does
not hold.
20.6 Lemma (Parallelogram identity) Let �V� �•� •
� be an inner product space.
Then
∥∥∥∥
v + w
2
∥∥∥∥
2
+
∥∥∥v − w
2
∥∥∥
2
= 1
2
(�v�2 + �w�2) ∀ v� w ∈ V
(20.10)
Proof Obvious.
Geometrically v + w and v − w are the diagonals of the parallelogram spanned
by v and w. The proof of (20.10) in �n would show the cosine law for the angle
��x� y� between the vectors x� y ∈ �n:
�x� y
�x� · �y� = cos��x� y�
(20.11)
In fact, inner products induce a natural geometry on V which resembles in many
aspects the Euclidean geometry on �n and �n.
20.7 Definition Let �V� �•� •
� be an inner product space. We call v� w ∈ V
orthogonal and write v ⊥ w if �v� w
= 0.
20.8 Remark (i) If �•� derives from a scalar product, we can recover �•� •
from
�•� with the help of the so-called polarization identities: if � = �,
�v� w
= 14
(�v + w�2 − �v − w�2) = 12
(�v + w�2 − �v�2 − �w�2)� (20.12)
and if � = �,
�v� w
= 14
(�v + w�2 − �v − w�2 + i�v − iw�2 − i�v + iw�2)
(20.13)
232 R.L. Schilling
(ii) One can show that a norm �•� derives from a scalar product if, and only
if, �•� satisfies the parallelogram identity (20.10). For a proof we refer to Yosida
[55, p. 39], see also Problem 20.2.
(iii) Let V = V� be an �-inner product space with scalar product �•� •
�. Then
we can turn V into a �-inner product space using the following complexification
procedure:
V� �= V� ⊕ iV� =
{
v + iw � v� w ∈ V�
}
�
with the following addition
�v + iw� + �v′ + iw′� �= �v + v′� + i�w + w′�� v� v′� w� w′ ∈ V��
scalar multiplication
�� + i���v + iw� �= ��v − �w� + i��v + �w�� �� � ∈ �� v� w ∈ V��
inner product
�v + iw� v′ + iw′
� �= �v� v′
+ i�w� v′
− i�v� w′
+ �w� w′
� v� v′� w� w′ ∈ V��
and norm � · � �= �•� •
1/2� .
Problems
20.1. Show that the examples given in 20.5 are indeed inner product spaces.
20.2. This exercise shows the following
Theorem (Fréchet–von Neumann–Jordan). An inner product �•� •
on the �-
vector space V derives from a norm if, and only if, the parallelogram identity
(20.10) holds.
(i) Necessity: prove Lemma 20.6.
Assume from now on that �•� is a norm satisfying (20.10) and set
�v� w� �= 1
4
(�v + w�2 − �v − w�2)
(ii) Show that �v� w� satisfies the properties �SP1� and �SP2� of Definition 20.2.
(iii) Prove that �u + v� w� = �u� w� + �v� w�.
(iv) Use (iii) to prove that �q v� w� = q �v� w� for all dyadic numbers q = j 2−k,
j ∈
, k ∈ 0 and conclude that �SP3� holds for dyadic �� �.
(v) Prove that the maps t �→ �tv + w� and t �→ �tv − w� �t ∈ �� v� w ∈ X� are
continuous and conclude that t �→ �tv� w� is continuous. Use this and (iv) to
show that �SP3� holds for all �� � ∈ �.
Measures, Integrals and Martingales 233
20.3. (Continuation of Problem 20.2) Assume now that W is a �-vector space with norm
�•� satisfying the parallelogram identity (20.10) and let
�v� w�� �= 14
(�v + w�2 − �v − w�2)
Then �v� w�� �= �v� w�� + i�v� iw�� is a complex-valued inner product.
20.4. Does the norm �•�1 on L1�
0� 1�� �
0� 1�� �1
0�1�� derive from an inner product?
20.5. Let �V� �•� •
� be a �-inner product space, n ∈ and set � �= e2�i/n.
(i) Show that
1
n
n∑
j=1
�jk =
{
1 if k = 0
0 if 1 � k � n − 1
(ii) Use (i) to prove for all n � 3 the following generalization of (20.12) and
(20.13):
�v� w
= 1
n
n∑
j=1
�j �v + �j w�2
(iii) Prove the following continuous version of (ii)
�v� w
= 1
2�
∫
�−����
ei�
∥∥v + ei�w
∥∥2 d�
20.6. Let V be an inner product space. Show that v ⊥ w if, and only if, Pythagoras’
theorem �v + w�2 = �v�2 + �w�2 holds.
21
Hilbert space �
Let �V� �•� •�� be an inner product space. As we have seen in Chapter 20,
�V� �•� �= �•� •�1/2� is a normed space and the norm resembles in many aspects
the Euclidean, resp. unitary norm in �n and �n. In particular, we have a
notion of convergence:1 a sequence �vj �j∈� ⊂ V converges to an element v ∈ V
if ��v − vj��j∈� converges to 0 in �+,
lim
j→�
vj = v ⇐⇒ lim
j→�
�v − vj� = 0�
But it is completeness and the study of Cauchy sequences in V ,
�vj �j∈� ⊂ V Cauchy sequence ⇐⇒ lim
j�k→�
�vj − vk� = 0�
that gets analysis really going. This leads to the very natural
21.1 Definition A Hilbert space � is a complete inner product space, i.e. an
inner product space where every Cauchy sequence converges. We will usually
write � for a Hilbert space.
21.2 Example The spaces �n, �n, �2� and L
2
���� over any measure space
�X� �� �� are Hilbert spaces and, indeed, the ‘typical’ ones. This follows from
Example 20.5 and the Riesz – Fischer theorem 12.7.
Since every Hilbert space is an inner product space, we have the notion of
orthogonality of g� h ∈ �, see Definition 20.7:
g ⊥ h ⇐⇒ �g� h� = 0�
234
Measures, Integrals and Martingales 235
21.3 Definition Let � be a Hilbert space. The orthogonal complement M ⊥ of a
subset M ⊂ � is by definition
M ⊥ �= {h ∈ � � h ⊥ m ∀ m ∈ M}
= {h ∈ � � �h� m� = 0 ∀ m ∈ M}�
(21.1)
21.4 Lemma Let � be a Hilbert space and M ⊂ � be any subset. The orthogonal
complement M ⊥ is a closed linear subspace of � and M ⊂ �M ⊥�⊥.
Proof If g� h ∈ M ⊥ we find for all �
∈ � that
� g +
h� m� = �g� m� +
�h� m� = 0 ∀ m ∈ M�
i.e. g +
h ∈ M ⊥ and M ⊥ is a linear subspace of �. To see the closedness we
take a sequence �hk�k∈� ⊂ M ⊥ such that limk→� hk = h. Then, for all m ∈ M,
�h� m�
=
�h� m� − �hk� m�︸ ︷︷ ︸
= 0
=
�h − hk� m�
20.3
� �h − hk� · �m�
k→�−−−→ 0�
this shows that M ⊥ is closed since h ∈ M ⊥. Finally, if m ∈ M we get
0 = �h� m� = �m� h� ∀ h ∈ M ⊥ =⇒ m ∈ �M ⊥�⊥�
The next theorem is central for the study of (the geometry of) Hilbert spaces.
Recall that a set C ⊂ � is convex if
u� w ∈ C =⇒ tu + �1 − t�w ∈ C ∀ t ∈ �0� 1��
21.5 Theorem (Projection theorem) Let C �= ∅ be a closed convex subset of the
Hilbert space �. For every h ∈ � there is a unique minimizer u ∈ C such that
�h − u� = inf
w∈C
�h − w� =� d�h� C�� (21.2)
This element u = PC h is called (orthogonal) projection of h onto C and is equally
characterized by the property
PC h ∈ C and Re �h − PC h� w − PC h� � 0 ∀ w ∈ C� (21.3)
Proof Existence: Let d �= inf w∈C �h − w�. By the very definition of the infimum,
there is a sequence �wk�k∈� ⊂ C such that
lim
k→�
�h − wk� = d�
If we can show that �wk�k∈� is a Cauchy sequence, we know that the limit
u �= limk→� wk exists because of the completeness of � and is in C since C is
236 R.L. Schilling
closed. Applying the parallelogram law (20.10) with v = h − wk and w = h − w�
gives
∥∥∥∥h −
wk + w�
2
∥∥∥∥
2
+
∥∥∥wk − w�
2
∥∥∥
2
= 1
2
(�h − wk�2 + �h − w��2
)
�
Since C is convex, 12 wk + 12 w� ∈ C, thus d �
∥∥h − 12 �wk + w��
∥∥ and
d2 + 14 �wk − w��2 � 12
(�h − wk�2 + �h − w��2
) k��→�−−−−→ d2�
This proves that �wk�k∈� is a Cauchy sequence.
Uniqueness: Assume that u� ũ ∈ C satisfy both (21.2), i.e.
�u − h� = d = �ũ − h��
Since by convexity 12 u + 12 ũ ∈ C, the parallelogram law (20.10) gives
d2 �
∥∥h −(12 u + 12 ũ
)∥∥2
︸ ︷︷ ︸
�d2
+
∥∥1
2 �u − ũ �
∥∥2 = 12
(�h − u�2 + �h − ũ �2) = d2�
and we conclude that �u − ũ �2 = 0 or u = ũ .
Equivalence of (21.2),(21.3): Assume that u ∈ C satisfies (21.2) and let w ∈ C.
By convexity, �1 − t�u + tw ∈ C for all t ∈ �0� 1� and by (21.2)
�h − u�2 � �h − �1 − t�u − tw�2
= ��h − u� − t�w − u��2
= �h − u�2 − 2t Re �h − u� w − u� + t2 �w − u�2�
Hence, 2 Re �h − u� w − u� � t�w − u�2 and (21.3) follows as t → 0.
Conversely, if (21.3) holds, we have for u = PC h ∈ C
�h − u�2 − �h − w�2 = 2 Re �h − u� w − u� − �u − w�2 � 0 ∀ w ∈ C�
which implies (21.2).
We will now study the properties of the projection operator PC . If V� W ⊂ �
are two subspaces with V ∩ W = �0
, we call V + W = �v + w � v ∈ V� w ∈ W
the direct sum and write V ⊕ W .
21.6 Corollary (i) Let ∅ �= C ⊂ � be a closed convex subset. The projection
PC � � → C is a contraction, i.e.
�PC g − PC h� � �g − h� ∀ g� h ∈ �� (21.4)
Measures, Integrals and Martingales 237
(ii) If ∅ �= C = F is a closed linear subspace of �, PF is a linear operator and
f = PF h is the unique element with
f ∈ F and h − f ∈ F ⊥� (21.5)
In particular, � = F ⊕ F ⊥.
(iii) If F is not closed, then � = F̄ ⊕ F ⊥ or, equivalently, F̄ = �F ⊥�⊥.
Proof (i) follows from the inequality
�PC g − PC h�2 = Re
(�PC g� PC g − PC h� − �PC h� PC g − PC h�
)
= Re(�PC g − g� PC g − PC h� + �PC h − h� PC h − PC g�
+ �g − h� PC g − PC h�
)
(21.3)
� Re �g − h� PC g − PC h�
� �g − h� · �PC g − PC h��
where we used the Cauchy – Schwarz inequality L20.3 for the last estimate.
(ii) Since F is a linear subspace, v ∈ F =⇒ �v ∈ F for all � ∈ � and (21.3)
reads in this case
Re �h − PF h� �v − PF h� � 0 ∀ � ∈ �� v ∈ F�
or, equivalently,
Re
(
��h − PF h� v�
)
� Re �h − PF h� PF h� ∀ � ∈ �� v ∈ F�
which is only possible if �h − PF h� v� = 0 for all v ∈ F and for v = PF h, in
particular, �h − PF h� PF h� = 0; this shows (21.5).
If, on the other hand, (21.5) is true, we get for all v ∈ F
0 = Re �h − f� v� − Re �h − f� f � = Re �h − f� v − f ��
and f = PF h follows by the uniqueness of the projection.
The decomposition � = F ⊕ F ⊥ follows immediately as h = PF h + �h − PF h�
and h ∈ F ∩ F ⊥ ⇐⇒ �h� h� = 0 ⇐⇒ h = 0. The decomposition also proves the
linearity of PF since for all g� h ∈ � and �
∈ �
⟨
� g − PF g�︸ ︷︷ ︸
∈ F ⊥
+ �
h −
PF h�︸ ︷︷ ︸
∈ F ⊥
� g +
h︸ ︷︷ ︸
∈ F
⟩ = 0
as well as
⟨
� g +
h� − PF � g +
h�� g +
h
⟩ = 0
238 R.L. Schilling
which implies, again by uniqueness of the projection, that PF � g +
h� = PF g
+
PF h.
(iii) We know from Lemma 21.4 that F ⊂ �F ⊥�⊥ and that �F ⊥�⊥ is closed;
therefore, F̄ ⊂ �F ⊥�⊥. Moreover, F ⊂ F̄ implies F̄ ⊥ ⊂ F ⊥,[�] showing that
�
21.6(ii)= F̄ ⊕ �F̄ �⊥ ⊂ F̄ + F ⊥ ⊂ �F ⊥�⊥ ⊕ F ⊥ 21.6(ii)= ��
and � = F̄ ⊕ F ⊥ or F̄ = �F ⊥�⊥ follows.
21.7 Remarks (i) It is easy to show that the projection PF onto a subspace F ⊂ �
is symmetric, i.e. that
�PF g� h� = �g� PF h� ∀ g� h ∈ �� (21.6)
and that P2F = PF , i.e.
�P2F g� h� = �PF g� PF h� = �PF g� h� ∀ g� h ∈ �� (21.7)
In fact, (21.7) implies (21.6). Since PF g ∈ F , PF �PF g� = PF g by the uniqueness
of the projection and �P2F g� h� = �PF g� h� follows. Finally,
�PF g� h� = �PF g� PF h� + �PF g� h − PF h�︸ ︷︷ ︸
= 0
= �PF g� PF h��
(ii) Pythagoras’ theorem has a particularly nice form for projections:
�h�2 = �PF h�2 + �h − PF h�2 ∀ h ∈ �� (21.8)
(iii) A very useful interpretation of C21.6(iii) is the following: a linear subspace
F ⊂ � is dense in � if, and only if, F ⊥ = �0
. In other words,
F ⊂ � is dense ⇐⇒ �f� h� = 0 ∀ f ∈ F entails h = 0�
Let us briefly discuss two important consequences of the projection theorem
21.5: F. Riesz’ representation theorem on the structure of continuous linear
functionals on � and the problem of finding a basis in �.
21.8 Definition A continuous linear functional on � is a map � � � → �,
h �→ ��h� which is linear,
�� g +
h� = ��g� +
��h� ∀ �
∈ �� ∀ g� h ∈ �
and satisfies
��g − h�
� c��� �g − h� ∀ g� h ∈ �
with a constant c��� � 0 independent of g� h ∈ �.
Measures, Integrals and Martingales 239
It is easy to find examples of continuous linear functionals on �. Just fix some
g ∈ � and set
�g�h� �= �h� g�� h ∈ �� (21.9)
Linearity is clear and the Cauchy–Schwarz inequality L20.3 shows
�g�h − h̃ �
=
�h − h̃ � g�
� �g�︸︷︷︸
= c���
·�h − h̃ ��
That, in fact, all continuous linear functionals of � arise in this way is the content
of the next theorem, due to F. Riesz.
21.9 Theorem (Riesz representation theorem) Each continuous linear func-
tional � on the Hilbert space � is of the form (21.9), i.e. there exists a unique
g ∈ � such that
��h� = �g�h� = �h� g� ∀ h ∈ ��
Proof Set F �= �−1��0
� which is, due to the continuity and linearity of �, a
closed linear subspace of �.[�] If F = �, � ≡ 0 and g = 0 ∈ � does the job.
Otherwise we can pick some g0 ∈ � \ F and set
g �= g0 − PF g0�g0 − PF g0�
(21.5)∈ F ⊥ =⇒ ��g� �= 0�
Since � = F ⊕ F ⊥, we can write every h ∈ � in the form
h = ��h�
��g�
g +
(
h − ��h�
��g�
g
)
∈ F ⊥ ⊕ F�
hence
⟨
h − ��h�
��g�
g�
��h�
��g�
g
⟩
= 0 ⇐⇒ �h� g� = ��h�
��g�
�g� g�︸ ︷︷ ︸
= 1
⇐⇒ ��h� = ⟨h� ��g� g⟩�
and the proof is finished.
We will finally see how to represent elements of a Hilbert space using an
orthonormal base (ONB, for short). We begin with a definition.
21.10 Definition Let � be a Hilbert space. (i) The (linear) span of a family
�ek � k = 1� 2� � � � � N
⊂ �, N ∈ �∪ ��
, is the set of all finite linear combinations
240 R.L. Schilling
of the ek, i.e.
span�e1� e2� � � � � eN
=
{ n∑
k=1
k ek � 1� � � � � n ∈ �� n ∈ �� n � N
}
�
(ii) A sequence �ek�k∈� ⊂ � is called a (countable) orthonormal system (ONS,
for short) if
�ek� e�� =
{
0� if k �= ��
1� if k = ��
that is, �ek� = 1 and ek ⊥ e� whenever k �= �.
21.11 Theorem Let �ek�k∈� be an ONS in the Hilbert space � and denote by
E = E�N � = span�e1� � � � � eN
the linear span of e1� � � � � eN , N ∈ �.
(i) E = E�N � is a closed linear subspace, PEg =
∑N
k=1�g� ek� ek and
∥∥∥∥g −
N∑
k=1
�g� ek� ek
∥∥∥∥ < �g − f � ∀ f ∈ E� f �= PEg�
and also
�PEg�2 =
N∑
k=1
�g� ek�
2�
(ii) (Pythagoras’ theorem) For g ∈ �
�g�2 = �g − PEg�2 + �PEg�2 =
∥∥∥∥g −
N∑
k=1
�g� ek� ek
∥∥∥∥
2
+
N∑
k=1
�g� ek�
2�
(iii) (Bessel’s inequality) For g ∈ �
�∑
k=1
�g� ek�
2 � �g�2�
(iv) (Parseval’s identity) The sequence �
∑m
k=1 ck ek�m∈�, ck ∈ �, converges to
an element g ∈ � if, and only if, ∑�k=1
ck
2 < �. In this case, Parseval’s
identity holds:
�∑
k=1
ck
2 =
�∑
k=1
�g� ek�
2 = �g�2�
Measures, Integrals and Martingales 241
Proof (i) That E�N � is a linear subspace is due to the very definition of ‘span’.
The closedness follows from the fact that E�N � is generated by finitely many
ek: if f ∈ E�N � is of the form f =
∑N
j=1 cj ej , cj ∈ �, then
�f� ek� =
⟨ N∑
j=1
cj ej � ek
⟩
=
N∑
j=1
cj �ej � ek� = ck�
Let �f �n��n∈� ⊂ E�N � be a sequence with f �n�
n→�−−−→ f ∈ �. Then
∥∥∥∥f �n� −
N∑
j=1
�f� ej� ej
∥∥∥∥ =
∥∥∥∥
N∑
j=1
�f �n� − f� ej� ej
∥∥∥∥
�
N∑
j=1
∣∣�f �n� − f� ej�
∣∣· �ej�
�
N∑
j=1
�f �n� − f � (L20.3, �ej� = 1)
= N �f �n� − f � n→�−−−→ 0�
which shows that limn→� f �n� =
∑N
j=1�f� ej� ej ∈ E�N �.
If g ∈ �, we observe that g −∑Nj=1�g� ej� ej ⊥ ek for all k = 1� 2� � � � � N , since
for these k
⟨
g −
N∑
j=1
�g� ej� ej � ek
⟩
= �g� ek� −
N∑
j=1
�g� ej��ej � ek�
= �g� ek� − �g� ek� = 0�
Since � = E�N � ⊕ E�N �⊥, we get PE�N �g =
∑N
j=1�g� ej� ej , while (21.2) implies∥∥g −∑Nj=1�g� ej� ej
∥∥� �g − f � for f ∈ E�N �, with equality holding only if f =
PE�N�g because of uniqueness of PE�N �g. Finally,
�PE�N �g�2 = �PE�N �g� PE�N �g� =
⟨ N∑
j=1
�g� ej� ej �
N∑
k=1
�g� ek� ek
⟩
=
N∑
j�k=1
�g� ej� �g� ek� �ej � ek� =
N∑
j=1
�g� ej�
2�
where we used that ej is an ONS.
242 R.L. Schilling
(ii) follows from (21.8) and (i).
(iii) From (ii) we get for all N ∈ �
N∑
j=1
�g� ej�
2 = �g�2 − �g − PEg� � �g�2�
Since the right-hand side is independent of N ∈ �, we can let N → � and the
claim follows.
(iv) Since � is complete, it is enough to show that
(∑m
k=1 ckek
)
m∈� is a Cauchy
sequence. Because of the orthogonality of the ek we see (as in (i))
∥∥∥∥
n∑
k=m−1
ck ek
∥∥∥∥
2
=
n∑
k=m−1
ck
2 �ek�2 =
n∑
k=m−1
ck
2�
which means that
(∑m
k=1 ck ek
)
m∈� is a Cauchy sequence in � if, and only if,∑�
k=1
ck
2 converges. In the latter case, Parseval’s identity follows from (iii):
for g = ∑�k=1 ck ek we have PE�N �g =
∑N
k=1 ck ek and ck = �g� ek� by (i). Thus
by (ii),
�g�2 = �g − PE�N �g�2 +
N∑
j=1
�g� ej�
2
N →�−−−→
�∑
j=1
�g� ej�
2 =
�∑
j=1
cj
2�
Two questions remain: can we always find a countable ONS? If so, can we
use it to represent all elements of �? The answer to the first question is ‘yes’,
while the second question has to be answered by ‘no’, unless we are looking
at separable Hilbert spaces, see Definition 21.14 below. Here we will restrict
ourselves to the latter situation but we will point towards references where the
general case is treated.
21.12 Definition An ONS �ek�k∈� in the Hilbert space � is said to be maximal
(also complete, total, an orthonormal basis) if for every g ∈ �
�g� ek� = 0 ∀ k ∈ � =⇒ g = 0�
The idea behind maximality is that we can obtain � as limit of finite-dimensional
projections, ‘� = limN Pspan�e1�����eN
�’ or ‘� =
⊕
k∈��ek
’, if the limits and sum-
mations are understood in the right way. Here we see that the countability of the
ONS entails that � can be represented as closure of the span of countably many
Measures, Integrals and Martingales 243
elements – and that this is indeed a restriction should be obvious. Let us make
all this more precise.
21.13 Theorem Let �ek�k∈� be an ONS in the Hilbert space �. Then the following
assertions are equivalent.
(i) �ek�k∈� is maximal;
(ii)
⋃
N ∈�
span�e1� � � � � eN
is dense in �;
(iii) g =
�∑
j=1
�g� ej� ej ∀ g ∈ �;
(iv)
�∑
j=1
�g� ej�
2 = �g�2 ∀ g ∈ �;
(v)
�∑
j=1
�g� ej� �h� ej� = �g� h� ∀ g� h ∈ �.
Proof (i)⇒(ii): Since F �= ⋃N ∈� span�e1� � � � � eN
= span�ej � j ∈ �
is a lin-
ear subspace of �, the assertion follows from the definition of maximality and
Remark 21.7(iii).
(ii)⇒(iii) is obvious since
�∑
j=1
�g� ej� ej = lim
N →�
N∑
j=1
�g� ej� ej = lim
N →�
PE�N �g�
(iii)⇒(iv) follows from Theorem 21.11(iv).
(iv)⇒(v) follows from the polarization identity (20.13).
(v)⇒(i): If �u� ek� = 0 for some u ∈ � and all k ∈ �, we get from (v) with
g = h = u that
0 =
�∑
j=1
�u� ej� �u� ej� = �u� u� = �u�2�
and therefore u = 0.
Theorem 21.13 solves the representation issue. To find an ONS, we recall
first what we do in a finite-dimensional vector space V to get a basis. If V =
244 R.L. Schilling
span�v1� � � � � vN
, we remove recursively all v
′
1� � � � � v
′
k such that still V = span(
�v1� � � � � vN
\ �v′1� � � � � v′k
)
. This procedure gives us in at most N steps a mini-
mal system �w1� � � � � wn
⊂ �v1� � � � � vN
, N = n + k, with the property that V =
span�w1� � � � � wn
. Note that this is, at the same time, a maximally independent
system of vectors in V . We can now rebuild �w1� � � � � wn
into an ONS by the
Gram–Schmidt orthonormalization procedure:
e1 �=
w1
�w1�
� and recursively
ẽ j+1 �= wj+1 − Pspan�e1�����ej
wj+1
= wj+1 −
j∑
�=1
�wj+1� e�� e� �
ej+1 �=
ẽ j+1
�ẽ j+1�
�
⎫
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭
(21.10)
Another interpretation of (21.10) is this: If we had unleashed the Gram–Schmidt
procedure on the set �v1� � � � � vN
, we would have obtained again n orthonormal
vectors[�], say, f1� � � � � fn (which are, in general, different from e1� � � � � en con-
structed from w1� � � � � wn). A close inspection of (21.10) shows that at each step
V = span�f1� � � � � fj � vj+1� � � � � vN
, so that (21.10) extends an partially existing
basis �f1� � � � � fj
to a full ONB �f1� � � � � fn
. This means that (21.10) is also a
‘basis extension procedure’.
To get (21.10) to work in infinite dimensions we must make sure that � is
the closure of the span of countably many vectors. This motivates the following
convenient (but somewhat restrictive)
21.14 Definition A Hilbert space � is said to be separable if � contains a
countable dense subset G ⊂ �.
21.15 Theorem Every separable Hilbert space � has a maximal ONS.
Proof Let G = �gj
j∈� be an enumeration of some countable dense subset of
� and consider the subspaces Fk = span�g1� � � � � gk
. Note that Fk ⊆ Fk+1,
dimFk � k and that
⋃
k∈� Fk is dense in �. Now construct an ONB in the
finite-dimensional space Fk and extend this ONB using (21.10) to an ONB in
Fk+1, etc. This produces a sequence �ej �j∈� of orthonormal elements such
that span�ej � j ∈ �
=
⋃
k∈� Fk = G is dense in � and T21.13 completes the
proof.
Measures, Integrals and Martingales 245
21.16 Remarks (i) Assume that � is separable. Then we have the following
‘algebraic’ interpretation of the results in 21.11–21.15. Consider the maps
coordinate projection
∏
� � → �2����
g �→ ��g� ej��j∈�
(re-)construction map
∐
� �2���� → �
�cj �j∈� �→
�∑
j=1
cj ej �
Because of Theorem 21.11(iv), both
∏
and
∐
are well-defined maps, and Theorem
21.11 shows that Diagram 1 (below, left) commutes, i.e.
∏�∐ = id�2����.
Diagram 1 Diagram 2
id
H�
2
� (�)
� 2� (�)
Π
Π
H
id
H
� 2� (�)
Π
Π
This means that, if we start with a square-summable sequence, associate an
element from � with it and project to the coordinates, we get the original sequence
back.
The converse operation, if we start with some h ∈ �, project h down to its
coordinates, and then try to reconstruct h from the (square integrable) coordinate
sequence, is much more difficult, as we have seen in Theorems 21.13 and 21.15.
Nevertheless, it can be done in every separable Hilbert space, and Diagram 2
(above, right) becomes commutative, i.e.
∐�∏ = id�.
This shows that every separable Hilbert space � can be isometrically mapped
onto �2����. The isometry is given by Parseval’s identity 21.11(iv):
∑
j∈�
�h� ej�
2 =
∥∥∏h
∥∥2
�2
= �h�2��
(ii) If � is not separable, we can still construct an ONB but now we need
transfinite induction or Zorn’s lemma. A reasonably short account is given in
Rudin’s book [40, pp. 83–88]. The results 21.11–21.13 carry over to this case if
one makes some technical (what is an uncountable sum? etc.) modifications.
246 R.L. Schilling
Problems
21.1. Show that every convergent sequence in � is a Cauchy sequence.
21.2. Show that g �→ �g� h�, h ∈ �, is continuous.
21.3. Show that �
�g� h�
� �= (�g�p + �h�p)1/p is for every p � 1 a norm on � × �. For
which values of p does � × � become a Hilbert space?
21.4. Show that �g� h� �→ �g� h� and �t� h� �→ t h are continuous on � × � resp. � × �.
21.5. Show that a Hilbert space � is separable if, and only if, � contains a countable
maximal orthonormal system.
21.6. Show that for � = L2�X� �� �� and w ∈ L2 the set M ⊥w �= �u ∈ L2 �
∫
u w d� = 0
⊥
is either �0
or a one-dimensional subspace of �.
21.7. Let �ej �j∈� ⊂ � be an orthonormal system.
(i) Show that no subsequence of �ej �j∈� converges. However, limj→��ej � h� = 0
for every h ∈ �.
[Hint: show that it can’t be a Cauchy sequence. Use Bessel’s inequality.]
(ii) The Hilbert cube Q �=
{
h ∈ � � h =
�∑
j=1
cj ej �
cj
� 1j � j ∈ �
}
is closed,
bounded and compact (i.e. every sequence has a convergent subsequence).
(iii) The set R �=
�⋃
j=1
B1/j �ej � is closed, bounded but not compact (cf. (ii)).
(iv) The set S �=
{
h ∈ � � h =
�∑
j=1
cj ej �
cj
� �j � j ∈ �
}
is closed, bounded and
compact (cf. (ii)) if, and only if,
∑�
j=1 �
2
j < �.
21.8. Let � be a real Hilbert space.
(i) Show that �h� = sup
g �=0
�g� h�
�g� = sup�g��1
�g� h�
= sup
�g�=1
�g� h�
�
(ii) Can we replace in (i)
�•� •�
by �•� •�?
(iii) Is it enough to take g in (i) from a dense subset rather than from � (resp.
B1�0� or �k ∈ � � �k� = 1
)?
21.9. Show that the linear span of a sequence �ek�k∈� ⊂ �, span�ek � ek ∈ �� k ∈ �
, is
a linear subspace of �.
21.10. A weak form of the uniform boundedness principle. Consider the real Hilbert
space �2 = �2���� and let a = �aj �j∈� and b = �bj �j∈� be two sequences of real
numbers.
(i) Assume that
∑�
j=1 a
2
j = �. Construct a sequence �jk�k∈� such that j1 = 0 and∑
jk
(ii) Define bj �= �k aj for all jk < j � jk+1, k ∈ � and show that one can determine
the �k in such a way that
∑�
j=1 aj bj = � while
∑�
j=1 b
2
j < �.
(iii) Conclude that if �a� b� < � for all a ∈ �2, we have necessarily b ∈ �2.
(iv) State and prove the analogue of (iii) for all separable Hilbert spaces.
Measures, Integrals and Martingales 247
Remark. The general uniform boundedness principle states that in every Hilbert
space � and for any H ⊂ � one has
sup
h∈H
�h� g�
< � ∀ g ∈ � =⇒ sup
h∈H
�h� < ��
Interpreting �h � g �→ �g� h� as linear map, this says that the boundedness of the
orbits �h��� for all h ∈ H implies that the set H is bounded. This formulation
perseveres even in Banach spaces. The proof is normally based on Baire’s category
theorem, cf. Rudin [40].
21.11. Let F� G ⊂ � be linear subspaces. An operator P defined on G is called (�-) linear,
if P� f +
g� = Pf +
Pg holds for all �
∈ � and f� g ∈ G.
(i) Assume that F is closed and that P � � → F is the orthogonal projection. Then
P2 = P and �Pg� h� = �g� Ph� ∀ g� h ∈ �� (21.11)
(ii) If P � � → � is a map satisfying (21.11), then P is linear and P is the
orthogonal projection onto the closed subspace P���.
(iii) If P � � → � is a linear map satisfying
P2 = P and �Ph� � �h� ∀ h ∈ ��
then P is the orthogonal projection onto the closed subspace P���.
21.12. Let �X� �� �� be a measure space and �Aj �j∈� ⊂ � be mutually disjoint sets such
that X = ·⋃
j∈�Aj . Set
Yj �=
{
u ∈ L2��� �
∫
Acj
u
2 d� = 0
}
� j ∈ � �
(i) Show that Yj ⊥ Yk if j �= k.
(ii) Show that span
( ·⋃
j∈�Yj
)
, (i.e. the set of all linear combinations of finitely
many elements from ·⋃
j∈�Yj ) is dense in L
2���.
(iii) Find the projection Pj � L
2��� → Yj .
21.13. Let �X� �� �� be a measure space and assume that �Aj �j∈� ⊂ � is a sequence
of pairwise disjoint sets such that ·⋃
j∈�Aj = X and 0 < ��Aj � < �. Denote by
�n �= ��A1� A2� � � � � An� and by �� �= ��Aj � j ∈ ��.
(i) Show that L2��n� ⊂ L2��� and that L2��n� is a closed subspace.
(ii) Find an explicit formula for E�n u where E�n is the orthogonal projection
E�n � L2��� → L2��n�.
(iii) Determine the orthogonal complement of L2��n�.
(iv) Show that
(
E�n u
)
n∈�∪��
is a martingale.
(v) Show that E�n u
n→�−−→ E�� u a.e. and in L2.
(vi) Conclude that L2���� is separable.
22
Conditional expectations in L2
Throughout this chapter �X� �� �� will be some measure space.
We have seen in Chapter 20 that L2���� = L2���� ⊕ i L2����. By considering
real and imaginary parts separately, we can reduce many assertions concerning
L2���� to L
2
����. From Chapter 21 we know that L
2
���� is a Hilbert space with
inner product, resp. norm
�u� v� = �u� v�2 =
∫
u v d�� resp. �u� = �u�2 =
(∫
u2 d�
)1/2
�
Since a function1 u ∈ L2���� is only almost everywhere defined and since
(square-) integrable functions with values in �̄ are almost everywhere finite,
hence �-valued, cf. Remark 12.5, we can identify L2���� and L
2
�̄
���. We will
do so and simply write L2���.
Caution: Note that for functions u� v ∈ L2��� expressions of the type u = v,
u � v � � � always mean u�x� = v�x�, u�x� � v�x� � � � for all x outside some �-null set.
In this chapter we are mainly interested in linear subspaces of L2��� and pro-
jections onto them. One particularly important class arises in the following way:
if � ⊂ � is a sub-�-algebra of �, then any �-measurable function is certainly
�-measurable. Since �X� �� ���� is also a measure space, it seems natural to
interpret L2��� ���� (with norm � · �L2���) as a subspace of L2��� �� (with norm
� · �L2���). This can indeed be done.
22.1 Lemma Let � ⊂ � be a sub-�-algebra of �. Then
ı � �2��� �� → �2��� �� and j � L2��� �� → L2��� ��
1 Strictly speaking we should call it an equivalence class of functions, cf. Remark 12.5.
248
Measures, Integrals and Martingales 249
are isometric imbeddings, i.e. linear maps satisfying �ı�u���2��� = �u��2��� and
�j�w��L2��� = �w�L2��� for all u ∈ �2��� ��, resp. w ∈ L2��� ��. In particular
�2���
L2���� is a closed linear subspace of �2���
L2����.
Proof Since � ⊂ � and since ��� and � coincide on �, we have ���� ⊂ ����
for the simple functions. The map
ı
� ���� → ����� f �→ ı
�f � �= f =
N∑
j=0
�j 1Gj �
where the latter is a standard representation of f with �j ∈ � and Gj ∈ �, clearly
satisfies
�f �2
�2���
=
N∑
j=0
�2j ��Gj � =
N∑
j=0
�2j ������Gj � = �f �2�2��� ∀ f ∈ �����
According to Corollary 12.11 we can find for every u ∈ �2��� a sequence
�fk�k∈� ⊂ ���� ∩ �2��� such that limk→� �u − fk��2��� = 0. Therefore,
�ı
�fk� − ı
�f
���2��� = �ı
�fk� − ı
�f
���2���
k�
→�−−−−→ 0�
which shows because of the completeness of �2��� (cf. Theorem 12.7) that
ı�u� �= �2���- limk→� ı
�fk� is a linear isometry from �2��� to �2���. Since
�2��� is complete, ı��2���� is a closed linear subspace of �2���.
Denote by
u� ∈ L2 the equivalence class containing the function u ∈ �2. Since
for any two u� w ∈ �2��� with u = w a.e. we also have ı�u� = ı�w� a.e., the map
j�
u�� �=
ı�u�� is independent of the chosen representative for
u� and defines
a linear isometry j � L2��� → L2���. As before, j�L2���� is a closed linear
subspace of L2���.
It is customary to identify u ∈ L2��� with j�u� ∈ L2��� and we will do so in
the sequel. Unless we want to stress the �-algebra, we will write � instead of
��� and � · �2 for the norm in L2��� ���� and L2��� ��.
A key observation is that the choice of � ⊂ � determines our knowledge about
a function u.
22.2 Example Consider a finite measure space �X� �� �� and the sub-�-algebra
� �= �∅� G� Gc� X� where G ∈ � and ��G� > 0, ��Gc� > 0. Let f ∈ ���� be a
simple function in standard representation:
f =
m∑
j=0
yj 1Aj � yj ∈ �� Aj ∈ ��
250 R.L. Schilling
Then
∫
G
f d� =
m∑
j=0
yj
∫
G
1Aj d� =
m∑
j=0
yj
��Aj ∩ G�
��G�
︸ ︷︷ ︸
=� �1
��G� = �1
∫
1G d� � (22.1)
and similarly,
∫
Gc
f d� =
m∑
j=0
yj
��Aj ∩ Gc�
��Gc�
︸ ︷︷ ︸
=� �2
��Gc� = �2
∫
1Gc d� � (22.2)
This indicates that we could have obtained the same results in the integrations
(22.4) and (22.5) if we had not used f ∈ ���� but the �-simple function
g �= �1 1G + �2 1Gc ∈ ���� (22.3)
with �1� �2 from above. In other words, f and g are indistinguishable, if we
evaluate (i.e. integrate) both of them on sets of the �-algebra �. Note that g is
much simpler than f , but we have lost nearly all information of what f looks like
on sets from � save �: if we take a set from the standard representation of f ,
say, Aj0 � G, Aj0 ∈ �, then∫
Aj0
f d� = yj0 ��Aj0 � �=
∫
Aj0
g d� = �1 ��Aj0 ∩ G�
=
( m∑
j=0
yj
��Aj ∩ G�
��G�
)
︸ ︷︷ ︸
= �1
·��Aj0 ��
i.e. we would get a weighted average of the yj rather than precisely yj0 .
Let us extend the process sketched in Example 22.2 to �-finite measures and
general square-integrable functions. Our starting point is the observation that,
with the notation of 22.2,
∫
f g d� = �1
∫
G
f d� + �2
∫
Gc
f d� = �21 ��G� + �22 ��Gc� =
∫
g2 d� �
that is, �f − g� g� = 0 or �f − g� ⊥ g in the space L2���.
22.3 Definition Let �X� �� �� be a measure space and � ⊂ � be a sub-�-algebra.
The conditional expectation of u ∈ L2��� relative to � is the orthogonal projection
onto the closed subspace L2���
E� � L2��� → L2���� u �→ E�u�
Sometimes one writes E�u � �� instead of E�u.
Measures, Integrals and Martingales 251
The terminology ‘conditional expectation’ comes from probability theory where
this notion is widely used and where �X� �� �� is usually a probability space. In
slight abuse of language we continue to call E� conditional expectation even if
� is not a probability measure. Let us collect some properties of E�.
22.4 Theorem Let �X� �� �� be a measure space and � ⊂ � a sub-�-algebra.
The conditional expectation E� has the following properties (u� w ∈ L2���):
(i) E�u ∈ L2���;
(ii) �E�u�L2��� � �u�L2���;
(iii) �E�u� w� = �u� E�w� = �E�u� E�w�;
(iv) E�u is the unique minimizer in L2��� such that
�u − E�u�L2��� = inf
g∈L2���
�u − g�L2����
(v) u = w =⇒ E�u = E�w;
(vi) E���u + �w� = � E�u + � E�w for all �� � ∈ �;
(vii) If ⊂ � is a further sub-�-algebra, then E E�u = E u;
(viii) E��g u� = g E�u for all g ∈ L����;
(ix) E�g = g for all g ∈ L2���;
(x) 0 � u � 1 =⇒ 0 � E�u � 1;
(xi) u � w =⇒ E�u � E�w;
(xii) �E�u� � E��u�;
(xiii) E�∅�X�u = 1
��X�
∫
u d� for all u ∈ L1��� ∩ L2��� ( 1� �= 0
)
.
Before we turn to the proof of the above properties let us stress again that all
(in-)equalities in (i)–(xiii) are between L2-functions, i.e. they hold only �-almost
everywhere. In particular, E�u is itself only determined up to a �-null set N ∈ �.
Proof (of Theorem 22.4) Properties (i)–(vi) and (ix) follow directly from Theo-
rem 21.5, Corollary 21.6 and Remark 21.7.
(vii): For all u� w ∈ L2��� we find because of (iii)
�E E�u� w� (iii)= �E�u� E w� = �u� E� E w︸︷︷︸
∈L2���
�
(ix)= �u� E w�
= �E u� w�
as E w ∈ L2� � ⊂ L2���. Since w ∈ L2��� was arbitrary, we conclude that
E E�u = E u.
252 R.L. Schilling
(viii): Writing u = f + f ⊥ ∈ L2��� ⊕ L2���⊥, we get g u = g f + g f ⊥. More-
over, we have for any � ∈ L2��� and g ∈ L���� that g � ∈ L2���, thus
�g f ⊥� �� = �f ⊥� g �� = 0�
and from the uniqueness of the orthogonal decomposition we infer that
�g f �⊥ = g f ⊥ or E��g u� = g f = g E�u�
(x): Let 0 � u � 1. Since E�u ∈ L2���, the Markov inequality P10.12 implies
�
({�E�u� > 1
n
})
� n2 �E�u�2
L2���
� n2 �u�2
L2���
< �� (22.4)
and so �n �= 1��E�u�>1/n� ∈ L2���. Therefore,
∫
E�u 1�E�u<0��n d�
22.4(iii)=
∫
u E��1�E�u<0��n� d�
22.4(ix)=
∫
u 1�E�u<0��n d� � 0�
which is only possible if �
(
�E�u < 0� ∩ ��E�u� > 1/n�) = 0, that is, if
�
({
E�u < 0
}) = �
( ⋃
n∈�
{
E�u < 0
}∩{�E�u� > 1/n}
)
= sup
n∈�
�
({
E�u < 0
}∩{�E�u� > 1/n}) = 0�
hence E�u � 0.
With very similar arguments we see that 1�E�u>1� ∈ L2���, and since u � 1 we
have
∫
E�u 1�E�u>1� d�
22.4(iii),(ix)=
∫
u 1�E�u>1� d� � �
({
E�u > 1
})
�
which entails ���E�u > 1�� = 0 or E�u � 1.
(xi): Using that w −u � 0, the first part of the proof of (x) shows E��w −u� � 0,
so that by linearity E�w � E�u.
(xii): Again by the proof of (x) we find for �u� ± u � 0 that E���u� ± u� � 0,
and by linearity ±E�u � E��u�. This proves
∣∣E�u
∣∣ � E��u�.
(xiii): If ��X� = �, we have L2��∅� X�� = �0�,[�] thus E�∅�X�u = 0 and the
formula clearly holds.
Measures, Integrals and Martingales 253
If ��X� < �, we have L2��∅� X�� � �, and E�∅�X�u = c is a constant. By (iv),
c = E�∅�X�u minimizes �u − c�L2���, and
∫
�u − c�2 d� =
∫
u2 d� − 2c
∫
u d� + c2
∫
d�
=
∫
u2 d� − 1
��X�
(∫
u d�
)2
+ 1
��X�
(∫
u d� − c ��X�
)2
shows that c = 1
��X�
∫
u d� is the unique minimizer.
In Chapter 23 we will extend the operator E� to
⋃
p�1 L
p��� and add a few
further properties.
On2 the structure of subspaces of L2
In the rest of this chapter we want to address a different question. As we have seen,
E� � L2��� → L2��� is a symmetric orthogonal projection onto the closed sub-
space L2��� of the Hilbert space L2���. It is natural to ask whether every orthog-
onal projection � � L2��� →
onto a closed subspace
⊂ L2��� is a conditional
expectation. Equivalently we could ask under which conditions a closed subspace
of L2��� is of the form L2��� =
for a suitable sub-�-algebra � ⊂ �.
22.5 Theorem Let �X� �� �� be a �-finite measure space. For a closed linear
subspace
⊂ L2��� and its orthogonal projection � = P
� L2��� →
, the
following assertions are equivalent.
(i)
= L2��� and � = E� for some sub-�-algebra � ⊂ � containing an
exhausting sequence �Gj �j∈� ⊂ � with Gj ↑ X and ��Gj � < �.
(ii) � is a sub-Markovian operator, i.e. 0 � u � 1 =⇒ 0 � ��u� � 1, u ∈ L2���,
and for some u0 ∈ L2��� with u0 > 0 we have ��u0� > 0.
(iii)
∩ L���� is an algebra – i.e. it is closed under pointwise products: f� h ∈
∩ L���� =⇒ f h ∈
∩ L���� – which is L2-dense in
and contains
an (everywhere) strictly positive function h0 > 0.
(iv)
is a lattice – i.e. f� h ∈
=⇒ f ∧ h ∈
– containing an (everywhere)
strictly positive function h0 > 0, and for all h ∈
also h ∧ 1 ∈
.
Proof We show that (i)⇒(ii)⇒(iv)⇒(i)⇒(iii)⇒(iv).
(i)⇒(ii) The sub-Markov property of � = E� follows from Theorem 22.4(x),
while
u0 �=
�∑
j=1
2−j√
��Gj � + 1
1Gj ∈ �+���
2 This section can be left out at first reading.
254 R.L. Schilling
clearly satisfies 0 < u0 � 1,
�u0�2 =
∥∥∥∥
�∑
j=1
2−j√
��Gj � + 1
1Gj
∥∥∥∥
2
�
�∑
j=1
2−j√
��Gj � + 1
�1Gj �2
=
�∑
j=1
2−j
√
��Gj �
��Gj � + 1
� 1�
so that u0 ∈ L2��� and, therefore, 0 < ��u0� = u0 � 1.
(ii)⇒(iv) Since � preserves positivity, we find for all u ∈ L2��� that ��u+� � 0
and ��u� = ��u+� − ��u−� � ��u+�, thus ��u� ∨ 0 � ��u+�.
On the other hand,
= �h ∈ L2��� � ��h� = h� and the above calculation
shows for h ∈
h+ = ���h��+ = ��h� ∨ 0 � ��h+�� (22.5)
Since � is a contraction, see (21.4), we find also
�h+�2
(22.5)
� ���h+��2 � �h+�2�
which implies ���h+�� ��h+�� = ���h+�� h+� = �h+� h+�. Because of (22.5) we
get ���h+� − h+︸ ︷︷ ︸
�0
� h+� = 0 or ��h+� = h+ on the set �h+ > 0�. But then
���h+��2 = �h+�2 =⇒
∫
�h+=0�
��h+�2 d� =
∫
�h+=0�
�h+�2 d� = 0�
which shows that ��h+� = 0 on �h+ = 0� or ���h+ = 0�� = 0. In either case,
��h+� = h+ (almost everywhere) and h+ ∈
.
Consequently, f ∧ h = f − �f − h�+ ∈
. Similarly, h ∧ 1 = h − �h − 1�+ and,
if h ∈
, we see ��h ∧ 1� � ��h� ∧ 1 = h ∧ 1. Further,
���h − 1�+� = ��h� − ��h ∧ 1� � h − h ∧ 1 = �h − 1�+�
and since � is a contraction, the same argument which we used to get ��h+� = h+
yields ���h − 1�+� = �h − 1�+, hence �h − 1�+� h ∧ 1 ∈
. Finally, h0 �= ��u0�,
u0 as in (ii), satisfies h0 ∈
and h0 > 0.
(iv)⇒(i) We set
� �= �G ∈ � � h ∧ 1G ∈
∀ h ∈
��
Let us first show that � is a �-algebra. Clearly, ∅� X ∈ �. If G ∈ �, then
h ∧ 1Gc + h ∧ 1G︸ ︷︷ ︸
∈
= h ∧ 1 + h ∧ 0 ∈
∀ h ∈
�
Measures, Integrals and Martingales 255
which means that h ∧ 1Gc ∈
and Gc ∈ �. For any two sets G� H ∈ � we see3
h ∧ 1G∪H = h ∧ �1G ∨ 1H � = �h ∧ 1G� ∨ �h ∧ 1H � ∈
∀ h ∈
�
so that G ∪ H ∈ �. Finally, let �Gj �j∈� ⊂ �; since � is ∪-stable, we may assume
that Gj ↑ G �=
⋃
j∈� Gj . Then
�h ∧ 1Gj �j∈� ⊂
and limj→��h ∧ 1Gj � = h ∧ 1G ∈ L
2��� ∀ h ∈
�
Since �h ∧ 1Gj � � h ∧ 1G ∈ L2, an application of the dominated convergence
theorem 12.9 shows that in L2��� limj→� h ∧ 1Gj = h ∧ 1G for all h ∈
. Since
is a closed subspace, we conclude that h ∧ 1G ∈
for all h ∈
, thus G ∈ �.
We will now show that L2��� =
. If f ∈
we know from our assumptions
that �±f� ∧ 0 ∈
, so that f + = −��−f� ∧ 0�, f − = −�f ∧ 0� ∈
. Thus
=
+ −
+, and since also L2��� = L2+��� − L2+��� it is clearly enough to show
that
+ = L2+���.
Assume that f ∈
+. Then for a > 0, f ∧ a = a(f
a
∧ 1) ∈
, and by monotone
convergence T11.1 and the closedness of
,
h ∧ 1�f>a� = h ∧ sup
n∈�
{
nf − n�f ∧ a�� ∧ 1} ∈
∀ h ∈
�
proving that �f > a� ∈ � for all a > 0. Moreover, �f > a� = X if a < 0 and
�f > 0� = ⋃k∈��f > 1/k�, which shows that �f > a� ∈ � for all a ∈ � and,
consequently,
+ ⊂ L2+���.
Conversely, if g ∈ L2+���, we can write g as limit of simple functions, gn =∑N�n�
j=1 y
�n�
j 1G�n�j
with disjoint sets G
�n�
1 � � � � � G
�n�
N�n�
∈ � and y�n�j > 0. For all h ∈
we find
gn ∧ h =
N�n�∑
j=1
(
y
�n�
j 1G�n�j
)∧ h =
N�n�∑
j=1
y
�n�
j
(
1
G
�n�
j
∧ h
y
�n�
j
)
∈
�
and dominated convergence T12.9 and the closedness of
imply that g ∧ h ∈
.
Choosing, in particular, h = n h0 for some a.e. strictly positive function h0 and
letting n → � gives
g = L2- lim
n→��n h0� ∧ g ∈
+�
where we again used monotone convergence T11.1 and the closedness of
. This
proves L2+��� ⊂
+.
Finally, the sets Gj �= �h0 > 1/j� ∈ � satisfy Gj ↑ X and, because of the
Markov inequality P10.12, ��Gj � = ���h0 > 1/j�� � j2
∫
h20 d� < �.
3 Use that
is a vector space and a ∨ b = −��−a� ∧ �−b��.
256 R.L. Schilling
(i)⇒(iii): Note that L2��� ∩ L���� = L2��� ∩ L����. An application of the
dominated convergence theorem 12.9 shows that the sequence fn �= �−n� ∨ f ∧ n,
n ∈ �, f ∈ L2���, converges in L2��� to f , i.e. L2��� ∩ L���� is a dense subset
of L2���. The element h0 > 0 is now constructed as in the proof of (i)⇒(ii).
That L2��� ∩ L���� is an algebra is trivial.
(iii)⇒(iv): Let us show, first of all, that
∩ L���� is stable under minima.
To this end we define recursively a sequence of polynomials in �,
p0�x� �= 0� pn+1�x� �= pn�x� + 12 �x2 − p2n�x��� n ∈ �0�
By induction it is easy to see that pn�0� = 0 for all n ∈ �0 and that
0 � pn�x� � pn+1�x� � �x� ∀ x ∈
−1� 1��
For n = 0 there is nothing to show. Otherwise we can use the induction assumption
pn�x� � pn+1�x� � �x� to get
0 � pn+1�x� �
def= pn+2�x�︷ ︸︸ ︷
pn+1�x� + 12
(
x2 − p2n+1�x�
)
︸ ︷︷ ︸
�0
= �x� − ��x� − pn+1�x��︸ ︷︷ ︸
�0
·(1 − 12 ��x� + pn+1�x��
)
︸ ︷︷ ︸
�0 for x∈
−1�1�
� �x��
Therefore, limn→� pn�x� = supn∈� pn�x� =
�x� exists for all �x� � 1 and, accord-
ing to the recursion relation,
�x� = �x�.
Since
is a linear subspace which is stable under products, we get for every
h ∈
∩ L���� that pn�h/�h��� ∈
, and monotone convergence T11.1 and the
closedness of
show
sup
n∈�
pn
( h
�h��
)
= �h��h��
∈
=⇒ �h� ∈
�
As
∩ L���� is dense in
, we find for h ∈
a sequence �hk�k∈� ⊂
∩ L����
such that L2- limk→� hk = h. From above we know, however, that �hk� ∈
and
�hk�
k→�−−−→ �h� in L2���, thus �h� ∈
. This shows, in particular, that
f ∧ h = 12 �f + h − �f − h�� ∈
∀ f� h ∈
�
Since 0 < h0 � ��h0���, we get for all n � ��h0��� that
n
n − h0
=
�∑
j=0
(h0
n
)j
= sup
N ∈�
N∑
j=0
(h0
n
)j
�
Measures, Integrals and Martingales 257
and for h ∈
,
h ∧ n
n − h0
= lim
N →�
N∑
j=0
(h0
n
)j
∧ h
︸ ︷︷ ︸
∈
�
By monotone convergence T11.1 we conclude that h ∧ n
n−h0 ∈
. Finally, as
n
n−h0 ↓ 1 and h
2 ∧( n
n−h0
)2
� h2, we can use the dominated convergence theorem
12.9 and the closedness of
to see that h ∧ 1 ∈
.
Problems
22.1. Let �X� �� �� be a measure space and ⊂ � be two sub-�-algebras. Show that
E E�u = E�E u = E u ∀ u ∈ L2����
22.2. Let �X� �� �� be a measure space, � ⊂ � be a sub-�-algebra and let � �= f �
where f ∈ �+��� is a density f > 0.
(i) Denote by E�� resp. E
�
� the projections in the spaces L
2��� ��, resp. L2��� ��.
Express E�� in terms of E
�
�.
[Hint: E�� u = E���fu�/E��f 1�E��f>0�]
(ii) Under which condition do we have E��u = E�� u for all u ∈ L2��� ��∩L2��� �� ?
Remark. The above result allows us to study conditional expectations for finite
measures � only and to define for more general measures a conditional expecta-
tion by
E�� u �=
E���fu�
E��f
1
�E��f>0�
�
22.3. Let �X� �� �� be a finite measure space, G1� � � � � Gn ∈ � such that ·
⋃n
j=1Gj = X
and ��Gj � > 0 for all j = 1� 2� � � � � n. Then
E�u =
n∑
j=1
[∫
Gj
u�x�
��dx�
��Gj �
]
1Gj �
Remark. The measure 1Gj �/��Gj � = ��• ∩ Gj �/��Gj � is often called the condi-
tional probability given Gj .
23
Conditional expectations in Lp
Throughout this chapter �X� �� �� will be some measure space.
Our aim is to extend the operator E� from L2��� to a wider class of func-
tions including the spaces Lp���, 1 � p � �. We will use the same technique
that allowed us in Chapters 9 and 10 to extend the integral from the simple
functions ���� to the positive measurable functions �+��� and integrable
functions �1���.
Since we are now considering the spaces Lp of (equivalence classes of) pth
power integrable functions, it is convenient to have an analogous notion for
measurable functions.
23.1 Definition Let �X� �� �� be a measure space. Two functions u� v ∈ ����
are called equivalent, u ∼ v, if �u �= v� ∈ �� is a �-null set. We write M��� �=
����/∼ for the set of all equivalence classes of measurable functions u ∈ ����.
As with Lp-functions, all (in-)equalities between elements from M��� hold
pointwise almost everywhere.
23.2 Lemma Let �X� �� �� be a -finite measure space. Then u ∈ M +���
if, and only if, there exists an increasing sequence �uj �j∈� ⊂ L2+��� such that
u = supj∈� uj .
Proof The ‘only if’ part ‘⇐’ is trivial since suprema of countably many mea-
surable functions are again measurable (C8.9). For ‘⇒’ let �Aj �j∈� ⊂ � be an
exhausting sequence such that Aj ↑ X and ��Aj � < �. If u ∈ M +���, then
uk �= �u ∧ k� 1Ak ∈ L2+��� and supk∈� uk = u.
23.3 Remark Lemma 23.2 is no longer true if �X� �� �� is not -finite. In fact,
if 1 ∈ M +��� can be approximated by an increasing sequence �uk�k∈� ⊂ L2+���,
1 = supk∈� uk, the sets Ak �= �uk > 1/k� would form an increasing sequence
258
Measures, Integrals and Martingales 259
Ak ↑ X with ��Ak� = ���uk > 1/k�� � k2
∫
u2k d� < � by the Markov inequality
P10.12.
The key technical point is the following result.
23.4 Lemma Let �X� �� �� be a measure space, � ⊂ � be a sub- -algebra, and
�uj �j∈�� �wj �j∈� ⊂ L2��� be two increasing sequences. Then
sup
j∈�
uj = sup
j∈�
wj =⇒ sup
j∈�
E�uj = sup
j∈�
E�wj
(23.1)
If u �= supj∈� uj is in L2���, the following conditional monotone convergence
property holds:
sup
j∈�
E�uj = E�
(
sup
j∈�
uj
)
= E�u� (23.2)
in L2��� and almost everywhere.
Proof Let us first of all assume that uj ↑ u and u ∈ L2���. Monotone convergence
11.1 and Theorem 12.7 show that uj
j→�−−−→ u also in L2-sense. By Theorem
22.4(xi), �E�uj �j∈� is again an increasing sequence and E
�uj � E�u. From
22.4(ii), (vi) we get
∥∥E�u − E�uj
∥∥
2
=
∥∥E��u − uj �
∥∥
2
� �u − uj�2�
i.e. L2-limj→� E
�uj = E�u. For a subsequence �uj�k��k∈� ⊂ �uj �j∈� we even
have limk→� E
�uj�k� = E�u a.e., cf. Corollary 12.8. Because of the monotonicity
of the sequence �E�uj �j∈�, we get for all j > j�k�
∣∣E�u − E�uj
∣∣ = E�u − E�uj � E�u − E�uj�k��
and letting first j → � and then k → � gives E�uj ↑ E�u a.e. This finishes the
proof of (23.2).
If �uj �j∈�� �wj �j∈� ⊂ L2��� are any two increasing sequences1 such that
supj∈� uj = supj∈� wj , we can apply (23.2) to the increasing sequences uj ∧ wk ↑
uj (as k → � and for fixed j) and uj ∧ wk ↑ wk (as j → � and for fixed k). This
shows
sup
j∈�
E�uj
(23.2)= sup
j∈�
sup
k∈�
E��uj ∧ wk� = sup
k∈�
sup
j∈�
E��uj ∧ wk�
(23.2)= sup
k∈�
E�wk
A combination of Lemmata 23.2 and 23.4 allows us to define conditional
expectations for positive measurable functions in a -finite measure space.
1 We do not assume that supj∈� uj � supj∈� wj ∈ L2���.
260 R.L. Schilling
23.5 Definition Let �X� �� �� be a -finite measure space and � ⊂ � be a sub-
-algebra. Let u ∈ M +��� and let �uj �j∈� ⊂ L2+��� be an increasing sequence
such that u = supj∈� uj . Then
E�u �= sup
j∈�
E�uj (23.3)
is called the conditional expectation of u with respect to �. If u ∈ M��� and
E�u± ∈ almost everywhere, we define (almost everywhere)
E�u �= E�u+ − E�u− = lim
j→�
(
E�u+j − E�u−j
)
� (23.4)
where u±j ↑ u± are suitable approximating sequences from L2+���.
We write L���� for the set of all functions u ∈ M��� such that (almost
everywhere) E�u exists and is finite.
23.6 Theorem Let �X� �� �� be a -finite measure space. The conditional
expectation E� extends E�, i.e. L2��� ⊂ L���� and E�u = E�u for all u ∈
L2���.
Proof Applying (23.2) to u+ and u− shows E�u± = E�u± and, in particular,
E�u± ∈ L2���. As such, E�u± is a.e. real-valued, so that (23.4) is always defined
in M���, resp. M���.
23.7 Theorem Let �X� �� �� be a -finite measure space. Then L���� is a
vector space and Lp��� ⊂ L���� for all 1 � p � �.
Proof Let 1 < p < � and take u ∈ Lp��� ∩ L2���. Since E�u ∈ L2���, the
Markov inequality P10.12 shows that ��E�u� = �� ∈ � is a �-null set and that
the sets Gn �= �n > �E�u� > 1/n� ∈ � have finite �-measure,
��Gn� � �
(
��E�u� > 1/n�) � n2
∫
�E�u�2 d� < �
Moreover,
∥∥E�u 1Gn
∥∥p
p
= ⟨E�u� �E�u�p−1 sgn�E�u� 1Gn
⟩
= ⟨u� �E�u�p−1 sgn�E�u� 1Gn
⟩
(by T22.4(iii), (ix))
� Cq �u�p�
where we used Hölder’s inequality T12.2 with p−1 + q−1 = 1 and
Cq =
( ∫
�E�u��p−1�q 1Gn d�
)1/q
=
( ∫
�E�u�p 1Gn d�
)1/q
Measures, Integrals and Martingales 261
Dividing the above inequality by Cq – if Cq = 0 there is nothing to show since
in this instance E�u = 0[�] – gives
∥∥E�u 1Gn
∥∥
p
� �u�p
As we have seen above, ���E�u = ��� = 0, so we can use Beppo Levi’s theorem
9.6 to find for all u ∈ L2��� ∩ Lp���
∥∥E�u
∥∥
p
=
∥∥E�u 1�0<�E�u�<��
∥∥
p
= sup
n∈�
∥∥E�u 1Gn
∥∥
p
� �u�p
(23.5)
For general u ∈ Lp��� ⊂ M���, 1 < p < �, we use Lemma 23.2 to find
sequences �u±j �j∈� ⊂ L2+��� such that 0 � u±j ↑ u±. Since u+j − u−j � �uj� � �u�,
T22.4(x)–(xii) and Fatou’s lemma T9.11 show
∥∥E�u
∥∥
p
(23.4)=
∥∥∥ lim
j→�
E��u+j − u−j �
∥∥∥
p
�
∥∥∥ lim inf
j→�
E��uj�
∥∥∥
p
� lim inf
j→�
∥∥E��uj�
∥∥
p
(23.5)
� lim inf
j→�
� �uj� �p � �u�p� (23.6)
which, in turn, shows that E�u is a well-defined function in Lp���.
If p = 1 and u ∈ L2��� ∩ L1���, the estimate (23.5) follows more easily from
∥∥E�u 1Gn
∥∥
1
= ⟨E�u� 1Gn
⟩ 21.4(iii),(ix)= ⟨u� 1Gn
⟩
� �u�1
for all u ∈ L2��� ∩ L1��� (notation as above), and we conclude that �E�u�1 �
�u�1 for u ∈ L1��� and that E�u is well-defined.
If p = � and u ∈ L2��� ∩ L���� we get from T22.4 and the observation that
�u�/�u�� � 1 ∣∣E��u/�u���
∣∣ � E���u�/�u��� � 1�
thus �E�u�� � �u�� for all u ∈ L2��� ∩ L����. This extends to E�u for general
u ∈ L���� as in the cases where p ∈ �1� ��.
It remains to show that L���� is a vector space. This follows immediately
from the formula E���u +
w� = � E�u +
E�w, which is easily proved from the
definition of E� via approximating sequences and the corresponding property (vi)
for E� of Theorem 22.4.
The properties of E� resemble those of E�. The theorem below is the analogue
of Theorem 22.4.
262 R.L. Schilling
23.8 Theorem Let �X� �� �� be a -finite measure space and let
⊂ � ⊂ �
be sub- -algebras. The conditional expectation E� has the following properties
�u� w ∈ L�����:
(i) E�u ∈ M���;
(ii) u ∈ Lp��� =⇒ E�u ∈ Lp��� and �E�u�p � �u�p; �p ∈ �1� ���;
(iii) �E�u� w� = �u� E�w� = �E�u� E�w�
�for u� w ∈ L����� u E�w ∈ L1���, e.g. if u ∈ Lp��� and w ∈ Lq���
with p−1 + q−1 = 1�;
(iv) u = w =⇒ E�u = E�w;
(v) E���u +
w� = �E�u +
E�w ���
∈ �;
(vi) � ⊃
=⇒ E
E�u = E
u;
(vii) g ∈ M���� u ∈ L���� =⇒ g u ∈ L���� and E��g u� = g E��u��
(viii) M��� ⊂ L���� and E�g = g E�1 for all g ∈ M���;
(viii′) if ��� is -finite 1 = E�1 and g = E�g for all g ∈ M���;
(ix) 0 � u � 1 =⇒ 0 � E�u � 1;
(x) u � w =⇒ E�u � E�w;
(xi) �E�u� � E��u�;
(xii) E�∅�X�u = 1
��X�
∫
u d� for all u ∈ L1��� � 1� �= 0�
Proof (i) is clear from the definition of E�, (ii), (v) were already proved in
Theorem 23.7. (iii), (iv), (ix), (x) follow by approximation from the corresponding
properties of E� from Theorem 22.4, and (xi) is derived from (ix) exactly as in
the L2-case.
(vi) Without loss of generality it is enough to consider the case u � 0. Pick a
sequence �uj �j∈� ⊂ L2��� such that uj ↑ u. By Lemma 23.4 and the definition of
E� we know that E�uj ↑ E�u as well as E
uj ↑ E
u and E
�E�uj � ↑ E
�E�u�.
Since E
�E�uj � = E
uj , by Theorem 22.4(vii), we are done.
(vii) Assume first that g� u � 0. Define gj �= g ∧ j ∈ L�+ ��� and let �uj �j∈� ⊂
L2+��� be an increasing sequence such that supj∈� uj = u. Then gj uj ∈ L2���,
gj uj ↑ g u and T22.4 shows that E��gj uj � = gj E�uj . Hence,
E��g u� = sup
j∈�
E��gj uj � = sup
j∈�
(
gj E
�uj
) = g E�u
If g � 0 and u ∈ L����, the conditional expectation E�u+ − E�u− is well-defined
and we find from the previous calculations
g E�u = g (E�u+ − E�u−) = E��g u+� − E��g u−� = E��g u�
Finally, if g ∈ M��� we see, using g+ g− = 0, that
�g+ − g−� E�u = E��g+ u� − E��g− u� = E��g u�
Measures, Integrals and Martingales 263
(viii) Since �X� �� �� is -finite, E�1 = supj∈� E�1Aj for some exhausting
sequence �Aj �j∈� ⊂ � with Aj ↑ X and ��Aj � < �. We can now argue as in
(vii) to get E�g = g E�1.
(viii′) If ��� is -finite, we can find an exhausting sequence �Gj �j∈� ⊂ � with
Gj ↑ X and ��Gj � < �. Since 1Gj ∈ L2���, we find from T22.4(ix) that
E�1 = sup
j∈�
E�1Gj = sup
j∈�
1Gj = 1
For g ∈ M���, we use g±j �=
(
g± ∧ j) 1Gj ∈ L2��� as approximating sequences
and finish the proof as before.
(ix) If �uj �j∈� ⊂ L2��� approximates 0 � u � 1 such that uj ↑ u �= supk∈� uk,
we still have vj �= u+j ∈ L2��� and vj ↑ u. Thus T22.4(x) implies 0 � E�u � 1.
(xii) Considering positive and negative parts separately we may assume that
u � 0. Since u ∈ L����, there is an approximating sequence �uj �j∈� ⊂ L2+���,
u = supj∈� uj , and as 0 � uj � u ∈ L1���, we have uj ∈ L1���. Theorem
22.4(xiii) gives, together with the definition of E� and Beppo Levi’s theorem 9.6,
E�∅�X�u = sup
j∈�
E�∅�X�uj = sup
j∈�
1
��X�
∫
uj d� =
1
��X�
∫
u d�
Classical conditional expectations
From now on we will no longer distinguish between E� and its extension E� but
always write E�. In particular, we can now show that the operator E� coincides
with the traditional definition of conditional expectation for L1-functions. The
latter turns out to be a rather elegant way to rewrite the martingale property
introduced in Definition 17.1.
23.9 Theorem Let �X� �� �� be a -finite measure space and � ⊂ � be a sub-
-algebra such that ��� is -finite. For u ∈ L1��� and g ∈ L1��� the following
conditions are equivalent:
(i) E�u = g;
(ii)
∫
G
u d� =
∫
G
g d� ∀ G ∈ �;
(iii)
∫
G
u d� =
∫
G
g d� ∀G ∈ �� ��G� < �.
If � is generated by a ∩-stable family � ⊂ ��X� containing an exhausting
sequence �Fj �j∈�� Fj ↑ X then (i)–(iii) are also equivalent to
(iv)
∫
F
u d� =
∫
F
g d� ∀ F ∈ � .
264 R.L. Schilling
Proof We begin with the general remark that by Theorem 23.8(iii), (viii′) we
have for all G ∈ � and u ∈ L1���
∫
G
E�u d� = ⟨E�u� 1G
⟩ = ⟨u� E�1G
⟩ = �u� 1G� =
∫
G
u d�
(23.7)
(i)⇒(ii): Because of (23.7) we get for all G ∈ � and k ∈ �
∫
G
E�u d�
(23.7)=
∫
G
u d�
23.9(i)=
∫
G
g d�
(ii)⇒(iii) is obvious.
(iii)⇒(i): Take an exhausting sequence �Gk�k∈� ⊂ � with Gk ↑ X and ��Gk�
< �. Then we have for all G ∈ �
∫
G∩Gk
E�u d�
(23.7)=
∫
G∩Gk
u d�
23.9(iii)=
∫
G∩Gk
g d�
Since �1G∩Gk E�u� � �E�u� ∈ L1 and �1G∩Gk g� � �g� ∈ L1, we can use dominated
convergence T11.2 to let k → � and get
∫
G
E�u d� =
∫
G
g d� ∀ G ∈ ��
from which we conclude that E�u = g a.e. by Corollary 10.14(i).
Assume now, in addition, that � = �� �. In this case, (ii)⇒(iv) is obvious,
while (iv)⇒(ii) follows with the technique used in Remark 17.2(i): because of
(iv) the measures
��G� �=
∫
G
u+ + g−d� and ��G� �=
∫
G
u− + g+d�
coincide on � , and by the uniqueness theorem for measures 5.7, on �.
If we combine Theorem 23.9 with the Beppo Levi theorem 9.6 or other conver-
gence theorems we can derive all sorts of ‘conditional’ versions of these theorems.
23.10 Corollary (Conditional Beppo Levi theorem) Let �X� �� �� be a -finite
measure space and � ⊂ � be a sub- -algebra such that ��� is -finite. For every
increasing sequence �uj �j∈� ⊂ L1+��� of positive functions the limit u �= supj∈� uj
admits a conditional expectation with values in �0� �� and
sup
j∈�
E�uj = E�
(
sup
j∈�
uj
)
= E�u
(23.8)
Proof Let �Aj �j∈� ⊂ � be an exhausting sequence of sets, i.e. Aj ↑ X and
��Aj � < �. Then the functions wj �= �uj ∧ j�1Aj ∈ L�+ ��� ∩ L1+��� and, in
Measures, Integrals and Martingales 265
particular, wj ∈ L2+���.[�] Moreover, the sequence wj increases towards u. From
Definition 23.5 we get that
E�u
def= sup
j∈�
E�wj �
which is a numerical function with values in �0� ��. On the other hand, we know
from Theorem 23.9 that for all G ∈ �
∫
G
E�wj d� =
∫
G
wj d� and
∫
G
E�uj d� =
∫
G
uj d�
holds. Since supj∈� wj = u = supj∈� uj and since the sequences
(
E�uj
)
j∈� and(
E�wj
)
j∈� are positive and increasing, cf. Theorem 22.4(xi) and 23.8(x), we
conclude from Beppo Levi’s theorem 9.6 that
∫
G
sup
j∈�
E�uj d� = sup
j∈�
∫
G
E�uj d� = sup
j∈�
∫
G
uj d� =
∫
G
sup
j∈�
uj d� =
∫
G
u d�
With a similar calculation we find
∫
G
sup
j∈�
E�wj d� =
∫
G
u d��
and, consequently,
∫
G
sup
j∈�
E�uj d� =
∫
G
sup
j∈�
E�wj d� ∀ G ∈ �
By Corollary 10.14 we conclude that supj∈� E
�uj = supj∈� E�wj
def= E�u almost
everywhere.
In the same way as we deduced Fatou’s lemma T9.11 and Lebesgue’s dominated
convergence theorem 11.2 from the monotonicity property of the integral and
Beppo Levi’s theorem 9.6, we can get their conditional versions from T22.4(xi),
(xii) and C23.10. We leave the simple proofs to the reader.
23.11 Corollary (Conditional Fatou’s lemma) Let �X� �� �� be a -finite
measure space, � ⊂ � be a sub- -algebra such that ��� is -finite, and
�uj �j∈� ⊂ L1+���. Then
E�
(
lim inf
j→�
uj
)
� lim inf
j→�
E�uj
(23.9)
23.12 Corollary (Conditional dominated convergence theorem) Let �X� �� ��
be a -finite measure space, � ⊂ � be a sub- -algebra such that ��� is -finite,
266 R.L. Schilling
and �uj �j∈� ⊂ L1��� such that �uj� � w for some w ∈ L1+���. Then
E�
(
lim
j→�
uj
)
= lim
j→�
E�uj
(23.10)
23.13 Corollary (Conditional Jensen inequality) Let �X� �� �� be a -finite
measure space and � ⊂ � be a sub- -algebra such that ��� is -finite. Assume
that V � → is a convex function with V�0� � 0 and � � → a concave
function with ��0� � 0. Then
E���u� � �
(
E�u
) ∀ u ∈ L����� (23.11)
and, in particular, ��u� ∈ L����. Moreover,
V
(
E�u
)
� E�V�u� ∀ u ∈ L���� s.t. V�u� ∈ L����
(23.12)
Proof The argument is very similar to the proof of Jensen’s inequality T12.14.
Note, however, that we do not have to require the finiteness of the reference
measure – which was w � in T12.14. Let us, for example, prove (23.12). Using
Lemma 12.13 and denoting by sup� the supremum over all linear functions � such
that ��x� = ax + b � V�x� for all x ∈ , we get using T22.4(v), (x), (ix)
V
(
E�u
) = sup
�
(
a E�u + b) � sup
�
E��au + b� � E�V�u�
since b � E�b where we observed that b � V�0� � 0.
The inequality (23.11) is proved in the same way.
Because of Theorem 23.9 it is now very easy and convenient to express the
martingale property D17.1 in terms of conditional expectations. In fact,
23.14 Corollary Let �X� �� �j � �� be a -finite filtered measure space. A sequence
�uj �j∈� ⊂ L1��� such that uj ∈ L1��j � is a martingale (resp. sub- or supermartin-
gale) if, and only if, for all j ∈ �
E�j uj+1 = uj
(
resp. E�j uj+1 � uj or E�j uj+1 � uj
)
A great advantage of this way of putting things is that we can now formulate
the convergence theorem for uniformly integrable martingales T18.6 in a very
striking way:
23.15 Theorem (Closability of martingales) Let �X� �� �j � �� be a -finite
filtered measure space and �� =
( ⋃
j∈� �j
)
.
(i) For every u ∈ L1��� the sequence (E�j u)
j∈� is a uniformly integrable mar-
tingale. In particular, E�j u
j→�−−−→ E�� u in L1 and a.e.
Measures, Integrals and Martingales 267
(ii) Conversely, if �uj �j∈� is a uniformly integrable martingale, there exists
a function u� ∈ L1���� such that uj
j→�−−−→ u� in L1 and a.e. such that
�uj �j∈�∪��� is a martingale. In particular, uj = E�j u�.
In this sense, u� closes the martingale �uj �j∈�.
Proof (i) That
(
E�j u
)
j∈� is a martingale follows at once from Theorem 23.8(vi).
By assumption there exists an exhausting sequence �Ak�k∈� ⊂ �0 with Ak ↑ X
and ��Ak� < �. Therefore, the function
w �=
�∑
k=1
2−k
1 + ��Ak�
1Ak
is strictly positive and integrable. Since u ∈ L1��� and ��u� � N w� ↓ ∅ as
N → �, we find by dominated convergence T11.2 that
lim
N →�
∫
��u��Nw�
�u� d� = 0
This shows that, for all � > 0, large enough N = N��� and any A ∈ �
∫
A
�u� d� =
∫
A∩�u
∫
A
w d� < � =⇒
∫
A
�u� d� < � ∀ A ∈ �
(23.13)
Since for all j ∈ � and c > 0
∫
{
�E�j u�>c w
} c w d� �
∫
�E�j u� d� �
∫
E�j �u� d� =
∫
�u� d��
we may choose c = c0 �= �−1
∫ �u� d�, which implies that for a given � > 0
∫
{
�E�j u�>c0 w
} w d� < �
(23.13)
=⇒
∫
{
�E�j u�>c0 w
} �u� d� < �
Since ��E�j u� > c0 w� ∈ �j , the martingale property implies
∫
{
�E�j u�>c0 w
} �E�j u� d�
23.8(xi)
�
∫
{
�E�j u�>c0 w
} E�j �u� d�
23.9
�
∫
{
�E�j u�>c0 w
} �u� d� � � ∀ j ∈ ��
which is but uniform integrability of the family
(
E�j u
)
j∈�.
268 R.L. Schilling
The convergence assertions follow now from the convergence theorem for UI
submartingales T18.6.
(ii) follows directly from Theorem 18.6.
Since the conditional Jensen inequality needs fewer assumptions than the clas-
sical Jensen inequality we can improve Example 17.3(v), (vi).
23.16 Corollary Let �X� �� �j � �� be a -finite filtered measure space and
�uj �j∈� be a family of measurable functions uj ∈ L�j ��� which satisfies the
[sub-]martingale property2
uj = E�j uj+1
[
resp. uj � E�j uj+1
]
If V � → is a [monotone increasing] convex function such that V�uj � ∈ L1��j �,
then �V�uj ��j∈� is a submartingale.
Proof Since �uj �j∈� satisfies the [sub-]martingale property, we find from Jensen’s
inequality C23.13 [and the monotonicity of V ] that
V�uj � � V
(
E�j uj+1
)
� E�j V�uj+1�
23.17 Example In Example 17.3(ix) we introduced a dyadic filtration on the
measure space
(
�0� ��n�
��0� ��n�� � = �n��0���n
)
given by
��j =
(
z + �0� 2−j �n � z ∈ 2−j�n0
)
� j ∈ �0
For u ∈ L1��0� ��n� �� and all j ∈ �0 we can now rewrite (17.4) as
E�
�
j u�x� = ∑
z∈2−j�n0
{ ∫
u
1z+�0�2−j �n
�
(
z + �0� 2−j �n) d�
}
1z+�0�2−j �n �x�
23.18 Remark In Theorem 22.5 we found necessary and sufficient conditions
that a projection in L2 is a conditional expectation. This result has a counterpart
in the spaces Lp, p �= 2, which we want to mention here without proof. Details
can be found in the monograph by Neveu [31, pp. 12–16].
Let �X� �� �� be a finite measure space. Then
(i) Let p ∈ �1� ��. Every bounded linear operator T � Lp��� → Lp��� such that∫
Tf d� = ∫ f d�, f ∈ Lp���, and T�f Tg� = �Tf� �Tg�, f ∈ L����� g ∈
L1���, is a conditional expectation w.r.t. some sub- -algebra � ⊂ �.
(ii) Let p ∈ �1� ��, p �= 2. Every linear contraction T � Lp��� → Lp��� such that
T 2 = T and T 1 = 1 is a conditional expectation w.r.t. some sub- -algebra� ⊂�.
2 This is slightly more general than assuming that �uj �j∈� is a [sub-]martingale since [sub-]martingales are,
by definition, integrable.
Measures, Integrals and Martingales 269
Separability criteria for the spaces Lp�X� �� ��
Let �X� �� �� be a measure space. Recall that Lp��� is separable if it contains
a countable dense subset �dj �j∈� ⊂ Lp���. We have seen in Chapter 21 that
the Hilbert space L2��� is separable if we can find a countable complete ONS
�ej �j∈� ⊂ L2��� since the system �q1 e1 + · · · + qN eN � N ∈ �� qj ∈ �� is both
countable and dense. Conversely, using any countable dense subset �dj �j∈� as
input for the Gram–Schmidt orthonormalization procedure (21.10), produces a
complete countable ONS.
Here is a simple sufficient criterion for the separability of Lp.
23.19 Lemma Let �X� �� �� be a -finite measure space and assume that the
-algebra � is countably generated, i.e. � = �Aj � j ∈ ��, Aj ⊂ X. Then
Lp�X� �� ��, 1 � p < �, is separable.
Proof Step 1: Let us first assume that � is a finite measure. Consider the
-algebras �n �= �A1�
� An�; then
�1 ⊂ �2 ⊂
⊂ �� = ��j � j ∈ ��
is a filtration, ���j is trivially -finite for every j ∈ � and � = ��.
Set uj �= E�j u for u ∈ L1���. By Theorem 23.15 �uj � �j �j∈� is a uniformly
integrable martingale, hence uj
j→�−−−→ u in L1 and a.e.
If v ∈ Lp���, we set vj �= �E�j v�p � E�j ��v�p� (by Corollary 23.13) and
observe that �vj � �j �j∈� is a submartingale, cf. Theorem 23.15, which is uniformly
integrable. The latter follows easily from
∫
�vj >w�
vj d� �
∫
�vj >w�
E�j ��v�p� d� �
∫
{
E�j ��v�p�>w
} E�j ��v�p� d�
and the uniform integrability of the family
(
E�j ��v�p�)
j∈�, see Theorem 23.15.
From the (sub-)martingale convergence Theorem 18.6 we conclude that vj
j→�−−−→
�E�� v�p = �v�p in L1 and a.e., and Riesz’ theorem 12.10 shows �vj�1/p =∣∣E�j v
∣∣ j→�−−−→ �v� in Lp. Consequently, E�j v j→�−−−→ v in Lp.
Since the -algebra �j is generated by finitely many sets, E
�j u, resp., E�j v
are simple functions with canonical representations of the form
s =
N∑
k=1
yk 1Bk � yk �= 0� B1�
� BN ∈ �j disjoint�
270 R.L. Schilling
as �j is kept fixed, we suppress the dependence of yk� Bk� N on j. If yk �∈ �, we
find for every � > 0 numbers y�k ∈ � such that
∣∣yk − y�k
∣∣ � �
N ��X�1/p
The triangle inequality now shows
∥∥∥∥s −
N∑
k=1
y�k 1Bk
∥∥∥∥
p
�
N∑
k=1
∣∣yk − y�k
∣∣ ��Bk�1/p � ��
which proves that the system
D �=
{ N∑
k=1
qk 1Bk � N ∈ �� qk ∈ �� Bk ∈
⋃
j∈�
�j
}
is a countable dense subset of the space Lp�X� �� ��, 1 � p < �.
Step 2: If � is -finite but not finite, we choose an exhausting sequence
�Cj �j∈� ⊂ � such that Cj ↑ X and ��Cj � < � and consider the finite measures
�j �= �� • ∩ Cj �, j ∈ �, on Cj ∩�. Since every u ∈ Lp��j � = Lp�Cj � Cj ∩�� �j �
can be extended by 0 on the set X \ Cj and becomes an element of Lp��j+1�, we
can interpret the sets Lp��j � as a chain of increasing subspaces of each other and
of Lp���:
Lp�Cj � Cj ∩ �� �j � ⊂ Lp�Cj+1� Cj+1 ∩ �� �j+1� ⊂
⊂ Lp�X� �� ��
Applying the construction from step 1 to each of the sets Lp��j � furnishes
countable dense subsets Dj . Obviously, D �=
⋃
j∈� Dj is a countable set but it
is also dense in Lp���. To see this, fix � > 0 and u ∈ Lp���. Since X \ Cj ↓ ∅,
we find by Lebesgue’s dominated convergence theorem 12.9 some N ∈ � such
that
∫
X\Cj �u�
p d� < �p for all j � N . Since Dj is dense in Lp��j � and since
u 1Cj ∈ Lp��j �, there is some dj ∈ Dj ⊂ D with �u 1Cj − dj�Lp��j � � �, and
altogether we get for large j � N
�u − dj�p � �u 1Cj − dj�Lp��j � + �u 1X\Cj �Lp��� � 2�
If the underlying set X is a separable metric space (cf. Appendix B), the
criterion of Lemma 23.19 becomes particularly simple.
23.20 Corollary Let �X� �� be a separable metric space equipped with its Borel
-algebra
=
�X�. Then Lp�X�
� ��, 1 � p < � is separable for every
-finite measure � on �X�
�. If � is not -finite, Lp�X�
� �� need not be
separable.
Measures, Integrals and Martingales 271
Proof Denote by D ⊂ X a countable dense subset and consider the countable
system of open balls Br �d� �= �x ∈ X � ��x� d� < r�
� �= {Br �d� � d ∈ D� r ∈ �+
} ⊂ ��X�
Since every open set U ∈ ��X� can be written as
U = ⋃
Br �d�⊂U
Br �d�∈�
Br �d�
3
which shows that ��X� ⊂ �� � ⊂ ���X�� =
�X�. Thus the Borel sets
�X� =
�� � are countably generated, and the assertion follows from Lemma 23.19.
If � is not -finite, we have the following counterexample: take X = �0� 1�
with its natural Euclidean metric ��x� y� = �x − y� and let � be the counting
measure on ��0� 1��
�0� 1��, i.e. ��B� �= #B. Obviously, � is not -finite. The
p th power �-integrable simple functions are of the form
��
� ∩ Lp��� =
{ N∑
j=1
yj 1Aj � N ∈ �� yj ∈ � Aj ∈
� #Aj < �
}
�
so that
Lp��� =
{
u � �0� 1� → � ∃ �xj �j∈� ⊂ �0� 1�� u�x� = 0 ∀ x �= xj
and
�∑
j=1
�u�xj ��p < �
}
Obviously, �1�x��x∈�0�1� ⊂ Lp���, but no single countable system can approximate
this family since
�1�x� − 1�y��pp = 0 or 2
according to whether x = y or x �= y.
With somewhat more effort we can show that the conditions of Lemma 23.19
are even necessary.
23.21 Theorem Let �X� �� �� be a -finite measure space. Then the following
assertions are equivalent.
(i) � is (almost) separable, i.e. there exists a countable family � ⊂ � such
that ��F � < � for all F ∈ � and �� � ≈ � in the sense that every set in
� has, up to a null set, a version in �� �.
3 The inclusion ‘⊂’ is obvious, for ‘⊃’ fix x ∈ U . Then there exists some r ∈ �+ with Br �x� ⊂ U . Since D
is dense, x ∈ Br/2�d� for some d ∈ D with ��d� x� < r/4, so that x ∈ Br/2�d� ⊂ U .
272 R.L. Schilling
(ii) � is separable,4 i.e. there exists a countable family � ⊂ � such that ��F � <
� and for every A ∈ � with ��A� < � we have
∀ � > 0� ∃ F� ∈ � � ��A \ F�� + ��F� \ A� � �
(iii) Lp�X� �� �� is separable, 1 � p < �.
Proof (i)⇒(iii): The proof of Lemma 23.19 shows that Lp�X� �� �� �� is sepa-
rable. Since for each A ∈ � there is an A∗ ∈ �� � with
�
(
�A \ A∗� ∪ �A∗ \ A�) = 0 ⇐⇒
∫
�1A − 1A∗ � d� = 0�
every simple function � ∈ ���� has a version �∗ ∈ �� �� �� such that ∫ ��
−�∗� d� = 0. This proves that Lp�X� �� �� �� ⊃ Lp�X� �� �� (we have, in fact,
equality since �� � ⊂ �), and we see that Lp�X� �� �� is separable.
(iii)⇒(ii): Denote by �dj �j∈� a countable dense subset of Lp���. Since
����∩Lp��� is dense in Lp���, cf. Lemma 12.11, we find for each dj a sequence
�fjk�k∈� ⊂ ���� ∩ Lp��� such that Lp-limk→� fjk = dj . Thus �fjk�j�k∈� is also
dense in Lp, and the system of subsets
� �=
{ N⋃
�=1
�fjk = r�� � N ∈ �� j� k ∈ �� r� ∈
}
is countable since each fjk attains only finitely many values.
For every A ∈ �, ��A� < �, we have 1A ∈ Lp���, and we find a subsequence
�f A� ��∈� ⊂ �fjk�j�k∈� with lim�→� �f A� − 1A�pp = 0.
Set F� �= ��f A� − 1A� � 1/2� ∩ ��f A� � > 1/2�. Obviously F� ∈ � , and F� ⊂ A
since
Ac ∩ F� = Ac ∩ ��f A� − 1A� � 1/2� ∩ ��f A� � > 1/2�
= Ac ∩ ��f A� � � 1/2� ∩ ��f A� � > 1/2�
= ∅
Thus ��F� \ A� = 0, while
��A \ F�� � ��A ∩ ��f A� − 1A� > 1/2�� + ��A ∩ ��f A� � � 1/2��
Using the triangle inequality, we infer
A ∩ ��f A� � � 1/2� ⊂ A ∩ ��f A� − 1� � 1/2� = A ∩ ��f A� − 1A� � 1/2��
4 This notion derives from the fact that ��� ���, ���A� B� �= ��A \ B� + ��B \ A�, A� B ∈ � becomes a
separable pseudo-metric space in the usual sense, cf. Appendix B.
Measures, Integrals and Martingales 273
and with the above calculation and an application of Markov’s inequality we
conclude that
��A \ F�� + ��F� \ A� � 2 ��A ∩ ��f A� − 1A� � 1/2��
10.12
� 2p+1 �f A� − 1A�pp
The right-hand side of the above inequality tends to 0 as � → �, and (ii) follows.
(ii)⇒(i): Fix A ∈ � with ��A� < �. Then we find, by assumption, sets Fn ∈ �
with ��A \ Fn� + ��Fn \ A� � 2−n. Consider the sets
F ∗ �=
�⋂
k=1
�⋃
n=k
Fn and F∗ �=
�⋃
k=1
�⋂
n=k
Fn
Then using the continuity of measures T4.4 and -subadditivity C4.6,
��F ∗ \ A� + ��A \ F∗� = �
([ �⋂
k=1
�⋃
n=k
Fn
]∖
A
)
+ �
(
A ∩
[ �⋃
k=1
�⋂
n=k
Fn
]c )
= �
( �⋂
k=1
�⋃
n=k
�Fn \ A�
)
+ �
(
A ∩
�⋂
k=1
�⋃
n=k
F cn
)
= lim
k→�
[
�
( �⋃
n=k
�Fn \ A�
)
+ �
( �⋃
n=k
�A \ Fn�
)]
� lim
k→�
�∑
n=k
[
�
(
Fn \ A
) + �(A \ Fn
)]
� lim
k→�
�∑
n=k
2−k = 0
This shows that for all A ∈ � with ��A� < �
∃ F∗� F ∗ ∈ �� � � F∗ ⊂ F ∗ and ��F ∗ \ A� + ��A \ F∗� = 0�
implying that ��F ∗ \ A� + ��A \ F ∗� = 0 also.
If ��A� = � we pick some exhausting sequence �Ak�k∈� ⊂ � with Ak ↑ X and
��Ak� < �. Then the sets A ∩ Ak have finite �-measure and we can construct,
as before, sets F ∗k and F∗�k. Setting F
∗ �= ⋃k∈� F ∗k we find
( ⋃
k∈�
F ∗k
)∖ ⋃
j∈�
�A ∩ Aj � =
⋃
k∈�
(
F ∗k
∖ ⋃
j∈�
�A ∩ Aj �
)
⊂ ⋃
k∈�
�F ∗k \ �A ∩ Ak���
274 R.L. Schilling
and so
��F ∗ \ A� � �
( ⋃
k∈�
�F ∗k \ �A ∩ Ak��
)
�
�∑
k=1
�
(
F ∗k \ �A ∩ Ak�
)
︸ ︷︷ ︸
= 0
= 0
The expression ��A \ F ∗� is handled analogously.
This shows that sets from � and �� � differ by at most a null set.
Problems
23.1. Complete the proof of Theorem 23.8.
23.2. Show that E�1 = 1 if, and only if ��� is -finite. Find a counterexample showing
that E�1 � 1 is, in general, best possible.
[Hint: use p = 2 and E� = E�.]
23.3. Let � be a sub- -algebra of �. Show that E�g = g for all g ∈ Lp���.
[Hint: observe that, a.e., g = g 1⋃
j ��g�>1/j� and ����g� > 1/j�� < �. This emulates
-finiteness.]
23.4. Let
⊂ � be two sub- -algebras of �. Show that
E�E
u = E
E�u = E
u
for all u ∈ Lp��� resp. for all u ∈ M��� provided �
∣∣
is -finite.
[Hint: if �
∣∣
is not -finite, the set Lp�
� can be very small ….]
23.5. Consider on the measure space ��0� ���
�0� ��� � �= �1��0���� the filtration �n �=
(
�0� 1�� �0� 2��
� �n − 1� n�� �n� ��). Find E�n u for u ∈ Lp.
23.6. Let �X� �� �� be a measure space and � ⊂ � be a sub- -algebra. Show that, in
general, ∫
E�u d� �
∫
u d�� u ∈ L1����
with equality holding only if ��� is -finite.
23.7. Prove Corollaries 23.11 and 23.12.
23.8. Let �X� �� �j � �� be a -finite filtered measure space and denote by �u� �� the
canonical dual pairing between u ∈ Lp and � ∈ Lq , p−1 + q−1 = 1, i.e. �u� �� �=∫
u � d�. A sequence �uj �j∈� ⊂ Lp is weakly relatively compact if there exists a
subsequence �ujk �k∈� such that
�ujk − u� ��
k→�−−→ 0
holds for all � ∈ Lq and some u ∈ Lp. Show that for a martingale �uj �j∈� and every
p ∈ �1� �� the following assertions are equivalent:
(i) there exists some u ∈ Lp��� such that limj→� �uj − u�p = 0;
(ii) there exists some u� ∈ Lp���� such that uj = E�j u�;
(iii) the sequence �uj �j∈� is weakly relatively compact.
Measures, Integrals and Martingales 275
23.9. Let �X� �� �� be a measure space and �uj �j∈� ⊂ L1���. Show that
m1 �= u1� mj+1 − mj �= uj+1 − E�j uj+1
is a martingale under the filtration �j �= �u1�
� uj �.
23.10. (Continuation of Problem 23.9). If
∫
u1 d� = 0 and E�j uj+1 = 0�
then �uj �j∈� is called a martingale difference sequence. Assume that uj ∈ L2���
and denote by sk �= u1 + · · · + uk the partial sums. Show that �s2j � �j �j∈� is a
submartingale satisfying
∫
s2k d� =
k∑
j=1
∫
u2j d�
23.11. Doob decomposition. Let �X� �� �j � �� be a -finite filtered measure space and
let �sj � �j �j∈� be a submartingale. Show that there exists an a.e. unique martingale
�mj � �j �j∈� and an increasing sequence of functions �aj �j∈� such that aj ∈ L1��j−1�
for all j � 2 and
sj = mj + aj � j ∈ �
[Hint: set m0 �= u0, mj+1 − mj �= uj+1 − E�j uj+1 and a0 �= 0, aj+1 − aj �=
E�j uj+1 − uj . For uniqueness assume m̃j + ãj is a further Doob decomposition
and study the measurability properties of the martingale Mj �= mj − m̃j = ãj − aj .]
23.12. Let ��� �� P� be a probability space and let �Xj �j∈� be a sequence of independent
identically distributed random variables such that P�Xj = 0� = P�Xj = 2� = 12 . Set
Mk �=
∏k
j=1 Xj . Show that there does not exist any filtration ��j �j∈� and no random
variable M such that Mk = E�k M.
[Hint: compare with Example 17.3(xi).]
Remark. This example shows that not all martingales can be obtained as condi-
tional expectations of a single function.
24
Orthonormal systems and their convergence behaviour
In Chapter 21 we discussed the importance of orthonormal systems (ONSs) in
Hilbert spaces. In particular, countable complete ONSs turned out to be bases of
separable Hilbert spaces. We have also seen that a countable ONS gives rise to
a family of finite-dimensional subspaces and a sequence of orthogonal projections
onto these spaces. In the present chapter we are concerned with the following topics:
• to give concrete examples of (complete) ONSs;
• to see when the associated canonical projections are conditional expectations;
• to understand the Lp (p �= 2) and a.e. convergence behaviour of series expan-
sions with respect to certain ONSs.
The latter is, in general, not a trivial matter. Here we will see how we can use
the powerful martingale machinery of Chapters 17 and 18 to get Lp �1 � p < ��
and a.e. convergence.
Throughout this chapter we will consider the Hilbert space L2�I� ��I�� � �� where
I ⊂ � is a finite or infinite interval of the real line, ��I� = I ∩���� are the Borel
sets in I , � = �1�I is Lebesgue measure on I and ��x� is a density function. We
will usually write ��x� dx and
∫
� � � dx instead of � � and
∫
� � � d�.
One of the most important techniques to construct ONSs is the Gram–Schmidt
orthonormalization procedure (21.10), which we can use to turn any countable
family �fk�k∈� into an orthonormal sequence �ek�k∈�. Something of a problem,
however, is to find a reasonable sequence �fk�k∈� which can be used as input to
the orthonormalization procedure.
Orthogonal polynomials
For many practical applications, such as interpolation, approximation or numerical
integration, a natural set of fk to begin with is given by the polynomials on I .
276
Measures, Integrals and Martingales 277
Usually one applies (21.10) to the sequence of monomials
�1� t� t2� t3� � � �� = �tj �j∈�0
to construct an ONS consisting of polynomials. Of course, this depends heavily
on the underlying measure space where polynomials should be square integrable.
With some (partly pretty tedious) calculations1 one can get the following important
classes of orthogonal polynomials in L2�I� ��I�� ��x� dx�.
24.1 Jacobi polynomials
(
J
�����
k
)
k∈�0 , �� � > −1 We choose
I = �−1� 1 � ��x� dx = �1 − x�
�1 + x�� dx�
� � > −1�
and we get
J
�
���
k �x� =
�−1�k
k! 2k
dk
dxk
(
�1 − x�
+k�1 + x��+k
)
�1 − x�
�1 + x��
= 1
2k
k∑
j=0
(
k +
j
)(
k + �
k − j
)
�x − 1�k−j �x + 1�j
∥∥J �
���k
∥∥2
2
= 2
+�+1
2k +
+ � + 1
��k +
+ 1� ��k + � + 1�
��k + 1� ��k +
+ � + 1� �
Choosing in 24.1 particular values for
and � yields other important families.
24.2 Chebyshev polynomials (of the first kind) �Tk�k∈�0 We choose
I = �−1� 1 � ��x� dx = �1 − x2�−1/2 dx�
and we get
Tk�x� = J �−1/2�−1/2�k �x� =
⎧
⎨
⎩
√
2
cos�k arccos x� if k ∈ ��
1√
if k = 0�
Tk 22 =
1
2
(
�
(
k + 12
)
��k + 1�
)2
�
The first few Chebyshev polynomials are
1� x� 2×2 − 1� 4×3 − 3x� 8×4 − 8×2 + 1� 16×5 − 20×3 + 5x� � � �
1 The material in Sections 24.1–24.5 below is taken from Alexits [1, pp. 30–37], Gradshteyn-Ryzhik [17,
§8.9] and Kaczmarz-Steinhaus [22, §§IV.1–2, 8–9]. Another classic is the book by Szegö [51, §§1–5], and
a good modern reference is the monograph by Andrews et al. [2, §§5.1, 6.1–6.3].
278 R.L. Schilling
and the following recursion formula holds:
Tk+1�x� = 2x Tk�x� − Tk−1�x�� k ∈ ��
24.3 Legendre polynomials �Pk�k∈�0 We choose
I = �−1� 1 � ��x� dx = dx�
and we get
Pk�x� = J �0�0�k �x� =
�−1�k
k! 2k
dk
dxk
(
�1 − x2�k)� Pk 22 =
2
2k + 1 �
The first few Legendre polynomials are
1� x� 12 �3x
2 − 1�� 12 �5×3 − 3x�� 18 �35×4 − 30×2 + 3�� 18 �63×5 − 70×3 + 15x��
and the following recursion formula holds:
�k + 1� Pk+1�x� = �2k + 1� x Pk�x� − k Pk−1�x�� k ∈ ��
24.4 Laguerre polynomials �Lk�k∈�0 We choose
I = �0� ��� ��x� dx = e−x dx�
and we get
Lk�x� = ex
dk
dxk
(
e−x xk
) = k!
k∑
j=0
�−1�j
(
k
j
)
xj
j! � Lk
2
2 = �k!�2 �
The first few Laguerre Polynomials are
1� 1 − x� x2 − 4x + 2� −x3 + 9×2 − 18x + 6� x4 − 16×3 + 72×2 − 96x + 24� � � �
and the following recursion formula holds:
Lk+1�x� = �2k + 1 − x� Lk�x� − k2 Lk−1�x�� k ∈ ��
24.5 Hermite polynomials �Hk�k∈�0 We choose
I = �−�� ��� ��x� dx = e−x2 dx�
and we get
Hk�x� = �−1�k ex
2 dk
dxk
(
e−x
2)
� Hk 22 = 2k k!
√
�
Measures, Integrals and Martingales 279
The first few Hermite polynomials are
1� 2x� 4×2 − 2� 8×3 − 12x� 16×4 − 48×2 + 12� � � �
and the following recursion formula holds:
Hk+1�x� = 2x Hk�x� − 2k Hk−1�x�� k ∈ ��
In order to decide if a family of polynomials �pk�k∈� ⊂ L2�I� ��x� dx� is a
complete ONS we have to show that
∫
u�x� pk�x� ��x� dx = 0 ∀ k ∈ �0 =⇒ u = 0 a.e.
The key technical result is the Weierstraß approximation theorem.
24.6 Theorem (Weierstraß) Polynomials are dense in C�0� 1 w.r.t. uniform
convergence.
Proof (S.N. Bernstein) Take a sequence �Xj �j∈� of independent2 measurable
functions on ��0� 1 � ��0� 1 � dx� which are all Bernoulli �p� 1 − p�-distributed,
0 < p < 1, i.e.
���Xj = 1�� = p and ���Xj = 0�� = 1 − p ∀ j ∈ ��
cf. 17.4 for the construction of such a sequence. Write Sn �= X1 + · · · + Xn for
the partial sum and observe that, due to independence,
���Sn = k�� = �
(
·⋃
1�j1�����jk�n
(
�Xj1 = 1� ∩ � � � ∩ �Xjk = 1�∩
∩ �Xjk+1 = 0� ∩ � � � ∩ �Xjn = 0�
))
=
(
n
k
)
pk�1 − p�n−k�
which shows that
∫
u
(Sn�x�
n
)
dx =
n∑
k=0
u
(
k
n
)(n
k
)
pk�1 − p�n−k =� Bn�u� p��
2 In the sense of Example 17.3(x).
280 R.L. Schilling
where Bn�u� p� stands for the nth Bernstein polynomial.
3 From 17.4 we also
know that
∫ ∣∣∣∣
Sn�x�
n
− p
∣∣∣∣
2
dx = p�1 − p�
n
�
1
4n
� (24.1)
since the function p �→ p�1 − p� attains its maximum at p = 1/2. As u ∈ C�0� 1
is uniformly continuous, �u�x� − u�y�� < � whenever �x − y� < � is sufficiently
small. Thus
�Bn�u� p� − u�p��
�
∫ ∣∣u(Sn
n
)− u�p�
∣∣d�
=
∫
{∣∣∣Snn −p
∣∣∣<�
}
∣∣u(Sn
n
)− u�p�
∣∣d� +
∫
{∣∣∣Snn −p
∣∣∣��
}
∣∣u(Sn
n
)− u�p�
∣∣d�
� � �
({∣∣∣Snn − p
∣∣∣ < �
})
+ 2 u � �
({∣∣∣Snn − p
∣∣∣� �
})
� � + 2 u �
1
�2
∫ ∣∣∣Snn − p
∣∣∣
2
d�
(24.1)
� � + u �
2 n �2
�
by Markov’s inequality P10.12 (in the penultimate step) and (24.1). The above
inequality is independent of p ∈ �0� 1 , and the assertion follows by letting first
n → � and then � → 0.
24.7 Remark The key ingredient in the above proof is (24.1) which shows that
the variance of the random variable Sn vanishes uniformly (in p) as n → �.
A short calculation confirms that this is equivalent to saying that
Bn
(
�• − p�2� p) n→�−−−→ 0 uniformly for p ∈ �0� 1 .
With this information, the proof then yields that Bn�u� p�
n→�−−−→ u uniformly in p
for all continuous u. This is, in fact, a special case of
Korovkin’s theorem: A sequence of positive linear operators from C�0� 1 to
C�0� 1 converges uniformly for every u ∈ C�0� 1 if, and only if, it converges
uniformly for each of the following three test functions: 1� x� x2.
3 In view of the strong law of large numbers, Example 18.8, we observe that n−1Sn
n→�−−−→ p a.e., so that by
dominated convergence Bn�u� p�
n→�−−−→ u�p� for each p ∈ �0� 1�. Since our argument includes this result as
a particular case, we leave it as a side-remark.
Measures, Integrals and Martingales 281
(In the present case, the operators are u �→ Bn�u� p�.) More on this topic can be
found in Korovkin’s monograph [24, pp. 1–30] or the expository paper [4] by
Bauer.
24.8 Corollary The monomials �tj �j∈�0 are complete in L
1 = L1��0� 1 � dt�, that
is
∫
�0�1 u�t� t
j dt = 0 for all j ∈ �0 implies that u = 0 a.e.
Proof Assume first that u ∈ C�0� 1 satisfies ∫
�0�1 u�t�t
j dt = 0 for all j ∈ �0.
This implies, in particular, that
∫
�0�1
u�t�p�t� dt = 0 for all polynomials p�t��
Using Weierstraß’ approximation theorem 24.6 we find a sequence of polynomials
�pk�k∈� which approximate u uniformly on �0� 1 . Since C�0� 1 ⊂ L1�0� 1 ⊂
L2�0� 1 , we see
∫
�0�1
u2 dt =
∫
�0�1
u · �u − pk� dt � u 2 · u − pk 2
� u 2 · u − pk �
k→�−−−→ 0�
and conclude that u = 0 a.e. (even everywhere since u is continuous).
Assume now that u ∈ L1��0� 1 � dt� \ C�0� 1 such that ∫
�0�1 u�t� t
j dt = 0 for
all j ∈ �0. The primitive
U�x� �=
∫
�x�1
u�t� dt
is a continuous function, cf. Problem 11.7, and by Fubini’s theorem 13.9 we see
for all j ∈ �
∫
�0�1
U�x� xj−1 dx =
∫
�0�1
∫
�0�1
1�x�1 �t� u�t� x
j−1 dt dx
=
∫
�0�1
(∫
�0�1
1�0�t��x� x
j−1 dx
)
u�t� dt
=
∫
�0�1
tj
j! u�t� dt = 0�
This means that
∫
�0�1 U�x� x
k dx = 0 for all k ∈ �0 and, by the first part of the
proof, that U ≡ 0. Lebesgue’s differentiation theorem 19.20 finally shows that
u�x� = U ′�x� = 0 a.e.
282 R.L. Schilling
It is not hard to see that Theorem 24.6 and Corollary 24.8 also hold for the
interval �−1� 1 and even for general compact intervals �a� b (cf. Problem 24.3).
This we can use to show that the Jacobi (hence, Legendre and Chebyshev)
polynomials are dense in L2�I� ��x� dx� and form a complete ONS. Note that
∫
�−1�1
�u�� dx �
(∫
�−1�1
(
u
√
�
)2
dx
)1/2
·
(∫
�−1�1
(√
�
)2
dx
)1/2
=
(∫
�−1�1
u2 � dx
)1/2
·
(∫
�−1�1
� dx
)1/2
< �
implies that u� ∈ L1��−1� 1 � dx�, and from Corollary 24.8 and the fact that � > 0
we get
∫
�−1�1
u�x���x� xj dx = 0 =⇒ u� = 0 a.e. =⇒ u = 0 a.e.
This does not quite work for the Hermite and Laguerre polynomials, which are
defined on infinite intervals. For the latter we take u ∈ L2��0� ��� e−x dx�, and
find for all s � 1
∫
�0���
u�x� e−sx dx =
∫
�0���
u�x� e�1−s�x e−x dx
=
�∑
k=0
�1 − s�k
k!
∫
�0���
u�x� xk e−x dx
︸ ︷︷ ︸
= 0
= 0
(note that the integral and the sum can be interchanged by dominated conver-
gence). Using Jacobi’s formula C15.8 to change coordinates according to t = e−x,
dt/dx = −e−x, we get
0 =
∫
�0���
u�x� e−sx dx =
∫
�0�1�
u�− ln t� ts−1 dt� s � 1�
and for s ∈ � the above equality reduces to the case covered by Corollary 24.8.
A very similar calculation can be used for the Hermite polynomials since
∫
�
u�x� e−sx
2
dx =
∫
�0���
�u�x� + u�−x�� e−sx2 dx
=
∫
�0���
(
u
(√
t
)+ u(− √t))e−st dt
2
√
t
�
where we used the obvious substitution x = √t.
Measures, Integrals and Martingales 283
The trigonometric system and Fourier series
We consider now L2 = L2��−
�
�� ��−
�
�� � = �1��−
�
��. As before we use
dx as a shorthand for ��dx�. The trigonometric system consists of the functions
1√
2
�
cos x√
�
sin x√
�
cos 2x√
�
sin 2x√
� � � � �
cos kx√
�
sin kx√
� � � � (24.2)
or, equivalently,
1√
2
eikx� k ∈ �� i =
√
−1 � (24.3)
Since eix = cos x + i sin x, we can see that (24.2) and (24.3) are equivalent, and
from now on we will only consider (24.2). Orthogonality of the functions in
(24.2) follows easily from the classical result that
∫
�−
�
�
cos kx sin �x dx =
⎧
⎪⎨
⎪⎩
0� if k �= ��
� if k = � � 1�
2
� if k = � = 0�
(24.4)
which we leave as an exercise for the reader, see Problem 24.4.
24.9 Definition A trigonometric polynomial (of order n) is an expression of the
form
T�x� =
0 +
n∑
j=1
(
j cos jx + �j sin jx
)
� (24.5)
where n ∈ �0,
j � �j ∈ � and
2n + �2n > 0.
It is not hard to see that the representation (24.5) of T�x� is equivalent to
T�x� =
n∑
j�k=0
�j�k cos
j x sink x
with coefficients �j�k ∈ �, cf. Problem 24.5. It is this way of writing T�x� that
justifies the name trigonometric polynomial.
24.10 Theorem The trigonometric system (24.2) is a complete ONS in L2 =
L2��−
�
�� dx�.
Proof We have to show that∫
�−
�
�
u�x� cos kx dx = 0 ∀ k ∈ �0
∫
�−
�
�
u�x� sin �x dx = 0 ∀ � ∈ �
⎫
⎪⎪⎬
⎪⎪⎭
=⇒ u = 0 a.e. (24.6)
284 R.L. Schilling
Assume first that u is continuous and that, contrary to (24.6), u�x0� = c �= 0 for
some x0 ∈ �−
�
�. Without loss of generality we may assume that c > 0. Since
the trigonometric functions are 2
-periodic, we can extend u periodically onto
the whole real line. Then w�x� �= c−1 u�x + x0� is continuous around x = 0,
orthogonal on �−
�
� to any of the functions in (2)[�], and satisfies w�0� = 1.
As w is continuous, there is some 0 < � <
such that
w�x� > 12 ∀ x ∈ �−�� ���
Consider the trigono-
metric polynomial
t�x� = 1 − cos � + cos x�
Obviously, t�x� and all
powers tN �x� are poly-
nomials in cos x. From
de Moivre’s formula
eikx = �cos x + i sin x�k
1
2 – cos δ
δ–δ–π π
t (x) = 1 – cos δ + cos x
it is easy to see that cosk x = ∑kj=0 cj cos jx,[�] see also Gradshteyn and Ryzhik
[17, 1.32]. We can thus write tN �x� as linear combination of cos kx, k =
0� 1� � � � � N . By assumption, w is orthogonal to all of them, and so
0 =
∫
�−
�
�
w�x� tN �x� dx =
( ∫
�−
�−�
+
∫
�−����
+
∫
���
�
)
w�x� tN �x� dx� (24.7)
On �−�� �� we have w�x� > 12 as well as t�x� > 1, hence
∫
�−����
w�x� tN �x� dx �
1
2
∫
�−����
tN �x� dx
N →�−−−→ �
by monotone convergence T9.6. On the other hand, �t�x�� � 1 for x ∈ �−
� −� ∪
���
� and
∣∣∣∣
∫
���
�
w�x� tN �x� dx
∣∣∣∣� �
− �� w � < � ∀ N ∈ ��
which means that (24.7) is impossible, i.e. w ≡ 0 and u ≡ 0.
An arbitrary function u ∈ L2��−
�
�� dx� is, due to the finiteness of the
measure, integrable[�], and we may consider the primitive
U�x� �=
∫
�−
�x�
u�t� dt�
Measures, Integrals and Martingales 285
which is a continuous function, cf. Problem 11.7. Moreover,
U�−
� = 0 =
∫
�−
�
�
u�t� dt = U�
��
because of the assumption that u is orthogonal to every function from (24.2) and,
in particular, to t �→ 1/
√
2
. By Fubini’s theorem 13.9 we get
∫
�−
�
�
U�x� cos kx dx =
∫
�−
�
�
∫
�−
�
�
1�−
�x��t� u�t� cos kx dt dx
=
∫
�−
�
�
(∫
�−
�
�
1�t�
��x� cos kx dx
)
u�t� dt
= −
∫
�−
�
�
u�t�
sin kt
k
dt = 0�
and we conclude from the first part of the proof that U ≡ 0. Lebesgue’s differen-
tiation theorem 19.20 finally shows that u�x� = U ′�x� = 0 a.e.
Since the trigonometric system is one of the most important ONSs, we provide
a further proof of the completeness theorem which gives some more insight into
Fourier series and yields even an independent proof of Weierstraß’ approximation
theorem 24.6 for trigonometric polynomials, cf. Corollary 24.12 below.
We begin with an elementary but fundamental consideration which goes back
to Féjer. If u ∈ L2��−
�
�� dt�, we write
aj �=
1
∫
�−
�
�
u�t� cos jt dt� bk �=
1
∫
�−
�
�
u�t� sin kt dt� (24.8)
(j ∈ �0, k ∈ �) for the Fourier cosine and sine coefficients of u and set
sN �u� x� �=
N∑
j=1
(
aj cos jx + bj sin jx
)+ a0
2
= 1
∫
�−
�
�
( N∑
j=1
(
cos jt cos jx + sin jt sin jx)+ 1
2
)
u�t� dt
= 1
∫
�−
�
�
u�t�
(
1
2
+
N∑
j=1
cos j�t − x�
)
︸ ︷︷ ︸
=� DN �t−x�
dt�
(24.9)
where we used the trigonometric formula
cos a cos b + sin a sin b = cos�a − b�� (24.10)
286 R.L. Schilling
The function DN �•� is called the Dirichlet kernel. In Problem 24.6 we will see
that DN �•� has the following closed-form expression:
DN �x� =
sin
(
n + 12
)
x
2 sin x2
� (24.11)
but we do not need this formula in the sequel.
Now we introduce the Cesàro C-1 mean
�N �u� x� �=
1
N + 1
(
s0�u� x� + s1�u� x� + · · · + sN �u� x�
)
� (24.12)
and in view of (24.9) we want to compute what is known as the Féjer kernel
KN �x� �=
1
N + 1
(
D0�x� + D1�x� + · · · + DN �x�
)
�
Using again (24.10) and observing that the cosine is even, we find for every
k = 0� 1� � � � � N
�1 − cos x� Dk�x� =
1
2
�1 − cos x�
k∑
j=−k
cos jx
(24.10)= 1
2
k∑
j=−k
(
cos jx − cos�j − 1�x + sin x sin jx)
= 1
2
(
cos kx − cos�k + 1�x)�
since sin jx = − sin�−jx� is an odd function which cancels if we sum over
−k � j � k. Summing over all values of k = 0� 1� � � � � N shows
KN �x� =
1
N + 1
(
D0�x� + D1�x� + · · · + DN �x�
)
= 1
2�N + 1�
1 − cos�N + 1�x
1 − cos x � (24.13)
24.11 Lemma (Féjer) If u ∈ C�−
�
, then limN →� �N �u� − u p = 0 for all
1 � p � �.
Proof From (24.9), (24.12) and (24.13) we get after a change of variables in the
integrals
�N �u� x� =
1
∫
�−
�
u�t� KN �x − t� dt
= 1
2�N + 1�
∫
�−
�
u�x − t� 1 − cos�N + 1�t
1 − cos t dt�
Measures, Integrals and Martingales 287
Since 1
∫
�−
�
KN �t� dt = 1[�], we see for all � > 0 and sufficiently small � > 0
�N �u� − u p =
∥∥∥∥
1
2�N + 1�
∫
�−
�
1 − cos�N + 1�t
1 − cos t
(
u�• − t� − u)dt
∥∥∥∥
p
12.14
�
1
2�N + 1�
∫
�−
�
1 − cos�N + 1�t
1 − cos t
∥∥u�• − t� − u
∥∥
p
dt
�
1
2�N + 1�
∫
�−����
1 − cos�N + 1�t
1 − cos t
∥∥u�• − t� − u
∥∥
p
dt
+ u p
�N + 1�
∫
�−
�−� ∪���
1 − cos�N + 1�t
1 − cos t dt
� � + u p
�N + 1�
4
1 − cos � �
where we used Jensen’s inequality and the fact that limt→0 u�• − t� − u p = 0 by
dominated convergence (p < �), resp. uniform continuity (p = �). Letting first
N → � and then � → 0 finishes the proof.
24.12 Corollary (Weierstraß) The trigonometric polynomials are dense in
C�−
�
under • � and dense in Lp��−
�
� dt�, w.r.t. • p, 1 � p < �.
Proof From (24.9), (24.12) it is obvious that �N �u� •� is a trigonometric polyno-
mial. The density of the trigonometric polynomials in C�−
�
is just Lemma
24.11. Since C�−
�
is dense in Lp��−
�
� dt�, cf. Theorem 15.17, we can
find for every � > 0 and u ∈ Lp�−
�
some g� ∈ C�−
�
with u − g� p � �
and a trigonometric polynomial t� such that g� − t� � � �2
�−1/p �. This shows
u − t� p � u − g� p + g� − t� p � � + �2
�1/p g� − t� � � 2��
For the last estimate we also used that w p � �2
�1/p w �.
24.13 Corollary The trigonometric system (24.2) is a complete ONS in L2 =
L2��−
�
� dt�
Proof (of C24.13 and, again, of T24.10) Let u ∈ L2�−
�
and pick a trigono-
metric polynomial t� such that u − t� 2 � �, cf. Corollary 24.12. Let n =
degree�t��. As in the proof of Theorem 24.10 we use de Moivre’s formula to see
that cosk x and sink x can be represented as linear combinations of 1� cos x� � � � �
cos kx and sin x� � � � � sin kx.[�]
288 R.L. Schilling
Recall that the partial sum sn�u� x� = a0/2 +
∑n
j=1�aj cos jx + bj sin jx� is the
projection of u onto span�1� cos x� sin x� � � � � cos nx� sin nx�. Therefore, Theorem
21.11(i) applies and
u − sn�u� 2 � u − t� 2 � �
proves completeness.
The above proof of the completeness of the trigonometric system has a further
advantage as it allows a glimpse into other modes of convergence of Fourier
series. We have
24.14 Corollary (M. Riesz’ theorem) Let u ∈ Lp��−
�
� dt� and 1 � p < �.
lim
n→� u − sn�u� p = 0 ⇐⇒ sn�u� p � Cp u p ∀ n ∈ � (24.14)
with an absolute constant Cp not depending on u or n ∈ �.
Proof The ‘only if’ part is a consequence of the uniform boundedness principle
(Banach–Steinhaus theorem) from functional analysis, see e.g. Rudin [40, §5.8]
or Problem 21.10.
The ‘if’ part follows from the observation that every trigonometric polynomial T
of degree � n satisfies sn�T� = T .[�] Choosing for u ∈ Lp the polynomial T = t�
with u − t� p � �, cf. Corollary 24.12, we infer that for sufficiently large n >
degree�t��
u − sn�u� p � u − t� p + t� − sn�t�� p︸ ︷︷ ︸
= 0
+ sn�t�� − sn�u� p
� �1 + Cp� u − t� p�
Establishing the estimate sn�u� p � Cp u p is an altogether different matter
and so is the whole Lp- and pointwise convergence theory for Fourier series.
Here we want to mention only a few facts:
• Lp-convergence (1 � p < �) of the Cesàro means �n�u� follows immediately
from Lemma 24.11.
This is in stark contrast to…
• Lp-convergence (1 � p < �, p �= 2) of the partial sums sn�u� requires the
estimate (24.14); see Corollary 24.14 and, for more details, Wheeden and
Zygmund [53, §12.88].
Measures, Integrals and Martingales 289
• Pointwise a.e. convergence of the partial sums sn�u�
n→�−−−→ u when u ∈ L2
or u ∈ Lp, 1 < p < �, which had been an open problem until 1966. A.N.
Kolmogorov constructed in 1922/23 a function u ∈ L1 whose Fourier series
diverges a.e. In his famous 1966 paper L. Carleson proved that a.e. convergence
holds for u ∈ L2, and R.A. Hunt extended this result in 1968 to u ∈ Lp,
1 < p < �.
All these deep results depend on estimates of the type (24.14) and, more impor-
tantly, on estimates for max0�j�n sj �u� which resemble the maximal martingale
estimates which we have encountered in Chapter 19, e.g. T19.12. But there is a
catch.
24.15 Lemma The subspace �n = span�1� cos x� sin x� � � � � cos nx� sin nx� of
L2��−
�
� dx� is not of the form L2��n� where �n is a sub-�-algebra of
the Borel sets ��−
�
.
Proof The space L2��n� is a lattice, i.e. if f ∈ L2��n�, then �f � ∈ L2��n�. Take
f�x� = sin x. Unfortunately,
� sin x � = 2
− 4
(
cos 2x
1 · 3 +
cos 4x
3 · 5 +
cos 6x
5 · 7 + · · ·
)
�
so that sin�•� ∈ �n but � sin�•�� �∈ �n. (You might also want to have a look at
Theorem 22.5 for a more systematic treatment.)
This means that martingale methods are not (immediately) applicable to Fourier
series.
The Haar system
In contrast to Fourier series, the Haar system allows a complete martingale
treatment. Throughout this section we consider L2 = L2��0� 1�� ��0� 1�� ��, � =
�1��0�1�.
24.16 Definition The Haar system consists of the functions
�0�0�x� �= 1�0�1��x��
�j�k�x� �= 2k/2
(
1[ 2j−2
2k+1 �
2j−1
2k+1
)�x� − 1[ 2j−1
2k+1 �
2j
2k+1
)�x�
)
�
1 � j � 2k� k ∈ �0�
⎫
⎪⎪⎪⎪⎪⎬
⎪⎪⎪⎪⎪⎭
(24.15)
Obviously, each Haar function is normalized to give �j�k 2 = 1. The first few
Haar functions are
290 R.L. Schilling
1
2
1
4
1
2
3
4 1
χ
0,0
1
2
χ
1,0
1
2
2
χ
1,1
1
2
2
χ
2,1
1
2
χ
1,2
1
2
χ
2,2
1
2
χ
3,2
1
2
χ
4,2
√√
It is often more convenient to arrange the double sequence (24.15) in lexicograph-
ical order: �0�0; �1�0; �1�1� �2�1; �1�2� �2�2� �3�2� �4�2; …and to relabel them in the
following way
H0 �= �0�0 � Hn = H2k+� �= ��+1�k� 0 � � � 2k − 1� (24.16)
(note that the representation n = 2k + �, 0 � � � 2k − 1 is unique). We can now
associate with the sequence �Hn�n∈� a canonical filtration
�Hn �= ��H0� H1� � � � � Hn�� n ∈ �0�
which is the smallest �-algebra that makes all functions H0� � � � � Hn measurable,
cf. Definition 7.5.
24.17 Theorem The Haar functions are a complete ONS in L2��0� 1�� dx�.
Moreover,
MN �=
N∑
n=0
an Hn� N ∈ �0� an ∈ ��
is a martingale w.r.t. the filtration ��HN �N ∈�0 , and for every u ∈ Lp��0� 1�� dx�,
1 � p < �, the Haar–Fourier series
sN �u� x� �=
N∑
n=0
�u� Hn� Hn
Measures, Integrals and Martingales 291
converges to u in Lp and almost everywhere, and the maximal inequality
∥∥∥sup
n∈�
sn�u�
∥∥∥
p
�
p
p − 1 u p
holds for all u ∈ Lp and 1 < p < �.
Proof Step 1. Orthonormality: That �j�k 2 = 1 is obvious. If the functions
�j�k �= ���m satisfy ��j�k �= 0� ∩ ����m �= 0� = ∅, it is clear that
∫
�j�k ���m d� = 0.
Otherwise, we can assume that k < m, so that
either ����m �= 0� ⊂ ��j�k = +1� or ����m �= 0� ⊂ ��j�k = −1�
obtains. In either case,
∫
���m �j�k d� = ±
∫
���m d� = 0�
Step 2. Martingale property: Let n = 2k + �. Then, for all n ∈ �,
�Hn = �
(
�0�0� �1�0� �1�1� � � � � �2k−1�k−1� �1�k� �2�k� � � � � ��+1�k
)
= �
([
0�
1
2k+1
)
� � � � �
[2� + 1
2k+1
�
2� + 2
2k+1
)
︸ ︷︷ ︸
=� �n
�
[� + 1
2k
�
� + 2
2k
)
� � � � �
[2k − 1
2k
� 1
)
︸ ︷︷ ︸
=� n
)
�
where we used that the dyadic intervals are nested and refine. Assume, for
simplicity, that � < 2k − 1. Then �Hn+1 �= 0� ∈ n, and so
∫
J
Hn+1�x� dx = 0 ∀ J ∈ �n or J ∈ n�
(If � = 2k − 1 we get an analogous conclusion with a rollover as �Hn is just the
dyadic �-algebra generated by all disjoint half-open intervals of length 2−k−1 in
�0� 1�.) By Theorem 23.5 we have E�
H
n Hn+1 = 0, and by Theorem 23.8
E�
H
N MN +1 = E�
H
N �MN + aN +1HN +1� = MN + aN +1 E�
H
N HN +1 = MN �
This shows that
(
MN � �
H
N
)
N ∈� is indeed a martingale, cf. Corollary 23.14.
Step 3. Convergence in L1 and a.e. if u ∈ L1 ∩ L�: Set ak �= �u� Hk�, so that
MN = sN �u� becomes the Haar–Fourier partial sum. Using Bessel’s inequality
(Theorem 21.11) we see
sN �u� 22 =
N∑
k=0
��u� Hk��2 � u 22� (24.17)
292 R.L. Schilling
where the right-hand side is finite since L1 ∩ L� ⊂ L2,[�] and from the Cauchy–
Schwarz C12.3 and Markov P10.12 inequalities we get for all R > 0
∫
��sN �u��>R�
�sN �u�� d� � sN �u� 2 �
(
��sN �u�� > R�
)1/2
�
1
R
sN �u� 22 �
1
R
u 22�
Since the constant function R is in L2��0� 1�� dx�, the martingale �sN �u��N ∈�
is uniformly integrable in the sense of Definition 16.1, and we conclude from
Theorems 18.6 and 23.15 that
sN �u�
N →�−−−→ u� in L1 and almost everywhere�
Since ��Hn �n∈� contains the sequence ��
�
k �k∈� of dyadic �-algebras – we have
indeed ��n = �H2n−1 – we know that �H� �= ���Hn � n ∈ �� = ��0� 1�. Just as in
Example 23.17 we see that
E�
H
2n−1 u = E��n u = s2n−1�u��
and in view of Theorem 23.15 we conclude that u = u� a.e.
Step 4. Convergence in Lp if u ∈ L1 ∩ L�: Observe that L1 ∩ L� ⊂ Lp for all
1 < p < �.[�] Applying the inequality
�a�p − �b�p � ��a�p − �b�p� =
∣∣∣∣p
∫ �b�
�a�
tp−1 dt
∣∣∣∣
� p ��a� − �b�� max{�a�p−1� �b�p−1}
� p �a − b� max{�a�p−1� �b�p−1}�
a� b ∈ �, 1 < p < �, to the martingale E�HN u = sN �u�, we get after integrating
over �0� 1�
±
∫ (�sN �u��p − �u�p
)
d� � p sN �u� − u 1 u p−1� � p > 1�
where we also used that �sN �u�� =
∣∣E�HN u
∣∣ � u � as �u� � u � < �, cf.
Theorem 23.8(ix). From Riesz’ convergence theorem T12.10 we conclude that
sN �u�
N →�−−−→ u in Lp for all 1 < p < � and all u ∈ L1 ∩ L�.
Step 5. Convergence in Lp if u ∈ Lp: If u ∈ Lp, 1 � p < �, is not bounded,
we set uk �= �−k� ∨ u ∧ k. Since we have a finite measure space, uk ∈ Lp ∩ L� ⊂
Measures, Integrals and Martingales 293
L1 ∩ L�, and we see from the triangle inequality and Theorem 23.8(v),(ii)
sN �u� − u p � sN �u� − sN �uk� p + sN �uk� − uk p + uk − u p
� sN �uk� − uk p + 2 uk − u p�
The claim follows as N → � and then k → �.
Step 6. A.e. convergence if u ∈ Lp: Since sN �u±� = E�
H
N �u±� � 0, we know
from Corollary 23.16 that
(
�sN �u
±��p
)
N ∈� are submartingales which satisfy, by
Theorem 23.8(ii),
∫
�sN �u
±��p d� �
∫ (
E�
H
N �u±�
)p
d� =
∥∥E�HN �u±�
∥∥p
p
� u± pp�
Therefore, the submartingale convergence theorem 18.2 applies and shows that
limN →� �sN �u±� x��p exists a.e., hence, limN →� sN �u� x� exists a.e. Since step 5
and Corollary 12.8 already imply limj→� sNj �u� x� = u�x� a.e. for some subse-
quence, we can identify the limit and get limN →� sN �u� x� = u�x� a.e.
Step 7. Completeness follows from lim
N →�
sN �u� − u 2 = 0 and T21.13.
Step 8. The maximal inequality is just Doob’s maximal Lp-inequality for
martingales T19.12 since �sn�u��n∈� is a uniformly integrable martingale which
is, by step 5 and Theorem 23.15, closed by s��u� = u.
24.18 Remark As a matter of fact, ordering the Haar functions in a sequence
like �Hn�n∈�0 does play a rôle. If p = 1, we can find (after some elementary but
very tedious calculations) that
∥∥∥∥�0�0 + �1�0 +
2n∑
k=1
2k/2 �1�k
∥∥∥∥
1
�
√
2 �
while the lacunary series satisfies
∥∥∥∥�1�0 +
n∑
k=1
2k �1�2k
∥∥∥∥
1
� c n
for some absolute constant c > 0. Therefore, we can rearrange
∑�
n=0 anHn in such
a way that it becomes a divergent series
∑�
n=0 a��n�H��n� for some necessarily
infinite permutation � � �0 → �0.
This phenomenon does not happen if 1 < p < �. In fact, �Hn�n∈�0 is what
one calls an unconditional basis of Lp, 1 < p < �, which means that every
rearrangement of the series
∑�
n=0 anHn converges in L
p and leads to the same
limit. The Haar system is even the litmus test for the existence of unconditional
bases: every Banach space B where �Hn�n∈�0 is a basis has an unconditional
294 R.L. Schilling
basis if, and only if, the basis �Hn�n∈�0 is unconditional, cf. Olevskiı̆ [32, p. 73,
Corollary] or Lindenstrauss and Tzafriri [27, vol. II, p. 161, Corollary 2.c.11].
Since the unconditionality of �Hn�n∈�0 rests on a martingale argument, we
include a sketch of its proof. First we need the following Burkholder–Davis–
Gundy inequalities for a martingale �uj �j∈�0 on a probability space �X� �� ��:
�p
∥∥∥ sup
0�j�N
�uj�
∥∥∥
p
�
∥∥∥
√
�u•� u• N
∥∥∥
p
� Kp
∥∥∥ sup
0�j�N
�uj�
∥∥∥
p
(BDG)
for all N ∈ �0, all 0 < p < � and some absolute constants Kp � �p > 0. The
expression �u•� u• N stands for the quadratic variation of the martingale
�u•� u• N �= �u0�2 +
N −1∑
j=0
�uj+1 − uj�2�
A proof of (BDG) can be found in Rogers and Williams [38, vol. 2, pp. 94–6].
If we combine (BDG) with Doob’s maximal Lp-inequality 19.12 we get
�p uN p �
∥∥∥
√
�u•� u• N
∥∥∥
p
�
p Kp
p − 1 uN p (BDG
′)
for all N ∈ �0 and 1 < p < � – mind the different range for p in (BDG′) compared
to (BDG). Obviously,
uN �=
N∑
k=0
�u� Hk� Hk and wN �=
N∑
k=0
�k �u� Hk� Hk�
�k ∈ �−1� +1�, are uniformly integrable martingales (use the argument of the proof
of Theorem 24.17) and their quadratic variations �u•� u• N = �w•� w• N coincide.
Therefore, (BDG′) shows that the martingales �uN − un�N�n and �wN − wn�N�n
satisfy
uN − un p ∼
∥∥�u• − un� u• − un 1/2N
∥∥
p
=
∥∥�w• − wn� w• − wn 1/2N
∥∥
p
∼ wN − wn p�
where a ∼ b means that � a � b � K a for some absolute constants �� K > 0, so
that either both sequences converge or diverge. Let us assume that �uN �N ∈�0
converges. Then every lacunary series
�∑
j=1
�u� Hkj � Hkj converges (24.18)
since we can produce its partial sums by adding and subtracting uN and wN
with suitable ±1-sequences ��k�k∈�. This entails that for every fixed permutation
Measures, Integrals and Martingales 295
� � �0 → �0
∥∥∥∥
N∑
k=n
�u� H��k�� H��k�
∥∥∥∥
p
� �� N > n sufficiently large�
Otherwise, we could find finite sets �0� �1� �2� � � � ⊂ �0 with �kj �j∈� =
⋃
n∈� �n
and ∥∥∥∥
∑
k∈�n
�u� Hk� Hk
∥∥∥∥
p
> � ∀ n ∈ ��
contradicting (24.18). For more on this topic we refer to Lindenstrauss and
Tzafriri [27].
The Haar wavelet
Let us now consider a Haar system on the whole real line, i.e. in L2 =
L2��� ����� dx�. We begin with the remark that the functions �0�0 = 1�0�1� and
�1�0 = 1�0�1/2� − 1�1/2�1� are the two basic Haar functions, since we can reconstruct
all Haar functions �j�k from them by scaling and shifting:
�j�k�x� = 2k/2�1�0�2kx − j + 1�� k ∈ �0� j = 1� 2� � � � � 2k� (24.19)
The advantage of (24.19) over the definition (24.15) is that (24.19) easily extends
to all pairs �j� k� ∈ �2 and, thus to a system of functions on �.
24.19 Definition The Haar wavelets are the system
(
�j�k
)
j�k∈� where the mother
wavelet is ��x� �= 1�0�1/2��x� − 1�1/2�1��x� and
�j�k�x� �= 2k/2 ��2kx − j� = 2k/2
(
1[ 2j
2k+1 �
2j+1
2k+1
)�x� − 1[ 2j+1
2k+1 �
2j+2
2k+1
)�x�
)
for all j� k ∈ �.
Note that � = �1�0 = �1�0, �j−1�k = �j�k for all j = 1� 2� � � � � 2k and k ∈ �0 while
�−1�0�x� = 2−1/2�0�0�x� for 0 � x < 1.
The Haar wavelets can be treated by martingale methods. To do so, we
introduce the two-sided dyadic filtration
��n+1 = �
([
j
2n+1 �
j+1
2n+1
)
� j ∈ �
)
= �(�j�n � j ∈ �
)
� n ∈ ��
��−� =
⋂
n∈�
��n = �∅� ��� ��� = �
( ⋃
n∈�
��n
)
= �����
(24.20)
296 R.L. Schilling
The last assertion follows from the fact that D = �j2−n−1 � j ∈ �� n ∈ �� is a
dense subset of � and that ���� is generated by all intervals of the form �a� b�
where a� b ∈ D (or, indeed, any other dense subset).[�]
In what follows we have to consider double summations. To keep notation
simple, we write
�∑
k=−�
�∑
j=−�
aj�k as a shorthand for
�∑
k=−�
[ �∑
j=−�
aj�k
]
and call
∑
k�const.
∑�
j=−� the right tail and
∑
−�
E�
�
−M uR = 2−M
∫
�−R�0
u�x� dx 1�−2M �0� + 2−M
∫
�0�R
u�x� dx 1�0�2M �
where we used that E�
�
−M projects onto the intervals �j2M � �j + 1�2M �, and we
find from the Hölder inequality T12.2 with p−1 + q−1 = 1 that
∣∣E��−M uR�x�
∣∣� 2−M R1/q u p 1�−2M �2M ��x��
which implies
∥∥E��−M uR
∥∥
p
� 2−M R1/q u p �2 · 2M �1/p = cR 2−M�1−1/p� u p�
298 R.L. Schilling
Finally, by Theorem 23.8(v),(ii),
∥∥E��−M u
∥∥
p
�
∥∥E��−M �u − uR�
∥∥
p
+
∥∥E��−M uR
∥∥
p
�
∥∥u − uR
∥∥
p
+ cR 2−M�1−1/p� u p�
and we get limM→� E�
�
−M u p = 0 for all u ∈ Lp, 1 < p < �, letting first M → �
and then R → �.
This shows that u−M�N
M�N →�−−−−−→ u in Lp, 1 < p < �, and the proof of the
convergence of (24.21) in Lp, 1 < p < �, is complete.
Step 6. Completeness of the Haar wavelets in L2 follows if we apply (24.21)
in the case p = 2, cf. Theorem 21.13.
Step 7. A.e. convergence of the left tail of (24.21): Observe that
A �=
{∣∣E��−M u
∣∣ > � � for infinitely many M ∈ �
}
=
�⋂
M=1
�⋃
j=M
{∣∣E��−j u
∣∣ > �
}
︸ ︷︷ ︸
∈ ��−M
∈ ��−��
By the martingale maximal inequality, Lemma 19.11, for the reversed martingale(
E�
�
−j u
)
j∈� and Theorem 23.8(ii) we see
��A� � �
( �⋃
j=M
{∣∣E��−j u
∣∣ > �
})
� �
({
sup
j∈�
∣∣E��−j u
∣∣ > �
})
�
1
�p
∥∥E��−1 u
∥∥
p
�
1
�p
u p�
This shows that ��A� < �. Since ��−� = �∅� �� is the trivial �-algebra, we
conclude that ��A� = 0 or A = ∅. Therefore, E��−M u M→�−−−→ 0 almost everywhere
and so uN�−M
M�N →�−−−−−→ u almost everywhere.
Step 8: The maximal inequality (24.22): From step 2 we know that
∥∥ sup
N�M∈�
uN�−M
∥∥
p
=
∥∥∥∥ sup
N�M∈�
N∑
k=−M
�∑
j=−�
�u� �j�k� �j�k
∥∥∥∥
p
=
∥∥∥ sup
N ∈�
E�
�
N +1 u − inf
M∈�
E�
�
−M u
∥∥∥
p
�
p
p − 1 u p +
∥∥E��−1 u
∥∥
p
�
Measures, Integrals and Martingales 299
The last estimate follows from a combination of Minkowski’s inequality, Doob’s
maximal Lp-inequality for martingales T19.12 applied to the closed (by u) martin-
gale
(
E�
�
N u
)
N ∈�∪���, cf. step 3 and Theorem 23.15, and the fact that
(∣∣E��−M u
∣∣p)
M∈�
is a reversed submartingale, cf. Example 17.3(vi) or Corollary 23.16, which
entails E��−M u p � E�
�
−1 u p. Since by T23.8(ii) conditional expectations are
contractions on Lp, we have E��−1 u p � u p, and the proof is completed.
A nice introduction to the Haar and other wavelets is Pinsky [35].
The Rademacher functions
Let L2 = L2(�0� 1�� ��0� 1�� �), � = �1��0�1�. The Rademacher functions �Rk�k∈�0
are functions on L2 defined by
R0 �= 1�0�1�� R1 �= 1�0� 12 � − 1� 12 �0�� R2 �= 1�0� 14 � − 1� 14 � 12 � + 1� 12 � 34 � − 1� 34 �1�� � � �
The graphs of the first four Rademacher functions are
1
1
4
1
2
3
4 1
R0
R1
R2
R3
In terms of Haar functions we have
R0 = �0�0� Rk+1 =
1
2k/2
2k∑
j=1
�j�k� k ∈ �0� (24.25)
Another equivalent definition of the Rademacher system is the following: expand
each x ∈ �0� 1� as binary series, x = ∑�j=1 �j 2−j with �j ∈ �0� 1� – we exclude
expansions terminating with a string of 1s to enforce uniqueness – and set
R0�x� �= 1�0�1��x�� Rk�x� �= 2�k − 1�
Yet another way to think of the functions Rk is as right-continuous versions of
sign changes: Rk�x� ≈ sgn sin�2k
x�, k ∈ �0.
24.21 Lemma The system of Rademacher functions �Rk�k∈� is an ONS of inde-
pendent4 functions in L2��0� 1�� dx� which is not complete.
4 In the sense of Example 17.3(x) and Scholium 17.4.
300 R.L. Schilling
Proof Orthonormality follows since
∫
�Rk=±1� R� d� = 0 for all k < �, thus∫
RkR�d� = 0 while
∫
R2k d� = 1 is obvious.
In very much the same way we deduce that
∫
RkR1R2 d� = 0 for all k ∈ �0
which shows that the system �Rk�k∈�0 is not complete.
Independence is a special case of Scholium 17.4 with p = q = 1/2.
Although �Rk�k∈�0 is not complete in L
2, it still has good a.e. convergence
properties. The reason for this is formula (24.25) and independence.
24.22 Theorem The Rademacher series
∑�
k=1 ck Rk, ck ∈ �, converges almost
everywhere if, and only if,
∑�
k=0 c
2
k < �.
Proof Assume first that
∑�
k=0 c
2
k < �. In view of (24.25) we set cj�k �= 2−k/2 ck
and rearrange the absolutely convergent series as
�∑
k=0
c2k =
�∑
k=0
2k∑
j=1
c2j�k < ��
We can now interpret the double sequence �cj�k � 1 � j � 2
k� k ∈ �0� as coeffi-
cients of the complete (!) Haar ONS ��j�k � 1 � j � 2
k� k ∈ �0�. From Parseval’s
identity T21.11(iv) we then conclude that the series
�∑
k=0
ck Rk =
�∑
k=0
2k∑
j=1
cj�k �j�k
converges almost everywhere and in L2 to some element u ∈ L2.
Conversely, assume that the series
∑�
k=0 ck Rk converges to a finite limit s�x�
for all x ∈ E ∈ ��0� 1� such that ��E� > 0. Writing sN for the N th partial sum of
this series, we see that
A�N � �=
�⋃
j=N
{
x ∈ E � �sj �x� − s�x�� > 12
}
and
⋂
N ∈�
A�N � = ∅�
By the continuity of measures T4.4 we find for every � > 0 some N = N� ∈ �
such that
��A�N �� < � < 12 ��E� and ��E \ A�N �� > 0�
In particular, if E∗ �= E \ A�N �,
�sj �x� − sk�x�� � �sj �x� − s�x�� + �s�x� − sk�x�� � 1 ∀ j� k > N� x ∈ E∗�
Measures, Integrals and Martingales 301
and an application of the Cauchy–Schwarz inequality for (double) series, cf.
(12.13), shows
��E∗� �
∫
E∗
( N∑
k=M+1
ck Rk
)2
︸ ︷︷ ︸
�1
d�
= ��E∗�
N∑
k=M+1
c2k + 2
∑
M
�Rj RkR�=±1� Rm d� = 0, we see that
∫
�Rj Rk� �R�Rm� d� = 0 if �j� k� �= ��� m��
This shows that �Rj Rk�0�j
c2k − 2
( ∑
k>M
c2k
)
��E∗�
4
= ��E
∗�
2
∑
k>M
c2k�
which implies
∑
k>M c
2
k � 2, i.e.
∑�
k=0 c
2
k < �, and we are done.
It is possible to extend the Rademacher system explicitly to a complete ONS.
This can be achieved by the following construction:
w0 �= R0� wn �= Rj1+1 · Rj2+1 · � � � · Rjk+1� n ∈ �� (24.27)
302 R.L. Schilling
where n = 2j1 + 2j2 + · · · + 2jk is the unique dyadic representation of n ∈ � where
0 � j1 < j2 < � � � < jk. A similar argument to the one used in the second part
of the proof of Theorem 24.22 shows that �wn�n∈�0 is indeed an ONS. Note that
Rk+1 = w2k , so that �Rk�k∈�0 ⊂ �wn�n∈�0 .
24.23 Definition The system (24.27) is called the Walsh orthonormal system (in
Paley’s ordering).
The Walsh system is a complete ONS, cf. Alexits [1, pp. 61–3] or Schipp et al.
[41], and it is susceptible to a complete martingale treatment, cf. [41]. Again one
considers the filtration of dyadic �-algebras ���n �n∈�0 on �0� 1� and the special
partial sums
s2n−1�u� �=
2n−1∑
j=1
�u� wj� wj �
Then sn�u� = E�
�
n u and we have the full martingale toolkit at our disposal. With
the methods used so far it is possible to show that s2n−1�u�
n→�−−−→ u a.e. and in
Lp, 1 � p < �. The case of general partial sums sn�u� is somewhat harder to
handle but it is still doable with some variations of the techniques presented here;
see Schipp et al. [41, Chapters 4, 6].
Well-behaved orthonormal systems
For the Haar system and the Haar wavelet we could use martingale methods.
A close inspection of our proofs reveals that the crucial input for getting martin-
gales is that the ONS �ej �j∈�0 satisfies
E�n en+1 = 0� n ∈ �0� (24.28)
where �n = ��e0� e1� � � � � en�. This condition implies immediately that the partial
sum
∑n
j=0 cj ej is a martingale w.r.t. the filtration ��n�n∈�0 generated by the ONS
�ej �j∈�.
24.24 Definition Let �X� �� �� be a �-finite measure space and 1 � p < �.
A family of functions �ej �j∈�0 ⊂ Lp�X� �� �� satisfying (24.28) is called a system
of martingale differences.
For a system of martingale differences no orthogonality is required. The archetype
of martingale differences are sequences of independent5 functions �fk�k∈�0 ⊂
5 In the sense of Example 17.3(x).
Measures, Integrals and Martingales 303
L2 �⊂ L1, since � is a probability measure) which are normalized such that∫
fk d� = 0 and
∫
f 2k d� = 1. Our methods used in connection with the Haar
system and Haar wavelets still apply and yield
24.25 Theorem Let �X� �� �� be a �-finite measure space and let �ej �j∈�0 be
an ONS of martingale differences in L2�X� �� ��. Then
sn�u� x� �=
n∑
j=0
�u� ej� ej �x�� n ∈ �� u ∈ L1 ∩ L2�
is a martingale w.r.t. the filtration �n �= ��e0� e1� � � � � en�. For every u ∈ L2 the
sequence �sn�u��n∈� converges a.e. and satisfies the following maximal inequality:
∥∥∥sup
n∈�
sn�u�
∥∥∥
2
� 2 u 2� u ∈ L2�
Proof That the sequence of partial sums satisfies E�n sn+1�u� = sn�u� for u ∈
L2 and is, for u ∈ L1 ∩ L2, a martingale is clear. Therefore, Corollary 23.16
shows that ��sn�u±��2�n∈� are submartingales, and from Bessel’s inequality, cf.
Theorem 21.11,
sup
n∈�
sn�u±� 2 � u± 2� u ∈ L2�
we conclude that ��sn�u±��2�n∈� satisfy the conditions of the submartingale con-
vergence theorem 18.2. Thus limn→� �sn�u±��2 exists a.e. in �0� ��, and, since
sn�u
±� � 0, so does limn→� sn�u�.
From Doob’s maximal inequality T19.12 and Bessel’s inequality T21.11 we
get
∥∥∥ sup
n�N
sn�u�
∥∥∥
2
� 2 sN �u� 2 � 2 u 2� N ∈ ��
and the usual monotone convergence argument proves the maximal inequality as
N → �.
In the situation of Theorem 24.25 we cannot say much more about the limit
limn→� sn�u� x� apart from its mere existence. In particular, the partial sums
limn→� sn�u� can converge to something completely different from u! Consider,
for example, the system of Rademacher functions �Rn�n∈�0 , which is clearly a
system of martingale differences. If u = R1R2 we get
�u� Rj� = �R1R2� Rj� =
∫
�0�1�
R1�x�R2�x�Rj �x� dx = 0
304 R.L. Schilling
for all j ∈ �. Thus sn�R1R2� ≡ 0 is convergent, but limn→� sn�R1R2� ≡ 0 �= R1R2.
The reason is that the Rademacher functions are not complete in L2. This also
means that we cannot hope to get Lp-convergence in Theorem 24.25.
24.26 Theorem Let �X� �� �� be a �-finite measure space and �ej �j∈�0 ⊂ L2���
be an ONS of martingale differences. Denote by sn�u� the partial sum
sn�u� x� �=
n∑
j=0
�u� ej� ej �x�� u ∈ L2�
and by �n �= ��e0� e1� � � � � en� the associated canonical filtration. Then the fol-
lowing assertions are equivalent:
(i) �ej �j∈�0 is a complete ONS.
(ii)
∫
A
sn�u� d� =
∫
A
u d� for all A ∈ �n, ��A� < �, and u ∈ L2���.
(iii) E�n u = sn�u� for all u ∈ L2���.
(iv) lim
n→� sn�u� − u p = 0 for all u ∈ L
p��� and all 1 � p < �.
Proof (i)⇒(ii): Since �ej �j∈�0 is complete, we know from Theorem 21.13 that
limn→� sn�u� − u 2 = 0 for all u ∈ L2. Using the Cauchy–Schwarz inequality
12.3 we see for every A ∈ �n with ��A� < �∫
A
�sn�u� − u� d� � sn�u� − u 2 · 1A 2
n→�−−−→ 0�
Thus limn→�
∫
A
sn�u� d� =
∫
A
u d�. Since �ej �j∈�0 is a system of martingale
differences, we know that E�n en+k = 0, k ∈ �,[�] and by Theorem 23.9, applied
to the function 1A en+k ∈ L1 and A ∈ �n,
∫
A
en+k d� =
∫
A
1A en+k d� = 0�
Therefore
∫
A
u d� = limj→�
∫
A
sj �u� d� =
∫
A
sn�u� d� holds for all n ∈ � and
A ∈ �n with ��A� < �.
(ii)⇒(iii) Since 1A u ∈ L1 for all A ∈ �n with ��A� < � and u ∈ L2, Theorem
23.9 and 23.8(vii) show
∫
A
u d� =
∫
A
1A u d� =
∫
A
E�n �1A u� d�
=
∫
A
1A E
�n �u� d� =
∫
A
E�n �u� d��
Together with the assumption this gives
∫
A
E�n u d� =
∫
A
sn�u� d� ∀ A ∈ �n� ��A� < ��
Measures, Integrals and Martingales 305
Choose, in particular, for every k ∈ � the set {sn�u� > 1k + E�n u
} ∈ �n. By
Markov’s inequality P10.12 we see
�
({
sn�u� − E�n u > 1k
})
� k2
∥∥sn�u� − E�n u
∥∥2
2
< ��
so that the above equality becomes
0 =
∫
{
sn�u�>
1
k
+E�n u
}
(
sn�u� − E�n u
)
d� �
1
k
�
({
sn�u� >
1
k
+ E�n u}) �
This is only possible if �
({
sn�u� >
1
k
+ E�n u}) = 0. A similar argument for the
set
{
sn�u� <
1
k
+ E�n u} finally shows
�
({
sn�u� �= E�n u
}) = �
( ⋃
k∈�
{�sn�u� − E�n u� > 1k
})
�
∑
k∈�
�
({�sn�u� − E�n u� > 1k
}) = 0�
Therefore, sn�u� = E�n u a.e.
(iii)⇒(iv) For u ∈ L1 ∩ L� Theorem 23.15 shows that (E�n u)
n∈� is a uniformly
integrable martingale and that E�n u
n→�−−−→ u in L1 and a.e. As in step 4 of the
proof of Theorem 24.17, we use the inequality
�a�p − �b�p � p �a − b� max��a�p−1� �b�p−1�� a� b ∈ �� p > 1�
to deduce that
±
∫ (∣∣E�n u
∣∣p − �u�p)d� � p
∥∥E�n u − u
∥∥
1
· u p−1�
and, by Riesz’ convergence theorem 12.10, that E�n u
n→�−−−→ u in Lp.
If u ∈ Lp is not bounded, we take an exhausting sequence �Ak�k∈� ⊂ � with
Ak ↑ X and ��Ak� < � and set uk �=
(
�−k� ∨ u ∧ k)1Ak . Clearly, uk ∈ L1 ∩ L�,
and we see using Theorem 23.8(v),(ii),
∥∥E�n u − u
∥∥
p
�
∥∥E�n u − E�n uk
∥∥
p
+
∥∥E�n uk − uk
∥∥
p
+ uk − u p
�
∥∥E�n uk − uk
∥∥
p
+ 2 uk − u p�
The claim follows if we let first n → � and then k → �.
(iv)⇒(i) is just p = 2 combined with Theorem 21.13.
If we know that the elements of the ONS are independent, we obtain the
following necessary and sufficient conditions for pointwise convergence which
generalize Theorem 24.22.
306 R.L. Schilling
24.27 Theorem Let �X� �� P� be a probability space and �ej �j∈�0 ⊂ L2�P� be
independent random variables such that
∫
ej dP = 0 and
∫
e2j dP = 1
and let �cj �j∈�0 ⊂ � be a sequence of real numbers. Then
(i) The family �ej �j∈�0 is an ONS of martingale differences;
(ii) If
∑�
j=0 c
2
j < �, then
∑�
j=0 cj ej converges in L
2�P� and a.e.;
(iii) If supj∈�0 ej � � � < � and if
∑�
j=0 cj ej converges almost everywhere,
then
∑�
j=0 c
2
j < �.
Proof (i) We set
�n �= ��e0� e1� � � � � en�� and un �=
n∑
j=0
cj ej �
Since P is a probability measure, uj ∈ L2�P� ⊂ L1�P� and under our assumptions
it is clear that �uj � �j �j∈�0 is a martingale.
[�]
By independence we have
∫
ej ek dP =
⎧
⎪⎨
⎪⎩
∫
ej dP ·
∫
ek dP = 0 if j �= k�
∫
e2j dP = 1 if j = k�
which entails
∫
u2n dP =
n∑
j�k=0
cj ck
∫
ej ek dP =
n∑
j=0
c2j (24.29)
and also
∫
un un+k dP =
∫
E�n un un+k dP =
∫
un E
�n un+k dP =
n∑
j=0
c2j � (24.30)
(ii) Because of (24.29) we see that
un 21 � un 22 =
n∑
j=0
c2j �
�∑
j=0
c2j < ��
and the martingale convergence theorem C18.3 shows that un
n→�−−−→ u� a.e. Using
(24.30) we conclude that
∫
�un+k − un�2 dP =
∫ (
u2n+k − 2unun+k + u2n
)
dP
=
∫
u2n+k dP −
∫
u2n dP =
n+k∑
j=n+1
c2j �
�∑
j=n+1
c2j �
Measures, Integrals and Martingales 307
Thus, by Fatou’s lemma 9.11,
∫
�u� − un�2 dP � lim inf
k→�
∫
�un+k − un�2 dP �
�∑
j=n+1
c2j
n→�−−−→ 0�
and un
n→�−−−→ u� follows in the L2-sense.
(iii) Since ej and �j−1 are independent, we find for all A ∈ �j−1
∫
A
�uj − uj−1�2 dP =
∫
A
c2j e
2
j dP
�17�6�= c2j P�A� = c2j
∫
A
dP � (24.31)
Essentially the same calculation that was used in (24.30) also yields
∫
A
�uj − uj−1�2 dP =
∫
�1A uj − 1A uj−1�2 dP
=
∫
�1A uj �
2 dP −
∫
�1A uj−1�
2 dP
=
∫
A
�u2j − u2j−1� dP �
which can be combined with (24.31) to give
∫
A
(
u2n −
n∑
j=0
c2j
)
dP =
∫
A
(
u2n−1 −
n−1∑
j=0
c2j
)
dP ∀ A ∈ �n−1�
This means, however, that wn �= u2n −
∑n
j=0 c
2
j is a martingale.
Consider the stopping time � = �� �= inf�n ∈ �0 � �un� > ��, inf ∅ = �. Since
the series
∑�
j=0 cj ej converges a.e., we can choose � > 0 in such a way that
�2P��� < ��� < 12 P��� = ����
Without loss of generality we may also take �2 >
∣∣∫ w0 dP
∣∣+
∣∣∫ u20 dP
∣∣.
The optional sampling theorem 17.8 proves that �wn∧� �n∈� is again a martingale
and, therefore,
∫
w0 dP =
∫
wn∧� dP =
∫
u2n∧� dP −
∫ n∧�∑
j=0
c2j dP� (24.32)
Taking into account the very definition of � we find furthermore
∫
u2n∧� dP =
∫
��>n�
u2n∧� dP +
∫
�1���n�
u2n∧� dP +
∫
��=0�
u2n∧� dP
� 2�2 +
∫
�1���n�
u2� dP
308 R.L. Schilling
= 2�2 +
∫
�1���n�
�c� e� + u�−1�2 dP
� 2�2 + 2
∫
�1���n�
(
c2� e
2
� + u2�−1
)
dP �
where we used the elementary inequality �a + b�2 � 2a2 + 2b2 in the last line.
Since the ej are uniformly bounded by � and since �u�−1� � �, we get
∫
u2n∧� dP � 4�
2 + �2
∫
���n�
c2� dP
� 4�2 + �2P��� � n��
n∑
j=0
c2j (24.33)
� 4�2 + 12 P��� = ���
n∑
j=0
c2j �
since, by construction, �2 P��� � n�� � �2 P��� < ��� < 12 P��� = ���.
Rearranging (24.32) and combining this with the above estimates we obtain
P��� = ���
n∑
j=0
c2j =
∫
��=��
n∧�∑
j=0
c2j dP �
∫ n∧�∑
j=0
c2j dP
(24.32)=
∫
u2n∧� dP −
∫
w0 dP
(24.33)
� 4�2 + 12 P��� = ���
n∑
j=0
c2j + �2�
uniformly for all n ∈ �. Since, by assumption, P��� = ��� > 0 for sufficiently
large �, we conclude that
∑�
j=0 c
2
j < �.
Theorem 24.27 has an astonishing corollary if we apply the Burkholder–Davis–
Gundy inequalities (BDG) from p. 294 to the martingale
wn �= un+k − uk =
n+k∑
j=k+1
cj ej
w.r.t. the filtration n �= �n+k �= ��e0� e1� � � � � en+k�. The part of the inequalities
which is important for our purposes reads
�p wn p � �p
∥∥∥ sup
0�j�n
�wn�
∥∥∥
p
�
∥∥∥
√
�w•� w• n
∥∥∥
p
� (24.34)
Measures, Integrals and Martingales 309
where n ∈ �0, 0 < p < �, and the quadratic variation is given by
�w•� w• n = �u•+k − uk� u•+k − uk n =
n−1∑
j=0
�uj+k+1 − uj+k�2 =
n+k∑
j=k+1
c2j e
2
j �
If we happen to know that supj∈� ej � � � < �, we even find
∥∥∥
√
�w•� w• n
∥∥∥
�
� �
( n+k∑
j=k+1
c2j
)1/2
and we conclude from (24.34) that for all n� k ∈ � and 0 < p < �
�p un+k − uk p �
∥∥�u•+k − uk� u•+k − uk 1/2n
∥∥
p
�
∥∥�u•+k − uk� u•+k − uk 1/2n
∥∥
� � �
( n+k∑
j=k+1
c2j
)1/2
holds. This proves immediately the following
24.28 Corollary Let �X� �� P� be a probability space and let �ej �n∈�0 be a
sequence of independent random variables such that
sup
j∈�0
ej � < ��
∫
ej dP = 0 and
∫
e2j dP = 1�
Then un �=
∑n
j=0 cj ej converges in L
2 and a.e. to some u ∈ L2 if, and only if,∑�
j=0 c
2
j < �.
If the latter is the case, u ∈ Lp and the convergence takes place in Lp-sense
for all 0 < p < �.
Unfortunately, many ONSs of martingale differences are incomplete and seem
to behave more often like Rademacher functions than Haar functions. More on
this topic can be found in the paper by Gundy [18] and the book by Garsia [16].
24.29 Epilogue The combination of martingale methods and orthogonal expan-
sions opens up a whole new world. Let us illustrate this by a rapid construction
of one of the most prominent stochastic process: the Wiener process or Brownian
motion.
Choose in Theorem 24.27 �X� �� P� = ��0� 1 � ��0� 1 � �� where � is one-
dimensional Lebesgue measure on �0� 1 ; denoting points in �0� 1 by �, we
will often write d� instead of ��d��. Assume that the independent, identically
310 R.L. Schilling
distributed random variables ej are all standard normal Gaussian random vari-
ables, i.e.
P�ej ∈ B� =
1√
2
∫
B
e−x
2/2 dx� B ∈ ��0� 1 �
and consider the series expansion
Wt��� �=
�∑
n=0
en����1�0�t � Hn�� � ∈ �0� 1 �
Here t ∈ �0� 1 is a parameter, �u� v� = ∫ 10 u�x�v�x� dx, and Hn, n = 2k + j,
0 � j < 2k, denote the lexicographically ordered Haar functions (24.16). A short
calculation confirms for n � 1
�1�0�t � Hn� =
∫ t
0
Hn�x� dx = 12 2k/2
∫ t
0
H1�2
kx − j� dx = 12 2−k/2 Fn�t��
where F1�t� =
∫ t
0 H1�x� dx 1�0�1 �t� = 2t1�0� 12 ��t� − �2t − 2�1� 12 �1 �t� is a tent-
function and Fn�t� �= F1�2kt − j�. Since 0 � Fn � 1, we see
�∑
n=0
��1�0�t � Hn��2 �
1
4
�∑
n=0
2−k = 1
2
�
and Theorem 24.27(ii) guarantees that Wt��� exists, for each t ∈ �0� 1 , both in
L2�d��-sense and ��d��-almost everywhere.
More is true. Since the en are independent Gaussian random variables, so are
their finite linear combinations (e.g. Bauer [5, §24]) and, in particular, the partial
sums
SN �t� �� �=
N∑
n=0
en����1�0�t � Hn��
Gaussianity is preserved under L2-limits;6 we conclude that Wt��� has a Gaussian
distribution for each t. The mean is given by
∫ 1
0
Wt��� d� =
�∑
n=0
∫ 1
0
en��� d� �1�0�t � Hn� = 0
(to change integration and summation use that L2�d��-convergence entails
L1�d��-convergence on a finite measure space). Since
∫
enem d� = 0 or 1
6 (cf. [5, §§23, 24]) if Xn is normal distributed with mean 0 and variance �
2
n , its Fourier transform is∫
ei�Xn dP = e� 2n �2 /2 . If Xn
n→�−−−→ X in L2 -sense, we have � 2n → � 2 and, by dominated convergence,∫
ei�X dP = limn
∫
ei�Xn dP = limn e�
2
n �
2 /2 = e� 2 �2 /2 ; the claim follows from the uniqueness of the Fourier
transform.
Measures, Integrals and Martingales 311
according to n �= m or n = m, we can calculate for 0 � s < t � 1 the variance by
∫ 1
0
�Wt��� − Ws����2 d�
=
�∑
n�m=0
∫ 1
0
en���em��� d� �1�0�t − 1�0�s � Hn��1�0�t − 1�0�s � Hm�
=
�∑
n=0
�1�s�t � Hn�2
24.17,21.13= �1�s�t � 1�s�t � = t − s�
In particular, the increment Wt − Ws has the same probability distribution as Wt−s.
In the same vein we find for 0 � s < t � u < v � 1 that
∫ 1
0
�Wt��� − Ws�����Wv��� − Wu���� d� = �1�s�t � 1�u�v � = 0�
Since Wt − Ws is Gaussian, this proves already the independence of the two
increments Wt − Ws and Wv − Wu, cf. [5, §24]. By induction, we conclude that
Wtn − Wtn−1 � � � � � Wt1 − Wt0 �
are independent for all 0 � t0 � · · · � tn � 1.
Let us finally turn to the dependence of Wt��� on t. Note that for M < N
∫ 1
0
sup
t∈�0�1
∣∣SN �t� �� − SM �t� ��
∣∣d� =
∫ 1
0
sup
t∈�0�1
∣∣∣∣
N∑
n=M+1
en����1�0�t � Hn�
∣∣∣∣d�
�
N∑
n=M+1
∫ 1
0
�en���� d�
︸ ︷︷ ︸
= const.
sup
t∈�0�1
��1�0�t � Hn��
� C
N∑
n=M+1
1
2 2
−k/2 < ��
which means that the partial sums SN �t� �� of Wt��� converge in L
1�d�� uni-
formly for all t ∈ �0� 1 . By C12.8 we can extract a subsequence, which converges
(uniformly in t) for ��d��-almost all � to Wt���; since for fixed � the partial
sums t �→ SN �t� �� are continuous functions of t, this property is inherited by the
a.e. limit Wt���.
The above construction is a variation of a theme by Lévy [26, Chap. I.1,
pp. 15–20] and Ciesielski [10]. In one or another form it can be found in many
probability textbooks, e.g. Bass [3, pp. 11–13] or Steele [45, pp. 35–39]. A related
construction of Wiener, see Paley and Wiener [34, Chapter XI], using random
Fourier series, is discussed in Kahane [23, §16.1–3].
312 R.L. Schilling
Problems
24.1. Prove the orthogonality relation for the Jacobi polynomials 24.1.
24.2. Use the Gram–Schmidt orthonormalization procedure to verify the formulae for
the first few Chebyshev, Legendre, Laguerre and Hermite polynomials given in
24.1–24.5.
24.3. State and prove Theorem 24.6 and Corollary 24.8 for an arbitrary compact interval
�a� b .
24.4. Prove the orthogonality relations (24.4) for the trigonometric system.
[Hint: observe that Im
(
ei�x+y� + ei�x−y�) = 2 sin x cos y.]
24.5. (i) Show that for suitable constants cj � sj ∈ � and all k ∈ �0
cosk x =
k∑
j=0
cj cos jx and sin
k+1
x =
k+1∑
j=1
sj sin jx �
(ii) Show that for suitable constants aj � bj ∈ � and all k ∈ �
cos kx =
k∑
j=0
aj cos
k−j x sinj x and sin kx =
k∑
j=1
bj cos
k−j x sinj x �
(iii) Deduce that every trigonometric polynomial Tn�x� of order n can be written
in the form
Un�x� =
n∑
j�k=0
�j�k cos
j x sink x
and vice versa.
24.6. Use the formula sin a − sin b = 2 cos a+b
2
sin a−b
2
to show that DN �x� sin
x
2
=
1
2
sin
(
N + 1
2
)
x. This proves (24.11).
24.7. Find the Fourier series expansion for the function � sin x �.
24.8. Let u�x� = 1�0�1��x�. Show that the Haar–Fourier series for u converges for all
1 � p < � in Lp-sense to u. Is this also true for the Haar wavelet expansion?
24.9. Show that the Haar–Fourier series for u ∈ Cc��� converges uniformly for every
x ∈ � to u�x�. Show that this remains true for functions u ∈ C����, i.e. the set
of continuous functions such that lim�x�→� u�x� = 0.
[Hint: use the fact that u ∈ Cc is uniformly continuous. For u ∈ C� observe that
C� = Cc
• � (closure in sup-norm) and check that �sN �u� x�� � u �.]
24.10. Extend Problem 24.9 to the Haar wavelet expansion.
[Hint: use Problem 24.9 and show that
∥∥E��−N u
∥∥
�
N →�−−−→ 0 for all u ∈ Cc���.]
24.11. Let u�x� = 1�0�1/3��x�. Prove that the Haar–Fourier diverges at x = 13 .
[Hint: verify lim inf N →� sN �u�
1
3
� < lim supN →� sN �u�
1
3
�.]
Appendix A
lim inf and lim sup
For a sequence of real numbers �aj �j∈� ⊂ � the limes inferior or lower limit is
defined as
lim inf
j→�
aj �= sup
k∈�
inf
j�k
aj � (A.1)
and the limes superior or upper limit is defined as
lim sup
j→�
aj �= inf
k∈�
sup
j�k
aj � (A.2)
Lower and upper limits of a sequence are always defined as numbers in
�−�� +�� and �−�� +��, respectively. This is due to the fact that the
sequences
(
inf j�k aj
)
k∈� ⊂ �−�� +�� and
(
supj�k aj
)
k∈� ⊂ �−�� +�� are
in- resp. decreasing, so that the supk∈� and inf k∈� in (A.1) and (A.2) are
actually (improper) limits limk→�.
Let us collect a few simple properties of lim inf and lim sup.
A.1 Properties (of lim inf and lim sup). Let �aj �j∈� and �bj �j∈� be sequences
of real numbers.
(i) lim inf
j→�
aj = lim
k→�
inf
j�k
aj and lim sup
j→�
aj = lim
k→�
sup
j�k
aj .
(ii) lim inf
j→�
aj = − lim sup
j→�
�−aj �.
(iii) lim inf
j→�
aj � lim sup
j→�
aj .
(iv) lim inf
j→�
aj and lim sup
j→�
aj are limits of subsequences of �aj �j∈� and all other
limits L of subsequences of �aj �j∈� satisfy
lim inf
j→�
aj � L � lim sup
j→�
aj �
313
314 R.L. Schilling
(v) lim
j→�
aj ∈ � exists ⇐⇒ −� < lim inf
j→�
aj = lim sup
j→�
aj < +�.
In this case lim
j→�
aj = lim inf
j→�
aj = lim sup
j→�
aj .
(vi) lim inf
j→�
aj + lim inf
j→�
bj � lim inf
j→�
�aj + bj �,
lim sup
j→�
�aj + bj � � lim sup
j→�
aj + lim sup
j→�
bj .
(vii) If aj � bj � 0 for all j ∈ �, then
lim inf
j→�
aj lim inf
j→�
bj � lim inf
j→�
aj bj �
lim sup
j→�
aj bj � lim sup
j→�
aj lim sup
j→�
bj �
(viii) lim inf
j→�
�aj + bj � � lim inf
j→�
aj + lim sup
j→�
bj � lim sup
j→�
�aj + bj �.
(ix) If, for all j ∈ �, aj � bj � 0, then
lim inf
j→�
aj bj � lim inf
j→�
aj lim sup
j→�
bj � lim sup
j→�
aj bj �
(x) If the limit limj→� aj exists, then
lim inf
j→�
�aj + bj � = lim
j→�
aj + lim inf
j→�
bj �
lim sup
j→�
�aj + bj � = lim
j→�
aj + lim sup
j→�
bj �
(xi) If aj � bj � 0 for all j ∈ � and if limj→� aj exists, then
lim inf
j→�
aj bj = lim
j→�
aj lim inf
j→�
bj �
lim sup
j→�
aj bj = lim
j→�
aj lim sup
j→�
bj �
(xii) lim sup
j→�
�aj� = 0 =⇒ lim
j→�
aj = 0.
Proof (i) follows from the remark preceding A.1, (ii) is clear since
inf
j
aj = − sup
j
�−aj ��
and (iii) follows from the inequality inf j�k aj � supj�k aj where we can pass to
the limit k → � on both sides.
Notice that (ii) reduces any statement about lim sup to a dual statement for
lim inf . This means that we need to show (iv)–(xi) for the lower limit only.
Measures, Integrals and Martingales 315
(iv): Let �an�j��j∈� ⊂ �aj �j∈� be some subsequence with (improper) limit L =
limj→� an�j�. Then
inf
j�k
aj � inf
j�k
an�j� � L =⇒ lim
k→�
inf
j�k
aj � L�
i.e. lim inf j→� aj is smaller than any limit of any subsequence. Let us now
construct a subsequence which has L∗ �= lim inf j→� aj > −� as its limit. By
the very definition of L∗ and the infimum we find for all > 0 some N ∈ � such
that ∣∣L∗ − inf
j�k
aj
∣∣ � ∀ k � N �
Since then inf j�k aj > −�, we find by the definition of the infimum some
� k � N ,
=
�k, and a
with∣∣a
− inf
j�k
aj
∣∣ � �
Specializing = 1
n
, n ∈ �, we obtain an infinite family of a
�n� from which we
can extract a subsequence with limit L∗.
If L∗ = −�, the sequence �aj �j∈� is unbounded from below and it is obvious
that there must exist a subsequence tending to −�.
(v): If limj→� aj exists, then all subsequences converge and have the same limit,
thus lim inf j→� aj = limj→� aj = lim supj→� aj by (iv).
Conversely, if L = lim inf j→� aj = lim supj→� aj , we get for all k ∈ �
0 � ak − inf
j�k
aj � sup
j�k
aj − inf
j�k
aj
k→�−−−→ 0�
and limk→� ak = limk→� inf j�k aj = L follows from a sandwiching argument.
(vi) follows immediately from
inf
j�k
aj + inf
j�k
bj � a
+ b
∀
� k =⇒ inf
j�k
aj + inf
j�k
bj � inf
�k
�a
+ b
�
if we pass to the limit k → � on both sides.
(vii): We have 0 � inf j�k bj � b
for all
� k and multiplying this inequality
with 0 � inf j�k aj � a
,
� k, gives
inf
j�k
aj inf
j�k
bj � a
b
∀
� k =⇒ inf
j�k
aj inf
j�k
bj � inf
�k
a
b
�
The assertion follows as we go to the limit k → � on both sides.
(viii): We have
inf
j�k
�aj + bj � � a
+ b
� a
+ sup
j�k
bj ∀
� k�
316 R.L. Schilling
so that inf j�k�aj + bj � � inf j�k aj + supj�k bj , and the assertion follows as we go
to the limit k → � on both sides.
(ix) is similar to (viii) taking into account the precautions set out in (vii).
(x): If limj→� aj exists, we know from (v) that limj→� aj = lim inf j→� aj =
lim supj→� aj . Thus
lim
j→�
aj + lim inf
j→�
bj
A.1(v)= lim inf
j→�
aj + lim inf
j→�
bj
A.1(vi)
� lim inf
j→�
�aj + bj �
A.1(viii)
� lim sup
j→�
aj + lim inf
j→�
bj
A.1(v)
� lim
j→�
aj + lim inf
j→�
bj �
(xi) is similar to (x) using (v),(vii) and (ix).
(xii): since �aj� � 0,
0 � lim inf
j→�
�aj�
A.1(iii)
� lim sup
j→�
�aj� = 0�
and we conclude from (v) that
lim
j→�
�aj� = lim inf
j→�
�aj� = lim sup
j→�
�aj� = 0�
Thus limj→� aj = 0.
∗ ∗ ∗
Sometimes the following definitions for upper and lower limits of a sequence of
sets �Aj �j∈�, Aj ⊂ X, are used:
lim inf
j→�
Aj �=
⋃
k∈�
⋂
j�k
Aj and lim sup
j→�
Aj �=
⋂
k∈�
⋃
j�k
Aj � (A.3)
The connection between set-theoretic and numerical upper and lower limits is
given by
A.2 Lemma For all x ∈ X we have
lim inf
j→�
1Aj �x� = 1lim inf
j→�
Aj
�x�� (A.4)
lim sup
j→�
1Aj �x� = 1lim sup
j→�
Aj
�x�� (A.5)
Measures, Integrals and Martingales 317
Proof Note that
1⋂
k∈� Bk = infk∈� 1Bk and 1
⋃
k∈� Bk = sup
k∈�
1Bk
which follows from
1⋂
k∈� Bk �x� = 1 ⇐⇒ x ∈
⋂
k∈�
Bk
⇐⇒ ∀ k ∈ � � x ∈ Bk
⇐⇒ ∀ k ∈ � � 1Bk �x� = 1
⇐⇒ inf
k∈�
1Bk �x� = 1�
A similar argument proves the assertion for supk∈� 1Bk . Hence,
1lim inf
j→�
Aj
= 1⋃
k∈�
⋂
j�k Aj
= sup
k∈�
1⋂
j�k Aj
= sup
k∈�
inf
j�k
1Aj = lim infj→� 1Aj �
and (A.5) follows analogously.
Appendix B
Some facts from point-set topology
The following diagram gives a survey of various types of abstract spaces used in
this book. The arrows ‘−→’ indicate how the spaces are connected. In brackets
we mention the key concepts that define the notion of convergence in these spaces.
�n
Banach space
(norm, complete)
Hilbert space
(scalar product,
complete)
normed space
(norm)
inner product space
(scalar product)
metric space
(distance)
topological space
(open set)
Note that due to the Riesz–Fischer theorem 12.7 the space �L2� �•� •�� is a Hilbert
space and all �Lp� �•�p�, 1 � p < � are Banach spaces.
The material below can be found in many introductory texts on general topology
and real analysis. For this compilation we used the books by Willard [54], Steen
and Seebach [46] and Rudin [39]. Complete proofs are given in [54] and in the
first few chapters of [39].
318
Measures, Integrals and Martingales 319
Topological spaces
Topological spaces are characterized by the notion of openness of sets.
B.1 Definition A topological space �X� �� consists of a set X and a system � =
��X� of subsets of X, called a topology, which satisfies the following properties:
∅� X ∈ �� (�1)
U� V ∈ � =⇒ U ∩ V ∈ �n� (�2)
Ui ∈ �� i ∈ I �arbitrary� =⇒
⋃
i∈I
Ui ∈ �� (�3)
A set U ∈ � is called an open set. A set F ⊂ X is closed, if its complement F c
is open. We write � = ��X� for the family of closed sets in X.
From de Morgan’s identities (2.2) it is not hard to see that
• X and ∅ are closed sets,
• unions of finitely many closed sets are again closed,
• intersections of arbitrarily many closed sets are again closed.
B.2 Examples Let X be an arbitrary set.
(i) �∅� X� is a topology on X.
(ii) The power set ��X� is a topology on X.
(iii) Let U be a ‘classical’ open set in �n, i.e. for every x ∈ U one can find some
� > 0 such that B��x� ⊂ U . The classical open sets ���n� are a topology in
�n. Unless otherwise stated, we will always consider this natural topology
on �n.
(iv) (Trace topology) Let �X� ��X�� be a topological space and A ⊂ X be any
subset. Then the relatively open subsets of A,
��A� = A ∩ ��X� = �A ∩ U U ∈ ��X���
turn �A� ��A�� into a topological space.
(v) (Product topology) Let �X� ��X�� and �Y� ��Y �� be topological spaces. Then
X × Y becomes a topological space under the product topology ��X × Y �: by
definition, a set W ∈ ��X × Y � if W ⊂ X × Y and if for each w = �x� y� ∈ W
there exist U ∈ ��X� and V ∈ ��Y � such that
w = �x� y� ∈ U × V ⊂ W�
This makes ��X × Y � the smallest topology containing ��X� × ��Y �.
B.3 Definition Let �X� �� be a topological space.
(i) An open neighbourhood of a point x ∈ X is an open set U = U�x� containing
x. A neighbourhood of x is any set containing an open neighbourhood of x.
320 R.L. Schilling
(ii) The space X is called separated or a Hausdorff space if any two different
points x� y ∈ X have disjoint neighbourhoods.
(iii) Let A ⊂ X. The closure of A, denoted by Ā, is the smallest closed set
containing A, i.e. Ā = ⋂F ∈��F ⊃A F .
(iv) Let A ⊂ X. The (open) interior of A, denoted by A
, is the largest open set
inside A, i.e. A
= ⋃U ∈��U ⊂A U .
(v) A set A ⊂ X is dense in X, if Ā = X.
(vi) The space X is separable if it contains a countable dense subset.
B.4 Examples (i) The space ��n� ���n�� is a Hausdorff space.
(ii) The space �X� ��X�� and all spaces mentioned in the diagram at the begin-
ning of the section are Hausdorff spaces.
(iii) The space �X� �∅� X�� is not separated.
(iv) A set U in a topological space �X� �� is open if, and only if, it is a neigh-
bourhood of each of its points.
(v) The open ball Br �x� = �y ∈ �n �x − y� < r� in �n is an open neighbourhood
of x. The closed ball Kr �x� = �y ∈ �n �x − y� � r� is the closure of Br �x�,
thus Br �x� = Kr �x�.
(vi) The set of rational numbers � is dense in �. Therefore � is separable. The
same is true for �n when we consider the countable dense set �n.
Density assertions are often expressed through approximation theorems such
as Corollary 12.11, Theorem 24.6 or Corollary 24.12.
B.5 Definition A subset K of a Hausdorff space �X� �� is called compact, if
every cover of K by open sets, K ⊂ ⋃i∈I Ui, Ui ∈ �, I is an arbitrary index set,
admits a finite sub-cover, i.e. if there are finitely many Ui1 � � � � � Uin such that
K ⊂ Ui1 ∪ � � � ∪ Uin . A set L is relatively compact if L is compact.
B.6 Proposition Let �X� �� be a Hausdorff space.
(i) Every compact set K is closed.
(ii) Closed subsets of compact sets are closed.
(iii) A family �Ki�i∈I of compact sets (indexed by an arbitrary set I ) has non-
empty intersection
⋂
i∈I Ki �= ∅ if, and only if, every finite subcollection
�Kij �
n
j=1 has non-empty intersection Ki1 ∩ Ki2 ∩ � � � ∩ Kin �= ∅.
B.7 Example A set K ⊂ �n is compact if, and only if, it is closed and bounded.
This is also equivalent to saying that every sequence �xj �j∈� ⊂ K has a convergent
subsequence. Such a simple characterization of compactness fails in infinite-
dimensional spaces, notably in the Hilbert space L2 or the Banach spaces Lp,
1 � p < �, see Theorem B.22 and B.27.
Measures, Integrals and Martingales 321
Theorem B.6(iii) is an abstract version of the well known interval princi-
ple in �: a sequence of nested closed intervals
aj � bj � ⊂ �, j ∈ �, has non-
empty intersection
⋂
j∈�
aj � bj � �= ∅. If, in addition, limj→��bj − aj � = 0 then⋂
j∈�
aj � bj � = �L� where L = limj→� aj = limj→� bj .
B.8 Definition Let �X� ��X�� and �Y� ��Y �� be two topological spaces. A map
f X → Y is called continuous at x ∈ X, if for every neighbourhood V = V�f�x��
we can find a neighbourhood U = U�x� of x such that f�U � ⊂ V . If f is continuous
at every x ∈ X, we call f continuous.
B.9 Example Definition B.8 coincides on Euclidean spaces with the classical
notion of continuity, i.e. a map f �n → �m is continuous at x ∈ �n if, and
only if, for every convergent sequence xj
j→�−−−→ x we have f�xj �
j→�−−−→ f�x�, cf.
Theorem B.19.
B.10 Definition Let �X� �� be a topological space. A set A ⊂ X is called
connected, if A cannot be written in the form A = U ∪ V where U� V ∈ � and
U ∩ V = ∅.
The set A is called pathwise connected, if for any two points x� y ∈ A there is
a continuous curve or path �
0� 1� → A such that ��0� = x and ��1� = y.
B.11 Examples (i) The only connected sets in � are finite or infinite intervals.
The set �a� b� ∪ �c� d� where a < b < c < d is not connected.
(ii) Pathwise connected sets are connected; the converse is, in general, wrong:
the set V = ��x� 0� ∈ �2 x � 0� ∪ ��x� sin 1
x
� ∈ �2 x > 0� is connected, but
no path can be found from �0� 0� to any point �x� sin 1
x
�.
B.12 Theorem Let f X → Y be a map between the topological spaces X and Y .
(i) The map f is continuous if, and only if, for all open V ∈ ��Y � the pre-image
f −1�V � ∈ ��X� is open.
(ii) Let f be continuous. The image f�K� ⊂ Y of a compact [connected, pathwise
connected ] set K ⊂ X is again compact [connected, pathwise connected ].
(iii) Let K ⊂ X be a compact set. A continuous map1 g K → � attains its
maximum and minimum.
(iv) Let K ⊂ X be a compact set and f K → � be a injective and continuous
map. Then the inverse map f −1 f�K� → K exists and is continuous.
Since for our purposes the characterization of continuity by open sets is of
central importance, cf. Example 7.3, we include the short
1 We consider here the trace topology ��K�, cf. Example B.2.
322 R.L. Schilling
Proof (of Theorem B.12(i)) ‘⇐’ Assume first that f −1���Y �� ⊂ ��X�. Every
neighbourhood Ṽ of f�x� contains by definition an open set V ⊂ Ṽ with f�x� ∈ V .
By assumption, U = f −1�V � is open, and since x ∈ U , U is an (open) neighbour-
hood of x with f�U � = f
f −1�V � ⊂ V .
‘⇒’ Assume now that f is continuous. Take any open set B ⊂ Y , set A =
f −1�B� and fix some x ∈ A. Since B is open, there is some open neighbourhood
V = V�f�x�� ⊂ B and by continuity we find some neighbourhood U = U�x� ⊂ X
of x with f�U � ⊂ V . Thus
U ⊂ f −1
f�U � ⊂ f −1�V � ⊂ f −1�B� def= A�
which shows that A contains for every of its points a whole neighbourhood. This
is to say that A is open.
B.13 Example Let g
a� b� → � be a continuous function. Since
a� b� is com-
pact, g attains its maximum M = sup g�
a� b�� = g�xmax� and minimum m = inf
g�
a� b�� = g�xmin� at some points xmax� xmin ∈
a� b�. Since
a� b� is compact and
pathwise connected, so is g�
a� b��, hence it is of the form
m� M�. In particular, we
have recovered the intermediate value theorem for functions of a real variable.
B.14 Definition Let �xj �j∈� ⊂ X, be a sequence in the topological space �X� ��. We
say that xj converges to x ∈ X and write limj→� xj = x or x
j→�−−−→ x if for every open
neighbourhood U = U�x� there is some N = NU ∈ �such that xj ∈ U for all j � NU .
This is also the ‘usual’ convergence in the spaces � and �n. Note that limits
are only unique if X is a Hausdorff space. Sometimes we can use limits of
sequences to give an equivalent description of the topology. This is always the
case if every point x ∈ X has a countable system of open neighbourhoods �Un�n∈�
with the property that for every neighbourhood V = V�x� of x there is at least one
Un0 ⊂ V ; this is always true in metric spaces, cf. B.19.
Metric spaces
In metric spaces we have a notion of distance between any two points.
B.15 Definition A metric space �X� d� is a set X with a distance function or
metric d X × X →
0� �� such that for all x� y� z ∈ X
�definiteness� d�x� y� = 0 ⇐⇒ x = y� �d1�
�symmetry� d�x� y� = d�y� x�� �d2�
�triangle inequality� d�x� y� � d�x� z� + d�z� y�� �d3�
Measures, Integrals and Martingales 323
B.16 Examples (i) Let �X� d� be a metric space. Then �A� d = d�A×A� is again
a metric space for all A ⊂ X.
(ii) The real line � is a metric space with d�x� y� = �x − y�. The space �n
becomes a metric space with each of the following metrics:
dp�x� y� =
⎧
⎪⎪⎨
⎪⎪⎩
(
n∑
j=1
�xj − yj�p
)1/p
if 1 � p < ��
max
1�j�n
�xj − yj� if p = ��
(iii) The topological space �X� ��X�� is a metric space with metric
d�x� y� =
{
1 if x �= y�
0 if x = y�
(iv) Let �Xj � dj �, j = 1� 2, be two metric spaces. Then X1 × X2 becomes a metric
space for any of the following metrics �xj � yj ∈ Xj � 1 � p < ��:
p��x1� x2�� �y1� y2�� =
(
d
p
1 �x1� y1� + dp2 �x2� y2�
)1/p
�
or
���x1� x2�� �y1� y2�� = max
j=1�2
dj �xj � yj ��
B.17 Definition Let �X� d� be a metric space. We call
Br �x� = �x ∈ X d�x� y� < r� resp. Kr �x� = �x ∈ X d�x� y� � r�
an open resp. closed ball with centre x and radius r > 0. An open set is a set
U ⊂ X such that for every x ∈ U there is some � > 0 and B��x� ⊂ U . Closed sets
arise as complements of open sets.
Using the triangle inequality it is easy to see that open balls in X are also
open sets and that closed balls are closed sets. Mind, however, that in general
Br �x� � Kr �x�.
B.18 Lemma The family of open sets � of a metric space X is a topology in the
sense of Definition B.1. �X� �� is a separated topological space.
The converse of Lemma B.18 is wrong: the topology �∅� �a�� X� of the space
X = �a� b� cannot be generated by any metric.
The topology of metric spaces can be described by sequences.
324 R.L. Schilling
B.19 Theorem Let �X� d�, �Y�
� be a metric spaces.
(i) A sequence �xj �j∈� ⊂ X converges to x, xj
j→�−−−→ x, if, and only if,
d�xj � x�
j→�−−−→ 0. Moreover, the limit x is unique.
(ii) A set F ⊂ X is closed if, and only if, every convergent sequence �xj �j∈� ⊂ F
has its limit limj→� xj ∈ F .
(iii) A set K ⊂ X is compact if, and only if, every sequence �xj �j∈� ⊂ K has a
convergent subsequence whose limit is in K.
(iv) A set A ⊂ X is dense if, and only if, for every x ∈ X there is a sequence
�aj �j∈� with d�aj � x�
j→�−−−→ 0.
(v) A function f X → Y is continuous at x ∈ X if, and only if, for every sequence
xj
j→�−−−→ x we have f�xj �
j→�−−−→ f�x�.
Since for our purposes the characterization of continuity is of central impor-
tance, cf. Example 7.3, we include the short
Proof (of Theorem B.19(v)) We begin with the observation that every neigh-
bourhood Ũ = Ũ�x� of a point x ∈ X contains some open set U ⊂ Ũ with x ∈ U .
Since U is open, we find by definition some � > 0 such that B��x� ⊂ U . This
shows that we can restate the definition of continuity B.8 at a point x in the
following form:
∀ � > 0 ∃ � > 0 f�B��x�� ⊂ B��f�x��
(mind that the balls are taken in X and Y , respectively).
‘⇒’: If xj
j→�−−−→ x, we know from the definition of convergence that for every
� > 0 there is some N = N� such that xj ∈ B��x� for all j � N�. Since f is
continuous at x, we can choose for every � > 0 some � = �� > 0 such that
f�B��x�� ⊂ B��f�x��. Thus
f�xk� ∈ f��xj j � N�� ⊂ f�B��x�� ⊂ B��f�x�� ∀ k � N�
which shows that f�xk�
k→�−−−→ f�x�.
‘⇐’: Assume that xj
j→�−−−→ x implies f�xj �
j→�−−−→ f�x� but that f is not con-
tinuous at x. Thus there is some � > 0, such that for all n ∈ � the set f�B1/n�x��
is not (entirely) contained in B��f�x��. Thus we can pick for each n ∈ � some
xn ∈ B1/n�x�, such that f�xn� �∈ B��f�x��. This means, however, that xn
n→�−−−→ x
while d�f�xn�� f�x�� � � > 0 for all n ∈ �, contradicting that f�xn� converges to
f�x�.
Measures, Integrals and Martingales 325
B.20 Definition Let �X� d� be a metric space. A sequence �xj �j∈� is a Cauchy
sequence, if
∀ � > 0 ∃ N = N� ∈ � ∀ j� k � N� d�xj � xk� � ��
A metric space is complete if every Cauchy sequence converges.
An isometry is a surjective map j X → Y between two metric spaces �X� d�
and �Y�
� which satisfies d�x� x′� =
�j�x�� j�x′��.
B.21 Theorem (Completion) For every metric space �X� d� there exists a com-
plete metric space �X̂� d̂� such that d̂�X×X = d and X ⊂ X̂ is a dense subset. Any
two completions of X are, up to isometries, identical.
By covering a compact set K with the open sets �B1�x��x∈K and extract-
ing a finite subcover we can easily see that K has finite diameter diam�K� =
supx�y∈K d�x� y� and is, therefore, bounded. Thus compact sets are closed and
bounded. The converse is, in general, not true; however,
B.22 Theorem (Heine–Borel) A subset of �n is compact if, and only if, it is
closed and bounded. Moreover, all metrics on �n are equivalent in the sense
that for any two metrics d and
there are absolute constants c� C > 0 such that
c d�x� y� �
�x� y� � C d�x� y� ∀ x� y ∈ �n�
Normed spaces
B.23 Definition A normed space �X� �•�� is a �-vector space2 X with a norm
�•�, i.e. a map �•� X →
0� �� which satisfies for x� y ∈ X and � ∈ � the
following properties:
�definiteness� �x� > 0 ⇐⇒ x �= 0� �N1�
�pos. homogeneity� ��x� = ��� · �x�� �N2�
�triangle inequality� �x + y� � �x� + �y�� �N3�
If we drop the definiteness �N1�, �•� is called a semi-norm and �X� �•�� is a
semi-normed space.
B.24 Examples (i) The spaces �� � �n� n equipped with
�x� =
( n∑
j=1
�xj�p
)1/p
or �x� = max
1�j�n
�xj�
�1 � p < �� are normed spaces.
2 � stands for either or �.
326 R.L. Schilling
(ii) Let �Xj � �•�j �, j = 1� 2, be two normed spaces. Then X1 × X2 becomes a
normed space under any of the following norms �xj ∈ Xj � 1 � p < ��:
��x1� x2���p� =
(�x1�p1 + �x2�p2
)1/p
or ��x1� x2����� = max
j=1�2
�xj�j �
(iii) Every normed space is a metric space with metric given by d�x� y� = �x − y�.
Therefore, all notions and results for metric spaces carry over to normed
spaces.
In particular, open and closed balls are given by
Br �x� = �y ∈ X �x − y� < r� and Kr �x� = �y ∈ X �x − y� � r��
Since X is a vector space, we have now Br �x� = Kr �x�.
However, not every metric space arises from a normed space, e.g. the metric
d�x� y� = 1 or 0 according to x �= y or x = y on �n cannot be realized by
any norm.
B.25 Lemma Let X be a normed space. Then the following maps are continuous:
X � x �→ �x�� X × X � �x� y� �→ x + y� � × X � ��� x� �→ � x�
B.26 Definition A Banach space is a complete normed space.
The following result, due to F. Riesz, says that the Heine–Borel theorem B.22
holds if, and only if, the underlying space is finite-dimensional.
B.27 Theorem (Riesz). In a normed space V closed and bounded sets are
compact if, and only if, V is finite-dimensional.
Let ∼ be an equivalence relation on the normed space X. We write
x� = �y ∈
X x ∼ y� for the equivalence class with representative x. The quotient space
X/∼ consists of all equivalence classes. It is not hard to see that X/∼ is again a
vector space and that
�x + �y� = �
x� + �
y� ∀ �� � ∈ �� x� y ∈ X�
B.28 Theorem Let �X� �•�� be a (complete) normed space. Then X/∼ is a
(complete) normed space under the quotient norm given by
�
x��∼ = inf��y� y ∈
x���
Measures, Integrals and Martingales 327
Essentially the same procedure allows us to turn any semi-normed space
�X� �•�� into a normed space. We use the following equivalence relation for
x� y ∈ X:
x ≈ y ⇐⇒ �x − y� = 0�
and observe that
inf��y� y ∈
x�� = �x��
B.29 Corollary Let �X� �•�� be a (complete) semi-normed space. Then X/≈ is a
(complete) normed space with norm given by �
x��≈ = �x�.
B.30 Example Denote by
p�X� �� ��, 1 � p < �, the pth power integrable
functions of the measure space �X� �� ��. Then
�u�p =
(∫
�u�p d�
)1/p
is a semi-norm on
p�X� �� ��, and Lp�X� �� �� =
p�X� �� ��/∼ is a Banach
space if we identify u� w ∈
p�X� �� �� whenever �u − w�p = 0.
Appendix C
The volume of a parallelepiped
In this appendix we give a simple derivation for the volume of the parallelepiped
A
(
�0� 1�n
)
�= {Ax ∈ �n � x ∈ �0� 1�n}� A ∈ GL�n� ��
for a non-degenerate n × n matrix A ∈ �n×n.
C.1 Theorem �n
[
A��0� 1�n�
] = � det A� for all A ∈ GL�n� ��.
The proof of Theorem C.1 requires two auxiliary results.
C.2 Lemma If D = diag��1� � � � � �n , �j > 0, is a diagonal n × n matrix, then
�n�D�B�� = det D �n�B� for all Borel sets B ∈ ���n�.
Proof Since both D and D−1 are continuous maps, D�B� is a Borel set if
B ∈ ���n�, cf. Example 7.3. In view of the uniqueness theorem 5.7 for mea-
sures it is enough to prove the lemma for half-open rectangles ��a� b��, a� b ∈ �n.
Obviously,
D��a� b�� =
n×
j=1
��j aj � �j bj ��
and
�n
(
D��a� b��
) =
n∏
j=1
��j bj − �j aj � = �1 · � � � · �n
n∏
j=1
�bj − aj �
= det D �n(��a� b��)�
C.3 Lemma Every A ∈ GL�n� �� can be written as A = SDT , where S� T ∈ O�n�
are orthogonal n × n matrices and D = diag��1� � � � � �n is a diagonal matrix with
positive entries �j > 0.
328
Measures, Integrals and Martingales 329
Proof The matrix tAA is symmetric and so we can find some orthogonal matrix
U ∈ O�n� such that
tU�tAA�U = D̃ = diag�
1� � � � �
n �
Since for ej �= �0� � � � � 0� 1�︸ ︷︷ ︸
j
0 � � � � 0� and the Euclidean norm �•�
j = tej D̃ej = �tej tU tA��AUej � = �AUej�2 > 0�
we can define D �=
√
D̃ = diag��1� � � � � �n where �j �=
√
j . Thus
D−1 tU tAAUD−1 = idn�
and this proves that S �= AUD−1 ∈ O�n�. Since T �= tU ∈ O�n�, we easily see
that
SDT = �AUD−1�D tU = A�
Proof (of Theorem C.1) We have for A ∈ GL�n� ��
�n
[
A��0� 1�n�
] C.3= �n[SDT ��0� 1�n�]
7.9= �n[DT ��0� 1�n�]
C.2= det D �n[T ��0� 1�n�]
C.3= det D �n(�0� 1�n)�
Since S� T ∈ O�n�, their determinants are either +1 or −1, and we conclude that
� det A� = � det�SDT �� = � det S� · � det D� · � det T � = det D.
Appendix D
Non-measurable sets
Let �X� �� �� be a measure space and denote by �X� �∗� �̄� its completion, cf.
Problem 4.13 for the definition and Problems 6.2, 10.11, 10.12, 13.11 and 15.3
for various properties. Here we only need that
�∗ = {A ∪ N � A ∈ �� N is a subset of some �-measurable �-null set}
is the completion of � with respect to the measure �. It is a natural question to
ask how big � and �∗ are and whether � ⊂ �∗ ⊂ ��X� are proper inclusions.
Sometimes, see Problems 6.10 or 6.11, these questions are easy to answer.
For the Borel �-algebra � = ���n� and Lebesgue measure � = �n this is more
difficult. The following definition helps to distinguish between sets in ���n�
and the completion �∗��n� w.r.t. Lebesgue measure.
D.1 Definition The Lebesgue �-algebra is the completion �∗��n� of the Borel
�-algebra w.r.t. Lebesgue measure �n. A set B ∈ �∗��n� is called Lebesgue
measurable.
The next theorem shows that there are ‘as many’ Lebesgue measurable sets as
there are subsets of �n.
D.2 Theorem We have #�∗��n� = #���n� for all n ∈ �.
Proof Since �∗��n� ⊂ ���n� we have that #�∗��n� � #���n�. On the other
hand, we have seen in Problem 7.10 that the Cantor ternary set C is an uncountable
Borel measurable �1-null set of cardinality #� = ���. Consequently, �n−1 × C is
a �n-null set. By definition of the Lebesgue �-algebra, all sets in ���n−1 × C�
are Lebesgue measurable (null) sets, i.e. ���n−1 × C� ⊂ �∗��n�, and therefore
#���n−1 × C� � #�∗��n�. Using the fact that there is a bijection between C and
� we also get #���n� � #���n−1 × C� � #�∗��n�, and the Cantor–Bernstein
theorem 2.7 proves that #���n� = #�∗��n�.
330
Measures, Integrals and Martingales 331
Unfortunately, we cannot use Theorem D.2 to decide whether there are sets
which are not Lebesgue measurable. To answer this question we need the axiom
of choice.
D.3 Axiom of choice (AC) Let Mi � i ∈ I
be a collection of non-empty and
mutually disjoint subsets of X. Then there exists a set L ⊂ ⋃i∈I Mi which contains
exactly one element from each set Mi, i ∈ I .
Note that AC only asserts the existence of the set L but does not tell us how
or if the set L can be constructed at all. (This problem is at the heart of the
controversy over whether one should or should not accept AC.)
D.4 Theorem Assuming the axiom of choice, there exist non-Lebesgue measur-
able sets in �n.
Proof Assume first that n = 1. We will construct a non-Lebesgue measurable
subset of � = �0� 1�. We call any two x� y ∈ � equivalent if
x ∼ y ⇐⇒ x − y ∈ �
The equivalence class containing x is given by �x
= y ∈ � � x − y ∈
=
�x + � ∩ �. By construction, � is partitioned by a family of mutually disjoint
equivalence classes �xj
, j ∈ J .
By the axiom of choice1 there exists a set L which contains exactly one element,
say mj , from each of the classes �xj
, j ∈ J . We will show that L cannot be
Lebesgue measurable.
Assume L were Lebesgue measurable. Since for every x ∈ � we have �x
∩ L =
mj0
, j0 = j0�x� ∈ J , we can find some q ∈ such that x = mj0 + q. Obviously,
−1 < q < 1. Thus
� ⊂ L +( ∩ �−1� 1�) ⊂ � + �−1� 1� = �−1� 2��
which we can rewrite as
�0� 1� ⊂ ⋃
q∈ ∩�−1�1�
�q + L� ⊂ �−1� 2��
Moreover, �r +L�∩�q +L� = ∅ for all r �= q, r� q ∈ . Otherwise r +x = q +y for
x� y ∈ L, so that x ∼ y which is impossible since L contains only one representative
1 We have to use the axiom of choice since J is uncountable. This follows from the observation that the
uncountable set � = ·⋃
j∈J �xj
is the disjoint union of countable sets �xj
= �x + � ∩ �. It is known that all
proofs for Theorem D.4 must use the axiom of choice or some equivalent statement, cf. Solovay [44].
332 R.L. Schilling
of each equivalence class. Therefore we can use the �-additivity of the measure
�̄1 to find
1 = �̄1(�0� 1�) � ∑
q∈ ∩�−1�1�
�̄1�q + L� � �̄1(�−1� 2�) = 3�
Since �̄1 is invariant under translations, we get �̄1�q + L� = �̄1�L� for all
q ∈ ∩ �−1� 1�. We conclude that
1 �
∑
q∈ ∩�−1�1�
�̄1�L� � 3
which is not possible. This proves that L cannot be Lebesgue measurable.
If n > 1, a similar argument shows that �0� 1�n−1 × L is not Lebesgue
measurable.
The question whether there are Lebesgue measurable sets which are not Borel
measurable can be answered constructively. Since this is quite tedious, we content
ourselves with the fact that there are ‘fewer’ Borel sets than there are Lebesgue
measurable sets.
D.5 Theorem We have #���n� = �.
D.6 Corollary There are Lebesgue measurable sets which are not Borel mea-
surable.
Proof (of D.6) We know from Theorem D.2 that #�∗��n� = #���n� and
from Theorem D.5 that #���n� = �. Since by Theorem 2.9 and Problem 2.17
#���n� > #�n = ���, we conclude that ���n� � �∗��n�.
To prove Theorem D.5 we show that the Borel sets are contained in a family of
sets which has cardinality �. Let
�= ⋃�k=1 �k be the set of all finite sequences
of natural numbers and write � for the family of open balls Br �x� ⊂ �n with
radius r ∈ + and centre x ∈ n. We have seen in Problems 2.19 and 2.9 that
#
= #� and #� = #( + × n) = #��
Therefore, the collection of all Souslin schemes
� �
→ �� �i1� i2� � � � � ik� �→ Ci1i2���ik
has cardinality #�
= #�� = �, cf. Problem 2.18. With each Souslin scheme � we
can associate a set A ⊂ �n in the following way: take any sequence �ij �j∈� of nat-
ural numbers and consider the sequence of finite tuples �i1�� �i1� i2�� �i1� i2� i3�� � � � �
�i1� i2� � � � � ik�� � � � formed by the first 1� 2� � � � � k� � � � members of the sequence
Measures, Integrals and Martingales 333
�ij �j∈�. Using the Souslin scheme � we pick for each tuple �i1� i2� � � � � ik� the corre-
sponding set Ci1i2���ik ∈ � to get a sequence of sets Ci1 � Ci1i2 � Ci1i2i3 � � � � � Ci1i2���ik � � � �
from �. Finally, we form the intersection of all these sets Ci1 ∩ Ci1i2 ∩ Ci1i2i3 ∩ � � � ∩
Ci1i2���ik ∩ � � � and consider the union over all possible sequences �ij �j∈� of natural
numbers:
A �= A��� �= ⋃
�ij � j∈��∈��
�⋂
k=1
Ci1i2���ik
Note that this union is uncountable, so that A is not necessarily a Borel set.
It is often helpful to visualize this construction as tree:
C11 C12 C13 . . .
C1
C211 C212 C213 . . .
C21 C22 C23 . . .
C2
C31 C32 C33 . . .
C3 . . .
Souslin scheme s
where the Ci1 � Ci1i2 � Ci1i2i3 � � � � ∈ � are the sets of the 1st, 2nd, 3rd, etc. gener-
ation. We will also call Ci1i2 or Ci1i2i3 children or grandchildren of Ci1 .
D.7 Definition (Souslin) Let
� �, � and A��� be as above. The sets in ���� �=
A��� � � ∈ �
are called analytic or Souslin sets (generated by �).
D.8 Lemma Let
� � and � be as before.
(i) ���� is stable under countable unions and countable intersections;
(ii) ���� contains all open and all closed subsets of �n;
(iii) ���n� = ���� ⊂ ����;
(iv) #���� � �.
Proof (i) Let A� ∈ ����, � ∈ �, be a sequence of analytic sets
A� =
⋃
�ij � j∈��∈��
�⋂
k=1
C�i1i2���ik �
334 R.L. Schilling
Since
A �= ⋃
�∈�
A� =
⋃
�∈�
⋃
�ij � j∈��∈��
�⋂
k=1
C�i1i2���ik
it is obvious that A can be obtained from a Souslin scheme � which arises by
the juxtaposition of the Souslin schemes belonging to the A�: arrange the double
sequence C�i1 , �i1� �� ∈ � × �, in one sequence – e.g. using the counting scheme
of Example 2.5(iv) – to get the first generation of sets while all other generations
follow suit in genealogical order. Thus A ∈ ����.
For the countable intersection of the A� we observe first that
B �= ⋂
�∈�
A� =
⋂
�∈�
⋃
�ij � j∈��∈��
�⋂
k=1
C�i1i2���ik
[�]= ⋃
�imj � j∈��∈��
m=1�2�3����
�⋂
�=1
�⋂
k=1
C�
i�1i
�
2���i
�
k
and then we merge the two infinite intersections indexed by ��� k� ∈ � × � into a
single infinite intersection. Once again this can be achieved through the counting
scheme of Example 2.5(iv):
C1
i11
∩ C1
i11i
1
2
∩ C2
i21
∩ C1
i11i
1
2i
1
3
∩ C2
i21i
2
2
∩ C3
i31
∩ � � �
� � � � � �
�1� 1� → �1� 2� → �2� 1� → �1� 3� → �2� 2� → �3� 1� → � � �
and so
B = ⋃
�imj � j∈��∈��
m=1�2�3����
(
C1
i11
∩ C1
i11i
1
2
∩ C2
i21
∩ C1
i11i
1
2i
1
3
∩ C2
i21i
2
2
∩ C3
i31
∩ � � �
)
�
We will now construct a Souslin scheme which produces B by arranging the sets
C
j
k�m��� in a tree:
• The first generation are the sets C1
i11
, i11 ∈ �.
• The second generation are the sets C1
i11i
1
2
, i12 ∈ �, such that they are for fixed i11
the children of C1
i11
.
• Each C1
i11i
1
2
has the same offspring, namely the sets C2
i21
, i21 ∈ �, which form
jointly the third generation.
• The fourth generation are the sets C1
i11i
1
2i
1
3
, i13 ∈ �, such that they are for fixed
i11� i
1
2 the grandchildren of C
1
i11i
1
2
.
• The fifth generation are the sets C2
i21i
2
2
, i22 ∈ �, such that they are for fixed i21 the
grandchildren of C2
i21
.
Measures, Integrals and Martingales 335
• Each C2
i21i
2
2
has the same offspring, namely the sets C3
i31
, i31 ∈ �, which form
jointly the sixth generation.
• � � �
This shows that B ∈ ����.
(ii) Every open set can be written as countable union of �-sets
U = ⋃
Br �x�⊂U
Br �x�∈�
Br �x��
Indeed, the inclusion ‘⊃’ is obvious, for ‘⊂’ fix x ∈ U . Then there exists some
r ∈ + with Br �x� ⊂ U . Since n is dense in �n, x ∈ Br/2�y� for some y ∈ n
with �x − y� < r/4, so that x ∈ Br/2�y� ⊂ U . Since there are only countably many
sets in �, the union is a fortiori countable. By part (i) we then get that U ∈ ����,
i.e. ���� contains all open sets.
For a closed set F we know that
F = ⋂
j∈�
Uj where Uj = F + B1/j �0� =
{
y � x ∈ F� �x − y� < 1
j
}
is a countable intersection of open[�] sets Uj . Since open sets are analytic, part
(i) implies that F ∈ ����.
(iii) Consider the system � �= A ∈ ���� � Ac ∈ ����
. We claim that � is a
�-algebra. Obviously, � satisfies conditions ��1�� ��2� – i.e. contains �
n and is
stable under complementation. To see ��3� we take a sequence �Aj �j∈� ⊂ � and
observe that, by part (i),
⋃
j∈�
Aj ∈ ���� and
( ⋃
j∈�
Aj
)c
= ⋂
j∈�
Acj︸︷︷︸
∈ ����
∈ �����
so that
⋃
j Aj ∈ �.
Because of (ii) we have � ⊂ � ⊂ ���� and this implies that ���� ⊂ ����. Since,
by (ii), all open sets are countable unions of sets from �, we get � ⊂
⊂ ����
(
denotes the family of open sets) or ���� = ��
� def= ���n�.
(iv) follows immediately from the fact that there are #�
= #�� = � Souslin
schemes, cf. Definition D.7.
The Proof of Theorem D.5 is now easy: By Lemma D.8 there are at most �
analytic sets. Since each singleton x
, x ∈ �n, is a Borel set, there are at least �
336 R.L. Schilling
Borel sets (use Problem 2.17 to see #�n = �). So,
� � #���n� � #���� � �
and an application of Theorem 2.7 finishes the proof.
D.9 Remark Our approach to analytic sets follows the original construction of
Souslin [42], which makes it easy to determine the cardinality of ����. This,
however, comes at a price: if one wants to work with this definition, things
become messy, as we have seen in the proof of Lemma D.8(i). Nowadays
analytic sets are often introduced by one of the following characterizations. A set
A ⊂ �n is analytic if, and only if, one of the following equivalent conditions
holds:
(i) A = f��� for some left-continuous function f � � → �n;
(ii) A = g���� for some Borel measurable function g � �� → �n;
(iii) A = h�B� for some Borel set B ∈ ��X�, some Polish space2 X and some
Borel measurable function h � B → �n;
(iv) A = �2�B� where �2 � Y × �n → �n is the coordinate projection onto �n,
Y is a compact Hausdorff space3 and B ⊂ Y ×�n is a ���-set, i.e. B can be
written as countable intersection (‘ �’) of countable unions (‘ �’) of compact
subsets (‘ � ’) of Y × �n.
For a proof we refer to Srivastava [43] which is also our main reference for
analytic sets. The Souslin operation � can be applied to other systems of sets
than �. Without proof we mention the following facts:
���� = �� open sets
� = �� closed sets
� = �� compact sets
�
and also
���� = ������� and ���n� � ���� � �∗��n��
Most constructions of sets which are not Borel but still Lebesgue measurable are
actually constructions of non-Borel analytic sets, cf. Dudley [14, §13.2].
2 i.e. a space X which can be endowed with a metric for which X is complete and separable.
3 cf. Appendix B, Definition B.3.
Appendix E
A summary of the Riemann integral
In this appendix we give a brief outline of the Riemann integral on the real line.
The notion of integration was well known for a long time and ever since the cre-
ation of differential calculus by Newton and Leibniz, integration was perceived as
anti-derivative. Several attempts to make this precise were made, but the prob-
lem with these approaches was partly that the notion of integral was implicit –
i.e. axiomatically given rather than constructively – partly that the choice of possible
integrands was rather limited and partly that some fundamental points were unclear.
Out of the need to overcome these insufficiencies and to have a sound founda-
tion, Bernhard Riemann asked in his Habilitationsschrift Über die Darstellbarkeit
einer Function durch eine trigonometrische Reihe1 the question Also zuerst: Was
hat man unter
∫ b
a
f�x� dx zu verstehen?2 (p. 239) and proposed a general way
to define an integral which is constructive, which is (at least for continuous inte-
grands) the anti-derivative, and which can deal with a wider range of integrands
than all its predecessors.
We do not follow Riemann’s original approach but use the Darboux technique
of upper and lower integrals. Riemann’s original definition will be recovered in
Theorem E.5(iv).
The (proper) Riemann integral
Riemann integrals are defined only for bounded functions on compact intervals
�a� b� ⊂ �; this avoids all sorts of complications arising when either the domain
or the range of the integrand is infinite. Both cases can be dealt with by vari-
ous extensions of the Riemann integral, one of which – the so-called improper
Riemann integral – we will discuss later on.
1 On the representability of a function by a trigonometric series.
2 First of all: what is the meaning of
∫ b
a
f�x� dx?
337
338 R.L. Schilling
A partition � of the interval �a� b� consists of finitely many points satisfying
� = �a = t0 < t1 < < tk−1 < tk = b
� k = k���
We call mesh � �= max1�j�k����tj − tj−1� the mesh or fineness of the partition.
Given a partition � and a bounded function u � �a� b� → � we define
mj �= inf
x∈�tj−1�tj �
u�x� and Mj �= sup
x∈�tj−1�tj �
u�x��
for all j = 1� 2� � k���, and introduce the lower, resp. upper Darboux sums
S� �u� �=
k���∑
j=1
mj �tj − tj−1� resp. S� �u� �=
k���∑
j=1
Mj �tj − tj−1�
Obviously, S� �•�� S
� �•� are linear, and if �u�x�� � M, they satisfy
�S� �u�� � S� ��u�� � M �b − a�� �S� �u�� � S� ��u�� � M �b − a� (E.1)
E.1 Lemma Let � be a partition of �a� b� and �′ ⊃ � be a refinement of �.
Then
S� �u� � S�′ �u� � S
�′ �u� � S� �u�
holds for all bounded functions u � �a� b� → �.
Proof Since S� �u� = −S� �−u� and since S�′ �u� � S�
′
�u� is trivially fulfilled,
it is enough to show S� �u� � S�′ �u�. The partitions �� �
′ contain only finitely
many points and we may assume that �′ = � ∪ ��
where tj0−1 < � < tj0 for some
index 1 � j0 � k���. The rest follows by iteration. Clearly,
S� �u� =
∑
j =j0
mj �tj − tj−1� + mj0 �tj0 − �� + mj0 �� − tj0−1�
�
∑
j =j0
mj �tj − tj−1� + inf
x∈���tj0 �
u�x� �tj0 − ��
+ inf
x∈�tj0 −1���
u�x� �� − tj0−1� = S�′ �u�
Lemma E.1 shows that the following definition makes sense.
E.2 Definition Let u � �a� b� → � be a bounded function. The lower and upper
integrals of u are given by
b∫
∗
a
u �= sup
�
S� �u� and
b∫ ∗
a
u �= inf
�
S� �u�
where sup� and inf � range over all finite partitions of �a� b�.
Measures, Integrals and Martingales 339
E.3 Lemma
b∫
∗
a
u �
b∫ ∗
a
u and
b∫
∗
a
u = −
b∫ ∗
a
�−u�.
E.4 Definition A bounded function u � �a� b� → � is said to be (Riemann) inte-
grable, if the upper and lower integrals coincide. Their common value is denoted by
∫ b
a
u�x� dx �=
b∫
∗
a
u =
b∫ ∗
a
u
and is called the (Riemann) integral of u. The collection of all Riemann integrable
functions in �a� b� is denoted by ��a� b�.
E.5 Theorem (Characterization of ��a� b�) Let u � �a� b� → � be a bounded
function. Then the following assertions are equivalent
(i) u ∈ ��a� b�.
(ii) For every
> 0 there is some partition � such that S� �u� − S� �u� �
.
(iii) For every
> 0 there is some � > 0 such that S� �u� − S� �u� �
for all
partitions � with mesh � < �.
(iv) The limit I = lim
mesh �→0
∑
j � tj ∈�
u��j ��tj − tj−1� exists for every choice of inter-
mediate values tj−1 � �j � tj ; this means that for all
> 0 there exists a
� > 0 such that for all partitions � with mesh � < �
∣∣∣∣I −
∑
j � tj ∈�
u��j ��tj − tj−1�
∣∣∣∣ �
independently of the intermediate points.
If the limit exists, I =
b∫ ∗
a
u =
b∫
∗
a
u.
Proof We show the implications (i)⇒(ii)⇒(iii)⇒(iv)⇒(i).
(i)⇒(ii): By the very definition and the lower and upper integrals in terms of
of sup and inf , we find for every
> 0 partitions �′ and �′′ such that
b∫
∗
a
u − S�′ �u� �
2
and S�
′′
�u� −
b∫ ∗
a
u �
2
Using the common refinement � = �′ ∪ �′′ we get from Lemma E.1 and the
integrability of u
S� �u� − S� �u� � S�
′′
�u� − S�′ �u� =
(
S�
′′
�u� −
b∫ ∗
a
u
)
+
(
b∫
∗
a
u − S�′ �u�
)
�
340 R.L. Schilling
(ii)⇒(iii): This is the most intricate step in the proof. Fix
> 0 and denote by
�
�= �a = t
0 < t
1 < < t
k = b
the partition in (ii). We choose � > 0 in such
a way that
� <
1
2
min
1�j�k
(
t
j − t
j−1
)
and � <
4k�u�
If � �= �a = t0 < t1 < < tN = b
is any partition with mesh � < � we find
S� �u� − S� �u� =
∑
j � �
∩�tj−1�tj � =∅
�M �j − m�j ��tj − tj−1�
+ ∑
j � �
∩�tj−1�tj �=∅
�M �j − m�j ��tj − tj−1��
(E.2)
where M �j � m
�
j indicates that the supremum resp. infimum is taken w.r.t. intervals
defined by the partition �. The first sum has at most 2k terms since �
∩ �a� t1� =
�a
, �
∩ �tN −1� b� = �b
and since all other t
j , 1 � j � k − 1, appear in exactly
one or two intervals defined by �. Thus
∑
j � �
∩�tj−1�tj � =∅
�M �j − m�j ��tj − tj−1� � 2k · 2�u�
· � �
(E.3)
The second sum in (E.2) can be written as a double sum
∑
j � �
∩�tj−1�tj �=∅
�M �j − m�j ��tj − tj−1�
=
k∑
j=1
⎡
⎣ ∑
� � �t�−1�t��⊂�t
j−1�t
j �
�M �� − m�� ��t� − t�−1�
⎤
⎦
�
k∑
j=1
⎡
⎣ ∑
� � �t�−1�t��⊂�t
j−1�t
j �
�M
�
j − m�
j ��t� − t�−1�
⎤
⎦
�
k∑
j=1
�M
�
j − m�
j ��t
j − t
j−1�
= S�
�u� − S�
�u� �
(E.4)
Together (E.2)–(E.4) show S� �u� − S� �u� � 2
for any partition � with mesh � < �.
Measures, Integrals and Martingales 341
(iii)⇒(iv): Fix
> 0 and choose � > 0 as in (iii). Then we have for any partition
� = �a = t0 < < tk��� = b
with mesh � < � and any choice of intermediate
points �j ∈ �tj−1� tj �,
S� �u� −
� S� �u� �
k���∑
j=1
u��j ��tj − tj−1� � S� �u� � S� �u� +
This implies
b∫ ∗
a
u −
�
k���∑
j=1
u��j ��tj − tj−1� �
b∫ ∗
a
u
and
b∫
∗
a
u �
k���∑
j=1
u��j ��tj − tj−1� �
b∫
∗
a
u +
�
which means that
∑k���
j=1 u��j ��tj − tj−1�
mesh �→0−−−−−−→ I =
b∫
∗
a
u =
b∫ ∗
a
u.
(vi)⇒(i): Assume that ∑k���j=1 u��j ��tj − tj−1�
mesh �→0−−−−−−→ I exists for any choice
of intermediate values. We have to show that I =
b∫ ∗
a
u =
b∫
∗
a
u. By definition of
the limit, there is some
> 0 and some partition � with mesh � < � such that
I −
�
k���∑
j=1
u��j ��tj − tj−1� � I +
Since this must hold uniformly for any choice of intermediate values, we can pass
to the infimum and supremum of these values and get
I −
�
k���∑
j=1
inf
�∈�tj−1�tj �
u��� �tj − tj−1� �
k���∑
j=1
sup
�∈�tj−1�tj �
u��� �tj − tj−1� � I +
Thus I −
< S� �u� � S� �u� � I +
, and
I −
< S� �u� �
b∫
∗
a
u �
b∫ ∗
a
u � S� �u� � I +
Once we know that u is Riemann integrable, we can work out the value of the
integral by particular Riemann sums:
342 R.L. Schilling
E.6 Corollary If u � �a� b� → � is Riemann integrable, then the integral is the
limit of Riemann sums
lim
n→
kn∑
j=1
u
(
�
�n�
j
)(
t
�n�
j − t�n�j−1
)
where �n =
{
a = t�n�0 < t
�n�
1 < < t
�n�
kn
= b} is any sequence of partitions with
mesh �n
n→
−−−→ 0 and where ��n�j ∈
[
t
�n�
j−1� t
�n�
j
]
are some intermediate points.
The existence of the limit of Riemann sums for some particular sequence of
partitions does not guarantee integrability.
E.7 Example The Dirichlet jump function u�x� �= 1�0�1�∩��x� on �0� 1� is not
Riemann integrable, since for each partition � of �0� 1� we have Mj = 1 and
mj = 0, so that
1∫
∗
0
u = S� �u� = 0 while
1∫ ∗
0
u = S� �u� = 1.
On the other hand, the equidistant Riemann sum
k∑
j=1
u��j �
(
j
k
− j−1
k
) = 1
k
k∑
j=1
u��j �
takes the value n
k
, 0 � n � k if we choose �1� � �n rational and �n+1� � �k
irrational. This allows us to construct sequences of Riemann sums which converge
to any value in �0� 1�.
Let us now find concrete functions which are Riemann integrable. A step
function on �a� b� is a function f � �a� b� → � of the form
f�x� =
N∑
j=1
yj 1Ij �x�
where N ∈ �, yj ∈ � and Ij are (open, half-open, closed, even degenerate) adjacent
intervals such that I1 ∪ ∪ IN = �a� b� and Ij ∩ Ik, j = k, intersect in at most one
point. We denote by � �a� b� the family of all step functions on �a� b�.
E.8 Theorem Continuous functions, monotone functions, and step functions on
�a� b� are Riemann integrable.
Proof Notice that the functions from all three classes are bounded on �a� b�.
Continuous functions: Let u � �a� b� → � be continuous. Since �a� b� is com-
pact, u is uniformly continuous and we find for all
> 0 some � > 0 such that
�u�x� − u�y�� �
∀ x� y ∈ �a� b�� �x − y� < �
Measures, Integrals and Martingales 343
If � is a partition of �a� b� with mesh � < � we find
S� �u� − S� �u� =
∑
tj ∈�
�Mj − mj ��tj − tj−1� �
∑
tj ∈�
�tj − tj−1� =
�b − a�
since, by uniform continuity,
Mj − mj = sup u��tj−1� tj �� − inf u��tj−1� tj �� = sup
���∈�tj−1�tj �
(
u��� − u���) �
Thus u ∈ ��a� b� by Theorem E.5(iii).
Monotone functions: We can safely assume that u � �a� b� → � is monotone
increasing, otherwise we would consider −u. For the equidistant partition �k
with points tj = a + j b−ak , 0 � j � k, we get
S�k �u� − S�k �u� =
k∑
j=1
�u�tj � − u�tj−1���tj − tj−1�
= b − a
k
k∑
j=1
�u�tj � − u�tj−1�� =
b − a
k
�u�b� − u�a���
where we used that sup u��tj−1� tj �� = u�tj � and inf u��tj−1� tj �� = u�tj−1� because
of monotonicity. Since b−a
k
�u�b� − u�a�� can be made arbitrarily small, u ∈ ��a� b�
by Theorem E.5(ii).
Step functions: Let u be a step function which has value yj on the interval Ij ,
j = 1� � k. The endpoints of the non-degenerate intervals form a partition of
�a� b�, � = �a = t0 < t1 < < tN = b
, N � k, and we set for every
> 0
�
�=
{
a = s′0 < s1 < s′1 < s2 < < sN −1 < s′N −1 < sN = b
}
where sj < tj < s
′
j , 1 � j � N − 1, and s′j − sj <
/�2N �u�
�. Since u is constant
with value yj on each interval �s
′
j−1� sj �, we find
S�
�u� − S�
�u�
=
N∑
j=1
�yj − yj ��sj − s′j−1� +
N −1∑
j=1
[
sup u��sj � s
′
j �� − inf u��sj � s′j ��
]
�s′j − sj �
�
N −1∑
j=1
2�u�
2N �u�
�
Therefore Theorem E.5(ii) proves that u ∈ ��a� b�.
With somewhat more effort one can prove the following general theorem.
344 R.L. Schilling
E.9 Theorem Any bounded function u � �a� b� → � with at most countably many
points of discontinuity is Riemann integrable.
An elementary proof of this based on a compactness argument can be found in
Strichartz [48, §6.2.3], but since Theorem 11.8 supersedes this result anyway, we
do not include a proof here.
A combination of Theorems E.8 and E.5 yields the following quite useful
criterion for integrability.
E.10 Corollary u ∈ ��a� b� if, and only if, for every
> 0 there are f� g ∈ � �a� b�
such that f � u � g and
∫ b
a
�g − f � dt �
.
E.11 Theorem The Riemann integral is a positive linear form on the vector
lattice ��a� b�, that is, for all �� � ∈ � and u� w ∈ ��a� b� one has
(i) �u + �w ∈ ��a� b� and
∫ b
a
��u + �w� dt = �
∫ b
a
u dt + �
∫ b
a
w dt;
(ii) u � w =⇒
∫ b
a
u dt �
∫ b
a
w dt;
(iii) u ∨ w� u ∧ w� u+� u−� �u� ∈ ��a� b� and
∣∣∣∣
∫ b
a
u dt
∣∣∣∣ �
∫ b
a
�u� dt;
(iv) �u�p� u w ∈ ��a� b�, 1 � p <
.
Proof (i) follows immediately from the linearity of the limit criterion in Theorem
E.5(iv).
(ii): In view of (i) it is enough to show that v �= w − u � 0 entails ∫ b
a
v dt � 0.
This, however is clear since v ∈ ��a� b� and
0 �
b∫
∗
a
v =
∫ b
a
v dt
(iii): Since u ∨ w = −��−u� ∧ �−w��, u+ = u ∨ 0, u− = �−u� ∨ 0 and �u� =
u+ − u−, it is enough to prove that u ∧ w ∈ ��a� b�. By Corollary E.10 there are
for every
> 0 step functions f� g� �� � ∈ � �a� b� such that f � u � g, � � w � �
and
∫ b
a
�g − f � dt + ∫ b
a
�� − �� dt �
. Obviously, f ∧ g� � ∧ � are again step
functions[�] with f ∧ � � u ∧ w � g ∧ � and
∫ b
a
[
f ∧ � − g ∧ �]dt �
∫ b
a
[
�g − f � + �� − ��]dt �
�
where we used (ii) and the elementary inequality for a� b� A� B ∈ �
a ∧ A − b ∧ B � max��a − b�� �A − B�
� �a − b� + �A − B�
Finally, since ±u � �u� we find by parts (i),(ii) that ±∫ b
a
u dt �
∫ b
a
�u� dt which
implies
∣∣∣
∫ b
a
u dt
∣∣∣ �
∫ b
a
�u� dt.
Measures, Integrals and Martingales 345
(iv): By (iii), �u� ∈ ��a� b� and, by Corollary E.10, we find for each
> 0
step functions f � �u� � g such that ∫ b
a
�g − f � dt �
. Without loss of generality,
we may assume that f � 0 and g � �u�
– otherwise we could consider f + and
g ∧ �u�
and note that f +� g ∧ �u�
∈ � �a� b�, f + � �u� � g ∧ �u�
and
∫ b
a
(
g ∧ �u�
− f +
)
dt �
∫ b
a
�g − f � dt �
Thus f p � �u�p � gp, where f p� gp ∈ � �a� b�. By the mean value theorem of
differential calculus we get
gp − f p � p �g�p−1
�g − f � � p �u�p−1
�g − f �
Thus, by (ii),
∫ b
a
(
gp − f p)dt � p �u�p−1
∫ b
a
�g − f � dt � p �u�p−1
Since uw = 14 ��u + w�2 − �u − w�2�, we conclude that uw ∈ ��a� b� from (i)
and the fact that u2 = �u�2 ∈ ��a� b�.
Note that Theorem E.11(iii) has no converse: �u� ∈ ��a� b� does not imply that
u ∈ ��a� b� (as is the case for the Lebesgue integral, cf. T10.3). This can be
seen by the modified Dirichlet jump function u �= 1�0�1�∩� − 1�0�1�\� which is not
Riemann integrable but whose modulus �u� = 1�0�1� is Riemann integrable.
E.12 Corollary (Mean value theorem for integrals) Let u ∈ ��a� b� be either
positive or negative and let v ∈ C�a� b�. Then there exists some � ∈ �a� b� such that
∫ b
a
u�t�v�t� dt = v���
∫ b
a
u�t� dt (E.5)
Proof The case u � 0 being similar, we may assume that u � 0. By Theorem
E.8 and E.11(iv), uv is integrable and because of E.11(ii) we have
inf v��a� b��
∫ b
a
u�t� dt �
∫ b
a
u�t�v�t� dt � sup v��a� b��
∫ b
a
u�t� dt
Since v is continuous on �a� b�, the intermediate value theorem guarantees the
existence of some � ∈ �a� b� such that (E.5) holds.
E.13 Theorem Let �c� d� ⊂ �a� b�. Then ��a� b� ⊂ ��c� d� in the sense that
u ∈ ��a� b� satisfies u��c�d� ∈ ��c� d�. Moreover, for any u ∈ ��a� b�
∫ b
a
u dt =
∫ c
a
u dt +
∫ b
c
u dt
Proof By Theorem E.8 and E.11 we find that 1�c�d� u ∈ ��a� b�. Since we can
always add the points c and d to any of the partitions appearing in one of the
346 R.L. Schilling
criteria of Theorem E.5, we see that u��c�d� = �1�c�d� u���c�d� ∈ ��c� d� and
∫ b
a
1�c�d� u dt =
∫ d
c
u dt
Considering u = 1�a�c� u + 1�c�b� u proves also the formula in the statement of the
theorem.
The fundamental theorem of integral calculus
Since by Theorem E.13 ��a� x� ⊂ ��a� b�, we can treat ∫ x
a
u�t� dt, u ∈ ��a� b�,
as a function of its upper limit x ∈ �a� b�.
E.14 Lemma For every u ∈ ��a� b� the function U�x� �= ∫ x
a
u�t� dt is continuous
for all x ∈ �a� b�.
Proof Since u is bounded, M �= supx∈�a�b� �u�x�� <
. For all x� y ∈ �a� b�, x < y,
we have by Theorem E.13 and E.11
�U�y� − U�x�� =
∣∣∣∣
∫ y
a
u�t� dt −
∫ x
a
u�t� dt
∣∣∣∣
=
∣∣∣∣
∫ y
x
u�t� dt
∣∣∣∣ �
∫ y
x
�u�t�� dt � M �y − x� x−y→0−−−−→ 0�
showing even uniform continuity.
We can now discuss the connection between differentiation and integration.
Let us begin with a few examples.
E.15 Example (i) Let �0� 1� � �a� b�. Then u�x� = 1�0�1��x� is an integrable func-
tion and
U�x� �=
∫ x
a
u�t� dt =
⎧
⎪⎨
⎪⎩
0� if x � 0�
x� if 0 < x < 1�
1� if x � 1�
⎫
⎪⎬
⎪⎭
= x+ ∧ 1
Note that U ′�x� does not exist at x = 0 or x = 1, so that u�x� cannot be the
derivative of any function (at every point).
(ii) Let �a� b� = �0� 1� and take an enumeration �qj �j∈� of �0� 1� ∩ �. Then the
function
u�x� �= ∑
j � qj�x
2−j =
∑
j=1
2−j 1�qj �1��x�� x ∈ �0� 1��
is increasing, satisfies 0 � u � 1 and its discontinuities are jumps at the
points qj of height u�qj+� − u�qj−� = 2−j – this is as bad as it can get for
Measures, Integrals and Martingales 347
a monotone function, cf. Lemma 13.12. By Theorem E.8 u is integrable,
and since �qj �j∈� is dense, there is no interval �c� d� ⊂ �0� 1� such that
U ′�x� = u�x� for all x ∈ �c� d� for any function U�x�.
(iii) Consider on �−1� 1� the function
u�x� �=
{
x2 sin 1
x2
� if x = 0�
0� if x = 0
It is an elementary exercise to show that u′�x� exists on �−1� 1� and
u′�x� =
{
2x sin 1
x2
− 2
x
cos 1
x2
� if x = 0�
0� if x = 0
Thus u′ exists everywhere, but it is not Riemann integrable in any neigh-
bourhood of x = 0 since u′ is unbounded.
(iv) Let �qn�n∈� be an enumeration of �0� 1� ∩ �. The function
u�x� �=
{
2−n� if x = qn� n ∈ ��
0� if x ∈ (�0� 1� \ �)∪ �0� 1
�
is discontinuous for every x ∈ �0� 1� ∩ � and continuous otherwise. More-
over, u ∈ ��0� 1� which follows from Theorem E.9 or directly from the
following argument: fix
> 0 and n ∈ � such that 2−n <
. Choose a par-
tition � = �0 = t0 < t1 < < tN = 1
with mesh � = � <
n in such a way
that each qk from Qn �= �q1� q2� � qn
is the midpoint of some �tj−1� tj �,
j = 1� 2� � N . Therefore, if Mj denotes sup u��tj−1� tj ��,
0 � S� �u� � S
� �u� =
N∑
j=1
Mj �tj − tj−1�
= ∑
j � �tj−1�tj �∩Qn =∅
Mj �tj − tj−1�
+ ∑
j � �tj−1�tj �∩Qn=∅
Mj �tj − tj−1�
� n
n
+ 2−n ∑
j � �tj−1�tj �∩Qn=∅
�tj − tj−1�
�
+
N∑
j=1
�tj − tj−1� = 2
This proves u ∈ ��0� 1� and 0 � ∫ x0 u�t� dt �
∫ 1
0 u�t� dt = 0. Thus u′�x� =
0 = u�x� for all x from a dense subset.
348 R.L. Schilling
The above examples show that the Riemann integral is not always the antideriva-
tive, nor is the antiderivative an extension of the Riemann integral. The two
concepts, however, coincide on a large class of functions.
E.16 Definition Let u � �a� b� → � be a bounded function. Every function
U ∈ C�a� b� such that U ′�x� = u�x� for all but possibly finitely many x ∈ �a� b�
is called a primitive of u.
Obviously, primitives are only unique up to constants: for every constant c,
U + c is again a primitive of u. On the other hand, if U� W are two primitives of
u, we have U ′ − W ′ = 0 at all but finitely many points a = x0 < x1 < < xn = b.
Thus the mean value theorem of differential calculus shows U = W + const. (cf.
Rudin [39, Thm. 5.11]), first on each interval �xj−1� xj �, j = 1� 2� � n, and then,
by continuity, on the whole interval �a� b�.
E.17 Proposition Every u ∈ C�a� b� has U�x� �= ∫ x
a
u�t� dt as a primitive. More-
over,
U�b� − U�a� =
∫ b
a
u�t� dt
Proof Since continuous functions are integrable, U�x� is well-defined by Theorem
E.13 and continuous by Lemma E.14. For a < x < x + h < b and sufficiently
small h we find
∣∣U�x + h� − U�x� − h u�x�
∣∣ =
∣∣∣∣
∫ x+h
a
u�t� dt −
∫ x
a
u�t� dt −
∫ x+h
x
u�x� dt
∣∣∣∣
=
∣∣∣∣
∫ x+h
x
(
u�t� − u�x�)dt
∣∣∣∣
�
∫ x+h
x
∣∣u�t� − u�x�
∣∣dt
�
∫ x+h
x
dt =
h�
where we used that u�t� is continuous at t = x. With a similar calculation we get
∣∣U�x� − U�x − h� − h u�x�
∣∣ �
h�
and a combination of both inequalities shows that
lim
y→x
U�y� − U�x�
y − x = u�x�
The formula U�b� − U�a� = ∫ b
a
u�t� dt follows from the fact that U�a� = 0.
Measures, Integrals and Martingales 349
E.18 Theorem (Fundamental theorem of calculus) Assume that U is a primitive
of u ∈ ��a� b�. Then
U�b� − U�a� =
∫ b
a
u�t� dt
Proof Let C be some finite set such that U ′�x� = u�x� if x ∈ �a� b� \ C. Fix
> 0. Since u is integrable, we find by E.5(ii) a partition � of �a� b� such
that S� �u� − S� �u� �
. Because of Lemma E.1 this inequality still holds for the
partition �′ �= � ∪ C whose points we denote by a = t0 < t1 < < tk = b. Since
U�b� − U�a� =
k∑
j=1
�U�tj � − U�tj−1��
and since U is differentiable in each segment �tj−1� tj � and continuous on �a� b�,
we can use the mean value theorem of differential calculus to find points �j ∈
�tj−1� tj � with
U�tj � − U�tj−1� = U ′��j ��tj − tj−1� = u��j ��tj − tj−1�� 1 � j � k
Using mj = inf u��tj−1� tj �� � u��j � � sup u��tj−1� tj �� = Mj we can sum the above
equality over j = 1� � k and get
S�
′
�u� −
� S�′ �u� � U�b� − U�a� � S�
′
�u� � S�′ �u� +
By integrability, S�′ �u� �
∫ b
a
u dt � S�
′
�u�, and this shows
∫ b
a
u dt −
� U�b� − U�a� �
∫ b
a
u dt +
∀
> 0�
which proves our claim.
E.19 Remark There is not much room to improve the fundamental theorem E.18.
On one hand, Example E.15(ii) shows that an integrable function need not have
a primitive and E.15(iv) gives an example where
∫ x
a
u dt exists, but is not a
primitive in any interval; on the other hand, E.15(iii) provides an example of a
function u′ which has a primitive u but which is itself not Riemann integrable
since it is unbounded. Volterra even constructed an example of a bounded but
not Riemann integrable function with a primitive, see Sz.-Nagy [30, pp. 155–7].
To overcome this phenomenon was one of the motivations for Lebesgue when
he introduced the Lebesgue integral. And, in fact, every bounded function f
on the interval �a� b� with a primitive F is Lebesgue integrable: indeed, since
F is continuous, it is measurable in the sense of Chapter 8 and so is the limit
f�x� = limn→
�F�x + 1n � − F�x��/ 1n , cf. Corollary 8.9 – the finitely many points
where the limit does not exist are a Lebesgue null set and pose no problem. Since
350 R.L. Schilling
�f � is dominated by the (Lebesgue) integrable function M 1�a�b�� M �= sup f��a� b��,
we conclude that f ∈ �1�a� b�.
An immediate consequence of the integral as antiderivative are the following
integration formulae which are easily proved by ‘integrating up’ the corresponding
differentiation rules.
E.20 Theorem (Integration by parts) Let u′ and v′ be integrable functions on
�a� b� with primitives u and v. Then uv is a primitive of u′v+uv′ and, in particular,
∫ b
a
u′�t�v�t� dt = u�b�v�b� − u�a�v�a� −
∫ b
a
u�t�v′�t� dt
E.21 Theorem (Integration by substitution) Let u ∈ ��a� b� and assume that
� � �c� d� → �a� b� is a strictly increasing differentiable function such that ��c� = a
and ��d� = b. If u � �� �′ ∈ ��c� d� and if u has a primitive U , then U � � is a
primitive of u � � · �′ as well as
∫ b
a
u�t� dt =
∫ d
c
u���s���′�s� ds =
∫ �−1�b�
�−1�a�
u���s���′�s� ds
E.22 Corollary (Bonnet’s mean value theorem3) Let u� v ∈ ��a� b� have prim-
itives U and V . If u � 0 [resp. u � 0] and U � 0, then there exists some � ∈ �a� b�
such that
∫ b
a
U�t�v�t� dt = U�a�
∫ �
a
v�t� dt (E.6)
[
resp.
∫ b
a
U�t�v�t� dt = U�b�
∫ b
�
v�t� dt
]
(E.6′)
Proof By subtracting a suitable constant from V we may assume that V�a� = 0 and,
by the fundamental theorem E.18, V�a� = ∫ x
a
v�t� dt. Integration by parts now shows
∫ b
a
U�t�v�t� dt = U�b�V�b� −
∫ b
a
u�t�V�t� dt
Since u � 0 we get
∫ b
a
U�t�v�t� dt � U�b�V�b� − sup V��a� b��
∫ b
a
u�t� dt
= U�b�V�b� − sup V��a� b�� �U�b� − U�a��
= U�b�(V�b� − sup V��a� b��)+ sup V��a� b�� U�a�
� sup V��a� b�� U�a��
3 Also known as the second mean value theorem of integral calculus.
Measures, Integrals and Martingales 351
and a similar calculation yields the other inequality below:
inf V��a� b�� U�a� �
∫ b
a
U�t�v�t� dt � sup V��a� b�� U�a�
Applying the intermediate value theorem to the continuous function V furnishes
some � ∈ �a� b� such that (E.6) holds.
Integrals and limits
One of the strengths of Lebesgue integration is the fact that we have fairly general
theorems that allow interchanging pointwise limits and Lebesgue integrals.
Similar results for the Riemann integral regularly require uniform convergence.
Recall that a sequence of functions �un�•��n∈� on �a� b� converges uniformly (in x)
to u, if
∀
> 0 ∃ N
∈ � � ∀ x ∈ �a� b�� ∀ n � N
� �un�x� − u�x�� �
The basic convergence result for the Riemann integral is the following.
E.23 Theorem Let �un�n∈� ⊂ ��a� b� be a sequence which converges uniformly
to a function u. Then u ∈ ��a� b� and
lim
n→
∫ b
a
un dt =
∫ b
a
lim
n→
un dt =
∫ b
a
u dt
Proof Let � be a partition of �a� b� and let
> 0 be given. Since un
n→
−−−→ u
uniformly, we can find some N
∈ � such that �u�x� − un�x�� �
/�b − a� uni-
formly in x ∈ �a� b� for all n � N
. Because of (E.1) we find for all n � N
S� �u� − S� �u� = S� �u − un� + S� �un� − S� �un� − S� �u − un�
� 2
+ S� �un� − S� �un��
thus
b∫ ∗
a
u −
b∫
∗
a
u � 2
+ S� �un� − S� �un� ∀ n � N
Fixing some n0 � N
we can use that un0 is integrable and choose � in such a way
that S� �un0 � − S� �un0 � �
. This shows that
b∫ ∗
a
u −
b∫
∗
a
u � 3
and u ∈ ��a� b�.
Once u is known to be integrable, we get for all n � N
∣∣∣∣
∫ b
a
�u − un� dt
∣∣∣∣ �
∫ b
a
�u − un� dt �
�b − a�
→0−−→ 0
We can now consider Riemann integrals which depend on a parameter.
352 R.L. Schilling
E.24 Theorem (Continuity theorem) Let u � �a� b� × � → � be a continuous
function. Then
w�y� �=
∫ b
a
u�t� y� dt
is continuous for all y ∈ �.
Proof Since u�•� y� is continuous, the above Riemann integral exists. Fix y ∈ �
and consider any sequence �yn�n∈� with limit y. Without loss of generality we
can assume that �yn�n∈� ⊂ I �= �y − 1� y + 1�. Since �a� b� × I is compact, u��a�b�×I
is uniformly continuous, and we can find for all
> 0 some � > 0 such that
√
�t − ��2 + �y − ��2 < � =⇒ �u�t� y� − u��� ��� <
As yn
n→
−−−→ y, there is some N
∈ � with
�u�t� yn� − u�t� y�� <
∀ t ∈ �a� b�� ∀ n � N
�
i.e. u�yn� t�
n→
−−−→ u�y� t� uniformly in t ∈ �a� b�. Theorem E.23 and the continuity
of u�t� •� therefore show
lim
n→
w�yn� = limn→
∫ b
a
u�t� yn� dt =
∫ b
a
lim
n→
u�t� yn� dt =
∫ b
a
u�t� y� dt = w�y�
which is but the continuity of w at y.
E.25 Theorem (Differentiation theorem) Let u � �a� b�×� → � be a continuous
function with continuous partial derivative �
�y
u�t� y�. Then
w�y� �=
∫ b
a
u�t� y� dt
is continuously differentiable and
w′�y� = d
dy
∫ b
a
u�t� y� dt =
∫ b
a
�
�y
u�t� y� dt
Proof Since u�•� y� and �
�y
u�•� y� are continuous, the above integrals exist. Fix
y ∈ � and consider any sequence �yn�n∈� with limit y. Without loss of generality
we can assume that �yn�n∈� ⊂ I �= �y − 1� y + 1�.
We introduce the following auxiliary function
h�t� z� �= u�t� z� − u�t� y� − �
�y
u�t� y� �z − y�
Measures, Integrals and Martingales 353
Clearly, h�t� y� = 0 and �
�z
h�t� z� = �
�z
u�t� z� − �
�y
u�t� y� is continuous and uni-
formly continuous on �a� b� × I , i.e. for all
> 0 there is some � > 0 such that
√
�t − ��2 + �z − ��2 < � =⇒
∣∣∣∣
�
�z
h�t� z� − �
��
h��� ��
∣∣∣∣ <
From the mean value theorem of differential calculus we infer that for some �
between z and y
�h�t� z�� = �h�t� z� − h�t� y�� =
∣∣∣∣
�
��
h��� ��
∣∣∣∣· �z − y�
=
∣∣∣∣
�
��
h��� �� − �
�y
h�t� y�
∣∣∣∣· �z − y�
�
�z − y�
whenever z� y ∈ I and �z − y� < �. This shows that for some N
∈ �
∣∣∣u�t� yn� − u�t� y� −
�
�y
u�t� y��yn − y�
∣∣∣ �
�yn − y� ∀ t ∈ �a� b�� ∀ n � N
Theorem E.23 now shows that
w′�y� = lim
n→
w�yn� − w�y�
yn − y
= lim
n→
∫ b
a
u�t� yn� − u�t� y�
yn − y
dt
=
∫ b
a
lim
n→
u�t� yn� − u�t� y�
yn − y
dt =
∫ b
a
�
�y
u�t� y� dt
Improper Riemann integrals
Let us finally have a glance at various extensions of the Riemann integral to
unbounded intervals and /or unbounded integrands. The following cases can
occur:
A. the interval of integration is �a� +
� or �−
� b�;
B. the interval of integration is �a� b� or �a� b�, and the integrand u�t� is unbounded
as t ↑ b resp. t ↓ a;
C. the interval of integration is �a� b� with −
� a < b � +
and the integrand
may or may not be unbounded.
354 R.L. Schilling
A. Improper Riemann integrals of the type
∫ �
a
u dt or
∫ b
−�
u dt
E.26 Definition If u ∈ ��a� b� for all b ∈ �a�
� [resp. a ∈ �−
� b�] and if the
limit
lim
b→
∫ b
a
u dt
[
resp. lim
a→−
∫ b
a
u dt
]
exists and is finite, we call u improperly Riemann integrable and write u ∈
��a�
� [resp. u ∈ ��−
� b� ]. The value of the above limit is called the
(improper Riemann) integral and denoted by
∫
a
u dt
[
resp.
∫ b
−
u dt
]
.
The typical examples of improper integrals of this kind are expressions of the
type
∫
1 t
� dt if � < 0. In fact, if � = −1,
∫
1
t� = lim
b→
∫ b
1
t� dt = lim
b→
1
� + 1 �b
�+1 − 1� =
{ −1
�+1 if � < −1�
if � > −1�
and a similar calculation confirms that
∫
1 t
−1 dt =
. Thus t� ∈ ��1�
� if, and
only if, � < −1.
From now on we will only consider integrals of the type
∫
a
u dt, the case
of a finite upper and infinite lower limit is very similar. The following Cauchy
criterion for improper integrals is quite useful.
E.27 Lemma u ∈ ��a�
� if, and only if, u ∈ ��a� b� for all b ∈ �a�
� and
limx�y→
∫ y
x
u dt = 0 (x� y →
simultaneously).
Proof This is just Cauchy’s convergence criterion for U�z� = ∫ z
a
u�t� dt as
z →
.
It is not hard to see that Lemma E.27 implies, in particular, that
• ��a�
� is a vector space, i.e. for all �� � ∈ � and u� w ∈ ��a�
�,
∫
a
��u + �w� dt = �
∫
a
u dt + �
∫
a
w dt
• u ∈ ��a�
� if, and only if, ∫
b
u dt exists for all b > a.
E.28 Corollary Let u� w � �a�
� → � be two functions such that �u� � w. If
w ∈ ��a�
�, and if u ∈ ��a� b� for all b > a, then u� �u� ∈ ��a�
�. In particular,
�u� ∈ ��a�
� implies that u ∈ ��a�
�.
Measures, Integrals and Martingales 355
Proof For all y > x > a we find using Theorem E.11 and Lemma E.27 that
∣∣∣∣
∫ y
x
u dt
∣∣∣∣ �
∫ y
x
�u� dt �
∫ y
x
w dt
x�y→
−−−−→ 0
which shows, again by E.27, that u� �u� ∈ ��a�
�.
Note that, unlike Lebesgue integrals, improper Riemann integrals are not abso-
lute integrals since improper integrability of u does NOT imply improper integra-
bility of �u�, see e.g. Remark 11.11 where ∫
0 sin t/t dt is discussed. This means
that the following convergence theorems for improper Riemann integrals are not
necessarily covered by Lebesgue’s theory.
E.29 Theorem Let �un�n∈� ⊂ ��a�
�. If for some u � �a�
� → �
• un�t�
n→
−−−→ u�t� uniformly in t ∈ �a� b� and for every b > a,
• lim
b→
∫ b
a
un dt exists uniformly for all n ∈ �, i.e. for every
> 0 there is
some N
∈ � such that
sup
n∈�
∣∣∣∣
∫ y
x
un dt
∣∣∣∣ <
∀ y > x > N
�
then u ∈ ��a�
� and lim
n→
∫
a
un dt =
∫
a
lim
n→
un dt =
∫
a
u dt
Proof That u ∈ ��a� b� for all b > a follows from Theorem E.23. Fix
> 0 and
choose N
as in the above statement. For all y > x > N
∣∣∣∣
∫ y
x
u dt
∣∣∣∣ �
∣∣∣∣
∫ y
x
�u − un� dt
∣∣∣∣+
∣∣∣∣
∫ y
x
un dt
∣∣∣∣ � �y − x� sup
t∈�x�y�
�u�t� − un�t�� +
�
and as n →
we find
∣∣∫ y
x
u dt
∣∣ �
for all y > x > N
, hence u ∈ ��a�
� by
Lemma E.27.
In pretty much the same way as we derived Theorems E.24, E.25 from the
basic convergence result E.23 we get now from E.29 the following continuity and
differentiability theorems for improper integrals.
E.30 Theorem Let I ⊂ � be an open interval and u � �a�
� × I → � be contin-
uous such that u�•� y� ∈ ��a�
� for all y ∈ I and
lim
b→
∫ b
a
u�t� y� dt exists uniformly for all y ∈ �c� d� ⊂ I
Then U�y� �= ∫
a
u�t� y� dt is continuous for all y ∈ �c� d�.
356 R.L. Schilling
Proof (sketch) Fix y ∈ �c� d� and choose any sequence �yn�n∈� ⊂ �c� d� with limit
y. By the assumptions un�t� �= u�t� yn�
n→
−−−→ u�t� y� uniformly for all t ∈ �a� b�.
Now the basic convergence theorem for improper integrals E.29 applies and shows
U�yn�
n→
−−−→ U�y�.
E.31 Theorem Let I ⊂ � be an open interval and u � �a�
� × I → � be contin-
uous with continuous partial derivative �
�y
u�t� y�. If u�•� y�� �
�y
u�t� y� ∈ ��a�
�
for all y ∈ I , and if
lim
b→
∫ b
a
u�t� y� dt and lim
b→
∫ b
a
�
�y
u�t� y� dt
exist uniformly for all y ∈ �c� d� ⊂ I , then W�y� �= ∫
a
u�t� y� dt exists and is
differentiable on �c� d� with derivative
W ′�y� = d
dy
∫
a
u�t� y� dt =
∫
a
�
�y
u�t� y� dt
Proof (sketch) Set U�x� y� �= ∫ x
a
u�t� y� dt. By Theorem E.25 �
�y
U�x� y� exists and
equals
∫ x
a
�
�y
u�t� y� dt. By assumption,
U�x� y�
x→
−−−→
∫
a
u�t� y� dt pointwise for all y ∈ �c� d��
�
�y
U�x� y�
x→
−−−→
∫
a
�
�y
u�t� y� dt uniformly for all y ∈ �c� d�
By a standard theorem on uniform convergence and differentiability, cf. Rudin
[39, Theorem 7.17], we now conclude
d
dy
∫
a
u�t� y� dt =
∫
a
�
�y
u�t� y� dt
E.32 Theorem Let u� w ∈ ��a� b� for all b ∈ �a�
� and assume that u� w � 0
and that limx→
u�x�/w�x� = A > 0 exists. Then u ∈ ��a�
� if, and only if,
w ∈ ��a�
�.
Proof By assumption we find for every
> 0 some N
∈ � such that
0 < A −
� u�x�
w�x�
� A +
∀ x � N
�> a�
Thus �A −
�w�x� � u�x� � �A +
�w�x� for all x � N
. Thus, if w ∈ ��a�
�, we
get �A +
�w ∈ ��a�
� (cf. the remark following Lemma E.27) and, by Corollary
E.28, u ∈ ��a�
�.
Similarly, if u ∈ ��a�
�, we have u/�A −
� ∈ ��a�
� and, again by E.28,
w ∈ ��a�
�.
Measures, Integrals and Martingales 357
We will finally study the interplay of series and improper integrals.
E.33 Theorem Let a = b0 < b1 < b2 < be a strictly increasing sequence with
bk →
.
(i) If u ∈ ��a�
�, then
∑
k=1
∫ bk
bk−1
u dt converges.
(ii) If u � 0 and u ∈ ��bk−1� bk� for all k ∈ �, then the convergence of
∑
k=1
∫ bk
bk−1
u dt
implies u ∈ ��a�
�.
Proof (i): Since u ∈ ��a�
�,
∫
a
u dt = lim
n→
∫ bn
a
u dt = lim
n→
n∑
k=1
∫ bk
bk−1
u dt =
∑
k=1
∫ bk
bk−1
u dt
(ii): Define S �= ∑
k=1
∫ bk
bk−1
u dt. Since bk increases to
, we find for all b > a
some N ∈ � such that bN > b. Consequently,
∫ b
a
u dt �
∫ bN
a
u dt =
N∑
k=1
∫ bk
bk−1
u dt � S
which shows that the limit limb→
∫ b
a
u dt = supb>0
∫ b
a
u dt � S exists.
E.34 Theorem (Integral test for series) Let u ∈ C�0�
�, u � 0, be a decreasing
function. Then
∫
0
u dt and
∑
k=0
u�k�
either both converge or diverge.
Proof Note that by Theorem E.8 u ∈ ��0� b� for all b > 0, so that the improper
integral can be defined. Since u is decreasing,
u�k + 1� �
∫ k+1
k
u�t� dt � u�k��
cf. Theorem E.11, and summing these inequalities over k = 0� 1� � N yields
N +1∑
k=1
u�k� =
N∑
k=0
u�k + 1� �
∫ N +1
0
u�t� dt �
N∑
k=0
u�k�
Since u is positive and since the series has only positive terms, it is obvious that∫
0 u dt converges if, and only if, the series
∑
k=0 u�k� is finite.
358 R.L. Schilling
B. Improper Riemann integrals with unbounded integrands
E.35 Definition If u ∈ ��a� c� [resp. u ∈ ��c� b� ] for all c ∈ �a� b� and if the
limit
lim
c↑b
∫ c
a
u dt
[
resp. lim
c↓a
∫ b
c
u dt
]
exists and is finite, we call u improperly Riemann integrable and write u ∈ ��a� b�
[resp. u ∈ ��a� b� ]. The value of the limit is called the (improper Riemann)
integral and denoted by
∫ b
a
u dt.
Notice that the function u in E.35 need not be bounded in �a� b�. If it is, the
improper integral coincides with the ordinary Riemann integral.
E.36 Lemma If the function u ∈ ��a� b� [or u ∈ ��a� b� ] has an extension to
�a� b� which is bounded, then the extension is Riemann integrable over �a� b�, and
proper and improper Riemann integrals coincide.
Proof We consider only �a� b�, since the other case is similar. Denote, for
notational simplicity, the extension of u again by u.
Let M �= sup u��a� b��, fix
> 0 and pick c < b with b − c �
M
. Since
u ∈ ��a� c�, we can find a partition � of �a� c� such that S� �u� − S� �u� �
. For
the partition �′ �= � ∪ �b
of �a� b� we get
S�
′
�u� − S� �u� = sup u��c� b��
M
� M
M
=
and
S�′ �u� − S� �u� = inf u��c� b��
M
� M
M
=
�
which implies that S�
′
�u� − S�′ �u� � 3
and u ∈ ��a� b� by Theorem E.5. The
claim now follows from Lemma E.14.
Many of the results for improper integrals of the form
∫
a
u dt resp.
∫ b
−
u dt
carry over with minor notational changes to the case of half-open bounded inter-
vals. Note, however, that in the convergence theorems some assertions involving
uniform convergence are senseless in the presence of unbounded integrands. We
leave the details to the reader.
The typical examples of improper integrals of this kind are expressions of the
type
∫ 1
0 t
� dt if � < 0. In fact, if � = −1,
∫ 1
0
t� = lim
→0
∫ 1
t� dt = lim
→0
1
� + 1 �1 −
�+1� =
{
1
�+1 if � > −1�
if � < −1�
Measures, Integrals and Martingales 359
and a similar calculation confirms that
∫ 1
0 t
−1 dt =
. Thus t� ∈ ��0� 1� if, and
only if, � > −1.
C. Improper Riemann integrals where both limits are critical
Assume now that the integration interval is �a� b� and that both endpoints a and
b, −
� a < b � +
, are critical, i.e. that the integrand is unbounded at one or
both endpoints and/or that one or both endpoints are infinite.
Let u ∈ ��a� c� ∩��c� b� for some point a < c < b and suppose that d satisfies
c < d < b. By the remark following Lemma E.27 and Theorem E.13 we find
∫ c
a
u dt +
∫ b
c
u dt = lim
x↓a
∫ c
x
u dt + lim
y↑b
∫ y
c
u dt
= lim
x↓a
∫ c
x
u dt +
∫ d
c
u dt + lim
y↑b
∫ y
d
u dt
= lim
x↓a
∫ d
x
u dt + lim
y↑b
∫ y
d
u dt
=
∫ d
a
u dt +
∫ b
d
u dt�
which shows that u ∈ ��a� d� ∩ ��d� b�. Therefore, the following definition
makes sense.
E.37 Definition Let −
� a < b � +
and let �a� b� ⊂ � be a bounded or
unbounded open interval. Then u � �a� b� → � is said to be improperly integrable
if for some (hence, all) c ∈ �a� b� the function u is improperly integrable both
over �a� c� and �c� b�, i.e. we define ��a� b� �= ��a� c� ∩ ��c� b�. The (improper
Riemann) integral is then given by
∫ b
a
u dt �=
∫ c
a
u dt +
∫ b
c
u dt = lim
x↓a
∫ c
x
u dt + lim
y↑b
∫ y
c
u dt
The typical example of an improper integral of this kind is Euler’s Gamma
function
��x� �=
∫
0
tx−1e−t dt� x > 0�
which is treated in Example 10.14 in the framework of Lebesgue theory, but the
arguments are essentially similar. The Gamma function is only for 0 < x < 1 a
two-sided improper integral, since for x � 1 it can be interpreted as a one-sided
improper integral over �0�
�, cf. Lemma E.36.
Further reading
Measure theory is used in many mathematical disciplines. A few of them we have touched
in this book and the purpose of this section is to point towards literature which treats
these subjects in depth. The choice of books and topics is certainly not comprehensive.
On the contrary, it is very personal, limited by my knowledge of the literature and, of
course, my own mathematical taste. I decided to include only books in English and which
I thought are accessible to readers of the present text.
Real analysis (in particular measure and integration theory for analysts)
Bass, R. F., Probabilistic Techniques in Analysis, New York: Springer 1995.
Dudley, R. M., Real Analysis and Probability (2nd edn), Cambridge: Cambridge Univer-
sity Press, Studies in Adv. Math. vol. 74, 2002.
Hewitt, E. and K. R. Stromberg, Real and Abstract Analysis, New York: Springer, Grad.
Texts in Math. vol. 25, 1975.
Kolmogorov, A. N. and F. V. Fomin, Introductory Real Analysis, Mineola (NY): Dover,
1975.
Lieb, E. H. and M. Loss, Analysis (2nd edn), Am. Mathematical Society, Grad. Studies
in Math. vol. 14, Providence (RI) 2001.
Rudin, W., Real and Complex Analysis (3rd edn), McGraw-Hill, New York 1987.
Saks, S., Theory of the Integral (2nd revised edn), Hafner, Mongrafie Matematyczne
Tom VII, New York 1937. [Reprinted by Dover, 1964. Free online edition in the
Wirtualna Biblioteka Nauki: http://matwbn.icm.edu.pl/kstresc.php?tom=7&wyd=10]
Stroock, D., A Concise Introduction to the Theory of Integration (3rd edn), Birkhäuser,
Boston 1999.
Sz.-Nagy, B., Introduction to Real Functions and Orthogonal Expansions, Oxford Uni-
versity Press, Univ. Texts in the Math. Sci., New York 1965.
Wheeden, R. L. and A. Zygmund, Measure and Integral. An Introduction to Real Analysis,
Marcel Dekker, Pure Appl. Math. vol. 43, New York 1977.
360
Further reading 361
Functional analysis
Bollobas, B., Linear Analysis. An Introductory Course (2nd edn), Cambridge University
Press, Cambridge 1999.
Hirsch, F. and G. Lacombe, Elements of Functional Analysis, Springer, Grad. Texts in
Math. vol. 192, New York 1999.
Kolmogorov, A. N. and F. V. Fomin, Introductory Real Analysis, Mineola (NY): Dover,
1975.
Yosida, K., Functional Analysis (6th edn), Springer, Grundlehren math. Wiss. Bd. 123,
Berlin 1980.
Zaanen, A. C., Integration (completely revised edn. of An Introduction to the Theory of
Integration), North-Holland, Amsterdam 1967.
Fourier series, harmonic analysis, orthonormal systems, wavelets
Alexits, G., Convergence Problems of Orthogonal Series, Pergamon, Int. Ser. Monogr.
Pure Appl. Math. vol. 20, Oxford 1961.
Andrews, G. E., Askey, R. and R. Roy, Special Functions, Cambridge University Press,
Encycl. Math. Appl. vol. 71, Cambridge 1999.
Garsia, A. M., Topics in Almost Everywhere Convergence, Markham, Chicago
1970.
Helson, H., Harmonic Analysis, Addison-Wesley, London, 1983.
Kahane, J.-P., Some Random Series of Functions (2nd edn), Cambridge University Press,
Stud. Adv. Math. vol. 5, Cambridge 1985.
Krantz, S. G., A Panorama of Harmonic Analysis, Mathematical Association of America,
Carus Math. Monogr. vol. 27, Washington 1999.
Pinsky, M. A., Introduction to Fourier Analysis and Wavelets, Brooks/Cole, Ser. Adv.
Math., Pacific Grove (CA) 2002.
Schipp, F., Wade, W. R. and P. Simon, Walsh Series. An Introduction to Dyadic Harmonic
Analysis, Adam Hilger, Bristol 1990.
Stein, E. M., Singular Integrals and Differentiability Properties of Functions, Princeton
University Press, Math. Ser. vol. 30, Princeton (NJ) 1970.
Stein, E. M. and R. Shakarchi, Fourier Analysis: An Introduction, Princeton University
Press, Princeton (NJ) 2003.
Sz.-Nagy, B., Introduction to Real Functions and Orthogonal Expansions, Oxford Uni-
versity Press, Univ. Texts in the Math. Sci., New York 1965.
Wojtaszczyk, P., A Mathematical Introduction to Wavelets, Cambridge University Press,
London Math. Society Student Texts vol. 37, Cambridge 1997.
Zygmund, A., Trigonometric Series (2nd edn), Cambridge University Press, Cambridge
1959. [Almost unaltered softcover editions: Cambridge: Cambridge University Press,
1969, 1988 and 2003.]
362 Further reading
Geometric measure theory, Hausdorff measure, fine properties of functions
Evans, L. C. and R. F. Gariepy, Measure Theory and Fine Properties of Functions, CRC
Press, Boca Raton (FL) 1992.
Mattila, P., Geometry of Sets and Measures in Euclidean Spaces, Cambridge University
Press, Studies in Adv. Math. vol. 44, Cambridge 1995.
Morgan, F., Geometric Measure Theory: A Beginner’s Guide (3rd edn), Academic Press,
San Diego, 2000.
Rogers, C. A., Hausdorff Measures, Cambridge University Press, Cambridge Math.
Library, Cambridge 1970.
Ziemer, W. P., Weakly Differentiable Functions, Springer, Grad. Texts in Math. vol. 120,
New York 1989.
Topological measure theory, functional analytic aspects of integration and measure
Bauer, H., Measure and Integration Theory, de Gruyter, Studies in Math. vol. 26, Berlin
2001.
Choquet, G., Lectures on Analysis. vol. 1: Integration and Topological Vector Spaces,
W. A. Benjamin, New York 1969.
Dieudonné, J., Treatise on Analysis, vol. II, Academic Press, Pure Appl. Math. vol. 10-II,
New York 1969.
Hewitt, E. and K. A. Ross, Abstract Harmonic Analysis, vol. 1, Springer, Grundlehren
math. Wiss. Bd. 115, Berlin 1963.
Malliavin, P., Integration and Probability, Springer, Grad. Texts in Math. 157, New York
1995.
Oxtoby, J. C., Measure and Category (2nd edn), Springer, Grad. Texts Math. vol. 2, New
York 1980.
Weir, A. J., General Integration and Measure, Cambridge University Press, Cambridge
1974.
Borel and analytic sets
Rogers, C. A. et al., Analytic Sets, Academic Press, London 1980.
Srivastava, S. M., A Course on Borel Sets, Springer, Grad. Texts Math. vol. 180,
New York 1998.
Probability theory (in particular probabilistic measure theory)
Ash, R. B. and C. A. Doléans-Dade, Probability and Measure Theory (2nd edn), Academic
Press, San Diego (CA) 2000.
Billingsley, P., Probability and Measure (3rd edn), Wiley, Ser. Probab. Math. Stat., New
York 1995.
Chow, Y. S. and H. Teicher, Probability Theory. Independence, Interchangeability,
Martingales (3rd edn), Springer, Texts in Stat., New York 1997.
Further reading 363
Durrett, R., Probability: Theory and Examples (3rd edn), Thomson Brooks/Cole, Duxbury
Adv. Studies, Belmont (CA) 2004.
Kallenberg, O., Foundations of Modern Probability, Springer, New York 2001.
Malliavin, P., Integration and Probability, Springer, Grad. Texts in Math. 157, New York
1995.
Neveu, J., Mathematical Foundations of the Calculus of Probability, Holden Day,
San Francisco (CA) 1965.
Stromberg, K., Probability for Analysts, Chapman and Hall, Probab. Ser., New York
1994.
Martingales and their applications
Ash, R. B. and C. A. Doléans-Dade, Probability and Measure Theory (2nd edn), Academic
Press, San Diego (CA) 2000.
Chow, Y. S. and H. Teicher, Probability Theory. Independence, Interchangeability,
Martingales (3rd edn), Springer, Texts in Stat., New York 1997.
Dellacherie, C. and P. A. Meyer, Probabilities and Potential Pt. B: Theory of Martingales,
North Holland, Math. Studies, Amsterdam 1982. [Note that Probabilities and Potential
Pt. A, Amsterdam 1979, by the same authors is a prerequisite for this text.]
Garsia, A. M., Topics in Almost Everywhere Convergence, Markham, Chicago 1970.
Meyer, P. A., Probabilities and Potentials, Blaisdell, London 1966.
Neveu, J., Discrete-parameter Martingales, North Holland, Math. Libr. vol. 10, Amsterdam
1975.
Rogers, L. C. G. and D. Williams, Diffusions, Markov Processes and Martingales (2 vols.,
2nd edn), Cambridge Math. Library, Cambridge 2000.
References
[1] Alexits, G., Convergence Problems of Orthogonal Series, Oxford: Pergamon, Int.
Ser. Monogr. Pure Appl. Math. vol. 20, 1961.
[2] Andrews, G. E., Askey, R. and R. Roy, Special Functions, Cambridge: Cambridge
University Press, Encycl. Math. Appl. vol. 71, 1999.
[3] Bass, R. F., Probabilistic Techniques in Analysis, New York: Springer, 1995.
[4] Bauer, H., Approximation and abstract boundaries, Am. Math. Monthly 85 (1978),
632–647. Also in: H. Bauer, Selecta, Berlin: de Gruyter, 2003, 436–451.
[5] Bauer, H., Probability Theory, Berlin: de Gruyter, Studies in Math. vol. 23, 1996.
[6] Bauer, H., Measure and Integration Theory, Berlin: de Gruyter, Studies in Math.
vol. 26, 2001.
[7] Benyamini, Y. and J. Lindenstrauss, Geometric Nonlinear Functional Analysis, vol.
1, Providence (RI): Am. Math. Soc., Coll. Publ. vol. 48, 2000.
[8] Boas, R. P., A Primer of Real Functions, Math. Association of America, Carus
Math. Monogr. vol. 13, 1960.
[9] Carathéodory, C., Über das lineare Maß von Punktmengen – eine Verallgemeiner-
ung des Längenbegriffs, Nachr. Kgl. Ges. Wiss. Göttingen Math.-Phys. Kl. (1914),
404–426. Also in: C. Carathéodory, Gesammelte mathematische Schriften (5 Bde.),
München: C.H. Beck, 1954-57, Bd. 4, 249–275.
[10] Ciesielski, Z., Hölder condition for realizations of Gaussian processes, Trans. Am.
Math. Soc. 99 (1961), 403–413.
[11] Diestel, J. and J. J. Uhl Jr., Vector Measures, Providence (RI): American Mathe-
matical Society, Math. Surveys no. 15, 1977.
[12] Dieudonné, J., Sur un théorème de Jessen, Fundam. Math. 37 (1950), 242–248.
Also in: J. Dieudonné, Choix d’œuvres mathématiques (2 tomes), Paris: Hermann,
1981, t. 1, 369–275.
[13] Doob, J. L., Stochastic Processes, New York: Wiley, Ser. Probab. Math. Stat., 1953.
[14] Dudley, R. M., Real Analysis and Probability, Pacific Grove (CA): Wadsworth &
Brooks/Cole, Math. Ser., 1989.
[15] Dunford, N. and J. T. Schwartz, Linear Operators I , New York: Pure Appl. Math.
vol. 7, Interscience, 1957.
[16] Garsia, A. M., Topics in Almost Everywhere Convergence, Chicago: Markham,
1970.
[17] Gradshteyn, I. and I. Ryzhik, Tables of Integrals, Series, and Products (4th corrected
and enlarged edn), San Diego (CA): Academic Press, 1992.
364
References 365
[18] Gundy, R. F., Martingale theory and pointwise convergence of certain orthogonal
series, Trans. Am. Math. Soc. 124 (1966), 228–248.
[19] Hausdorff, F., Grundzüge der Mengenlehre, Leipzig: Veit & Comp., 1914 (1st edn).
Reprint of the original edn, New York: Chelsea, 1949.
[20] Hewitt, E. and K. R. Stromberg, Real and Abstract Analysis, New York: Springer,
Grad. Texts Math. vol. 25, 1975.
[21] Hunt, G. A., Martingales et processus de Markov, Paris: Dunod, Monogr. Soc.
Math. France t. 1, 1966.
[22] Kaczmarz, S. and H. Steinhaus, Theorie der Orthogonalreihen (2nd corr. reprint),
New York: Chelsea, 1951. First edition appeared under the same title with PWN,
Warsaw: Monogr. Mat. Warszawa vol. VI, 1935.
[23] Kahane, J.-P., Some Random Series of Functions, (2nd edn) Cambridge: Cambridge
University Press, Stud. Adv. Math. vol. 5, 1985.
[24] Korovkin, P. P., Linear Operators and Approximation Theory, Delhi: Hindustan
Publ. Corp., 1960.
[25] Krantz, S. G., A Panorama of Harmonic Analysis, Washington: Mathematical Asso-
ciation of America, Carus Math. Monogr. vol. 27, 1999.
[26] Lévy, P., Processus stochastiques et mouvement Brownien, Paris: Gauthier-Villars,
Monographies des Probabilités Fasc. VI, 1948.
[27] Lindenstrauss, J. and Tzafriri, L., Classical Banach Spaces I, II, Berlin: Springer,
Ergeb. Math. Grenzgeb. 2. Ser. Bde. 92, 97, 1977–79.
[28] Marcinkiewicz, J. and A. Zygmund, Sur les fonctions indépendantes, Fundam. Math.
29 (1937), 309–335. Also in: J. Marcinkiewicz, Collected Papers, Warsaw: PWN,
1964, 233–259.
[29] Métivier, M., Semimartingales. A Course on Stochastic Processes, Berlin:
de Gruyter, Stud. Math. vol. 2, 1982.
[30] Sz.-Nagy, B., Introduction to Real Functions and Orthogonal Expansions,
New York: Oxford University Press, Univ. Texts in the Math. Sci., 1965.
[31] Neveu, J., Discrete-parameter Martingales, Amsterdam: North Holland, Math. Libr.
vol. 10, 1975. Slightly updated version of the French original: Martingales à temps
discrèt, Paris: Masson, 1972.
[32] Olevskiı̆, A. M., Fourier Series with Respect to General Orthogonal Systems, Berlin:
Springer, Ergeb. Math. Grenzgeb. Bd. 2. Ser. 86, 1975.
[33] Oxtoby, J. C., Measure and Category, (2nd edn), New York: Springer, Grad. Texts
Math. vol. 2, 1980.
[34] Paley, R. E. A. C. and N. Wiener, Providence (RI): Fourier Transforms in the Com-
plex Domain, American Mathematical Society, Coll. Publ. vol. 19, 1934.
[35] Pinsky, M. A., Introduction to Fourier Analysis and Wavelets, Pacific Grove (CA):
Brooks/Cole, Ser. Adv. Math., 2002.
[36] Pratt, J. W., On interchanging limits and integrals, Ann. Math. Stat. 31 (1960),
74–77. [Acknowledgement of Priority, Ann. Math. Stat. 37 (1966), 1407.]
[37] Riemann, B., Über die Darstellbarkeit einer Function durch eine trigonometrische
Reihe, Nachr. Kgl. Ges. Wiss. Göttingen 13 (1867), 227–271. Also in: Bernhard
Riemann, Collected Papers, Berlin: Springer, 1990, 259–303.
[38] Rogers, L. C. G. and D. Williams, Diffusions, Markov Processes and Martingales
(2 vols., 2nd edn), Cambridge: Cambridge Mathematical Library, 2000.
[39] Rudin, W., Principles of Mathematical Analysis (3rd edn), New York: McGraw-
Hill, 1976.
[40] Rudin, W., Real and Complex Analysis (3rd edn), New York: McGraw-Hill, 1987.
366 References
[41] Schipp, F., Wade, W. R. and P. Simon, Walsh Series. An Introduction to Dyadic
Harmonic Analysis, Bristol: Adam Hilger, 1990.
[42] Souslin, M. Y., Sur une définition des ensembles mesurables B sans nombres trans-
finis, C. R. Acad. Sci. Paris 164 (1917), 88–91.
[43] Srivastava, S. M., A Course on Borel Sets, New York: Springer, Grad. Texts Math.
vol. 180, 1998.
[44] Solovay, R. M., A model of set theory in which every set of reals is Lebesgue
measurable, Ann. Math. 92 (1970), 1–56.
[45] Steele, J. M., Stochastic Calculus and Financial Applications, New York: Springer,
Appl. Math. vol. 45, 2000.
[46] Steen, L. A. and J. A. Seebach, Counterexamples in Topology, New York: Dover,
1995.
[47] Stein, E. M., Singular Integrals and Differentiability Properties of Functions, Prince-
ton (NJ): Princeton University Press, Math. Ser. vol. 30, 1970.
[48] Strichartz, R. S., The Way of Analysis (rev. edn), Sudbury (MA): Jones and Bartlett,
2000.
[49] Stromberg, K., The Banach–Tarski paradox, Am. Math. Monthly 86 (1979), 151–
161.
[50] Stroock, D. W., A Concise Introduction to the Theory of Integration (3rd edn),
Boston: Birkhäuser, 1999.
[51] Szegö, G., Orthogonal Polynomials, Providence (RI): Am. Math. Soc., Coll. Publ.
vol. 23, 1939.
[52] Wagon, S., The Banach–Tarski Paradox, Cambridge: Cambridge University Press,
Encycl. Math. Appl. vol. 24, 1985.
[53] Wheeden, R. L. and A. Zygmund, Measure and Integral. An Introduction to Real
Analysis, New York: Marcel Dekker, Pure Appl. Math. vol. 43, 1977.
[54] Willard, S., General Topology, Reading (MA): Addison-Wesley, 1970.
[55] Yosida, K., Functional Analysis (6th edn), Berlin: Springer, Grundlehren Math.
Wiss. Bd. 123, 1980.
[56] Young, W. H., On semi-integrals and oscillating successions of functions, Proc.
London Math. Soc. (2) 9 (1910/11), 286–324.
Notation index
This is intended to aid cross-referencing, so notation that is specific to a single sec-
tion is generally not listed. Some symbols are used locally, without ambiguity, in
senses other than those given below. Numbers following entries are page numbers
with the occasional (Pr m�n) referring to Problem m�n on the respective
page.
Unless otherwise stated, binary operations between functions such as f ± g, f · g,
f ∧ g, f ∨ g, comparisons f � g, f < g or limiting relations fj
j→�−−→ f , limj fj , lim inf j fj ,
lim sup fj , supi fi or inf i fi are always understood pointwise.
Alternatives are indicated by square brackets, i.e., ‘if A [B] … then P [Q] ’ should be
read as ‘if A … then P ’ and ‘if B … then Q ’.
Abbreviations and shorthand notation
a.a. almost all, 80
a.e. almost every(where), 80
ONB orthonormal basis, 239
ONS orthonormal system, 239
UI uniformly integrable, 163,
194
w.r.t. with respect to
negative always in the sense � 0
positive always in the sense � 0
∪-stable stable under finite unions
∩-stable stable under finite
intersections, 32
end of proof, x
[�] indicates that a small
intermediate step is
required, x
(in the margin) caution, x
Special labels, defining properties
��1�� ��2�� ��3� Dynkin system, 31
�M1�� �M2� measure, 22
�S1�� �S2�� �S3� semi-ring, 37
��1�� ��2�� ��3� �-algebra, 15
Mathematical symbols
Sub- and superscripts
+ positive part,
positive elements
⊥ orthogonal complement, 235
b bounded
c compact support
367
368 Notation index
Symbols, binary operations
∀ for all, for every
∃ there exists, there is
# cardinality, 7
−→ converges to
−→ convergence in measure,
163
↑ increases to
↓ decreases to
= defining equality
def= equal by definition
≡ identically equal
∨, [f ∨ g] maximum [of f and g], 64
∧, [f ∧ g] minimum [of f and g], 64
� absolutely continuous, 202
⊥ - measures: singular, 209
- Hilbert space: orthogonal,
231, 235
� convolution, 137
⊕ direct sum, 236
× - Cartesian product of sets;
- Cartesian product of
�-algebras, 121;
- product of measures, 125
⊗ product of �-algebras, 121;
Set operations
∅ empty set
A ∪ B union 5
A∪· B union of disjoint sets, 3, 5
A ∩ B intersection, 5
A \ B set-theoretic difference, 5
Ac complement of A, 5
A � B symmetric difference, 13
(Pr 2.2)
A ⊂ B subset, 5
A � B proper subset, 5
Ā closure of A, 320
A� open interior of A, 320
Aj ↑ A 24
Aj ↓ A 24
A × B Cartesian product
An n-fold Cartesian product
A� infinite sequences with
values in A
#A cardinality of A, 7
#A � #B, #A < #B 7
t · A �=
ta
a ∈ A��, 36 (Pr 5.8)
x + A �=
x + a
a ∈ A��, 28
E ∩ � �=
E ∩ A
A ∈ ���, 16
�a� b�� �a� b�� �a� b�� �a� b� open, closed,
half-open intervals
��a� b��, ((a,b)) rectangles in �n, 18
u � v��
u = v��
u ∈ B� etc. 57
Functions, norms, measures & integrals
f�A� �=
f�x�
x ∈ A��, 6
f −1���
[=
f −1�B�
B ∈ ��], 16
f � g composition:
f � g�x� = f�g�x��
f + �= f ∨ 0� positive part, 61
f − �= −�f ∧ 0�� negative part,
61
1A indicator function of A
1A�x� =
{
1� x ∈ A
0� x �∈ A
sgn sign function
sgn�x� =
⎧
⎨
⎩
1� x > 0
0� x = 0
−1� x < 0
�•�� maximum-norm in �n and
�n×n, 142
�•�p Lp-norm, 105, 108
�•�� L�-norm, 116
�•� •� scalar product, 228
̄ completion of the measure
, 29 (Pr 4.13)
�� ,
∣∣
�
restriction of the measure
to the family of sets �
�X,
∣∣
X
restriction of the measure
to the canonical �-algebra
on X
� T −1 image measure, 51
- limj→� convergence in measure,
163
T� � image measure, 51
u · measure with density, 79–80
∫
u d ,
∫
u�x� �dx�,
∫
u�x� d �x� 69, 76∫
A
u d
[= ∫ 1Au d
]
, 79
Notation index 369
∫
u dx,
∫
u�x� dx 77∫
u dT� �
[= ∫ u � T d ], 134∫ b
a
u�x� dx, �R�
∫ b
a
u�x� dx Riemann
integral, 93, 339
Other notation in alphabetical order
ℵ0 cardinality of �, 7
�∗ completion, 29 (Pr 4.13)
��j �j , ����� filtration, 176
�� �= ���i
i ∈ I��, 177, 203
�−� �=
⋂
�∈−� ���, 193
�� , �� 185
Br �x� open ball with radius r and
centre x, 17, 323
��A� Borel sets in A, 20 (Pr 3.10)
���� Borel sets in �, 226
���n�� �n Borel sets in �n, 17
�∗��n� completion of the Borel sets,
132 (Pr 13.11), 144, 330
���̄�� �̄ Borel sets in �̄, 58
� cardinality of �0� 1�, 11
� complex numbers
C�U� continuous functions
f
U → �
Cc�U� continuous functions
f
U → � with compact
support
C��U� functions f
U → �
differentiable arbitrarily
often
�� � Dynkin system generated by
, 31
�x unit mass at x, Dirac
measure at x, 26
det determinant (of a matrix)
D��x� Jacobian, 147
d�
d
Radon-Nikodým derivative,
203
E , E�• � � conditional expectation,
250, 263
E
[ = E ] conditional
expectation, 260, 263
simple functions, 60
GL�n� �� invertible �n×n-matrices
id identity map or matrix
� � � n� � ��n� half-open rectangles in �n,
18
�rat� �
n
rat …with rational endpoints,
18
� o� � o�n� � o��n� open rectangles in �n, 18
� orat� �
o�n
rat …with rational endpoints,
18
� � or �
�� �n Lebesgue measure in �n, 27
�1��� 79
�p��� 113
1,
1
�̄
76
1� 227
1-lim�∈I 203
Lp 108
p
�, L
p
� 228
p 105
p-limj→� 109
�, L� 116
L 260
lim inf j aj
[= supk inf j�k aj
]
, 313
lim supj aj
[= inf k supj�k aj
]
, 313
lim inf j Aj
[= ⋃k∈�
⋂
j�k Aj
]
, 316
lim supj Aj
[= ⋂k∈�
⋃
j�k Aj
]
, 316
�, ��̄ 59
M 258
� natural numbers: 1� 2� 3� � � �
�0 positive integers: 0� 1� 2� � � �
� -null sets, 29 (Pr 4.10), 80
�n volume of the unit ball in
�n, 156
��X� topology, open sets, 17
�n, ���n� topology, open sets in �n,
17
PC , PF (orthogonal) projection, 235
��X� all subsets of X, 12
� rational numbers
370 Notation index
� real numbers
�̄ extended real line
�−�� +��, 58
�n Euclidean n-space
�nx , �
n
y 147
�n×n real n × n-matrices
��a� b� 339
��a� ��, ��−�� b� 354
��a� b�, ��a� b� 358
��a� b�, ��−�� �� 359
�, � stopping times, 185
�� � �-algebra generated by ,
16
��T�� ��Ti
i ∈ I� �-algebra generated by
the map(s) T , resp., Ti, 51
span
� � �� all finite linear combinations
of the elements in
� � ��, 239
supp f
[ =
f �= 0�] support of f
�, � stopping times, 185
�x shift �x�y� = y − x, 49
�X� �� measurable space, 22
�X� �� � measure space, 22
�X� �� �j � �, �X� �� ��� � filtered
measure space, 176, 203
� integers: 0� ±1� ±2� � � �
Name and subject index
This should be used in conjunction with the Bibliography and the Index of Notation.
Numbers following entries are page numbers which, if accomplished by (Pr n.m), refer to
Problem n.m on that page; a number with a trailing ‘n’ indicates that a footnote is being
referenced. Unless otherwise started ‘integral’, integrability’ etc. always refer to the
(abstract) Lebesgue integral. Within the index we use ‘L-…’ and ‘R-…’ as a shorthand
for ‘(abstract) Lebesgue-…’ and ‘Riemann-…’
ℵ0 aleph null, 7
absolutely continuous, 202
uniformly absolutely continous, 169
Alexits, György, 277n, 302
almost all (a.a.), 80
almost everywhere (a.e.), 80
Analytic set, 333
Andrews, George, 277n
arc-length, 160–161 (Pr 15.6)
Askey, Richard, 277n
atom, 20 (Pr 3.5), 46 (Pr 6.5)
axiom of choice, 331
Banach, Stephan, 43
Banach space, 326
Banach–Tarski paradox, 43
basic convergence result
for improper R-integrals, 355
for R-integrals, 351
basis, 242
unconditional basis, 293–295
Bass, Richard, 311
Bauer, Heinz, 159, 281, 310
Benyamini, Yoav, 210
Bernoulli distribution, 183
Bernstein, Sergeı̆ N., 279
Bernstein polynomials, 280
bijective map, 6
Boas, Ralph, 114
Borel, Emile
Borel measurable, 17
Borel set, 17
Borel �-algebra, 17
alternative definition, 21 (Pr 3.12)
cardinality, 332
completion, 330
generator of, 18, 19
in a subset, 20 (Pr 3.10)
in �̄, 58
Brownian motion, 309–311
� continuum, 11
Calderón, Alberto
Calderón–Zygmund decomposition, 221
Cantor, Georg, 11
Cantor’s diagonal method, 11
Cantor discontinuum, 55 (Pr 7.10),
223–224 (Pr 19.10)
Cantor function, 224 (Pr 19.10)
Cantor (ternary) set, 55 (Pr 7.10),
223–224 (Pr 19.10)
Carathéodory, Constantin, 37
371
372 Name and Subject Index
cardinality, 7
of the Borel �-algebra, 332
of the Lebesgue �-algebra, 330
Carleson, Lennart, 289
Cartesian product
rules for Cartesian Products, 121
Cauchy sequence
in �p, 109
in metric spaces, 325
in normed spaces, 234
Cavalieri’s principle, 120
Cesàro mean, 286
change of variable formula
for Lebesgue integrals, 151
for Riemann integrals, 350
for Stieltjes integrals, 133 (Pr 13.13)
Chebyshev, Pafnuti L., 85 (Pr 10.5)
Chebyshev polynomials (first kind), 277
Ciesielski, Z., 311
closed ball, 323
compactness (weak sequential), 169
in �1, 168
in �p, 168, 274 (Pr 23.8)
and uniform integrability, 169
completeness
of �p, 1 � p < �, 110
of ��, 116
in normed spaces, 234
completion, 29 (Pr 4.13)
and Hölder maps, 146
and inner measure, 86 (Pr 10.12)
and inner/outer regularity, 160 (Pr 15.6)
integration w.r.t. complete measures,
86 (Pr 10.11)
of metric spaces, 325
and outer measure, 46 (Pr 6.2),
86 (Pr 10.12)
and product measures, 132 (Pr 13.11)
and submartingales, 187 (Pr 16.3)
complexification, 232
conditional
conditional Beppo Levi Theorem, 264
conditional dominated convergence
Theorem, 266
conditional Fatou’s Lemma, 265
conditional Jensen inequality, 266
conditional monotone convergence
property, 259
conditional probability, 257 (Pr 22.3)
conditional expectation
in Lp and L�, 260
in L1, 263–264
in L2, 250
properties (in L2), 251
properties (in Lp), 261–262
via Radon-Nikodým Theorem,
223 (Pr 19.3)
conjugate numbers (also conj. indices), 105
conjugate Young functions, 117 (Pr 12.5)
continuity
implies measurability, 50
of measures at ∅, 24
of measures from above, 24
of measures from below, 24
in metric spaces, 324
in topological spaces, 321
continuous function
is measurable, 50
is Riemann integrable, 342
continuous linear functional
in Hilbert space, 238
representation of continuous linear
functionals, 239
convergence
along an upwards filtering set, 203
criteria for a.e. convergence,
173 (Pr 16.1,16.2)
in �p, 109
in �p implies in measure, 164
in measure, 163
criterion, 174 (Pr 16.10)
is metrizable, 174 (Pr 16.8)
no unique limit, 173 (Pr 16.6)
weaker than pointwise, 164
in metric spaces, 323–324
in normed spaces, 234
pointwise implies in measure, 164
pointwise vs. �p, 109
in probability, 163n
of series of random variables,
201 (Pr 16.9)
in topological spaces, 322
uniform convergence, 351
convex function, 114–115, 172n
convex set, 235
convolution
formula for integrals, 138
of a function and a measure, 137
of functions, 137
Name and Subject Index 373
as image measure, 137
of measures, 137
cosine law, 231
countable set, 7
counting measure, 27
Darboux, Gaston, 337
Darboux sum, 93, 338
de la Vallée Poussin, Charles
de la Vallèe Poussin’s condition, 169
de Morgan’s identities, 5, 6
dense subset, 320
of C, 279, 287
in Hilbert space, 238
of �p, 157, 159, 139
of L2, 282
density (function), 80, 202
derivative
of a measure, 152, 219
of a measure singular to �n, 220
of a monotone function, 225 (Pr 19.17)
Radon-Nikodým derivative, 203
of a series of monotone functions,
225 (Pr 19.18)
diagonal method, 11
Diestel, Joseph, 210
Dieudonné, Jean, 205
Dieudonné’s condition, 169
diffeomorphism, 147
diffuse measure, 46 (Pr 6.5)
Dirac measure, 26
direct sum, 236
Dirichlet (Lejeune-D.), Gustav
Dirichlet’s jump function, 88
not Riemann integrable, 342
Dirichlet kernel, 286
disjoint union, 3, 5
distribution
distribution function, 128
of a random variable, 52
Doob, Joseph, 176, 213
Doob decomposition, 275 (Pr 23.11)
Doob’s upcrossing estimate, 191
Dudley, Richard, 336
Dunford, Nelson, 168
Dunford–Pettis condition, 169
dyadic interval, 179
dyadic square, 179
Dynkin system, 31
conditions to be �-algebra, 32
generated by a family, 31
minimal Dynkin system, 31
not �-algebra, 36 (Pr 5.2)
enumeration, 7
equi-integrable, see uniformly integrable
exhausting sequence, 22
factorization lemma, 56 (Pr 7.11), 64
Faltung, see convolution
Fatou, Pierre, 73
Féjer, Lipót, 285, 286
Féjer kernel, 286
filtration, 176, 203
dyadic filtration, 213, 221, 268,
295, 302
finite additivity, 23
Fischer, Ernst, 110
Fourier, Jean Baptiste
Fourier coefficients, 285
Fourier series, a.e.-convergence, 288
Fourier series, Kolmogorov’s example
of a nowhere convergent Fourier
series, 288
Fourier series, Lp-convergence, 288
Fréchet, Maurice, 232 (Pr 20.2)
Fresnel integral, 103 (Pr 11.19)
Friedrichs mollifier, 141 (Pr 14.10)
Frullani integral, 103 (Pr 11.20)
F� set, 142, 159 (Pr 15.1)
Fubini, Guido, 125
function
absolutely continuous function,
223 (Pr 19.9)
concave function, 114–115
convex function, 114–115, 172n
distribution function, see distribution
function
independent function, see independent
functions
indicator function, see indicator function
integrable function, 76
measurable function, see measurable
function(s)
moment generating function,
102 (Pr 11.15)
monotone function, see monotone
function
negative part of, 61
numerical function, 58
374 Name and Subject Index
function (cont.)
positive part of, 61
Riemann integrable funtion, 339
simple function, see simple functions
� (Gamma) function, 99, 161 (Pr 15.8)
Garsia, Adriano, 218, 309
Gaussian distribution, 152, 310
G� set, 142
generator
of the Borel �-algebra, 18, 19
of a Dynkin system, 31
of a �-algebra, 16
Gradshteyn, Izrail S., 277n, 284
Gram-Schmidt orthonormalization,
243–244
Gundy, Richard, 309
Haar, Alfréd
Haar–Fourier series, 290
a.e.-convergence, 291
Lp-convergence, 291
Haar functions, 289, 310
Haar system, 289, 310
complete ONS, 290
Haar wavelet, 295
a.e.-convergence, 296
complete ONS, 296
Lp-convergence, 296
Hausdorff, Felix, 43
Hausdorff space, 320
Hermite polynomials, 278
Hewitt, Edwin, 11, 19n, 43
Hilbert cube, 246 (Pr 21.7)
Hilbert space, 234
isomorphic to �2���, 244–245
separable Hilbert space, 230, 244–245,
245 (Pr 21.5)
Hölder continuity, 143
Hunt, G., 163
Hunt, R., 289
image measure, 51, 134
integral w.r.t. image measure, 134
of measure with density, 140 (Pr 14.1)
independence
and integrability, 85 (Pr 10.10)
of �-algebras, 36 (Pr 5.10)
independent functions, 180–184, 279,
299, 302
existence of independent functions, 183
independent random variables,
188 (Pr 16.8, 16.9), 196,
201 (Pr 18.9), 224 (Pr 19.11),
275 (Pr 23.12), 306
convergence of independent random
variables, 309
indicator function, 13 (Pr 2.5), 59
measurability, 59
rules for indicator functions,
74 (Pr 9.9), 316
inequality
Bessel inequality, 240
Burkholder–Davis–Gundy inequality,
294
Cauchy–Schwarz inequality, 107, 229
Chebyshev inequality, 85 (Pr 10.5)
conditional Jensen inequality, 266
Doob’s maximal Lp inequality, 211,
224 (Pr 19.13)
generalized Hölder inequality,
117 (Pr 12.4)
Hardy–Littlewood maximal inequality,
215
Hölder inequality, 106
for series, 113
for 0 < p < 1, 118 (Pr 12.18)
Jensen inequality, 115
Kolmogorov’s inequality,
224 (Pr 19.11)
Markov inequality, 82
variants thereof, 84–85 (Pr 10.5)
Minkowski’s inequality, 107
for integrals, 130
for series, 113
for 0 < p < 1, 118 (Pr 12.18)
moment inequality, 118 (Pr 12.19)
strong-type inequality, 212
weak-type maximal inequality, 212
Young inequality, 105, 117 (Pr 12.5),
138, 141 (Pr 14.9)
injective map, 6
inner product, 228
inner product space, 228
integrability
comparison test, 85 (Pr 10.9)
of complex functions, 227
integrability criterion
Name and Subject Index 375
for improper R-integrals, 354, 356, 357
for L-integrals, 77
for L-integrals of image measures,
134, 135
for R-integrals, 94, 339, 344
of exponentials, 98, 102 (Pr 11.9)
w.r.t. image measures, 135
of measurable functions, 76
of positive functions, 77
of (fractional) powers, 98, 155
Riemann integrability, 93, 339
integrable function. see also �1 �P etc.
improperly R-, not L-integrable
function, 97
is a.e. �-valued, 83
Riemann integrable function, 93, 339
integral, see also Lebesgue integral,
Riemann integral, Stieltjes integral
and alternating series, 101 (Pr 11.5)
of complex functions, 226–228
examples, 72–73, 79
generalizing series, 113
w.r.t. image measures, 134
and infinite series, 101 (Pr 11.4)
iterated vs. double,
130–131 (Pr 13.3–13.5)
lattice property, 78
of measurable functions, 76
over a null set, 81
of positive functions, 69
examples, 72–73
properties, 71–72
is positive linear functional, 79
properties, 78–79
of rotationally invariant functions, 155
of simple functions, 68
sine integral, 131 (Pr 13.6)
over a subset, 79
integral test for series, 357
integration by parts
for Riemann integrals, 350
for Stieltjes integrals, 133 (Pr 13.13)
integration by substitution, 350 see also
change of variable formula
isometry, 245, 325
Jacobi polynomials, 277
Jacobian, 147
Jordan, Pascual, 232 (Pr 20.2)
Kac, Mark, 176
Kaczmarz, Stefan, 277n
Kahane, Jean-Pierre, 311
kernel, 74 (Pr 9.11)
Dirichlet kernel, 286
Féjer kernel, 286
Kolmogorov, Andrei N., 196, 288
Kolmorgov’s law of large numbers,
196–200
Korovkin, Pavel P., 281
Krantz, Steven, 218, 222
�1 (summable sequences), 79
�2��� being isomorphic to separable
Hilbert spaces, 245
�1, �1
�̄
(integrable functions), 76
�1�, 227
�p, 113
�p, 105
dense subset of �p, 140, 157, 159
�
p
�, L
p
�, 228
Lp, 108
being not separable, 271
L
p
� = Lp�̄, 108
separability criterion, 269, 271, 272
Laguerre polynomials, 278
lattice, 253
law of large numbers, 196–200
��, L�, 116
Lebesgue, Henri, 77, 77n, 349
Lebesgue integrable, 77
Lebesgue measurable set, 330
Lebesgue pre-measure, 45
Lebesgue �-algebra, 330
cardinality, 330
Lebesgue integral, 77
abstract Lebesgue integral, 77n
invariant under reflections, 136
invariant under translations, 136
transformation formula, 151
and differentiation, 152
Lebesgue measure, 27
change of variable formula, 53, 148
and differentiation, 152
characterized by translation
invariance, 34
is diffuse, 46 (Pr 6.5)
dilations, 36 (Pr 5.8)
existence, 28, 45
and Hölder maps, 143
376 Name and Subject Index
Lebesgue measure (cont.)
is inner/outer regular, 158
invariant under motions, 54
invariant under orthogonal maps, 52
invariant under translations, 34
null sets, 29 (Pr 4.11), 47 (Pr 6.8)
under Hölder maps, 146
as product measure, 125
properties of Lebesgue measure, 28,
46 (Pr 6.3)
transformation formula, 148
and differentiation, 152
uniqueness, 28
Legendre polynomials, 278
lemma
Borel-Cantelli lemma, 48 (Pr 6.9), 198
Calderón-Zygmund lemma, 221
lemma
conditional Fatou’s lemma, 265
continuity lemma (L-integral), 91
differentiability lemma (L-integral), 91
Doob’s upcrossing lemma, 191
factorization lemma, 56 (Pr 7.11), 64
Fatou’s lemma, 73
Fatou’s lemma for measures, 74 (Pr 9.9)
generalized Fatou’s lemma, 85 (Pr 10.8)
Pratt’s lemma, 101 (Pr 11.3)
reverse Fatou lemma, 74 (Pr 9.8)
Urysohn’s lemma, 156
Lévy, Paul, 311
lim inf , lim sup (limit inferior/superior)
of a numerical sequence, 63, 313–314
of a sequence of sets, 74 (Pr 9.9), 316
Lindenstrauss, Joram, 210, 294, 295
linear span, 240, 246 (Pr 21.9)
lower integral, 338
map
bijective map, 6
continuity in metric spaces, 324
continuous map, 321
Hölder continuous map, 143
and completion, 146
injective map, 6
measurable map, 49, 54 (Pr 7.5)
surjective map, 6
Marcinkiewicz, Jozef, 176
martingale, 177 see also submartingale
backwards martingale, 193
characterization of martingales, 186
closure of a martingale, 266–267
and conditional expectation, 266
and convex functions, 178, 268
martingale difference sequence,
188 (Pr 17.9), 275 (Pr 23.10), 302
a.e.-convergence, 303, 304, 306
L2-convergence, 306
Lp-convergence, 304
of independent functions, 306
ONS, 303
with directed index set, 203
�1-convergence, 203
example of non-closable martingale,
275 (Pr 23.12)
martingale inequality, 210–213, 294
L1-convergence, 267
�2-bounded martingale, 201 (Pr 18.8)
�p-bounded martingale, 225 (Pr 19.14)
quadratic variation, 294
reverse martingale, see backwards
martingale
martingale transform, 188 (Pr 16.7)
uniformly integrable (UI) martingale,
194, 267
maximal function
Hardy–Littlewood maximal function,
214
of a measure, 217
square maximal function, 214
measurability
of continuous maps, 50
of coordinate functions, 54 (Pr 7.5)
of indicator functions, 59
∗-meaurabilty, 43
measurable function(s), 57
complex valued measurable function,
227
stable under limits, 62
vector lattice, 63
measurable map, 49, 54 (Pr 7.5)
measurable set, 15, 17
measurable space, 22,
measure, 22, see also Lebesgue measure,
Stieltjes measure, 22
complete measure, 29 (Pr 4.13)
continuous at ∅, 24
continuous from above, 24
continuous from below, 24
counting measure, 27
�-measure, 26
Name and Subject Index 377
diffuse measure, 46 (Pr 6.5)
Dirac measure, 26
discrete probability measure, 27
equivalent measures, 223 (Pr 19.5)
examples of measures, 26–28
finite measure, 22
inner regular measure, 158
invariant measure, 36 (Pr 5.9)
locally finite measure, 218, 217n
non-atomic measure, 46 (Pr 6.5)
outer measure, 38n
outer regular measure, 158
pre-measure, 22, 24, 45
probability measure, 22
product measure, 122–123
properties of measures, 23
separable measure, 272
�-additivity, 22
�-finite measure, 22, 30 (Pr 4.15)
�-subadditivity, 26
singular measure, 209
on Sn−1, 153–156
strong additivity, 23
subadditivity, 23
surface measure, 153–156, 161 (Pr 15.6)
uniqueness, 33
measure with density, 80, 202
measure space, 22
complete measure space, 29 (Pr 4.13)
finite measure space, 22
probability space, 22
�-finite measure space, 22, 30 (Pr 4.15)
�-finite filtered measure space, 176,
203
mesh, 338
Métivier, Michel, 210
metric (distance function), 322
metric space, 322
monotone class, 21 (Pr 3.11)
monotone function
discontinuities of monotone functions,
129
is Lebesgue a.e. continuous, 129
is Lebesgue a.e. differentiable,
225 (Pr 19.17)
is Riemann integrable, 342
monotonicity
monotonicity of the integral, 78, 344
monotonicity of measures, 23
neighbourhood, 319
open neighbourhood, 319
von Neumann, John, 232 (Pr 20.2)
Neveu, Jacques, 268
non-measurable set, 48 (Pr 6.10),
48 (Pr 6.11)
for the Borel �-algebra, 332
for the Lebesgue �-algebra, 331
norm, 105, 325
normed space, 325
and inner products, 229
Lp, 108
�p, 105
��, L�, 116
quotient norm, 326
null set, 29 (Pr 4.10), 47 (Pr 6.8), 80
subsets of a null set, 29 (Pr 4.13)
under Hölder map, 146
Olevskiı̆, A.M., 294
open ball, 323
optional sampling, 187
orthogonal
orthogonal complement, 235
orthogonal elements of a Hilbert space,
234
orthogonal projection, 235,
246–247 (Pr 21.1)
as conditional expectation, 253
orthogonal vectors, 231
orthogonal polynomials, 277–279
Chebyshev polynomials, 277
complete ONS, 282
dense in L2, 282
Hermite polynomials, 278
Jacobi polynomials, 277
Laguerre polynomials, 278
Legendre polynomials, 278
orthonormal basis (ONB), 239, 242
characterization of, 242
orthonormal system (ONS), 240
complete orthogonal system, 242
maximal orthogonal system, 242
total orthogonal system, 242
orthonormalization procedure, 243–244
Oxtoby, John, 43
Paley, Raymond, 176, 302, 311
parallelogram identity, 231
parameter-dependent
378 Name and Subject Index
parameter-dependent (cont.)
improper R-integrals, 103 (Pr 11.20),
355–356
L-integrals, 92, 99
R-integrals, 352–353
Parseval’s identity, 240
partial order, 9
partition, 338
Pettis, B.J., 169
Pinsky, Mark, 299
polar coordinates
3-dimensional, (Pr 15.9) 162
n-dimensional, 153
planar, 152
polarization identity, 231
generalized polarization identity,
233 (Pr 20.5)
Polish space, 336, 336n
power set, 12
Pratt, John, 101 (Pr 11.3)
pre-measure, 22, 24, 45
extension of a pre-measure, 37
�-subadditivity, 26
primitive, 346, 348
bounded functions with primitive are
L-integrable, 349
of a continuous function,
225 (Pr 19.16)
differentiability of a primitive,
225 (Pr 19.16)
probability space, 22
product
of measurable spaces, 121
product measure space, 125
product measures, 122–123
product �-algebra, 121, 127
projection, 235, 246–247 (Pr 21.11)
orthogonal projection, 238
Pythagoras’ theorem, 233 (Pr 20.6),
238, 240
quadratic variation (of a martingale),
294
quotient norm, 326
quotient space, 326
Rademacher, Hans
Rademacher functions, 299
are an incomplete ONS, 299
completion of, 301–302
Rademacher series,
a.e.-convergence, 300
Radon–Nikodým derivative, 203
random variable, 49 see also independent
random variables
distribution of a random variable, 52
�̄ (extended real line), 58
arithmetic of �̄, 58
rearrangement
decreasing rearrangement,
133 (Pr 13.14)
rearrangement invariant, 133 (Pr 13.14)
rectangle, 18
Riemann, Bernhard, 337
Riemann integrability, 339
criteria for Riemann integrability, 94,
339, 344
Riemann sum, 342
Riemann integral, 93, 339, 339
vs. antiderivative, 348
coincides with Lebesgue integral, 94
and completed Borel �-algebra, 97
function of upper limit, 346
improper Riemann integral, 96, 129,
353–359
improper Riemann integral and infinite
series, 357
properties of Riemann integral, 344
Riesz, Frigyes, 110, 111, 238, 239, 326
Riesz, Marcel, 288
ring of sets, 24n, 40n
Rogers, Chris (L.C.G.), 294
Roy, Ranjan, 277n
Rudin, Walter, 245, 246 (Pr 21.10), 288,
318, 348, 356
Ryzhik, Iosif, 277n, 284
scalar product, see inner product
Schipp, Ferenc, 302
Schwartz, Jacob, 168
Seebach, J. Arthur, 318
semi-norm, 325
in �p, 108
semi-ring (of sets), 37
� n is semi-ring, 44
separable
separable Hilbert space, 244, 230,
244–245, 245 (Pr 21.5)
separable Lp-space, 272
Name and Subject Index 379
separable measure, 272
separable �-algebra, 272
separable space, 320
sesqui-linear form, 229
set
analytic set, 333
Borel set, 17
cardinality of, 7
closed set, 319
closed in metric spaces, 323
closed in �n, 17
closure, 320
compact set, 320, 324–326, see also
compactness
connected set, 321
convex set, 235
countable set, 7
dense set, 320, see also dense subset
F� set, 159 (Pr 15.1), 142
G� set, 142
(open) interior, 320
measurable set, 15
∗-measurable set, 43
non-measurable set, see non-measurable
set
nowhere dense set, 55 (Pr 7.10)
open set, 319
open in metric spaces, 323
open in �n, 17, 319
pathwise connected set, 321
relatively compact set, 320
relatively open set, 319
Souslin set, 333
uncountable set, 7
upwards filtering index set, 203
�-additivity, 3, 22
�-algebra, 15
Borel set, 17
examples, 15–16
generated by a family of maps, 51
generated by a family of sets, 16
generated by a map, 51
generator of, 16
inverse image, 16, 49
minimal �-algebra, 16, 51
product �-algebra, 121, 127
properties, 15, 20 (Pr 3.1)
separable �-algebra, 272
topological �-algebra, 17
trace �-algebra, 16, 20 (Pr 3.10)
�-finite filtered measure space, 176
�-finite measure, 22
�-finite measure space, 22
�-subadditivity, 26
Simon, Peter, 302
simple functions, 60
dense in �p, 1 � p < �, 112
dense in , 61
integral of simple finctions, 68
not dense in ��, 116
standard representation, 60
uniformly dense in b, 65 (Pr 8.7)
singleton, 46 (Pr 6.5)
Souslin, Michel, 333, 336
Souslin operation, 336
Souslin scheme, 332
Souslin set, 333
span, 240, 246 (Pr 21.9)
spherical coordinates, 153
Srivastava, Sashi, 336
standard representation, 60
Steele, Michael, 311
Steen, Lynn, 318
Stein, Elias, 221
Steinhaus, Hugo, 176, 277n
step function, 342, see also simple
functions
is Riemann integrable, 342
Stieltjes, Thomas
Stieltjes function, 55 (Pr 7.9)
Stieltjes integral, 132 (Pr 13.13)
change of variable, 133 (Pr 13.13)
integration by parts, 133 (Pr 13.13)
Stieltjes measure, 54 (Pr 7.9),
132 (Pr 13.13)
Lebesgue decomposition of Stieltjes
measure, 223 (Pr 19.9)
stopping time, 184
characterization of, 189 (Pr 17.9)
Strichartz, Robert, 344
Stromberg, Karl, 11, 19n, 43
strong additivity, 23
Stroock, Daniel, 161 (Pr 15.7)
subadditivity, 23
submartingale, 177
backwards submartingale, 193
convergence theorem, 193
�1-convergence, 195
change of filtration, 187 (Pr 16.2)
characterization of submartingales, 186
380 Name and Subject Index
submartingale (cont.)
w.r.t. completed filtration, 187 (Pr 16.3)
and conditional expectation, 266
and convex functions, 178, 268
Doob decomposition, 275 (Pr 23.11)
Doob’s maximal inequality, 211,
224 (Pr 19.13)
examples, 178–181, 200 (Pr 18.6)
inequalities for, 210–213
�1-convergence, 194
pointwise convergence, 191
reversed submartingale, see backwards
martingale
uniformly integrable (UI)martingale,
193, 194
upcrossing estimate, 191
supermartingale, 177 see also
submartingale
surface measure, 153–156, 161 (Pr 15.6)
surjective map, 6
symmetric difference, 13 (Pr 2.2)
Sz.-Nagy, Béla, 349
Szegö, Gabor, 277n
Tarski, Alfréd, 43
theorem, see also lemma or inequality
backwards convergence theorem, 193
Beppo Levi theorem, 70
Bonnet’s mean value theorem, 350
bounded convergence theorem,
174 (Pr 16.7)
Cantor–Bernstein theorem, 9
Carathéodory’s existence theorem, 37
completion of metric spaces, 325
conditional Beppo Levi theorem, 264
conditional dominated convergence
theorem, 266
continuity theorem (improper
R-integral), 355
continuity theorem (R-integral), 352
continuity lemma (L-integral), 91
convergence of UI submartingales, 194
differentiability lemma (L-integral), 91
differentiation theorem (improper
R-integral), 356
differentiation theorem (R-integral), 352
dominated convergence theorem, 89
�p-version, 111 100 (Pr 11.1)
Doob’s theorem, 222 (Pr 19.2)
existence of product measures, 123
extension of measures, 37
Fréchet-v. Neumann-Jordan theorem,
232 (Pr 20.2)
Fubini’s theorem, 126
Fubini’s theorem on series,
225 (Pr 19.18)
fundamental theorem of calculus, 349
general transformation theorem, 151
Hardy–Littlewood maximal
inequalities, 215
Heine–Borel theorem, 325, 326
integral test for series, 357
integration by parts, 350
integration by substitution, 350
Jacobi’s transformation theorem, 148
Korovkin’s theorem, 280
Lebesgue’s convergence theorem, 89
�p-version, 100 (Pr 11.1), 111
Lebesgue decomposition, 209
Lebesgue’s differentiation theorem, 218
mean value theorem for integrals, 345
monotone convergence theorem, 88
optional sampling theorem, 187
projection theorem, 235
Pythagoras’ theorem, 233 (Pr 20.6),
238, 240
Radon-Nikodým theorem, 202
Riesz representation theorem, 239
Riesz’ convergence theorem, 112
Riesz–Fischer theorem, 110
M. Riesz’ theorem, 288
second mean value theorem for
integrals, 350n
submartingale convergence
theorem, 191
Tonelli’s theorem, 125
uniqueness of measures, 33
alternative statement, 36 (Pr 5.6)
uniqueness of product measures, 122
Vitali’s convergence theorem, 165
non-�-finite case, 167
Weierstrass approximation theorem,
279, 287
tightness (of measures), 169
Tonelli, Leonida, 125
topological �-algebra, 17
topological space, 17, 319
topology, 17, 319
examples, 319
trace �-algebra, 16, 20 (Pr 3.10)
Name and Subject Index 381
transformation formula
for Lebesgue integrals, 151
for Lebesgue measure, 53, 148
and differentiation, 152
trigonometric polynomial, 283
trigonometric polynomials are dense in
C
−�� �
, 287
trigonometric system, 283
complete in L2, 283, 287
Tzafriri, Lior, 294, 295
Uhl, John, 210
unconditional basis, 293–295
uncountable, 7
uniform boundedness principle,
246 (Pr 21.10)
uniformly integrable, 163, 175 (Pr 16.11)
vs. compactness, 169
equivalent conditions, 168
uniformly �-additive, 169
unit mass, 26
upcrossing, 190
upcrossing estimate, 191
upper integral, 338
upwards filtering, upwards directed, 203
vector space, 226
Volterra, Vito, 349
volume of unit ball, 155–156
Wade, William, 302
Wagon, Stan, 43
Walsh system, 302
wavelet, see Haar wavelet
Weierstrass, Karl, 279
Wheeden, Richard, 288
Wiener, Norbert, 176, 311
Wiener process, 309, 311
Willard, Stephen, 318
Williams, David, 294
Yosida, Kôsaku, 232
Young, William, 101 (Pr 11.3),
117 (Pr 12.5), 138
Young function, 117 (Pr 12.5)
Young’s inequality, 105, 117 (Pr 12.5),
138, 141 (Pr 14.9)
Zygmund, Antoni, 176, 221, 288
Cover
Half-title
Title
Copyright
Contents
Prelude
1 Prologue
Problems
2 The pleasures of counting
Problems
3 Sigma-algebras
Problems
4 Measures
Problems
5 Uniqueness of measures
Problems
6 Existence of measures
Problems
7 Measurable mappings
Problems
8 Measurable functions
Problems
9 Integration of positive functions
Problems
10 Integrals of measurable functions and null sets
Problems
11 Convergence theorems and their applications
Parameter-dependent integrals
Riemann vs. Lebesgue integration
Examples
Problems
12 The function spaces…
Problems
13 Product measures and Fubini’s theorem
More on measurable functions
Distribution functions
Minkowski’s inequality for integrals
Problems
14 Integrals with respect to image measures
Problems
15 Integrals of images and Jacobi’s transformation rule
Jacobi’s transformation formula
Spherical coordinates and the volume of the unit ball
Continuous functions are dense in…
Regular measures
Problems
16 Uniform integrability and Vitali’s convergence theorem
Problems
17 Martingales
Problems
18 Martingale convergence theorems
Problems
19 The Radon–Nikodým theorem and other applications of martingales
The Radon–Nikodým theorem
Martingale inequalities
The Hardy–Littlewood maximal theorem
Lebesgue’s differentiation theorem
The Calderón–Zygmund lemma
Problems
20 Inner product spaces
Problems
21 Hilbert space…
Gram–Schmidt orthonormalization procedure
Problems
22 Conditional expectations in L
On the structure of subspaces of L
Problems
23 Conditional expectations in L
Classical conditional expectations
Separability criteria for the spaces Lp X
Problems
24 Orthonormal systems and their convergence behaviour
Orthogonal polynomials
The trigonometric system and Fourier series
The Haar system
The Haar wavelet
The Rademacher functions
Well-behaved orthonormal systems
Problems
Appendix A lim inf and lim sup
Appendix B Some facts from point-set topology
Topological spaces
Metric spaces
Normed spaces
Appendix C The volume of a parallelepiped
Appendix D Non-measurable sets
Appendix E A summary of the Riemann integral
The (proper) Riemann integral
Integrals and limits
Improper Riemann integrals
A. Improper Riemann integrals of the type…
B. Improper Riemann integrals with unbounded integrands
C. Improper Riemann integrals where both limits are critical
Further reading
Functional analysis
Fourier series, harmonic analysis, orthonormal systems, wavelets
Geometric measure theory, Hausdorff measure, fine properties of functions
Topological measure theory, functional analytic aspects of integration and measure
Borel and analytic sets
Probability theory (in particular probabilistic measure theory)
Martingales and their applications
References
Notation index
Name and subject index