Advanced Macroeconomics
Problem Set 1 – Consumption
Due 11:59 PM Sunday 24th February 2013
Problem 1
Solve for ct, ct+1 and st the following two-period model:
max U (ct,ct+1) = −
1
α
e−αct −
β
α
e−αct+1
subject to:
ct +
(
1
1 + r
)
ct+1 = Yt +
(
1
1 + r
)
Yt+1
Problem 2
Solve the following problem:
max
Ct,Ct+1
U =
(
Ct − aC2t
)
+ β
(
Ct+1 − aC2t+1
)
subject to:
i) Ct + bt+1 + Kt+1 = Yt
ii) Ct+1 = Yt+1 + (1 + r) bt+1 + (1 − δ) Kt+1
iii) Yt+1 = A ln (1 + Kt+1)
where A > r + δ.
a) Find the optimal values of Kt+1, Yt+1, Ct, Ct+1, and bt+1;
b) Given the optimal value of Kt+1, find ∂Kt+1/∂r. Does this make sense?
Why?
c) For Yt = 1, β (1 + r) = 1, and A = 2 (r + δ), find ∂bt+1/∂r. Does the
income or substitution effect dominate?
Problem 3
Solve the following optimization problem for a three-period lived individual:
max
Ct,Ct+1,Ct+2
U = ln Ct + β ln Ct+1 + β2 ln Ct+2
subject to:
i) Ct + bt+1 = Yt
ii) Ct+1 + bt+2 = Yt+1 + (1 + r) bt+1
iii) Ct+2 = Yt+2 + (1 + r) bt+2
a) Find the optimal values of C∗t , C∗t+1 and C∗t+2;
b) Find the value function V = U(C∗t ,C∗t+1,C∗t+2), i.e. evaluate the total utility
function at the optimal values. What form does V have compared to U?
Advanced
Macroeconomics
Lecture 1. Quick Review of Undergraduate Macroeconomics:
Simple two-period models of consumption
Andrzej Cieślik
Spring 2013
2
Main assumptions:
– 2 periods of time: t, t+1
– Utility function additively separable:
UUU t
t
tt
C
C
CC
)(
)
(
,
1
1
+
+
+=
β
– constant discount factor:
θ
β
+
=
1
1 where: 10 ≤≤ β
– constant discount rate: where 0≥θ
3
UUU tt
tt
CC
CC
Max
)()(
,
1
1
+
+
+=
β
s.t : o) no storage (benchmark)
i) physical storage, no financial market, no production
ii) financial market
iii) production
iv) financial market and production
4
Problem: UUU tt
tt
CC
CC
Max
)()(
,
1
1
+
+
+= β
CAS
E
0. (Benchmark): No Physical Storage
s.t. (1) YC tt ≤
(2) YC tt 11 ++ ≤
CASE 0. (Benchmark): No Physical Storage
Budget Constraint = Endowment Point
(1) YC tt ≤
(2) YC tt 11 ++ ≤
E
1+
tY
tY
•
1+
tC
tC
6
Equilibrium with binding constraints
•
11* ++ = tt YC
tt YC =*
1+tC
tC
U
C = E
7
Problem: UUU tt
tt
CC
CC
Max
)()(
,
1
1
+
+
+= β
s.t. (1) YSC ttt =+
(2) SYC ttt += ++ 11
(3) 0≥S t
Combine (1) &(2) into the intertemporal budget constraint:
11 ++ +=+⇒ tttt YYCC
11 ++ +=+⇒ tttt dYdYdCdC ( )0,0 1 == +tt dYdY
11 −=⇒ +
t
t
dC
dC (slope)
CASE 1. Physical Storage
8
1+tC
tCYt
Yt+1 •
CASE 1. Physical Storage
Kinked Budget Constraint
E
9
1+tC
tCYt
Yt+1 •
CASE 1. Physical Storage
Equilibrium with not binding saving constraint
C*t+1 •
C*t
C
E
10
CASE 1. Physical Storage
Equilibrium with not binding saving constraint
Equate the slope of the indifference curve to the slope of the budget constraint:
1
)(
)(1
1
−=
′
′
−
=⇒
+
+
t
t
C
C
t
t
U
U
dC
dC
β
, CU c ln( )( =
01)()(
,
11
=′+′=
++
+
tCtC
CC
dCUdCUUd
tt
tt
β
1
1
1
1
=⇒
+t
t
C
C
β
)
)()( 1+
′=′⇒
tt CC
UU β
tt
tt
CC
CfC
β=
=⇒
+
+
*
1
1
* )(
( )
( )1* 1
1
*
1
1
1
++
+
+⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
+
=
+⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
+
=⇒
ttt
ttt
YYC
YYC
β
β
β
11 ++ +=+ tttt YYCC
0
1
1
1 1
** >⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
+
−⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
+
=−= +ttttt YYCYS ββ
β
11
1+tC
tC
•
CASE 1. Physical storage
Equilibrium with binding saving constraint
11* ++ = tt YC
tt YC =*
C = E
12
CASE 2. Financial Market
Problem: UUU tt
tt
CC
CC
Max
)()(
,
1
1
+
+
+= β
s.t. (1) YSC ttt =+
(2) SYC ttt r)1(11 ++= ++
Combine (1) &(2) into the intertemporal budget constraint:
( )
11
11
1
1
1
1
1
1
++
++
+
+=
+
+⇒
−
+
=⇒
tttt
ttt
Y
r
Y
C
r
C
YC
r
S
)(
)(1
1
)1(
+
′
′
−=+−=⇒ +
t
t
C
C
t
t
U
U
r
dC
d
C
β
PDV of consumption = PDV of income
13
CASE 2. Financial Market
Numerical Example
1
Two Period Utility
Function
(additivelly separable)
ln lnt tU c cβ += +1442443
Logarithmic
Utility
Function
( ) lnU c c=
14243
1 1
Inter-temporal (between periods)
Budget Constraint
1 1
1 1t t
t t
C C Y Y
r r
+ +
+
= +
+ +1444442444443
CASE 2. Financial Market
Numerical Example
1
Slope of the Indifference 1
Curve
1
( )
1
1( )
t t
t
t
U c c
r
U c
c
β β+
+
′
= = +
′
14243
{1
Consumption
Policy
Function
( ) (1 )t t tC f c r Cβ+ = = +
15
CASE 2. Financial Market
( )
( )
1 1
1
1
1 1
1 1
1 1
1
1 1
1
1
1
t
t t t
t t t t
t t t
C C Y Y
r r
C r C Y Y
r r
C Y Y
r
β
β
+ +
+
+
⎛ ⎞ ⎛ ⎞
+ = +⎜ ⎟ ⎜ ⎟+ +⎝ ⎠ ⎝ ⎠
⎛ ⎞ ⎛ ⎞
+ + = +⎜ ⎟ ⎜ ⎟+ +⎝ ⎠ ⎝ ⎠
⎛ ⎞
+ = + ⎜ ⎟+⎝ ⎠
{
1
Optimal
Amount of
Consumption
in period (t)
1
1
1 1t t t
C Y Y
rβ
∗
+
⎛ ⎞⎛ ⎞
= +⎜ ⎟⎜ ⎟+ +⎝ ⎠⎝ ⎠
We have to solve a simple two-period consumption = income equality
( )*
1 1
1
1 1t t t
r
C Y Y
β β
β β+ +
+⎧ ⎫ ⎧ ⎫
= +⎨ ⎬ ⎨ ⎬
+ +⎩ ⎭⎩ ⎭
( )* 1 1
1 1
1
1 1t t t
C r Y Y
r
β
β+ +
⎧ ⎫⎛ ⎞⎛ ⎞⎛ ⎞
= + +⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟+ +⎝ ⎠⎝ ⎠⎝ ⎠⎩ ⎭
CASE 2. Financial Market
Numerical Example
( )* *1 1t tC r Cβ+ = +
17
CASE 2. Financial Market
Optimal Savings
( )
( ) 1
1 1
1 1 1t t t t t t
S Y C Y Y Y
rβ β
∗ ∗
+
⎛ ⎞
= − = − −⎜ ⎟⎜ ⎟+ + +⎝ ⎠
( )( )
1
1 1 1
1 1 1 1t t t
S Y Y
r
β
β β β
∗
+
⎛ ⎞+
= − −⎜ ⎟+ + + +⎝ ⎠
( )
{ ( )( ) 1
Optimal
Amount of
Savings
in period
1
1 1 1t t t
t
S Y Y
r
β
β β
∗
+= −+ + +
18
1+tC
tCYt
Yt+1 •
CASE 2. Financial Market
Equilibrium with positive savings (lending)
C*t+1 •
C*t
C
E
19
1+tC
tCYt
Yt+1 •
CASE 2. Financial Market
Equilibrium with negative savings (borrowing)
C*t+1
•
C*t
C
E
20
Problem: UUU tt
tt
CC
CC
Max
)()(
,
1
1
+
+
+= β
s.t : (1) tttt KYKC )1(1 δ−+=+ +
(2) 1121 )1( ++++ −+=+ tttt KYKC δ
(3) )(1 1+=+ tKt FY
CASE 3. Production
Solution: (assume 02 =+tK )
Substitute (3) into (2)
1)(1 )1(1 ++ −+=⇒ + tKt KFC t
δ
(1) tttt KCYK )1()(1 δ−+−=⇒ +
21
CASE 3. Production
Intertemporal budget constraint
( ) ( )[ ] ( )[ ]ttttttt KCYKCYFC )1(111 δδδ −+−−+−+−=⇒ +
( ) ( )[ ]
=−−+−
∂
∂
∂
−+−∂
==
′
′
− +
+
+
+
)1)(1()1(
1 1
1
1
)(
(
1
) δ
δ
β
t
t
t
ttt
t
t
C
C
C
K
K
KCYF
dC
dC
U
U
t
t
[ ])1( δ−+−= kF
Equate slopes of indifference curve and intertemporal budget constraint
22
1+tC
tCYt
Yt+1
•
CASE 3. Production
Equilibrium with positive investment
C*t+1
•
C*t
C
E
23
CASE 3. Production
Numerical Example:
( )
1
21 1 1( )t t tY F K K
+ + += =
( )
1
21
1
2K t
F K
−
+=
1 δ =
( )
1
2
1 1 1t t tC Y K+ + += =
( )( )
1
2
1 1t t t tC Y K Cδ+ = + − −
From the budget constraint we know that
24
CASE 3. Production
Numerical Example:
( )
1 1
22
t
t
t t
C
C
Y C
β∗
+ =
−
From the utility maximization we know that
2
2t t
C Y
β
∗ =
+
Equation we have our solutions:
1
2
2 2t t t t
K Y Y Y
β
β β
∗
+ = − =+ +
1
2
1 1t t
C Y
β
β
∗
+
⎛ ⎞
= ⎜ ⎟+⎝ ⎠
25
Problem: UUU tt
tt
CC
CC
Max
)()(
,
1
1
+
+
+= β
s.t : (1) tttttt BrKYKBC )1()1(11 ++−+=++ ++ δ , [ 0,0 == tt BK ]
(2) 1111 )1()1( ++++ ++−+= tttt BrKYC δ
(3) )(1 1+=+ tKt FY , 01≥+tK
CASE 4. Financial Market and Production
Note that S splits into B & K
26
CASE 4. Financial Market and Production
Solution:
(1)
11 ++ −−=⇒ tttt KCYB
[ ]11)(1 )1()1(1
1
1
1
1 +++
+−−+
+
+=
+
+
+ ttKttt
KrKF
r
YC
r
C
t
δ
0)1()1()(
1
1
=+−−+′=
+
+
rF
dK
dPDV
tK
t
δ
)1()1()( 1 rF tK +=−+′ + δ
27
CASE 4. Financial Market and Production
Numerical Example:
1ln lnt tMaxU C Cβ += +
1 1t t t tC B K Y+ ++ + =
1 1 1 1(1 ) (1 )t t t tC Y K r Bδ+ + + += + − + +
1 1 1( )t t tY F K AK
α
+ + += =
CASE 4. Financial Market and Production
Numerical Example:
1 1 1 1
1 1
(1 ) (1 )
1 1t t t t t t
C C Y AK K r K
r r
α δ+ + + −⎡ ⎤+ = + + − − +⎣ ⎦+ +
Maximize the value of recourses 1 1 1 (1 ) (1 )t t tAK K r K
α δ+ + ++ − − +
{
1
1
1
Re
(1 ) (1 ) (1 ) (1 )K t
t GrossRateOf turnInFinancialMarkGrossRateOnCapital
dPDV
F r AK r
dK
αδ α δ−+
+
= + − − + = + − = +
1442443
CASE 4. Financial Market and Production
Numerical Example:
}
Net Rate of Retrun
in Financial
Net Rate of Return on capital Markets
1
1
Re
t
NetRateOf turn
AK rαα δ−+ − =
64748
144444424444443
{
{ {
1
1 1 1
1
Optimal
Opportunity Cost Improvements Capital
of Holding Capital in Technology
Stock
decreases Capital Stock increases Capital
Stock
K 0 0t tt
dK dKA
r dr dA
αα
δ
−∗ + +
+
⎛ ⎞=⎜ ⎟+⎝ ⎠
p f
CASE 4. Financial Market and Production
Numerical Example:
1 1 1
1 1 1
1
Maximum Resources that we can have in the next period
Intertemporal Budget Co
1 1
(1 ) (1 )
1 1t t t
A A A
C C Y A r
r r r r r
α
α α αα α α
δ
δ δ δ
− − −
+
⎡ ⎤
⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥+ = + + − − +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥+ + + + +⎝ ⎠ ⎝ ⎠ ⎝ ⎠
⎣ ⎦14444444444244444444443
nstraint
14444444444444444244444444444444443
{
1 1 1
1 1 1
1
Maximum Resources Available in
Period (t+1)
(1 ) (1 ) t
A A A
A r X
r r r
α
α α αα α α
δ
δ δ δ
− − −
+
⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ − − + =⎜ ⎟ ⎜ ⎟ ⎜ ⎟+ + +⎝ ⎠ ⎝ ⎠ ⎝ ⎠
CASE 4. Financial Market and Production
Numerical Example:
{
Optimality Condition (Utility Maximization)
Slope of the
Budget Constraint
1
Slope of the
Indifference Curve
1
1
1
t
t
C
r
C
β
+
= +
64444744448
123
( )1
Consumption Policy
function
1t tC r Cβ
∗
+ = +144424443
CASE 4. Financial Market and Production
Numerical Example:
( ) 1
1 1
1 1 (1 )t t t
C Y X
rβ β
∗
+= ++ + +
1 1 t t t tB Y C K
∗ ∗ ∗
+ += − −
( ) 1
1 1
1 1 (1 )t t t
C Y X
rβ β
∗
+= ++ + +
CASE 4. Financial Market and Production
Numerical Example:
1 1 0 0.25 0.5A rβ δ α= = = = =
1 1 0.5 0.5(2 1) 2 0.5( -5) 0 5t t t tB Y Y B if Y
∗ ∗
+ += − − − = f f
- Advanced Macroeconomics
- Main assumptions:�
- CASE 0. (Benchmark): No Physical Storage�Budget Constraint = Endowment Point
Consumer utility maximization problem:
Dynamic Economics:
Quantitative Methods and Applications
Jérôme Adda and Russell Cooper
October 25, 2002
A Lance Armstrong, notre maitre à tous
Contents
1 OVERVIEW 1
I Theory 6
2 Theory of Dynamic Programming 7
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Indirect Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Consumers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Firms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Dynamic Optimization: A Cake Eating Example . . . . . . . . . . . . 10
2.3.1 Direct Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Dynamic Programming Approach . . . . . . . . . . . . . . . . 13
2.4 Some Extensions of the Cake Eating Problem . . . . . . . . . . . . . 18
2.4.1 Infinite Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 Taste Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 Discrete Choice . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 General Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.1 Non-Stochastic Case . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.2 Stochastic Dynamic Programming . . . . . . . . . . . . . . . . 35
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
iii
iv
3 Numerical Analysis 39
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Stochastic Cake Eating Problem . . . . . . . . . . . . . . . . . . . . . 40
3.2.1 Value Function Iterations . . . . . . . . . . . . . . . . . . . . 41
3.2.2 Policy Function Iterations . . . . . . . . . . . . . . . . . . . . 45
3.2.3 Projection Methods . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Stochastic Discrete Cake Eating Problem . . . . . . . . . . . . . . . . 51
3.3.1 Value Function Iterations . . . . . . . . . . . . . . . . . . . . 52
3.4 Extensions and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4.1 Larger State Spaces . . . . . . . . . . . . . . . . . . . . . . . . 54
3.A Additional Numerical Tools . . . . . . . . . . . . . . . . . . . . . . . 58
3.A.1 Interpolation Methods . . . . . . . . . . . . . . . . . . . . . . 58
3.A.2 Numerical Integration . . . . . . . . . . . . . . . . . . . . . . 60
3.A.3 How to Simulate the Model . . . . . . . . . . . . . . . . . . . 64
4 Econometrics 66
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 Some Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.1 Coin Flipping . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.2 Supply and Demand Revisited . . . . . . . . . . . . . . . . . . 79
4.3 Estimation Methods and Asymptotic Properties . . . . . . . . . . . . 85
4.3.1 Generalized Method of Moments . . . . . . . . . . . . . . . . 86
4.3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . 90
4.3.3 Simulation Based Methods . . . . . . . . . . . . . . . . . . . . 92
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
v
II Applications 108
5 Stochastic Growth 109
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 Non-Stochastic Growth Model . . . . . . . . . . . . . . . . . . . . . . 109
5.2.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.2.2 Numerical Analysis . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3 Stochastic Growth Model . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.3.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.3.2 Bellman’s Equation . . . . . . . . . . . . . . . . . . . . . . . . 120
5.3.3 Solution Methods . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.3.4 Decentralization . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.4 A Stochastic Growth Model with Endogenous Labor Supply . . . . . 130
5.4.1 Planner’s Dynamic Programming Problem . . . . . . . . . . . 130
5.4.2 Numerical Analysis . . . . . . . . . . . . . . . . . . . . . . . . 133
5.5 Confronting the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.5.1 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.5.2 GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.5.3 Indirect Inference . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.5.4 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . 141
5.6 Some Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.6.1 Technological Complementarities . . . . . . . . . . . . . . . . 142
5.6.2 Multiple Sectors . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.6.3 Taste Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.6.4 Taxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
vi
6 Consumption 149
6.1 Overview and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.2 Two-Period Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.2.1 Basic Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.2.2 Stochastic Income . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.2.3 Portfolio Choice . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.2.4 Borrowing Restrictions . . . . . . . . . . . . . . . . . . . . . . 158
6.3 Infinite Horizon Formulation: Theory and Empirical Evidence . . . . 159
6.3.1 Bellman’s equation for the Infinite Horizon Probem . . . . . . 159
6.3.2 Stochastic Income . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.3.3 Stochastic Returns: Portfolio choice . . . . . . . . . . . . . . . 163
6.3.4 Endogenous Labor Supply . . . . . . . . . . . . . . . . . . . . 167
6.3.5 Borrowing Constraints . . . . . . . . . . . . . . . . . . . . . . 169
6.3.6 Consumption Over the Life Cycle . . . . . . . . . . . . . . . . 173
6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7 Durable Consumption 178
7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.2 Permanent Income Hypothesis Model of Durable Expenditures . . . . 179
7.2.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.2.2 Estimation of a Quadratic Utility Specification . . . . . . . . . 182
7.2.3 Quadratic Adjustment Costs . . . . . . . . . . . . . . . . . . . 183
7.3 Non Convex Adjustment Costs . . . . . . . . . . . . . . . . . . . . . 184
7.3.1 General Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.3.2 Irreversibility and Durable Purchases . . . . . . . . . . . . . . 187
7.3.3 A Dynamic Discrete Choice Model . . . . . . . . . . . . . . . 189
vii
8 Investment 199
8.1 Overview/Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.2 General Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.3 No Adjustment Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.4 Convex Adjustment Costs . . . . . . . . . . . . . . . . . . . . . . . . 203
8.4.1 Q Theory: Models . . . . . . . . . . . . . . . . . . . . . . . . 205
8.4.2 Q Theory: Evidence . . . . . . . . . . . . . . . . . . . . . . . 207
8.4.3 Euler Equation Estimation . . . . . . . . . . . . . . . . . . . . 212
8.4.4 Borrowing Restrictions . . . . . . . . . . . . . . . . . . . . . . 214
8.5 Non-Convex Adjustment: Theory . . . . . . . . . . . . . . . . . . . . 215
8.5.1 Non-convex Adjustment Costs . . . . . . . . . . . . . . . . . . 216
8.5.2 Irreversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.6 Estimation of a Rich Model of Adjustment Costs . . . . . . . . . . . 224
8.6.1 General Model . . . . . . . . . . . . . . . . . . . . . . . . . . 224
8.6.2 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . 227
8.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
9 Dynamics of Employment Adjustment 229
9.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.2 General Model of Dynamic Labor Demand . . . . . . . . . . . . . . . 230
9.3 Quadratic Adjustment Costs . . . . . . . . . . . . . . . . . . . . . . . 232
9.4 Richer Models of Adjustment . . . . . . . . . . . . . . . . . . . . . . 239
9.4.1 Piecewise Linear Adjustment Costs . . . . . . . . . . . . . . . 239
9.4.2 Non-Convex Adjustment Costs . . . . . . . . . . . . . . . . . 241
9.4.3 Asymmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
9.5 The Gap Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
9.5.1 Partial Adjustment Model . . . . . . . . . . . . . . . . . . . . 245
viii
9.5.2 Measuring the Target and the Gap . . . . . . . . . . . . . . . 246
9.6 Estimation of a Rich Model of Adjustment Costs . . . . . . . . . . . 250
9.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
10 Future Developments 255
10.1 Overview/Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
10.2 Price Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
10.2.1 Optimization Problem . . . . . . . . . . . . . . . . . . . . . . 256
10.2.2 Evidence on Magazine Prices . . . . . . . . . . . . . . . . . . 259
10.2.3 Aggregate Implications . . . . . . . . . . . . . . . . . . . . . . 260
10.3 Optimal Inventory Policy . . . . . . . . . . . . . . . . . . . . . . . . . 263
10.3.1 Inventories and the Production Smoothing Model . . . . . . . 263
10.3.2 Prices and Inventory Adjustment . . . . . . . . . . . . . . . . 267
10.4 Capital and Labor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
10.5 Technological Complementarities: Equilibrium Analysis . . . . . . . . 272
10.6 Search Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
10.6.1 A Simple Labor Search Model . . . . . . . . . . . . . . . . . . 274
10.6.2 Estimation of the Labor Search Model . . . . . . . . . . . . . 275
10.6.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
10.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Chapter 1
OVERVIEW
This book studies a rich set of applied problems in economics, emphasizing the
dynamic aspects of economic decisions. While we are ultimately interested in appli-
cations, it is necessary to acquire some basic techniques before tackling the details
of specific dynamic optimization problems. Thus the book presents and integrates
tools, such as dynamic programming, numerical techniques and simulation based
econometric methods. We then use these tools to study a variety of applications of
in both macroeconomics and microeconomics.
The approach we pursue to studying economic dynamics is structural. As re-
searchers, we are frequently interested in inferring underlying parameters that rep-
resent tastes, technology and other primitives from observations of individual house-
holds and firms as well as from economic aggregates. If this inference is successful,
then we can test competing hypotheses about economic behavior and evaluate the
effects of policy experiments. In the end, our approach allows us to characterize the
mapping from primitives to observed behavior.
To appreciate what is at stake, consider the following policy experiment. In re-
cent years, a number of European governments have instituted policies of subsidizing
the scrapping of old cars and the subsequent purchase of a new car. What are the
1
2
expected effects of these policies on the car industry and on government revenues?
At some level this question seems easy if a researcher ”knows” the demand function
for cars. But of course that demand function is, at best, elusive. Further, the de-
mand function estimated in one policy regime is unlikely to be very informative for
a novel policy experiment, such as the car scrapping subsidies.
An alternative approach is to build and estimate a dynamic model of household
choice over car ownership. Once the parameters of this model are estimated, then
various policy experiments can be evaluated.1 This seems considerably more difficult
than just estimating a demand function and indeed that is the case. The approach
requires the specification and solution of a dynamic optimization problem and then
the estimation of the parameters. But, as we argue here, this methodology is both
feasible and exciting.
It is the integration of the solution of dynamic optimization problems with the
estimation of parameters that is at the heart of the approach to the study of dynamic
economies. There are three key steps in our development of this topic. These are
reflected in the organization of the chapters.
The first step is to review the formal theory of dynamic optimization. This tool is
used in many areas of economics including macroeconomics, industrial organization,
labor economics, international economics and so forth. As in previous contributions
to the study of dynamic optimization, such as Sargent (1987) and Stokey and Lucas
(1989), our presentation starts with the formal theory of dynamic programming.
Given the large number of other contributions in this area, our presentation will rely
on existing theorems concerning the existence of solutions to a variety of dynamic
programming problems.
A second step is to present the numerical tools and the econometric techniques
necessary to conduct a structural estimation of the theoretical dynamic models.
These numerical tools serve two purposes: (i) to complement the theory in learn-
3
ing about dynamic programming and (ii) to enable a researcher to evaluate the
quantitative implications of the theory. From our experience, the process of writing
computer code to solve dynamic programming problems is an excellent device for
teaching basic concepts of this approach.2
The econometric techniques provide the final link between the dynamic program-
ming problem and data. Our emphasis will be on the mapping from parameters of
the dynamic programming problem to observations. For example, a vector of pa-
rameters is used to numerically solve a dynamic programming problem which is
then simulated to create moments. An optimization routine then selects a vector of
parameters to bring these simulated moments close to the actual moments observed
in the data.
The complete presentation of these two steps will comprise the first three chap-
ters. To distinguish this material, which is more theoretical, we call this Part I of
the book.
The final step of the presentation comprises Part II of the book which is devoted
to the application of dynamic programming to specific areas of applied economics
such as the study of business cycles, consumption, investment behavior, etc. Each
of the applied sections of the text will contain four elements: presentation of the
specific optimization problem as a dynamic programming problem, characterization
of the optimal policy functions, estimation of the parameters and using models for
policy evaluation.
While the specific applications might be labelled ”macroeconomics”, the material
is of value in other areas of economics for a couple of reasons. First, the presentation
of these applications utilizes material from all parts of economics. So, for example,
the discussion of the stochastic growth model includes material on taxation and the
work on factor adjustment at the plant-level is of interest to economists in labor
and industrial organization. Second, these techniques are useful in any application
4
where the researcher is interested in taking a dynamic optimization problem to data.
The presentation contains references to various applications of these techniques.
The novel element of this book is our presentation of an integrated approach to
the empirical implementation dynamic optimization models. Previous texts have
provided the mathematical basis for dynamic programming but those presentations
generally do not contain any quantitative applications. Other texts present the un-
derlying econometric theory but generally without specific economic applications.
This approach does both and thus provides a useable link between theory and ap-
plication as illustrated in the chapters of Part II.
Our motivation for writing this book is thus clear. From the perspective of un-
derstanding dynamic programming, explicit empirical applications complement the
underlying theory of optimization. From the perspective of applied macroeconomics,
explicit dynamic optimization problems, posed as dynamic programming problems,
provides needed structure for estimation and policy evaluation.
Since the book is intended to teach empirical applications of dynamic program-
ming problems, we plan to create a web-site for the presentation of code (MATLAB
and GAUSS) as well as data sets that will be useful for applications. The site will be
vital to readers wishing to supplement the presentation in Part II and also provide
a forum for further development of code.
The development of the material in this book has certainly benefited from the
joint work with Joao Ejarque, John Haltiwanger, Alok Johri and Jonathan Willis
that underlies some of the material. We thank these co-authors for their generous
sharing of ideas and computer code as well as their comments on the draft. Thanks
also to Victor Aguirregabiria, Yan Bai, Dean Corbae, Simon Gilchrist, Hang Kang,
Valérie Lechene, Nicola Pavoni, Marcos Vera for comments on various parts of the
book. Finally, we are grateful to numerous MA and PhD students at Tel Aviv
University, University of Texas at Austin, the IDEI at the Universit de Toulouse,
5
the NAKE PhD program in Holland, the University of Haifa, University College
London for their numerous comments and suggestions during the preparation of
this material.
Part I
Theory
6
Chapter 2
Theory of Dynamic Programming
2.1 Overview
The mathematical theory of dynamic programming as a means of solving dynamic
optimization problems dates to the early contributions of Bellman (1957) and Bert-
sekas (1976). For economists, the contributions of Sargent (1987) and Stokey and
Lucas (1989) provide a valuable bridge to this literature.
2.2 Indirect Utility
Intuitively, the approach of dynamic programming can be understood by recalling
the theme of indirect utility from basic static consumer theory or a reduced form
profit function generated by the optimization of a firm. These reduced form repre-
sentations of payoffs summarizes information about the optimized value of the choice
problems faced by households and firms. As we shall see, the theory of dynamic
programming uses this insight in a dynamic context.
2.2.1 Consumers
Consumer choice theory focuses on households who solve:
7
8
V (I, p) = max
c
u(c) subject to: pc = I
where c is a vector of consumption goods, p is a vector of prices and I is income.3
The first order condition is given by
uj(c)/pj = λ for j = 1, 2…J.
where λ is the multiplier on the budget constraint and uj(c) is the marginal utility
from good j.
Here V (I, p) is an indirect utility function. It is the maximized level of utility
from the current state (I, p). So if someone is in this state, you can predict that they
will attain this level of utility. You do not need to know what they will do with their
income; it is enough to know that they will act optimally. This is very powerful logic
and underlies the idea behind the dynamic programming models studied below.
To illustrate, what happens if we give the consumer a bit more income? Welfare
goes up by VI (I, p) > 0. Can the researcher predict what will happen with a little
more income? Not really since the optimizing consumer is indifferent with respect
to how this is spent:
uj(c)/pj = VI (I, p) for all j.
It is in this sense that the indirect utility function summarizes the value of the
households optimization problem and allows us to determine the marginal value of
income without knowing further details about consumption functions.
Is this all we need to know about household behavior? No, this theory is static
and thus ignores savings, spending on durable goods as well as uncertainty over the
future. These are all important elements in the household optimization problem.
We will return to these in later chapters on the dynamic behavior of households.
9
The point here was simply to recall a key object from optimization theory: the
indirect utility function.
2.2.2 Firms
Suppose that a firm chooses how many workers to hire at a wage of w given its stock
of capital, k, and product price, p. Thus the firm solves:
Π(w, p, k) = max
l
pf (l, k) − wl.
This will yield a labor demand function which depends on (w, p, k). As with V (I, p),
Π(w, p, k) summarizes the value of the firm given factor prices, the product price,
p, and the stock of capital, k. Both the flexible and fixed factors could be vectors.
Think of Π(w, p, k) as an indirect profit function. It completely summarizes the
value of the optimization problem of the firm given (w, p, k).
As with the households problem, given Π(w, p, k),we can directly compute the
marginal value of giving the firm some additional capital as Πk(w, p, k)=pfk(l, k)
without knowing how the firm will adjust its labor input in response to the additional
capital.
But, is this all there is to know about the firm’s behavior? Surely not as we
have not specified where k comes from. So the firm’s problem is essentially dynamic
though the demand for some of its inputs can be taken as a static optimization
problem. These are important themes in the theory of factor demand and we will
return to them in our firm applications.
10
2.3 Dynamic Optimization: A Cake Eating Ex-
ample
Here we will look at a very simple dynamic optimization problem. We begin with a
finite horizon and then discuss extensions to the infinite horizon.4
Suppose that you have a cake of size W1. At each point of time, t = 1, 2, 3, ….T
you can consume some of the cake and thus save the remainder. Let ct be your
consumption in period t and let u(ct) represent the flow of utility from this con-
sumption. The utility function is not indexed by time: preferences are stationary.
Assume u(·) is real-valued, differentiable, strictly increasing and strictly concave.
Assume that limc→0 u′(c) → ∞. Represent lifetime utility by
T∑
t=1
β(t−1)u(ct)
where 0≤ β ≤ 1 and β is called the discount factor.
For now, assume that the cake does not depreciate (melt) or grow. Hence, the
evolution of the cake over time is governed by:
Wt+1 = Wt − ct (2.1)
for t = 1, 2, ..T . How would you find the optimal path of consumption, {ct}T1 ?5
2.3.1 Direct Attack
One approach is to solve the constrained optimization problem directly. This is
called the sequence problem by Stokey and Lucas (1989). Consider the problem
of:
max
{ct}T1 ,{Wt}T +12
T∑
t=1
β(t−1)u(ct) (2.2)
11
subject to the transition equation (2.1), which holds for t = 1, 2, 3, ….T . Also, there
are non-negativity constraints on consumption and the cake given by: ct ≥ 0 and
Wt ≥ 0. For this problem, W1 is given.
Alternatively, the flow constraints imposed by (2.1) for each t could be combined
yielding:
T∑
t=1
ct + WT +1 = W1. (2.3)
The non-negativity constraints are simpler: ct ≥ 0 for t = 1, 2, ..T and WT +1 ≥ 0.
For now, we will work with the single resource constraint. This is a well-behaved
problem as the objective is concave and continuous and the constraint set is compact.
So there is a solution to this problem.6
Letting λ be the multiplier on (2.3), the first order conditions are given by:
βt−1u′(ct) = λ
for t = 1, 2, …, T and
λ = φ
where φ is the multiplier on the non-negativity constraint on WT +1. The non-
negativity constraints on ct ≥ 0 are ignored as we assumed that the marginal utility
of consumption becomes infinite as consumption approaches zero within any period.
Combining equations, we obtain an expression that links consumption across
any two periods:
u′(ct) = βu
′(ct+1). (2.4)
This is a necessary condition for optimality for any t: if it was violated, the agent
could do better by adjusting ct and ct+1. Frequently, (2.4) is referred to as an Euler
12
equation.
To understand this condition, suppose that you have a proposed (candidate)
solution for this problem given by {c∗t }T1 , {W ∗t }T +12 . Essentially, the Euler equation
says that the marginal utility cost of reducing consumption by ε in period t equals
the marginal utility gain from consuming the extra ε of cake in the next period,
which is discounted by β. If the Euler equation holds, then it is impossible to
increase utility by moving consumption across adjacent periods given a candidate
solution.
It should be clear though that this condition may not be sufficient: it does
not cover deviations that last more than one period. For example, could utility be
increased by reducing consumption by ε in period t saving the ”cake” for two periods
and then increasing consumption in period t+2? Clearly this is not covered by a
single Euler equation. However, by combining the Euler equation that hold across
period t and t + 1 with that which holds for periods t + 1 and t + 2, we can see that
such a deviation will not increase utility. This is simply because the combination of
Euler equations implies:
u′(ct) = β
2u′(ct+2)
so that the two-period deviation from the candidate solution will not increase utility.
As long as the problem is finite, the fact that the Euler equation holds across all
adjacent periods implies that any finite deviations from a candidate solution that
satisfies the Euler equations will not increase utility.
Is this enough? Not quite. Imagine a candidate solution that satisfies all of the
Euler equations but has the property that WT > cT so that there is cake left over.
This is clearly an inefficient plan: having the Euler equations holding is necessary
but not sufficient. Hence the optimal solution will satisfy the Euler equation for
13
each period and the agent will consume the entire cake!
Formally, this involves showing the non-negativity constraint on WT +1 must
bind. In fact, this constraint is binding in the above solution: λ = φ > 0. This
non-negativity constraint serves two important purposes. First, in the absence of
a constraint that WT +1 ≥ 0, the agent would clearly want to set WT +1 = −∞ and
thus die with outstanding obligations. This is clearly not feasible. Second, the fact
that the constraint is binding in the optimal solution guarantees that cake is not
being thrown away after period T .
So, in effect, the problem is pinned down by an initial condition (W1 is given)
and by a terminal condition (WT +1 = 0). The set of (T − 1) Euler equations and
(2.3) then determine the time path of consumption.
Let the solution to this problem be denoted by VT (W1) where T is the horizon of
the problem and W1 is the initial size of the cake. VT (W1) represents the maximal
utility flow from a T period problem given a size W1 cake. From now on, we call this
a value function. This is completely analogous to the indirect utility functions
expressed for the household and the firm.
As in those problems, a slight increase in the size of the cake leads to an increase
in lifetime utility equal to the marginal utility in any period. That is,
V ′T (W1) = λ = β
t−1u′(ct), t = 1, 2, …T.
It doesn’t matter when the extra cake is eaten given that the consumer is acting
optimally. This is analogous to the point raised above about the effect on utility of
an increase in income in the consumer choice problem with multiple goods.
2.3.2 Dynamic Programming Approach
Suppose that we change the above problem slightly: we add a period 0 and give an
initial cake of size W0. One approach to determining the optimal solution of this
14
augmented problem is to go back to the sequence problem and resolve it using this
longer horizon and new constraint. But, having done all of the hard work with the
T period problem, it would be nice not to have to do it again!
Finite Horizon Problem
The dynamic programming approach provides a means of doing so. It essentially
converts a (arbitrary) T period problem into a 2 period problem with the appropriate
rewriting of the objective function. In doing so, it uses the value function obtained
from solving a shorter horizon problem.
So, when we consider adding a period 0 to our original problem, we can take
advantage of the information provided in VT (W1), the solution of the T period
problem given W1 from (2.2). Given W0, consider the problem of
max
c0
u(c0) + βVT (W1) (2.5)
where
W1 = W0 − c0; W0 given.
In this formulation, the choice of consumption in period 0 determines the size of
the cake that will be available starting in period 1, W1. So instead of choosing
a sequence of consumption levels, we are just choosing c0. Once c0 and thus W1
are determined, the value of the problem from then on is given by VT (W1). This
function completely summarizes optimal behavior from period 1 onwards. For the
purposes of the dynamic programming problem, it doesn’t matter how the cake will
be consumed after the initial period. All that is important is that the agent will be
acting optimally and thus generating utility given by VT (W1). This is the principle
of optimality, due to Richard Bellman, at work. With this knowledge, an optimal
15
decision can be made regarding consumption in period 0.
Note that the first order condition (assuming that VT (W1) is differentiable) is
given by:
u′(c0) = βV
′
T (W1)
so that the marginal gain from reducing consumption a little in period 0 is summa-
rized by the derivative of the value function. As noted in the earlier discussion of
the T period sequence problem,
V ′T (W1) = u
′(c1) = β
tu′(ct+1)
for t = 1, 2, …T − 1. Using these two conditions together yields
u′(ct) = βu
′(ct+1),
for t = 0, 1, 2, …T − 1, a familiar necessary condition for an optimal solution.
Since the Euler conditions for the other periods underlie the creation of the
value function, one might suspect that the solution to the T + 1 problem using
this dynamic programming approach is identical to that from using the sequence
approach.7 This is clearly true for this problem: the set of first order conditions
for the two problems are identical and thus, given the strict concavity of the u(c)
functions, the solutions will be identical as well.
The apparent ease of this approach though is a bit misleading. We were able
to make the problem look simple by pretending that we actually knew VT (W1). Of
course, we had to solve for this either by tackling a sequence problem directly or by
building it recursively starting from an initial single period problem.
On this latter approach, we could start with the single period problem implying
V1(W1). We could then solve (2.5) to build V2(W1). Given this function, we could
16
move to a solution of the T = 3 problem and proceed iteratively, using (2.5) to build
VT (W1) for any T .
Example
We illustrate the construction of the value function in a specific example. Assume
u(c) = ln(c). Suppose that T = 1. Then V1(W1) = ln(W1).
For T = 2, the first order condition from (2.2) is
1/c1 = β/c2
and the resource constraint is
W1 = c1 + c2.
Working with these two conditions:
c1 = W1/(1 + β) and c2 = βW1/(1 + β).
¿From this, we can solve for the value of the 2-period problem:
V2(W1) = ln(c1) + β ln(c2) = A2 + B2 ln(W1) (2.6)
where A2 and B2 are constants associated with the two period problem. These
constants are given by:
A2 = ln(1/(1 + β)) + β ln(β/(1 + β)) B2 = (1 + β)
Importantly, (2.6) does not include the max operator as we are substituting the
optimal decisions in the construction of the value function, V2(W1).
Using this function, the T = 3 problem can then be written as:
V3(W1) = max
W2
ln(W1 − W2) + βV2(W2)
17
where the choice variable is the state in the subsequent period. The first order
condition is:
1
c1
= βV ′2 (W2).
Using (2.6) evaluated at a cake of size W2, we can solve for V
′
2 (W2) implying:
1
c1
= β
B2
W2
=
β
c2
.
Here c2 the consumption level in the second period of the three-period problem and
thus is the same as the level of consumption in the first period of the two-period
problem. Further, we know from the 2-period problem that
1/c2 = β/c3.
This plus the resource constraint allows us to construct the solution of the 3-period
problem:
c1 = W1/(1 + β + β
2), c2 = βW1/(1 + β + β
2), c3 = β
2W1/(1 + β + β
2).
Substituting into V3(W1) yields
V3(W1) = A3 + B3 ln(W1)
where
A3 = ln(1/(1+β+β
2))+β ln(β/(1+β+β2))+β2 ln(β2/(1+β+β2)), B3 = (1+β+β
2)
This solution can be verified from a direct attack on the 3 period problem using
(2.2) and (2.3).
18
2.4 Some Extensions of the Cake Eating Problem
Here we go beyond the T period problem to illustrate some ways to use the dynamic
programming framework. This is intended as an overview and the details of the
assertions and so forth will be provided below.
2.4.1 Infinite Horizon
Basic Structure
Suppose that we consider the above problem and allow the horizon to go to infinity.
As before, one can consider solving the infinite horizon sequence problem given by:
max
{ct}∞1 ,{Wt}∞2
∞∑
t=1
βtu(ct)
along with the transition equation of
Wt+1 = Wt − ct
for t=1,2,……
Specifying this as a dynamic programming problem,
V (W ) = max
c∈[0,W ]
u(c) + βV (W − c)
for all W . Here u(c) is again the utility from consuming c units in the current
period. V (W ) is the value of the infinite horizon problem starting with a cake of
size W . So in the given period, the agent chooses current consumption and thus
reduces the size of the cake to W ′ = W − c, as in the transition equation. We use
variables with primes to denote future values. The value of starting the next period
with a cake of that size is then given by V (W −c) which is discounted at rate β < 1.
19
For this problem, the state variable is the size of the cake (W ) that is given
at the start of any period. The state completely summarizes all information from
the past that is needed for the forward looking optimization problem. The control
variable is the variable that is being chosen. In this case, it is the level of consump-
tion in the current period, c. Note that c lies in a compact set. The dependence of
the state tomorrow on the state today and the control today, given by
W ′ = W − c
is called the transition equation.
Alternatively, we can specify the problem so that instead of choosing today’s
consumption we choose tomorrow’s state.
V (W ) = max
W ′∈[0,W ]
u(W − W ′) + βV (W ′) (2.7)
for all W . Either specification yields the same result. But choosing tomorrow’s
state often makes the algebra a bit easier so we will work with (2.7).
This expression is known as a functional equation and is often called a Bellman
equation after Richard Bellman, one of the originators of dynamic programming.
Note that the unknown in the Bellman equation is the value function itself: the
idea is to find a function V (W ) that satisfies this condition for all W . Unlike
the finite horizon problem, there is no terminal period to use to derive the value
function. In effect, the fixed point restriction of having V (W ) on both sides of (2.7)
will provide us with a means of solving the functional equation.
Note too that time itself does not enter into Bellman’s equation: we can express
all relations without an indication of time. This is the essence of stationarity.8
In fact, we will ultimately use the stationarity of the problem to make arguments
about the existence of a value function satisfying the functional equation.
A final very important property of this problem is that all information about the
20
past that bears on current and future decisions is summarized by W , the size of the
cake at the start of the period. Whether the cake is of this size because we initially
had a large cake and ate a lot or a small cake and were frugal is not relevant. All
that matters is that we have a cake of a given size. This property partly reflects the
fact that the preferences of the agent do not depend on past consumption. But, in
fact, if this was the case, we could amend the problem to allow this possibility.
The next part of this chapter addresses the question of whether there exists a
value function that satisfies (2.7). For now, we assume that a solution exists and
explore its properties.
The first order condition for the optimization problem in (2.7) can be written as
u′(c) = βV ′(W ′).
This looks simple but what is the derivative of the value function? This seems
particularly hard to answer since we do not know V (W ). However, we take use the
fact that V (W ) satisfies (2.7) for all W to calculate V ′. Assuming that this value
function is differentiable,
V ′(W ) = u′(c),
a result we have seen before. Since this holds for all W , it will hold in the following
period yielding:
V ′(W ′) = u′(c′).
Substitution leads to the familar Euler equation:
u′(c) = βu′(c′).
21
The solution to the cake eating problem will satisfy this necessary condition for all
W .
The link from the level of consumption and next period’s cake (the controls from
the different formulations) to the size of the cake (the state) is given by the policy
function:
c = φ(W ), W ′ = ϕ(W ) ≡ W − φ(W ).
Using these in the Euler equation reduces the problem to these policy functions
alone:
u′(φ(W )) = βu′(φ(W − φ(W )))
for all W .
These policy functions are very important for applied research since they provide
the mapping from the state to actions. When elements of the state as well as the
action are observable, then these policy functions will provide the foundation for
estimation of the underlying parameters.
An Example
In general, actually finding closed form solutions for the value function and the
resulting policy functions is not possible. In those cases, we try to characterize
certain properties of the solution and, for some exercises, we solve these problems
numerically.
However, as suggested by the analysis of the finite horizon examples, there are
some versions of the problem we can solve completely. Suppose then, as above, that
u(c) = ln(c). Given the results for the T-period problem, we might conjecture that
the solution to the functional equation takes the form of:
22
V (W ) = A + B ln(W )
for all W . With this guess we have reduced the dimensionality of the unknown
function V (W ) to two parameters, A and B. But can we find values for A and B
such that V (W ) will satisfy the functional equation?
Taking this guess as given and using the special preferences, the functional equa-
tion becomes:
A + B ln(W ) = max
W ′
ln(W − W ′) + β(A + B ln(W ′)) (2.8)
for all W . After some algebra, the first-order condition implies:
W ′ = ϕ(W ) =
βB
(1 + βB)
W.
Using this in (2.8) implies:
A + B ln(W ) = ln
W
(1 + βB)
+ β(A + B ln(
βBW
(1 + βB)
))
for all W . Collecting terms into a constant and terms that multiply ln(W ) and then
imposing the requirement that the functional equation must hold for all W , we find
that
B = 1/(1 − β)
is required for a solution. Given this, there is a complicated expression that can be
used to find A. To be clear then we have indeed guessed a solution to the functional
equation. We know that because we can solve for (A, B) such that the functional
equation holds for all W using the optimal consumption and savings decision rules.
With this solution, we know that
23
c = W (1 − β), W ′ = βW.
Evidently, the optimal policy is to save a constant fraction of the cake and eat the
remaining fraction.
Interestingly, the solution to B could be guessed from the solution to the T-
horizon problems where
BT =
T∑
t=1
βt−1.
Evidently, B = limT →∞BT . In fact, we will be exploiting the theme that the value
function which solves the infinite horizon problem is related to the limit of the finite
solutions in much of our numerical analysis.
Here are some exercises that add some interesting elements to this basic struc-
ture. Both begin with finite horizon formulations and then progress to the infinite
horizon problem.
Exercise 2.1
Suppose that utility in period t was given by u(ct, ct−1). How would you solve the
T period problem with these preferences? Interpret the first order conditions. How
would you formulate the Bellman equation for the infinite horizon version of this
problem?
Exercise 2.2
Suppose that the transition equation was modified so that
Wt+1 = ρWt − ct
where ρ > 0 represents a return from the holding of cake inventories. How would
you solve the T period problem with this storage technology? Interpret the first order
24
conditions. How would you formulate the Bellman equation for the infinite horizon
version of this problem? Does the size of ρ matter in this discussion? Explain.
2.4.2 Taste Shocks
One of the convenient features of the dynamic programming problem is the simplicity
with which one can introduce uncertainty.9 For the cake eating problem, the natural
source of uncertainty has to do with the agent’s tastes. In other settings we will
focus on other sources of uncertainty having to do with the productivity of labor or
the endowment of households.
To allow for variations in tastes, suppose that utility over consumption is given
by:
εu(c)
where ε is a random variable whose properties we will describe below. The function
u(c) is again assumed to be strictly increasing and strictly concave. Otherwise, the
problem is the original cake eating problem with an initial cake of size W .
In problems with stochastic elements, it is critical to be precise about the timing
of events. Does the optimizing agent know the current shocks when making a
decision? For this analysis, assume that the agent knows the value of the taste
shock when making current decisions but does not know future values. Thus the
agent must use expectations of future values of ε when deciding how much cake to
eat today: it may be optimal to consume less today (save more) in anticipation of
a high realization of ε in the future.
For simplicity, assume that the taste shock takes on only two values: ε ∈ {εh, εl}
with εh > εl > 0. Further, we assume that the taste shock follows a first -order
Markov process 10 which means that the probability a particular realization of ε
25
occurs in the current period depends only the value of ε attained in the previous
period.11 For notation, let πij denote the probability that the value of ε goes from
state i in the current period to state j in the next period. For example, πlh is define
from:
πlh ≡ Prob(ε′ = εh|ε = εl)
where ε′ refers to the future value of ε. Clearly πih + πil = 1 for i = h, l. Let Π be a
2×2 matrix with a typical element πij which summarizes the information about the
probability of moving across states. This matrix is naturally called a transition
matrix.
Given this notation and structure, we can turn to the cake eating problem. It
is critical to carefully define the state of the system for the optimizing agent. In
the nonstochastic problem, the state was simply the size of the cake. This provided
all the information the agent needed to make a choice. When taste shocks are
introduced, the agent needs to take this into account as well. In fact, the taste
shocks provide information about current payoffs and, through the Π matrix, are
informative about the future value of the taste shock as well.12
Formally, the Bellman equation is:
V (W, ε) = max
W ′
εu(W − W ′) + βEε′|εV (W ′, ε′)
for all (W, ε) where W ′ = W − c as usual. Note that the conditional expectation is
denoted here by Eε′|εV (W ′, ε′) which, given Π, is something we can compute.13
The first order condition for this problem is given by:
εu′(W − W ′) = βEε′|εV1(W ′, ε′)
for all (W, ε). Using the functional equation to solve for the marginal value of cake,
we find:
26
εu′(W − W ′) = βEε′|ε[ε′u′(W ′ − W ′′)] (2.9)
which, of course, is the stochastic Euler equation for this problem.
The optimal policy function is given by
W ′ = ϕ(W, ε)
The Euler equation can be rewritten in these terms as:
εu′(W − ϕ(W, ε)) = βEε′|ε[ε′u′(ϕ(W, ε) − ϕ(ϕ(W, ε), ε′)))]
The properties of this policy function can then be deduced from this condition.
Clearly both ε′ and c′ depend on the realized value of ε′ so that the expectation on
the right side of (2.9) cannot be split into two separate pieces.
2.4.3 Discrete Choice
To illustrate some of the flexibility of the dynamic programming approach, we build
on this stochastic problem. Suppose the cake must be eaten in one period. Perhaps
we should think of this as the wine drinking problem recognizing that once a good
bottle of wine is opened, it should be consumed! Further, we modify the transition
equation to allow the cake to grow (depreciate) at rate ρ.
The problem is then an example of a dynamic, stochastic discrete choice problem.
This is an example of a family of problems called optimal stopping problems .14
The common element in all of these problems is the emphasis on timing of a single
event: when to eat the cake; when to take a job; when to stop school, when to stop
revising a chapter, etc. In fact, for many of these problems, these choices are not
once in a lifetime events and so we will be looking at problems even richer than the
optimal stopping variety.
27
Let V E(W, ε) and V N (W, ε) be the value of eating the size W cake now (E) and
waiting (N ) respectively given the current taste shock, ε ∈ {εh, εl}. Then,
V E(W, ε) = εu(W )
and
V N (W ) = βEε′|εV (ρW, ε
′).
where
V (W, ε) = max(V E(W, ε), V N (W, ε))
for all (W, ε). To understand these terms, εu(W ) is the direct utility flow from
eating the cake. Once the cake is eaten the problem has ended. So V E(W, ε) is just
a one-period return. If the agent waits, then there is no cake consumption in the
current period and next period the cake is of size (ρW ). As tastes are stochastic,
the agent choosing to wait must take expectations of the future taste shock, ε′. The
agent has an option next period of eating the cake or waiting further. Hence the
value of having the cake in any state is given by V (W, ε), which is the value attained
by maximizing over the two options of eating or waiting. The cost of delaying the
choice is determined by the discount factor β while the gains to delay are associated
with the growth of the cake, parameterized by ρ. Further, the realized value of ε
will surely influence the relative value of consuming the cake immediately.
If ρ ≤ 1, then the cake doesn’t grow. In this case, there is no gain from delay
when ε = εh. If the agent delays, then utility in the next period will have to be
lower due to discounting and, with probability πhl, the taste shock will switch from
low to high. So, waiting to eat the cake in the future will not be desirable. Hence,
V (W, εh) = V
E(W, εh) = εhu(W )
28
for all W .
In the low ε state, matters are more complex. If β and ρ are sufficiently close
to 1 then there is not a large cost to delay. Further, if πlh is sufficiently close to 1,
then it is likely that tastes will switch from low to high. Thus it will be optimal not
to eat the cake in state (W, εl).
15
Here are some additional exercises.
Exercise 2.3
Suppose that ρ = 1. For a given β, show that there exists a critical level of
πlh,denoted by π̄lh such that if πlh > π̄lh, then the optimal solution is for the agent
to wait when ε = εl and to eat the cake when εh is realized.
Exercise 2.4
When ρ > 1, the problem is more difficult. Suppose that there are no variations
in tastes: εh = εl = 1. In this case, there is a trade-off between the value of waiting
(as the cake grows) and the cost of delay from discounting.
Suppose that ρ > 1 and u(c) = c
1−γ
1−γ . What is the solution to the optimal stop-
ping problem when βρ1−γ < 1? What happens if βρ1−γ > 1? What happens when
uncertainty is added?
2.5 General Formulation
Building on the intuition gained from this discussion of the cake eating problem,
we now consider a more formal abstract treatment of the dynamic programming
approach.16 We begin with a presentation of the non-stochastic problem and then
add uncertainty to the formulation.
29
2.5.1 Non-Stochastic Case
Consider the infinite horizon optimization problem of an agent with a payoff function
for period t given by σ̃(st, ct). The first argument of the payoff function is termed the
state vector, (st). As noted above, this represents a set of variables that influences
the agent’s return within the period but, by assumption, these variables are outside
of the agent’s control within period t. The state variables evolve over time in a
manner that may be influenced by the control vector (ct), the second argument of
the payoff function. The connection between the state variables over time is given
by the transition equation:
st+1 = τ (st, ct).
So, given the current state and the current control, the state vector for the subse-
quent period is determined.
Note that the state vector has a very important property: it completely summa-
rizes all of the information from the past that is needed to make a forward-looking
decision. While preferences and the transition equation are certainly dependent on
the past, this dependence is represented by st: other variables from the past do
not affect current payoffs or constraints and thus cannot influence current decisions.
This may seem restrictive but it is not: the vector st may include many variables
so that the dependence of current choices on the past can be quite rich.
While the state vector is effectively determined by preferences and the transition
equation, the researcher has some latitude in choosing the control vector. That
is, there may be multiple ways of representing the same problem with alternative
specifications of the control variables.
We assume that c ∈ C and s ∈ S. In some cases, the control is restricted to be
in subset of C which depends on the state vector: c ∈ C(s). Finally assume that
30
σ̃(s, c) is bounded for (s, c) ∈ S × C. 17
For the cake eating problem described above, the state of the system was the
size of the current cake (Wt) and the control variable was the level of consumption
in period t, (ct). The transition equation describing the evolution of the cake was
given by
Wt+1 = Wt − ct.
Clearly the evolution of the cake is governed by the amount of current consumption.
An equivalent representation, as expressed in (2.7), is to consider the future size of
the cake as the control variable and then to simply write current consumption as
Wt+1 − Wt.
There are two final properties of the agent’s dynamic optimization problem worth
specifying: stationarity and discounting. Note that neither the payoff nor the
transition equations depend explicitly on time. True the problem is dynamic but
time per se is not of the essence. In a given state, the optimal choice of the agent
will be the same regardless of “when” he optimizes. Stationarity is important both
for the analysis of the optimization problem and for empirical implementation of
infinite horizon problems. In fact, because of stationarity we can dispense with time
subscripts as the problem is completely summarized by the current values of the
state variables.
The agent’s preferences are also dependent on the rate at which the future is
discounted. Let β denote the discount factor and assume that 0 < β < 1. Then we
can represent the agent’s payoffs over the infinite horizon as
∞∑
t=0
βtσ̃(st, ct) (2.10)
One approach to optimization is then to maximize (2.10) through the choice of
{ct} for t = 0, 1, 2, ... given s0 and subject to the transition equation. Let V (s0) be
31
the optimized value of this problem given the initial state.
Alternatively, one can adopt the dynamic program approach and consider the
following equation, called Bellman’s equation:
V (s) = max
c∈C(s)
σ̃(s, c) + βV (s′) (2.11)
for all s ∈ S, where s′ = τ (s, c). Here time subscripts are eliminated, reflecting the
stationarity of the problem. Instead, current variables are unprimed while future
ones are denoted by a prime (′).
As in Stokey and Lucas (1989), the problem can be formulated as
V (s) = max
s′∈Γ(s)
σ(s, s′) + βV (s′) (2.12)
for all s ∈ S. This is a more compact formulation and we will use it for our
presentation.18 Nonetheless, the presentations in Bertsekas (1976) and Sargent
(1987) follow (2.11). Assume that S is a convex subset of �k.
Let the policy function that determines the optimal value of the control (the
future state) given the state be given by s′ = φ(s). Our interest is ultimately in the
policy function since we generally observe the actions of agents rather than their
levels of utility. Still, to determine φ(s) we need to ”solve” (2.12). That is, we
need to find the value function that satisfies (2.12). It is important to realize that
while the payoff and transition equations are primitive objects that models specify
a priori, the value function is derived as the solution of the functional equation,
(2.12).
There are many results in the lengthy literature on dynamic programming prob-
lems on the existence of a solution to the functional equation. Here, we present
one set of sufficient conditions. The reader is referred to Bertsekas (1976), Sar-
gent (1987) and Stokey and Lucas (1989) for additional theorems under alternative
32
assumptions about the payoff and transition functions.19
Theorem 1 Assume σ(s, c) is real-valued, continuous and bounded, 0 < β < 1 and
the constraint set, Γ(s), is non-empty, compact-valued and continuous, then there
exists a unique value function V (s) that solves (2.12)
Proof: See Stokey and Lucas (1989),[Theorem 4.6].
Instead of a formal proof, we give an intuitive sketch. The key component in the
analysis is the definition of an operator, commonly denoted as T, defined by:
T (W )(s) = max
s′∈Γ(s)
σ(s, s′) + βW (s′) for all s ∈ S.20
So, this mapping takes a guess on the value function and, working through the
maximization for all s, produces another value function, T (W )(s). Clear, any V (s)
such that V (s) = T (V )(s) for all s ∈ S is a solution to (2.12). So, we can reduce
the analysis to determining the fixed points of T (W ).
The fixed point argument proceeds by showing the T (W ) is a contraction using
a pair of sufficient conditions from Blackwell (1965). These conditions are: (i)
monotonicity and (ii) discounting of the mapping T (V ). Monotonicity means that
if W (s) ≥ Q(s) for all s ∈ S, then T (W )(s) ≥ T (Q)(s) for all s ∈ S. This property
can be directly verified from the fact that T (V ) is generated by a maximization
problem. So that if one adopts the choice of φQ(s) obtained from
max
s′∈Γ(s)
σ(s, s′) + βQ(s′) for all s ∈ S.
When the proposed value function is W (s) then:
T (W )(s) = max
s′∈Γ(s)
σ(s, s′) + βW (s′) ≥ σ(s, φQ(s)) + βW (φQ(s))
≥ σ(s, φQ(s)) + βQ(φQ(s)) ≡ T (Q)(s)
33
for all s ∈ S.
Discounting means that adding a constant to W leads T (W ) to increase by
less than this constant. That is, for any constant k, T (W + k)(s) ≤ T (W )(s) + βk
for all s ∈ S where β ∈ [0, 1). The term discounting reflects the fact that β must be
less than 1. This property is easy to verify in the dynamic programming problem:
T (W + k) = max
s′∈Γ(s)
σ(s, s′) + β[W (s′) + k] = T (W ) + βk, for all s ∈ S
since we assume that the discount factor is less than 1.
The fact that T (W ) is a contraction allows us to take advantage of the contrac-
tion mapping theorem.21 This theorem implies that: (i) there is a unique fixed point
and (ii) this fixed point can be reached by an iteration process using an arbitrary
initial condition. The first property is reflected in the theorem given above.
The second property is used extensively as a means of finding the solution to
(2.12). To better understand this, let V0(s) for all s ∈ S be an initial guess of the
solution to (2.12). Consider V1 = T (V0). If V1 = V0 for all s ∈ S, then we have the
solution. Else, consider V2 = T (V1) and continue iterating until T (V ) = V so that
the functional equation is satisfied. Of course, in general, there is no reason to think
that this iterative process will converge. However, if T (V ) is a contraction, as it is
for our dynamic programming framework, then the V (s) that satisfies (2.12) can be
found from the iteration of T (V0(s)) for any initial guess, V0(s). This procedure is
called value function iteration and will be a valuable tool for applied analysis of
dynamic programming problems.
The value function which satisfies (2.12) may inherit some properties from the
more primitive functions that are the inputs into the dynamic programming problem:
the payoff and transition equations. As we shall see, the property of strict concavity
is useful for various applications.22 The result is given formally by:
Theorem 2 Assume σ(s, s′) is real-valued, continuous, concave and bounded, 0 <
34
β < 1, S is a convex subset of �kand the constraint set is non-empty, compact-
valued, convex and continuous, then the unique solution to (2.12) is strictly concave.
Further, φ(s) is a continuous, single-valued function.
Proof: See Theorem 4.8 in Stokey and Lucas (1989).
The proof of the theorem relies on showing that strict concavity is preserved
by T (V ): i.e. if V (s) is strictly concave, then so is T (V (s)). Given that σ(s, c) is
concave, let our initial guess of the value function be the solution to the one-period
problem
V0(s) ≡ max
s′∈Γ(s)
σ(s, s′).
V0(s) will be strictly concave. Since T (V ) preserves this property, the solution to
(2.12) will be strictly concave.
As noted earlier, our interest is in the policy function. Note that from this
theorem, there is a stationary policy function which depends only on the state
vector. This result is important for econometric application since stationarity is
often assumed in characterizing the properties of various estimators.
The cake eating example relied on the Euler equation to determine some proper-
ties of the optimal solution. However, the first-order condition from (2.12) combined
with the strict concavity of the value function is useful in determining properties
of the policy function. Beneveniste and Scheinkman (1979) provide conditions such
that V (s) is differentiable (Stokey and Lucas (1989), Theorem 4.11). In our dis-
cussion of applications, we will see arguments that use the concavity of the value
function to characterize the policy function.
35
2.5.2 Stochastic Dynamic Programming
While the non-stochastic problem is perhaps a natural starting point, in terms of
applications it is necessary to consider stochastic elements. Clearly the stochas-
tic growth model, consumption/savings decisions by households, factor demand by
firms, pricing decisions by sellers, search decisions all involve the specification of
dynamic stochastic environments.
Further, empirical applications rest upon shocks that are not observed by the
econometrician. In many applications, the researcher appends a shock to an equation
prior to estimation without being explicit about the source of the error term. This
is not consistent with the approach of stochastic dynamic programming: shocks are
part of the state vector of the agent. Of course, the researcher may not observe all
of the variables that influence the agent and/or there may be measurement error.
Nonetheless, being explicit about the source of error in empirical applications is part
of the strength of this approach.
While stochastic elements can be added in many ways to dynamic programming
problems, we consider the following formulation which is used in our applications.
Letting ε represent the current value of a vector of ”shocks”; i.e. random variables
that are partially determined by nature. Let ε ∈ Ψ which is assumed to be a finite
set.23 Then using the notation developed above, the functional equation becomes:
V (s, ε) = max
s′∈Γ(s,ε)
σ(s, s′, ε) + βEε′|εV (s
′, ε′) (2.13)
for all (s, ε).
Further, we have assumed that the stochastic process itself is purely exogenous
as the distribution of ε′ depends on ε but is independent of the current state and
control. Note too that the distribution of ε′ depends on only the realized value of
ε : i.e. ε follows a first-order Markov process. This is not restrictive in the sense
36
that if values of shocks from previous periods were relevant for the distribution of
ε′, then they could simply be added to the state vector.
Finally, note that the distribution of ε′ conditional on ε, written as ε′|ε, is time
invariant. This is analogous to the stationarity properties of the payoff and transition
equations.. In this case, the conditional probability of ε′|ε are characterized by a
transition matrix, Π. The element πij of this matrix is defined as:
πij ≡ Prob(ε′ = εj|ε = εi)
which is just the likelihood that εj occurs in the next period, given that εi occurs
today. Thus this transition matrix is used to compute the transition probabilities
in (2.13). Throughout we assume that πij ∈ (0, 1) and
∑
j πij = 1 for each i. With
this structure:
Theorem 3 If σ(s, s′, ε) is real-valued, continuous, concave and bounded, 0 < β < 1
and the constraint set is compact and convex, then:
1. there exists a unique value function V (s, ε) that solves (2.13)
2. there exists a stationary policy function, φ(s, ε).
Proof: As in the proof of Theorem 2, this is a direct application of Blackwell’s
Theorem. That is, with β < 1, discounting holds. Likewise, monotonicity is imme-
diate as in the discussion above. See also the proof of Proposition 2 in Bertsekas
(1976), Chp. 6.
The first-order condition for (2.13) is given by:
σs′(s, s
′, ε) + βEε′|εVs′(s
′, ε′) = 0. (2.14)
Using (2.13) to determine Vs′(s
′, ε′) yields an Euler equation:
σs′(s, s
′, ε) + βEε′|εσs′(s
′, s′′, ε′) = 0. (2.15)
37
This Euler equation has the usual interpretation. The expected sum of the effects
of a marginal variation in the control in the current period (s) must be zero. So,
if there is a marginal gain in the current period, this, in expectation, is offset by a
marginal loss in the next period.
Put differently, if a policy is optimal, there should be no variation in the value of
the current control that will, in expectation, make the agent better off. Of course,
ex post (after the realization of ε′), there may have been better decisions for the
agent and, from the vantage point of hindsight, mistakes were made. That is
σs′(s, s
′, ε) + βσs′(s
′, s′′, ε′) = 0. (2.16)
will surely not hold for all realizations of ε′. Yet, from the ex ante optimization we
know that these ex post errors were not predicable given the information available
to the agent.
As we shall see, this is a powerful insight that underlies the estimation of models
based upon a stochastic Euler equation such as 2.15. Yet, as illustrated in many
applications, the researcher may be unable to summarize conditions for optimality
through an Euler equation. In these cases, characterizing the policy function directly
is required.
2.6 Conclusion
The theory of dynamic programming is a cornerstone of this book. The point of
this chapter is to introduce researchers to some of the insights of this vast literature
and some of the results we will find useful in our applications. As mentioned earlier,
this chapter has been specifically directed to provide theoretical structure for the
dynamic optimization problems we will confront in this book. Of course, versions
of these results hold in much more general circumstances. Again the reader is urged
38
to study Bertsekas (1976), Sargent (1987) and Stokey and Lucas (1989) for a more
complete treatment of this topic.
Chapter 3
Numerical Analysis
3.1 Overview
This chapter reviews numerical methods used to solve dynamic programming prob-
lems. This discussion provides a key link between the basic theory of dynamic
programming and the empirical analysis of dynamic optimization problems. The
need for numerical tools arises from the fact that generally dynamic programming
problems do not possess tractable closed form solutions. Hence, techniques must
be used to approximate the solutions of these problems. We present a variety of
techniques in this chapter which are subsequently used in the macroeconomic ap-
plications studied in Part II of this book.
The presentation starts by solving a stochastic cake eating problem using a
procedure called value function iteration. This same example is then used to
illustrate alternative methods that operate on the policy function rather than the
value function. Finally, a version of this problem is studied to illustrate the solution
to dynamic, discrete choice problems.
The appendix and the web page for this book contain the programs used in this
chapter. The applied researcher may find these useful templates for solving other
39
40
problems. In section 3.A in the appendix, we present several numerical tools such
as numerical integration or interpolation techniques, which are useful when using
numerical methods.
A number of articles and books have been devoted to numerical programming.
For a more complete description, we refer the reader to Judd (1998), Amman et al.
(1996), Press et al. (1986) or Taylor and Uhlig (1990).
3.2 Stochastic Cake Eating Problem
We start with the stochastic cake eating problem defined by:
V (W, y) = max
0≤c≤W +y
u(c) + βEy′|yV (W
′, y′) for all (W, y)
with W ′ = R(W − c + y)
(3.1)
Here there are two state variables: W , the size of the cake brought into the current
period, and y, the stochastic endowment of additional cake. This is an example of
a stochastic dynamic programming problem from the framework in (2.5.2).
We begin by analyzing the simple case where the endowment is iid: the shock
today does not give any information on the shock tomorrow. In this case, the
consumer only cares about the total amount which can be potentially eaten, X =
W + y, and not the particular origin of any piece of cake. In this problem, there is
only one state variable X. We can rewrite the problem as:
V (X) = max
0≤c≤X
u(c) + βEy′V (X
′) for all X
with X′ = R(X − c) + y′
(3.2)
If the endowment is serially correlated, then the agent has to keep track of
any variables which allow him to forecast future endowment. The state space, will
include X but also current and maybe past realizations of endowments. We present
such a case in section 3.3 where we study a discrete cake eating problem. Chapter 6.1
also presents the continuous cake eating problem with serially correlated shocks.
41
The control variable is c, the level of current consumption. The size of the cake
evolves from one period to the next according to the transition equation. The goal
is to evaluate the value V (X) as well as the policy function for consumption, c(X).
3.2.1 Value Function Iterations
This method works from the Bellman equation to compute the value function by
backward iterations on an initial guess. While sometimes slower than competing
methods, it is trustworthy in that it reflects the result, stated in Chapter 2, that
(under certain conditions)the solution of the Bellman equation can be reached by
iterating the value function starting from an arbitrary initial value. We illustrate
this approach here in solving (3.2).24
In order to program value function iteration, there are several important steps:
1. choosing a functional form for the utility function.
2. discretizing the state and control variable.
3. building a computer code to perform value function iteration
4. evaluating the value and the policy function.
We discuss each steps in turn. These steps are indicated in the code for the stochastic
cake eating problem.
Functional Form and Parameterization
We need to specify the utility function. This is the only known primitive function
in (3.2): recall that the value function is what we are solving for! The choice of
this function depends on the problem and the data. The consumption literature has
often worked with a constant relative risk aversion (CRRA) function:
u(c) =
c(1−γ)
1 − γ
42
The vector θ will represent the parameters. For the cake eating problem (γ, β)
are both included in θ. To solve for the value function, we need to assign particular
values to these parameters as well as the exogenous return R. For now, we assume
that βR = 1 so that the growth in the cake is exactly offset by the consumers
discounting of the future. The specification of the functional form and its parame-
terization are given in Part I of the accompanying Matlab code for the cake eating
problem.
State and Control Space
We have to define the space spanned by the state and the control variables as well
as the space for the endowment shocks. For each problem, specification of the state
space is important. The computer cannot literally handle a continuous state space,
so we have to approximate this continuous space by a discrete one. While the
approximation is clearly better if the state space is very fine (i.e. has many points),
this can be costly in terms of computation time. Thus there is a trade-off involved.
For the cake eating problem, suppose that the cake endowment can take two
values, low (yL) and high (yH ). As the endowment is assumed to follow an iid
process, denote the probability a shock yi by πi, for i = L, H. The probability of
transitions can be stacked in a transition matrix:
π =
[
πL πH
πL πH
]
with πL + πH = 1
In this discrete setting, the expectation in (3.2) is just a weighted sum, so that
the Bellman equation can be simply rewritten:
V (X) = max
0≤c≤X
u(c) + β
∑
i=L,H
πiV (R(X − c) + yi) for all X
For this problem, it turns out that the natural state space is given by:[X̄L, X̄H ].
This choice of the state space is based upon the economics of the problem, which
43
will be understood more completely after studying household consumption choices.
Imagine though that endowment was constant at a level yi for i = L, H. Then, given
the assumption βR = 1, the cake level of the household will (trust us) eventually
settle down to X̄i, for i = L, H. Since the endowment is stochastic and not constant,
consumption and the size of the future cake will vary with realizations of the state
variable, X, but it turns out that X will never leave this interval.
The fineness of the grid is simply a matter of choice too. In the program, let ns
be the number of elements in the state space. The program simply partitions the
interval [X̄L, X̄H ] into ns elements. In practice, the grid is usually uniform, with the
distance between two consecutive elements being constant. 25
Call the state space ΨS and let is be an index:
ΨS = {Xis}nsis=1 with X1 = X̄L, Xns = X̄H
The control variable, c, takes values in [X̄L, X̄H ]. These are the extreme levels of
consumption given the state space for X. We discretize this space into a nc size
grid, and call the control space ΨC = {cic}ncic=1.
Value Function Iteration and Policy Function
Here we must have a loop for the mapping T (v(X)) defined as
T (v(X)) = max
c
u(c) + β
∑
i=L,H
πivj (R(X − c) + yi) . (3.3)
Here v(X) represents a candidate value function, that is a proposed solution to
(3.2). If T (v(X)) = v(X), then indeed v(X) is the unique solution to (3.2). Thus
the solution to the dynamic programming problem is reduced to finding a fixed point
of the mapping T (v(X)).
Starting with an initial guess v0(X), we compute a sequence of value functions
44
vj(X):
vj+1(X) = T (vj(X)) = max
c
u(c) + β
∑
i=L,H
πivj (R(X − c) + yi) .
The iterations are stopped when |vj+1(X) − vj(X)| < �, ∀is, where � is a small
number. As T (.) is a contraction mapping (see chapter 2), the initial guess v0(X)
does not have any influence on the convergence to the fixed point, so that one can
choose for instance v0(X) = 0. However, finding a good guess for v0(X) helps to
decrease the computing time. Using the contraction mapping property, it can be
shown that the convergence rate is geometric, parameterized by the discount rate β.
We now review in more detail how the iteration is done in practice. At each
iteration, the values vj(X) are stored in a nsx1 matrix:
V =
vj(X
1)
...
vj(X
is )
...
vj(X
ns )
To compute vj+1, we start by choosing a particular size for the total amount of
cake at the start of the period, Xis . We then search among all the points in the
control space ΨC for the one that maximizes u(c) + βEvj(X
′). Let’s denote it ci
∗
c .
This involves finding next period’s value, vj(R(X
is − ci∗c ) + yi), i = L, H. With the
assumption of a finite state space, we look for the value vj(.) at the point nearest
to R(Xis − ci∗c ) + yi. Once we have calculated the new value for vj+1(Xis ), we can
proceed to compute similarly the value vj+1(.) for other sizes of the cake and other
endowment at the start of the period. These new values are then stacked in V.
Figure 3.1 gives a detailed example of how this can be programmed on a computer.
(Note that the code is not written in a particular computer language, so one has to
adapt the code to the appropriate syntax. The code for the value function iteration
piece is Part III of the Matlab code. )
45
[Figure 3.1 approximately here]
Once the value function iteration piece of the program is completed, the value
function can be used to find the policy function, c = c(X). This is done by collecting
all the optimal consumption value, cic∗ for each value of Xis . Here again, we only
know the function c(X) at the points of the grid. We can use interpolating methods
to evaluate the policy function at other points.
The value function and the policy function are displayed in Figures 3.2 and 3.3
for particular values of the parameters.
[Figure 3.2 approximately here]
[Figure 3.3 approximately here]
As discussed above, approximating the value function and the policy rules by
a finite state space requires a large number of points on this space (ns has to be
big). This is often very time consuming in terms of numerical calculations. One can
reduce the number of points on the grid, while keeping a satisfactory accuracy by
using interpolations on this grid. When we evaluated the function vj(R(X
is − ci∗c ) +
yi), i = L, H, we used the nearest value on the grid to approximate R(X
is −ci∗c )+yi.
With a small number of points on the grid, this can be a very crude approximation.
The accuracy of the computation can be increased by interpolating the function
vj(.) (see section 3.A.1 for more details). The interpolation is based on the values
in V.
3.2.2 Policy Function Iterations
The value function iteration method can be rather slow, as it converges at a rate β.
Researchers have devised other methods which can be faster to compute the solution
to the Bellman equation in an infinite horizon. The policy function iteration, also
46
known as Howard’s improvement algorithm, is one of these. We refer the reader to
Judd (1998) or Ljungqvist and Sargent (2000) for further details.
This method starts with a guess for the policy function, in our case c0(X). This
policy function is then used to evaluate the value of using this rule forever:
V0(X) = u(c0(X)) + β
∑
i=L,H
πiV0 (R(X − c0(X)) + yi) for all X.
This ”policy evaluation step” requires solving a system of linear equations, given
that we have approximated R(X − c0(X)) + yi by an X on our grid. Next, we do a
”policy improvement step” to compute c1(X) as:
c1(X) = argmax
c
[
u(c) + β
∑
i=L,H
πiV0 (R(X − c) + yi)
]
for all X.
Given this new rule, the iterations are continued to find V1(), c2(), . . ., cj+1() until
|cj+1(X) − cj(X)| is small enough. The convergence rate is much faster than the
value function iteration method. However, solving the ”policy evaluation step” can
be in some cases very time consuming, especially when the state space is large. Once
again, the computation time is much reduced if the initial guess c0(X) is close to
the true policy rule c(X).
3.2.3 Projection Methods
These methods compute directly the policy function without calculating the value
functions. They use the first order conditions (Euler equation) to back out the
policy rules. The continuous cake problem satisfies the first order Euler equation:
u′(ct) = Etu
′(ct+1)
if the desired consumption level is less than the total resources X = W + y. If there
is a corner solution, then the optimal consumption level is c(X) = X. Taking into
account the corner solution, we can rewrite the Euler equation as:
u′(ct) = max[u
′(Xt), Etu
′(ct+1)]
47
We know that, under the iid assumption, the problem has only one state variable,
X, so that the consumption function can be written c = c(X). As we consider the
stationary solution, we drop the subscript t in the next . The Euler equation can
be reformulated as:
u′
(
c(X)
)
− max
[
u′(X), Ey′u
′
(
c
(
R(X − c(X)) + y′
))]
= 0 (3.4)
or
F (c(X)) = 0 (3.5)
The goal is to find an approximation ĉ(X) of c(X), for which (3.5) is approximately
satisfied. The problem is thus reduced to find the zero of F , where F is an operator
over function spaces. This can be done with a minimizing algorithm. There are two
issues to resolve. First, we need to find a good approximation of c(X). Second, we
have to define a metric to evaluate the fit of the approximation.
Solving for the Policy Rule
[Figure 3.4 approximately here]
Let {pi(X)} be a base of the space of continuous functions and let Ψ = {ψi} be
a set of parameters. We can approximate c(X) by
ĉ(X, Ψ) =
n∑
i=1
ψipi(X)
There is an infinite number of bases to chose from. A simple one is to consider
polynomials in X, so that ĉ(X, Ψ) = ψ0 + ψ1X + ψ2X
2 + .... Although this is an
intuitive choice, this is not usually the best one. In the function space, this base is
not an orthogonal base, which means that some elements tend to be collinear.
Orthogonal bases will yield more efficient and precise results. 26 The chosen base
should be computationally simple. Its elements should ”look like” the function to
approximate, so that the function c(X) can be approximated with a small number
48
of base functions. Any knowledge of the shape of the policy function will be to a
great help. If, for instance this policy function has a kink, a method based only
on a series of polynomials will have a hard time fitting it. It would require a large
number of powers of the state variable to come somewhere close to the solution.
Having chosen a method to approximate the policy rule, we now have to be more
precise about what ”bringing F (ĉ(X, Ψ)) close to zero” means.
To be more specific, we need to define some operators on the space of continuous
functions. For any weighting function g(x), the inner product of two integrable
functions f1 and f2 on a space A is defined as:
〈f1, f2〉 =
∫
A
f1(x)f2(x)g(x)dx (3.6)
Two functions f1 and f2 are said to be orthogonal, conditional on a weighting
function g(x), if 〈f1, f2〉 = 0. The weighting function indicates where the researcher
wants the approximation to be good. We are using the operator 〈., .〉 and the
weighting function to construct a metric to evaluate how close F (ĉ(X, Ψ)) is to
zero. This will be done by solving for Ψ such that
〈F (ĉ(X, Ψ)), f (X)〉 = 0
where f (X) is some known function. We next review three methods which differs
in their choice for this function f (X).
First, a simple choice for f (X) is simply F (ĉ(X, Ψ)) itself. This defines the least
square metric as:
min
Ψ
〈F (ĉ(X, Ψ)), F (ĉ(X, Ψ))〉
The collocation method detailed in section 3.2.3 chose to find Ψ as
min
Ψ
〈F (ĉ(X, Ψ)), δ(X − Xi)〉 i = 1, . . . , n
where δ(X − Xi) is the mass point function at point Xi, i.e. δ(X) = 1 if X = Xi
49
and δ(X) = 0 elsewhere. Another possibility is to define
min
Ψ
〈F (ĉ(X, Ψ)), pi(X)〉 i = 1, . . . , n
where pi(X) is a base of the function space. This is called the Galerkin method.
An application of this method can be seen in section 3.2.3, where the base is taken
to be ”tent” functions.
Figure 3.4 displays some element of a computer code which calculates the residual
function F (ĉ(X, Ψ)) when the consumption rule is approximated by a second order
polynomial. This can then be used in one of the proposed methods.
Collocation Methods
Judd (1992) presents in more details this method applied to the growth model. The
function c(X) is approximated using Chebyshev polynomials. These polynomials
are defined on the interval [0, 1] and take the form:
pi(X) = cos(i arccos(X)) X ∈ [0, 1], i = 0, 1, 2, . . .
For i = 0, this polynomial is a constant. For i = 1, the polynomial is equal to X.
As these polynomials are only defined on the [0, 1] interval, one can usually scale
the state variables appropriately. 27 The policy function can then be expressed as:
ĉ(X, Ψ) =
n∑
i=1
ψipi(X)
Next, the method find Ψ which minimizes
〈F (ĉ(X, Ψ)), δ(X − Xi)〉 i = 1, . . . n
where δ() is the mass point function. Hence, the method requires that F (ĉ(X, Ψ))
is zero at some particular points Xi and not over the whole range [X̄L, X̄H ]. The
method is more efficient if these points are chosen to be the zeros of the basis
50
elements pi(X), here Xi = cos(π/2i). In this case the method is referred to as an
orthogonal collocation method. Ψ is the solution to a system of nonlinear equations:
F (ĉ(Xi, Ψ)) = 0 i = 1, . . . n
This method is good at approximating policy functions which are relatively smooth.
A draw back with this method is that the Chebyshev polynomials tends to display
oscillations at higher orders. The resulting policy function c(X) will also tend to
display wriggles. There is no particular rule for choosing n, the highest order of the
Chebyshev polynomial. Obviously, the higher n is the better the approximation,
but this comes at an increased cost of computation.
Finite Element Methods
McGrattan (1996) illustrates the finite element method with the stochastic growth
model (see also Reddy (1993) for a more in-depth discussion on finite elements).
To start, the state variable X is discretized over a grid {Xis}nsis=1. The finite
element method is based on the following functions:
pis (X) =
X − Xis−1
Xis − Xis−1 if X ∈ [X
is−1, Xis ]
Xis+1 − X
Xis+1 − Xis if X ∈ [X
is , Xis+1]
0 elsewhere
The function pis (X) is a very simple function which is in [0,1], as illustrated in
Figure 3.5. This is in fact a simple linear interpolation (and an order two spline,
see section 3.A.1 for more details on these techniques). On the interval [Xis , Xis+1],
the function ĉ(X) is equal to the weighted sum of pis (X) and pis+1(X). Here the
residual function satisfies
〈F (ĉ(X, Ψ)), pi(X)〉 = 0 i = 1, . . . n
51
or equivalently, choosing a constant weighting function:∫ X̄
0
pis (X)F (ĉ(X))dX = 0 is = 1, . . . , ns
This gives a system with ns equations and ns unknowns, {ψis}nsis=1. This non-linear
system can be solved to find the weights {ψis}. To solve the system, the integral can
be computed numerically using numerical techniques, see Appendix 3.A.2 for more
details. As in the collocation method, the choice of ns is the result of a trade-off
between increased precision and higher computational burden.
[Figure 3.5 approximately here]
3.3 Stochastic Discrete Cake Eating Problem
We present here another example of a dynamic programming model. It differs from
the one presented in section 3.2 in two ways. First, the decision of the agent is not
continuous (how much to eat) but discrete (eat or wait). Second, the problem has
two state variables as the exogenous shock is serially correlated.
The agent is endowed with a cake of size W . At each period, the agent has to
decide whether to eat the cake entirely or not. If not eaten, the cake shrinks by
a factor ρ each period. The agent also experiences taste shocks, possibly serially
correlated and which follows an autoregressive process of order one. The agent
observes the current taste shock at the beginning of the period, before the decision
to eat the cake is taken. However, the future shocks are unobserved by the agent,
introducing a stochastic element into the problem. Although the cake is shrinking,
the agent might decide to postpone the consumption decision until a period with a
better realization of the taste shock. The program of the agent can be written in
the form:
V (W, �) = max[�u(W ), βE�′|�V (ρW, �
′)] (3.7)
52
where V (W, ε) is the intertemporal value of a cake of size W conditional of the
realization ε of the taste shock. Here E�′ denotes the expectation with respect to
the future shock �, conditional on the value of �. The policy function is a function
d(W, ε) which takes a value of zero if the agent decides to wait or one if the cake is
eaten. We can also define a threshold ε∗(W ) such that:
d(W, ε) = 1 if ε > ε∗(W )
d(W, ε) = 0 otherwise
As in section 3.2, the problem can be solved by value function iterations. How-
ever, as the problem is discrete we cannot use the projection technique as the decision
rule is not a smooth function, but a step function.
3.3.1 Value Function Iterations
As before, we have to define first the functional form for the utility function and we
need to discretize the state space. If we consider ρ < 1, the cake shrinks with time
and W is naturally bounded between W̄ , the initial size and 0. In this case, the size
of the cake takes only values equal to ρtW̄ , t ≥ 0. Hence, ΨS = {ρiW̄ } is a judicious
choice for the state space. Contrary to an equally spaced grid, this choice ensures
that we do not need to interpolate the value function outside of the grid points.
Next, we need to discretize the second state variable, ε. The shock is supposed
to come from a continuous distribution and follows an autoregressive process of
order one. We discretize ε in I points {εi}Ii=1 following a technique presented by
Tauchen (1986) and summarized in appendix 3.A.2. In fact, we approximate an
autoregressive process by a markov chain. The method determines the optimal
discrete points {εi} and the transition matrix πij = Prob(εt = εi|εt−1 = εj) such
that the markov chain mimics the AR(1) process. Of course, the approximation is
only good if I is big enough.
53
In the case where I = 2, we have to determine two grid points �L and �H . The
probability that a shock �L is followed by a shock �H is denoted by πLH . The
probability of transitions can be stacked in a transition matrix:
π =
[
πLL πLH
πHL πHH
]
with the constraints that the probability of reaching either a low or a high state next
period is equal to one: πLL + πLH = 1 and πHL + πHH = 1. For a given size of the
cake W is = ρis W̄ and a given shock �j, j = L or H, it is easy to compute the first
term �ju(ρ
is W̄ ). To compute the second term we need to calculate the expected
value of tomorrow’s cake. Given a guess for the value function of next period, v(., .)
the expected value is:
E�′|�j v(ρ
is+1W̄ ) = πjLv(ρ
is+1W̄ , �L) + πjH v(ρ
is+1W̄ , �H )
The recursion is started backward with an initial guess for V (., .). For a given
state of the cake Wis and a given shock εj, the new value function is calculated from
equation (3.7). The iterations are stopped when two successive value functions are
close enough. In terms of numerical computing, the value function is stored as a
matrix V of size nW xnε where nW and nε are the number of points on the grid for
W and ε. At each iteration, the matrix is updated with the new guess for the value
function. Figure 3.6 displays an example of a computer code which computes the
value function vj+1(W, ε) given the value vj(W, ε).
[Figure 3.6 approximately here]
Given the way we have computed the grid, the next period value is simple to
compute as it is given by V[is − 1, .]. This rule is valid if is > 1. Computing V[1, .]
will be more of a problem. One can use an extrapolation method to approximate
the values, given the knowledge of V[is, .], is > 1.
54
Figure 3.7 displays the value function for particular parameters. The utility
function was taken to be u(c, ε) = ln(εc) and ln(ε) is supposed to follow an AR(1)
process with mean zero, autocorrelation ρε = 0.5 and with an unconditional variance
of 0.2. We have discretized ε into 4 grid points.
[Figure 3.7 approximately here]
Figure 3.8 displays the decision rule, and the function ε∗(W ). This threshold
was computed as the solution of:
u(W, ε∗(W )) = βEε′|εV (ρW, ε
′)
which is the the value of the taste shock which makes the agent indifferent between
waiting and eating, given the size of the cake W .
[Figure 3.8 approximately here]
We return later in this book to examples of discrete choice models. In particular,
we refer the readers to the models presented in section 8.5 and 7.3.3.
3.4 Extensions and Conclusion
This chapter has reviewed common techniques to solve dynamic programming prob-
lems as seen in chapter 2. We have applied these techniques to both deterministic
and stochastic problems, to continuous and discrete choice models. In principle,
these methods can be applied to solve more complicated problems.
3.4.1 Larger State Spaces
Both examples we have studied in sections 3.2 and 3.3 have small state spaces. In
empirical applications, the state space often need to be much larger if the model
55
has to be confronted with real data. For instance, the endowment shocks might be
serially correlated or the interest rate, R, might also be a stochastic and persistent
process.
For the value function iteration method, this means that the successive value
functions have to be stacked in a multidimensional matrix. Also, the value function
has to be interpolated in several dimensions. The techniques in section 3.A.1 can be
extended to deal with this problem. However, the value function iteration method
runs quickly into the ”curse of dimensionality”. If each state variable is discretized
into ns grid points, the value function has to be evaluated into N
ns points, where
N is the number of state variables. This demands an increasing computer memory
and slows down the computation. A solution to this problem is to evaluate the
value function for a subset of the points in the state space and then interpolate the
value function elsewhere. This solution has been implemented by Keane and Wolpin
(1994).
Projection methods are better at handling larger state spaces. Suppose the
problem is characterized by N state variables {X1, . . . , XN }. The approximated
policy function can be written as:
ĉ(X1, . . . , XN ) =
N∑
j=1
nj∑
ij =1
ψ
j
ij
pij (Xj)
The problem is then characterized by auxiliary parameters {ψji }.
Exercise 3.1
Suppose u(c) = c1−γ/(1 − γ). Construct the code to solve for the stochastic cake
eating problem, using the value function iteration method. Plot the policy function as
a function of the size of the cake and the stochastic endowment, for γ = {0.5, 1, 2}.
Compare the level and slope of the policy functions for different values of γ. How
do you interpret the results?
56
Exercise 3.2
Consider the example of the discrete cake eating problem in section 3.3. Con-
struct the code to solve for this problem, with i.i.d. taste shocks, using u(c) = ln(c),
εL = 0.8, εH = 1.2, πL = 0.3 and πH = 0.7. Map the decision rule as a function of
the size of the cake.
Exercise 3.3
Consider an extension of the discrete cake eating problem seen in section 3.3.
The agent has now the choice between three actions: eat the cake, store it in fridge
1 or in fridge 2. In fridge 1, the cake shrinks by a factor ρ: W ′ = ρW . In fridge
2, the cake diminish by a fixed amount: W ′ = W − κ. The program of the agent is
characterized as:
V (W, ε) = max[V Eat(W, ε), V Fridge 1(W, ε), V Fridge 2(W, ε)]
with
V Eat(W, ε) = εu(W )
V Fridge 1(W, ε) = βEε′V (ρW, ε
′)
V Fridge 2(W, ε) = βEε′V (W − κ, ε′)
Construct the code to solve for this problem, using u(c) = ln(c), εL = 0.8, εH = 1.2,
πL = 0.5 and πH = 0.5. When will the agent switch from one fridge to the other?
Exercise 3.4
Consider the stochastic cake eating problem. Suppose that the discount rate β
is a function of the amount of cake consumed: β = Φ(β1 + β2c), where β1 and
β2 are known parameters and Φ() is the normal cumulative distribution function.
Construct the code to solve for this new problem using value function iterations.
Suppose γ = 2, β1 = 1.65, πL = πH = 0.5, yL = 0.8, yH = 1.2 and β2 = −1.
57
Plot the policy rule c = c(X). Compare with the case where the discount rate is
independent of the quantity consumed. How would you interpret the fact that the
discount rate depends on the amount of cake consumed?
58
3.A Additional Numerical Tools
This appendix provides some useful numerical tools which are often used when
solving dynamic problems. We present interpolation methods, numerical integration
methods as well as a method to approximate serially correlated processes by a
markov process. The last section is devoted to simulations.
3.A.1 Interpolation Methods
We briefly review three simple interpolation methods. For further readings, see for
instance Press et al. (1986) or Judd (1996).
When solving the value function or the policy function, we often have to calculate
the value of these functions outside of the points of the grid. This requires to
be able to interpolate the function. Using a good interpolation method is also
helpful as one can save computer time and space by using fewer grid points to
approximate the functions. Denote f (x) the function to approximate. We assume
that we know this function at a number of grid points xi, i = 1, . . . , I. Denote by
fi = f (xi) the values of the function at these grid points. We are interested in finding
an approximate function f̂ (x) such that f̂ (x) � f (x), based on the observations
{xi, fi}. We present three different methods and use as an example the function
f (x) = xsin(x). Figure 3.9 displays the results for all the methods.
[Figure 3.9 approximately here]
Least Squares Interpolation
A natural way to approximate f () is to use an econometric technique, such as OLS,
to ”estimate” the function f̂ (.). The first step is to assume a functional form for f̂ .
For instance, we can approximate f with a polynomial in x such as:
f̂ (x) = α0 + α1x + . . . + αN x
N N < I
59
By regressing fi on xi we can easily recover the parameters αn. In practice, this
method is often not very good, unless the function f is well behaved. Higher order
polynomials tend to fluctuate and can occasionally give an extremely poor fit. This
is particularly true when the function is extrapolated outside of the grid points, i.e
when x > xI or x < x1. The least square method is a global approximation method.
As such, the fit can be on average satisfactory but mediocre almost everywhere.
This can be seen in the example in Figure 3.9.
Linear Interpolation
This method fits the function f with piecewise linear functions on the intervals
[xi−1, xi]. For any value of x in [xi−1, xi], an approximation f̂ (x) of f (x) can be
found as:
f̂ (x) = fi−1 +
fi − fi−1
xi − xi−1
(x − xi−1)
A finer grid will give a better approximation of f (x). When x is greater than
xI , using this rule can lead to numerical problems as the above expression may
not be accurate. Note that the approximation function f̂ is continuous, but not
differentiable at the grid points. This can be an undesirable feature as this non
differentiability can be translated to the value function or the policy function.
This method can be extended for multivariate functions. For instance, we can
approximate the function f (x, y) given data on {xi, yj, fij}. Denote dx = (x −
xi)/(xi−1 − xi) and dy = (y − yi)/(yi−1 − yi). The approximation can be written as:
f̂ (x, y) = dxdyfi−1,j−1 + (1 − dx)dyfi,j−1 + dx(1 − dy)fi−1,j + (1 − dx)(1 − dy)fi,j
The formula can be extended to higher dimension as well.
60
Spline Methods
This method extends the linear interpolation by fitting piecewise polynomials while
ensuring that the resulting approximate function f̂ is both continuous and differ-
entiable at the grid points xi. We restrict ourself to cubic splines for simplicity,
but the literature on splines is very large (see for instance De Boor (1978)). The
approximate function is expressed as:
f̂i(x) = fi + ai(x − xi−1) + bi(x − xi−1)2 + ci(x − xi−1)3 x ∈ [xi−1, xi]
Here for each point on the grid, we have to determine three parameters {ai, bi, ci},
so in total there is 3I parameters to compute. However, imposing the continuity
of the function and of its derivative up to the second order reduces the number of
coefficients:
f̂i(x) = f̂i+1(x)
f̂ ′i (x) = f̂
′
i+1(x)
f̂ ′′i (x) = f̂
′′
i+1(x)
It is also common practice to apply f̂ ′′1 (x1) = f̂
′′
I (xI ) = 0. With these constraints,
the number of coefficients to compute is down to I. Some algebra gives:
ai =
fi − fi−1
xi − xi−1 − bi(xi − xi−1) − ci(xi − xi−1)
2 i = 1, . . . , I
ci =
bi+1 − bi
3(xi − xi−1) i = 1, . . . , I − 1
cI = − bI3(xI − xI−1)
ai + 2bi(xi − xi−1) + 3ci(xi − xi−1)2 = ai+1
Solving this system of equation leads to expressions for the coefficients {ai, bi, ci}.
Figure 3.9 shows that the cubic spline is a very good approximation to the function f .
3.A.2 Numerical Integration
Numerical integration is often required in dynamic programming problems to solve
for the expected value function or to ”integrate out” an unobserved state variable.
For instance, solving the Bellman equation (3.3) requires to calculate Ev(X′) =
61
∫
v(X′)dF (X′), where F (.) is the cumulative density of the next period cash-on-
hand X. In econometric applications, some important state variables might not be
observed. If this is the case, then one need to compute the decision rule, uncondi-
tional of this state variable. In the case of the stochastic cake eating problem seen in
section 3.2, if X is not observed, one could compute c̄ =
∫
c(X)dF (X) which is the
unconditional mean of consumption, and match it with observed consumption. We
present three methods which can be useful when numerical integration is needed.
Quadrature Methods
There is a number of quadrature method. We briefly detail the Gauss-Legendre
method (much more detailed information can be found in Press et al. (1986)). The
integral of a function f is approximated as:
∫ 1
−1
f (x)dx � w1f (x1) + . . . + wnf (xn) (3.8)
where wi and xi are n weights and nodes to be determined. Integration over a
different domain can be easily handled by operating a change of the integration
variable. The weights and the nodes are computed such that (3.8) is exactly satisfied
for polynomials of degree 2n − 1 or less. For instance, if n = 2, denote fi(x) = xi.
The weights and nodes satisfy:
w1f1(x1) + w2f1(x2) =
∫ 1
−1 f1(x)dx
w1f2(x1) + w2f2(x2) =
∫ 1
−1 f2(x)dx
w1f3(x1) + w2f3(x2) =
∫ 1
−1 f3(x)dx
w1f4(x1) + w2f4(x2) =
∫ 1
−1 f4(x)dx
This is a system of four equation with four unknowns. The solution is w1 = w2 = 1
and x2 = −x1 = 0.578. For larger values of n, the computation is similar. By
increasing the number of nodes n, the precision increases. Note that the nodes are
not necessarily equally spaced. The weights and the value of the nodes are published
in the literature for commonly used values of n.
62
Approximating an Autoregressive Process with a Markov Chain
In this section we follow Tauchen (1986) and Tauchen and Hussey (1991) and show
how to approximate an autoregressive process of order one by a first order markov
process. This is useful to simplify the computation of expected values in the value
function iteration framework.
For instance, to solve the value function in the cake eating problem, we need to
calculate the expected value given ε:
V (W, ε) = max[εu(W ), Eε′|εV (ρW, ε
′)]
This involves the calculation of an integral at each iteration, which is cumbersome.
If we discretize the process εt, into N points ε
i, i = 1, . . . , N , we can replace the
expected value by:
V (W, εi) = max
[
εu(W ),
N∑
j=1
πi,jV (ρW, ε
j)
]
i = 1, . . . , N
As in the quadrature method, the methods involves finding nodes εj and weights πi,j.
As we shall see below, the εi and the πi,j can be computed prior to the iterations.
Suppose that εt follows an AR(1) process, with an unconditional mean µ and an
autocorrelation ρ:
εt = µ(1 − ρ) + ρεt−1 + ut (3.9)
where ut is a normally distributed shock with variance σ
2. To discretize this process,
we need to determine three different objects. First, we need to discretize the process
εt into N intervals. Second, we need to compute the conditional mean of εt within
each intervals, which we denote by zi, i, . . . , N . Third, we need to compute the
probability of transition between any of these intervals, πi,j. Figure 3.10 graphs the
distribution of ε and shows the cut-off points εi as well as the conditional means zi.
[Figure 3.10 approximately here]
63
The first step is to discretize the real line into N intervals, defined by the limits
ε1, . . . , εN +1. As the process εt is unbounded, ε
1 = −∞ and εN +1 = +∞. The
intervals are constructed such that εt has an equal probability of 1/N of falling into
them. Given the normality assumption, the cut-off points {εi}N +1i=1 are defined as
Φ(
εi+1 − µ
σε
) − Φ( ε
i − µ
σε
) =
1
N
, i = 1, . . . , N (3.10)
where Φ() is the cumulative of the normal density and σε is the standard deviation
of ε and is equal to σ/
√
(1 − ρ). Working recursively we get:
εi = σεΦ
−1(
i − 1
N
) + µ
Now that we have defined the intervals, what is the average value of ε within a
given interval? We denote this value by zi, which is computed as the mean of εt
conditional on εt ∈ [εi, εi+1].
zi = E(εt / εt ∈ [εi, εi+1]) = σε
φ(
εi − µ
σε
) − φ( ε
i+1 − µ
σε
)
Φ(
εi+1 − µ
σε
) − Φ( ε
i − µ
σε
)
+ µ
Using (3.10), the expression simplifies to:
zi = N σε
(
φ(
εi − µ
σε
) − φ( ε
i+1 − µ
σε
)
)
+ µ
Next, we define the transition probability as
πi,j = P (εt ∈ [εj, εj+1]|εt−1 ∈ [εi, εi+1])
πi,j =
1√
2πσε
∫ εj+1
εj
e
−(u − µ)
2
2σ2ε
[
Φ(
εi+1 − µ(1 − ρ) − ρu
σ
) − Φ( ε
i − µ(1 − ρ) − ρu
σ
)
]
du
The computation of πi,j requires the computation of a non trivial integral. This
can be done numerically. Note that if ρ = 0, i.e. ε is an i.i.d. process, the above
expression is simply:
πi,j = 1/N
64
We can now define a Markov process zt which will mimic an autoregressive
process of order one, as defined in (3.9). zt takes its values in {zi}Ni=1 and the
transition between period t and t + 1 is defined as:
P (zt = z
j/ zt−1 = z
i) = πi,j
By increasing N , the discretization becomes finer and the markov process gets
closer to the real autoregressive process.
Example: For N=3, ρ = 0.5, µ = 0 and σ = 1, we have:
z1 = −1.26 z2 = 0 z3 = 1.26
and
π =
0.55 0.31 0.140.31 0.38 0.31
0.14 0.31 0.55
3.A.3 How to Simulate the Model
Once the value function is computed, the estimation or the evaluation of the model
often requires the simulation of the behavior of the agent through time.
If the model is stochastic, the first step is to generate a series for the shocks,
for t = 1, . . . , T . Then, we go from period to period and use the policy function to
find out the optimal choice for this period. We also update the state variable and
proceed to next period.
How to Program a Markov Process
The markov process is characterized by grid points, {zi} and by a transition matrix
π, with elements πij = P rob(yt = z
j/yt−1 = zi).
We start in period 1. The process zt is initialized at , say z
i. Next, we have to
assign a value for z2. To this end, using the random generator of the computer, we
65
draw a uniform variable, u, in [0, 1]. The state in period 2, j, is defined as:
j∑
l=1
πi,l < u ≤
j+1∑
l=1
πi,l
or j = 1 if u < πi,1. The values for the periods ahead are constructed in a similar
way. Figure 3.11 presents a computer code which will construct iteratively the values
for T periods.
[Figure 3.11 approximately here]
How to Simulate the Model
For this, we need to initialize all stochastic processes, which are the exogenous shock
and the state variables. The state variables can be initialized to their long run values
or to some other value. Often, the model is simulated over a long number of periods
and the first periods are discarded to get rid of initial condition problems.
The value of the state variables and the shock in period 1 are used to determine
the choice variable in period 1. In the case of the continuous stochastic cake eating
problem in section 3.2, we would construct c1 = c(X1). Next, we can generate
the values of the state variable in period 2, X2 = R(X1 − c1) + y2 where y2 is
calculated using the method described in section 3.A.3 above. This procedure would
be repeated over T periods to successively construct all the values for the choice
variables and the state variables.
Chapter 4
Econometrics
4.1 Overview
This chapter reviews techniques to estimate parameters of models based on dynamic
programming. This chapters is organized in two parts. In section 4.2, we present two
simple examples to illustrate the different estimation methodologies. We analyze a
simple coin flipping experiment and the classic problem of supply and demand. We
review standard techniques such as maximum likelihood and the method of moments
as well as simulated estimation techniques. The reader who is already familiar
with econometric techniques could go to section 4.3 which gives more details on
these techniques and studies the asymptotic properties of the estimators. A more
elaborate dynamic programming model of cake eating is used to illustrate these
different techniques.
66
67
4.2 Some Illustrative Examples
4.2.1 Coin Flipping
We consider here a simple coin flipping example. The coin is not necessarily fair
and the outcome of the draw is either heads with a probability P1 or tails with a
probability P2 = 1−P1, with {P1, P2} ∈ [0, 1]x[0, 1]. We are interested in estimating
the probability of each outcome. We observe a series of T draws from the coin.
Denote the realization of the tth draw by xt, which is equal either to 1 (if heads) or
2 (if tails). The data set at hand is thus a series of observations {x1, x2, . . . , xT }.
This section will describe a number of methods to uncover the probabilities {P1, P2}
from observed data.
This simple example can be extended in two directions. First, we can try to
imagine a coin with more than two sides (a dice). We are then able to consider
more than two outcomes per draw. In this case, we denote P = {P1, . . . , PI} a
vector with I elements where Pi = P (xt = i) is the probability of outcome i. We are
interested in estimating the probabilities {Pi}i=1,...,I . For simplicity, we sometimes
state results for the case where I = 2, but the generalization to a larger number of
outcomes is straightforward.
Second, it may be possible that the draws are serially correlated. The probability
of obtaining a head might depend on the outcome of the previous draw. In this case
we want to estimate P (xt = j|xt−1 = i). We also consider this generalized example
below.
Of course, the researcher may not be interested in these probabilities alone but
rather, as in many economic examples, the parameters that underlie P . To be
more specific, suppose that one had a model parameterized by θ ∈ Θ ⊂ Rκ that
determines P . That is, associated with each θ is a vector of probabilities P . Denote
by M (θ) the mapping from parameters to probabilities: M : Θ −→ [0, 1]I .
68
In the case where I = 2, we could consider a fair coin, in which case θ = (1/2, 1/2)
and P = (P1, P2) = (1/2, 1/2). Alternatively we could consider a coin which is
biased towards heads, with θ = (2/3, 1/3) and P = (P1, P2) = (2/3, 1/3). In these
examples, the model M is the identity, M (θ) = θ. In practice, we would have to
impose that θ ∈ [0, 1] in the estimation algorithm. Another way of specifying the
model is to chose a function M (.) which is naturally bounded between 0 and 1. In
this case, we can let θ to belong to R. For instance, the cumulative distribution of
the normal density, noted Φ(.) satisfies this condition. In the fair coin example, we
could have θ = (0, 0) and P = (Φ(0), Φ(0)) = (1/2, 1/2). With the biased coin, we
would have θ = (0.43, −0.43), as Φ(0.43) = 2/3 and Φ(−0.43) = 1/3.
Maximum Likelihood
IID case: We start with the case where the draws from the coin are identically and
independently distributed. The likelihood of observing the sample {x1, x2, . . . , xT }
is given by:
£(x, P ) = ΠIi=1P
#i
i
where #i is the number of observations for which event i occurs. Thus £ represents
the probability of observing {xt}Tt=1 given P . The maximum likelihood estimator of
P is given by:
P = arg max £. (4.1)
By deriving the first order condition for a maximum of £(x, P ), the maximum
likelihood estimate of Pi, i = 1, 2, ...I is given by:
P ∗i =
#i∑
i #i
. (4.2)
In words, the maximum likelihood estimator of Pi is the fraction of occurrences of
event i.
69
Suppose that one had a model M (.) for the probabilities, parameterized by θ.
So, indirectly, the likelihood of the sample depends on this vector of parameters,
denote it £̃(x, θ) = £(x, M (θ)). In that case, the maximum likelihood estimator of
the parameter vector (θ∗) is given by:
θ∗ = arg max
θ
£̃(x, θ).
In effect, by a judicious choice of θ, we choose the elements of P to maximize the
likelihood of observing the sample. In fact, by maximizing this function we would
end up at the same set of first-order conditions, (4.2), that we obtained from solving
(4.1).
Example 4.1
Suppose I=2 and that M (θ) = Φ(θ), where Φ(.) is the cumulative distribution func-
tion of the standardized normal density. 28 In this case, p1 = P (xt = 1) = Φ(θ) and
p2 = 1 − Φ(θ). The parameter is estimated by maximizing the likelihood of observing
the data:
θ∗ = arg max
θ
Φ(θ)#1(1 − Φ(θ))#2
where #1 and #2 are the number of observations that fall into category 1 and 2.
Straightforward derivation gives:
θ∗ = Φ−1(
#1
#1 + #2
)
Markov Structure: The same issues arise in a model which exhibits more dy-
namics, as is the case when the outcomes are serially correlated. Let Pij denote the
probability of observing event j in period t + 1 conditional on observing event i in
period t:
Pij = Prob (xt+1 = j|xt = i).
70
These conditional probabilities satisfy: Pij ∈ (0, 1) and
∑
j Pij = 1 for i = 1, 2, .., I.
Intuitively, the former condition says that given the current state is i, in period t + 1
all j ∈ I will occur with positive probability and the latter condition requires that
these probabilities sum to one. The probability of observing the sample of data is:
£(x, P ) = P (x1, . . . , xT ) =
T∏
l=2
P (xl|xl−1) P (x1)
Let #ij denote the number of observations in which state j occurred in the period
following state i. Then the likelihood function in this case is:
£(x, P ) = (ΠiP
#ij
ij ) ∗ P (x1)
We can express the probability of the first observation as a function of the Pij
probabilities.
P (x1) =
I∑
j=1
P (x1|x0 = j) =
I∑
j=1
Pj1
As before, the conditional probabilities and this initial probability can, in principle,
depend on θ. Thus the maximum likelihood estimator of θ would be the one that
maximizes £(x, P ). Note that there are now a large number of probabilities that are
estimated through maximum likelihood: I(I − 1). Thus a richer set of parameters
can be estimated with this structure.
Method of Moments
Continuing with our examples, we consider an alternative way to estimate the pa-
rameters. Consider again the iid case and suppose there are only two possible
outcomes, I = 2, so that we have a repeated Bernoulli trial. Given a sample of ob-
servations, let µ denote a moment computed from the data. For example, µ might
simply be the fraction of times event i = 1 occurred in the sample. In this case,
µ = P1.
Let µ(θ) denote the same moment calculated from the model when the data
generating process (the model M ) is parameterized by θ. For now, assume that the
71
number of parameters, κ, is equal to one so that the number of parameters is equal
to the number of moments (the problem is then said to be just identified). Consider
the following optimization problem:
min
θ
(µ(θ) − µ)2.
Here we are choosing the parameters to bring the moment from the model as close
as possible to that from the actual data. The θ that emerges from this optimization
is a method of moments estimator, denote this estimate by θ̂.
Example 4.2
Suppose we chose as a moment the fraction of times event i = 1 occurs in the sample.
From our model of coin flipping, this fraction is equal to Φ(θ). The parameter is
estimated by minimizing the distance between the fraction predicted by the model and
the observed one:
θ∗ = arg min
θ
(
Φ(θ) − #1
#1 + #2
)2
Solving the minimization problem gives:
θ∗ = Φ−1
(
#1
#1 + #2
)
Hence, with this choice of moment, the method of moment estimator is the same as
the maximum likelihood one, seen in example 4.1.
In example 4.2 we chose a particular moment which was the fraction of heads in
the sample. Often, in a data set, there is a large set of moments to chose from.
The method of moment does not guide us in the choice of a particular moment.
So which moment should we consider? The econometric theory has not come out
with a clear indication of ”optimal” moments. However, the moments should be
informative of the parameters to estimate. This means that the moments under
72
consideration should depend on the parameters in such a way that slight variations
in their values results in different values for the moments.
With a choice of moment different from the one in example 4.2, the method of
moment estimator would have been different from the maximum likelihood estima-
tor. However, asymptotically, when the size of the data set increases both estimator
converge to the true value.
More generally, let µ be a mx1 column vector of moments from the data. If
κ < m the model is said to be over identified, as there are more moments than
parameters to estimate. If κ = m, the model is said to be just identified and if
κ > m, the model is under identified. In the latter case, estimation cannot be
achieved as there are too many unknown parameters.
So if κ ≤ m, the estimator of θ comes from:
min
θ
((µ(θ) − µ)′W −1(µ(θ) − µ).
In this quadratic form, W is a weighting matrix. As explained below, the choice of W
is important for obtaining an efficient estimator of θ when the model is overidenfied.
Using Simulations
In many applications, the procedures outlined above are difficult to implement,
either because the likelihood of observing the data or the moments are difficult
to compute analytically or because it involves solving too many integrals. Put
differently, the researcher does not have an analytic representation of M (θ). If this
is the case, then estimation can still be carried out numerically using simulations.
Consider again the iid case, where I = 2. The simulation approach proceeds
in the following way. First, we fix θ, the parameter of M (θ). Second, using the
random number generator of a computer, we generate S draws {us} from a uniform
73
distribution over [0, 1]. We classify each draw as heads (denoted i = 1) if us <
M (θ) or tails (denoted i = 2) otherwise. The fractions of the two events in the
simulated data are used to approximate P Si (θ) by counting the number of simulated
observations that take value i, denoted by �i. So, P
S
i (θ) = �i/S. The simulated
maximum likelihood estimator is defined as:
θ∗S = arg max
θ
∏
i
P Si (θ)
#i
where, as before, #i refers to the fraction of observations in which i occurs. The
estimator is indexed by S, the number of simulations. Obviously, a larger number
of simulation draws will yield more precise estimates. Figure 4.1 displays the log-
likelihood for the coin flipping example, based on two series of simulation with
respectively 50 and 5000 draws. The observed data set was a series of 100 draws.
The log-likelihood has a maximum at the true value of the parameter, although the
likelihood is very flat around the true value when the number of simulations is small.
Exercise 4.1
Build a computer program which computes the likelihood function, using simula-
tions, of a sample of T draws for the case where I = 3.
[Figure 4.1 approximately here]
For the method of moment estimator, the procedure is the same. Once an arti-
ficial data set has been generated, we can compute moments both on the artificial
data and on the observed data. Denote by µS(θ) a moment derived from the simu-
lated data. For instance, µ and µS(θ) could be the fraction of heads in the observed
sample and in the simulated one. The simulated method of moment estimator
is defined as:
θ∗S = arg min
θ
(µS(θ) − µ)′W −1(µS(θ) − µ)
74
Figure 4.2 displays the objective function for the simulated method of moments.
The function has a minimum at the true value of the parameter. Once again, using
more simulation draws gives a smoother function, which will be easier to minimize.
Exercise 4.2
Build a computer program which computes the objective function, using simula-
tions, of a sample of T draws for the case where I = 3.
[Figure 4.2 approximately here]
In both methods, the estimation requires two steps. First, given a value of θ, one
needs to simulate artificial data and compute either a likelihood or a moment from
this data set. Second, using these objects, the likelihood or the objective function
has to be evaluated and a new value for the parameters, closer to the true one,
found. These two steps are repeated until convergence to the true value.
To compute the simulated data, we need to draw random shocks using the ran-
dom number generator of a computer. It is to be noted that the random draws have
to be computed once and for all at the start of the estimation process. If the draws
change between iterations, it would be unclear whether the change in the criterion
function comes from a change in the parameter or from a change in the random
draws.
The ability to simulate data opens the way to yet another estimation method:
indirect inference. This method uses an auxiliary model chosen by the researcher.
This model should be easy to estimate by standard techniques and should capture
enough of the interesting variation in the data. We denote it by M̃ (ψ), where ψ
is a vector of auxiliary parameters describing this new model. Given a guess for
the vector of structural parameters θ, the true model can be simulated to create a
new data set. The auxiliary model is estimated both on the real data and on the
75
simulated one, providing two sets of auxiliary parameters. The vector θ is chosen
such that the two sets of auxiliary parameters are close to each other.
Note that the vector of auxiliary parameters ψ is of no particular interest per se,
as it describes a misspecified model (M̃ ). Within the context of the original model,
it has no clear interpretation. However, it serves as a mean to identify and estimate
the structural parameters θ.
Example 4.3
For instance, if M (θ) = Φ(θ), the model has no closed-form solution as the cu-
mulative of the normal density has no analytical form. Instead of approximating it
numerically, we can use the indirect inference method to estimate parameters of in-
terest without computing this function. We might turn to an auxiliary model which
is easier. For instance, the logit model has closed forms for the probabilities. Denote
by ψ the auxiliary parameter parameterizing the logit model. With such a model, the
probability of observing xt = 1 is equal to:
P (xt = 1) =
exp(ψ)
1 + exp(ψ)
Denote by #1 and #2 the number of cases that fall into category 1 and 2. The
log-likelihood of observing some data is:
£ = #1 ln
exp(ψ)
1 + exp(ψ)
+ #2 ln
1
1 + exp(ψ)
= #1ψ − (#1 + #2) ln(1 + exp(ψ))
Maximization of this log likelihood and some rearranging gives a simple formula
for the ML estimator of the auxiliary parameter: ψ = ln #1
#2
. We can compute this
estimator of the auxiliary parameter both for our observed data and for the simulated
data by observing in each case the empirical frequencies. Denote the former by ψ̂
and the latter by ψ̂
S
(θ). The indirect inference estimator is then:
θ∗S = arg min
θ
(ψ̂
S
(θ) − ψ̂)2 = argmin
θ
(ln
�1(θ)
�2(θ)
− ln #1
#2
)2
76
In this example, as the probit model is difficult to estimate by maximum likelihood
directly, we have instead replaced it with a logit model which is easier to estimate.
Although we are not interested in ψ per se, this parameter is a means to estimate
the parameter of importance, θ.
So far, we have not discussed the size of the simulated data set. Obviously,
one expects that the estimation will be more efficient if S is large, as either the
moments, the likelihood or the auxiliary model will be pinned down with greater
accuracy. Using simulations instead of analytical forms introduce randomness into
the estimation method. For short samples, this randomness can lead to biased esti-
mates. For instance, with the simulated maximum likelihood, we need the number
of simulation draws to go to infinity to get rid of the bias. This is not the case for
the simulated method of moment or the indirect inference, although the results are
more precise for a large S. We discuss this issue later on in this chapter.
Identification Issues
We conclude this section on coin flipping with an informal discussion of identification
issues. Up to here, we implicitly assumed that the problem was identified, i.e. the
estimation method and the data set allowed us to get a unique estimate of the true
vector of parameters θ.
A key issue is the dimensionality of the parameter space, κ, relative to I, the
dimensionality of P . First, suppose that κ = I − 1, so that the dimensionality of
θ is the same as the number of free elements of P .29 Second, assume that M (θ) is
one to one. This means that M is a function and for every P there exists only one
value of θ such that P = M (θ). In this case, we effectively estimate θ from P ∗ by
using the inverse of the model: θ∗ = M −1(P ∗).
77
This is the most favorable case of identification and we would say the parameters
of the model are just identified. It is illustrated in Figure 4.3 for the case of I = 2
and κ = 1. There is a unique value of the parameter, θ∗, for which the probability
predicted by the model, M (θ∗), is equal to the true probability.
[Figure 4.3 approximately here]
[Figure 4.4 approximately here]
[Figure 4.5 approximately here]
A number of problems can arise, even for the special case of κ = I − 1. First,
it might be that the model, M (θ), is not invertible. Thus, for a given maximum
likelihood estimate of P ∗ , there could be multiple values of θ that generate this
vector of probabilities. In this case, the model is not identified. This is shown
in Figure 4.4. Example 4.4 shows an example based on the method of moment
estimation where a particular choice of moment leads to non identification.
Example 4.4
Suppose we label heads as 1 and tails as 2. Suppose that instead of focusing on the
mean of the sample (i.e. the fraction of heads) we chose the variance of the sample.
The variance can be expressed as:
V (x) = Ex2 − (Ex)2
=
#1
#1 + #2
+ 4
#2
#1 + #2
− ( #1
#1 + #2
+ 2
#2
#1 + #2
)2
=
#1
#1 + #2
(1 − #1
#1 + #2
)
So the theoretical and the empirical moments are:
µ(θ) = Φ(θ)(1 − Φ(θ))
µ =
#1
#1 + #2
(1 − #1
#1 + #2
)
78
This might appear as a perfectly valid choice of moment, but in fact it is not. The
reason is that the function Φ(θ)(1 − Φ(θ)) is not a monotone function but a hump-
shaped one and thus not invertible. For both low and high values of θ, the function
is close to 0. The variance is maximal when the probability of obtaining a head is
equal to that of obtaining a tail. If either tails or heads are very likely, the variance
is going to be low. So a low variance indicates that either heads or tails are more
frequent, but does not tell us which occurrence is more likely. Hence, in this case,
the variance is not a valid moment to consider, for identification reasons.
Second, it might be that for a given value of P ∗, there does not exist a value
of θ such that M (θ) = P ∗. In this case, the model is simply not rich enough to
fit the data. This is a situation of misspecification. Put differently, there is a
zero-likelihood problem here as the model, however parameterized, is unable to
match the observations. This is illustrated in Figure 4.5.
So, returning to the simple coin flipping example, if there is a single parameter
characterizing the probability of a head occurring and the mapping from this pa-
rameter to the likelihood of heads is one-to-one, then this parameter can be directly
estimated from the fraction of heads. But, it might be that there are multiple values
of this parameter which would generate the same fraction of heads in a sample. In
this case, the researcher needs to bring additional information to the problem. Or,
there may be no value of this parameter that can generate the observed frequency
of heads. In this case, the model needs to be re-specified.
If, instead of κ = I − 1, we may have more dimensions to θ than informa-
tion in P : κ > I − 1. In this case, we have a situation where the model is again
underidentified. Given the maximum likelihood estimate of P ∗, there are multi-
ple combinations of the parameters that, through the model, can generate P ∗. In
79
this case, the researcher needs to bring additional information to the problem to
overcome the indeterminacy of the parameters. So in the coin-flipping example, a
physical theory that involved more than a single parameter would be impossible to
estimate from data that yields a single probability of heads.
Alternatively, if κ < I −1, then the parameters are overidentified. In this case,
there may not be any θ that is consistent with all the components of P. In many
applications, such as those studied in this book, this situation allows the researcher
a more powerful test of a model. If a model is just identified, then essentially
there exists a θ such that P ∗ can be generated by the model. But when a model
is overidentified, then matching the model to the data is a much more demanding
task. Thus a model that succeeds in matching the data, characterized by P ∗, when
the parameters are overidentified is viewed as more compelling.
4.2.2 Supply and Demand Revisited
Let us consider the classic problem of supply and demand. This model will serve
as an illustration for the previous estimation methods and to discuss the problem
of identification. The supply depends on prices, p and the weather, z. The demand
side depends on prices and income, y:
qS = αpp + αzz + εS (Supply)
qD = βpp + βyy + εD (Demand)
(4.3)
Both the demand and supply shocks are iid, normally distributed, with mean
zero and variance σ2S and σ
2
D and covariance ρSD. In total, this model has seven
parameters. We solve for the reduced form by expressing the equilibrium variables
as function of the exogenous variables y and z:
p∗ =
βy
αp − βp y −
αz
αp − βp z +
εD − εS
αP − βP = A1y + A2z + U1
q∗ =
αpβy
αp − βp y −
αzβp
αp − βp z +
αpεD − βpεS
αP − βP = B1y + B2z + U2
(4.4)
80
where A1, A2, B1 and B2 are the reduced form parameters. These parameters can
be consistently estimated from regressions using the reduced form. If the system
is identified, we are able to recover all the structural parameters from the reduced
form coefficients using:
αp = B1/A1 βp = B2/A2
βy = A1(B1/A1 − B2/A2) αz = −A2(B1/A1 − B2/A2)
(4.5)
From these four parameters, it is straightforward to back out the variance of the
demand and supply shocks. We can compute εS = q − αpp + αzz and calculate the
empirical variance. The same procedure can be applied to recover εD.
The estimation in two steps is essentially an instrumental variable estimation
where y and z are used as instrument for the endogenous variables p and q. Instead
of using a two step OLS method, we can use a number of alternative methods
including method of moments, maximum likelihood and indirect inference. We
review these methods in turn.
Method of Moments
Denote by θ the vector of parameters describing the model:
θ = (αp, αz, βp, βy)
For simplicity, we assume that σD, σS and ρSD are known to the researcher. From
the data, we are able to compute a list of empirical moments which consists, for
example, of the variance of prices and quantities and the covariance between prices,
quantities, income and the weather. Denote µ = {µ1, µ2, µ3, µ4}′ a 4x1 vector of
empirical moments with 30
µ1 = cov(p, y)/V (y) µ3 = cov(p, z)/V (z)
µ2 = cov(q, y)/V (y) µ4 = cov(q, z)/V (z)
(4.6)
These moments can be computed directly from the data. For instance, µ1 can
81
be expressed as:
µ1 =
∑T
t=1(pt − p̄)(yt − ȳ)∑T
t=1(yt − ȳ)2
From the model, we can derive the theoretical counterpart of these moments, ex-
pressed as functions of the structural parameters. We denote these theoretical mo-
ments µ(θ) = {µ1(θ), µ2(θ), µ3(θ), µ4(θ)}. Starting with the expressions in (4.4),
some straightforward algebra gives:
µ1(θ) =
βy
αp − βp µ3(θ) = −
αz
αp − βp
µ2(θ) =
αpβy
αp − βp µ4(θ) = −
αzβp
αp − βp
(4.7)
The basis of the method of moment estimation is that at the true value of the vector
of parameters,
E(µi(θ) − µi) = 0 , i = {1, . . . , 4}
This is called an orthogonality condition. In practical terms, we can bring the
moments from the model as close as possible to the empirical ones by solving:
θ∗ = Argmin
θ
L(θ) = Argmin
θ
(µ − µ(θ))′Ω(µ − µ(θ)) (4.8)
The ergodicity condition on the sample is the assumption used to make the empirical
and the theoretical moments the same as the sample size goes to infinity. Note that
this assumption is easily violated in many macro economic samples, as the data
is non stationary. In practice, most of the macro data is first made stationary by
removing trends.
How do the results of (4.8) compare to the results in (4.5)? Note that with
our choice of moments, µ1(θ) = A1, µ2(θ) = B1, µ3(θ) = A2 and µ4(θ) = B2. At
the optimal value of the parameters, we are left with solving the same problem as
in (4.4). This would lead to exactly the same values for the parameters as in (4.5).
The method of moment approach collapses the two steps of the previous section into
82
a single one. The estimation of the reduced form and solving the non linear system
of equations is done within a single procedure.
Could we chose other moments to estimate the structural parameters? As in
example 4.4, the answer is both yes and no. The moments must be informative of
the parameters of the model.
For instance, if we chose µ1 = E(z), the average value of weather, this moment
is independent of the parameterization of the model, as z is an exogenous variable.
Hence, we are in fact left to estimate four parameters with only three identifying
equations. Any moment involving an endogenous variable (p or q in our example)
can be used in the estimation and would asymptotically produce the same results.
With a finite number of observations, higher order moments are not very precisely
computed, so an estimation based on cov(p4, y), say, would not be very efficient.
Finally, note that when computing the moments in (4.7), we have not used the
assumption that the error terms εD and εS are normally distributed. Whatever their
joint distribution, (4.8) would give a consistent estimate of the four parameters
of interest. The next section presents the maximum likelihood estimation which
assumes the normality of the residuals.
Maximum Likelihood
The likelihood of observing jointly a given price p and a quantity q, conditional on
income and weather can be derived from the reduced form (4.4) as f (p − A1y −
A2z, q − B1y − B2z) where f (., .) is the joint density of the disturbances U1 and U2
and where A1, A2, B1, B2 are defined as in (4.4).
The likelihood of the entire sample is thus:
£(θ) =
T∏
t=1
f (pt − A1yt − A2zt, qt − B1yt − B2zt) (4.9)
We assume here that εD and εS are normally distributed, so U1 and U2 are also
83
normally distributed with zero mean. 31 The maximization of the likelihood function
with respect to the reduced form coefficients is a straightforward exercise. It will
give asymptotically consistent estimates of A1, A2, B1 and B2. Given that there is
a one to one mapping between the reduced form and the structural parameters, the
estimation will also provide consistent estimates of the parameters αp, βp, αz and
βy as in the method of moment case.
Indirect Inference
For a given value of the parameters, we are able to draw supply and demand
shocks from their distribution and to simulate artificial data for prices and de-
mand, conditional on observed weather and income. This is done using expres-
sion (4.4). Denote the observed data as {qt, pt, yt, zt}Tt=1. Denote the simulated data
as {qst , pst }t=1...,T,s=1,...,S. Denote the set of parameters of the structural system (4.3)
as θ = {αp, αz, βp, βz}. For simplicity, we assume that the parameters σD, σS and
ρDS are known.
Next, we need an auxiliary model which is simple to estimate. We could use the
system (4.3) as this auxiliary model. For both the observed and the simulated data,
we can regress the quantities on the prices and the income or the weather. Denote
the first set of auxiliary estimate ψ̂T and the second one ψ̃
s
T , s = 1, . . . , S. These
vectors contains an estimate for the effect of prices on quantities and the effect of
weather and income on quantity from both the supply and the demand equations.
These estimates will undoubtedly be biased given the simultaneous nature of the
system. However, we are interested in these auxiliary parameters only as a mean
to get to the structural ones (θ). The next step is to find θ which brings the vector
ψ̃
S
T = 1/S
∑S
s=1 ψ̃(θ)
s
T as close as possible to ψ̂T . Econometric theory tells us that
this will produce a consistent estimate of the parameters of interest, αp, αz, βq, βy.
Again, we rely here on the assumption of ergodicity. As will become apparent in
84
section 4.3.3, the estimator will be less efficient than maximum likelihood or the
method of moments, unless one relies on a very large number of simulations.
Non Identification
If the weather has no influence on supply, i.e. αz = 0, then the reduced form
equations only expresses p∗ and q∗ as a function of income and shocks only. In
this case, the system is under-identified. We can only recover part of the original
parameters:
αp = B1/A1 σ
2
p = V (q − B1/A1p)
Further manipulations give:
βy = B1 − A1βp (4.10)
There is an infinity of pairs {βy, βp} that satisfy the above equality. Hence, we
cannot recover the true values for these two parameters. From (4.10), it is easy to
visualize that there is an identification problem.
When the estimation involves moment matching or minimization of a likelihood
function, non identification is not always straightforward to spot. Some estimation
routines will provide an estimate for the parameters whether the system is identified
or not. There is no reason that these estimates coincide with the true values, as
many sets of parameter values will satisfy the first order conditions of (4.8). If
the estimation routine is based on a gradient calculation, finding the minimum of
a function requires to calculate and to inverse the hessian of the criterion function
L(θ). If αz = 0 the hessian will not be of full rank, as the cross derivatives of L with
respect to αz and the other parameters will be zero. Hence one should be suspicious
about the results when numerical problems occur such as invertibility problems. As
the hessian matrix enters the calculation of the standard errors, a common sign is
also abnormally imprecise coefficients. If the estimation routine is not based on
85
gradients (the simplex algorithm for instance), the problem will be more difficult
to spot, as the estimation routine will come up with an estimate. However, these
results will usually look strange with some coefficients taking absurd large values.
Moreover, the estimation results will be sensible to the choice of initial values.
Exercise 4.3
Build a computer program which creates a data set of prices and quantities us-
ing (4.4), given values for z and y. Use this program to create a data set of size
T , the ”true data set” and then to construct a simulated data set of size S. Next,
construct the objective function for the indirect inference case as suggested in sec-
tion 4.3.3 What happens when you set αz to zero?
4.3 Estimation Methods and Asymptotic Proper-
ties
This section presents in detail the methods discussed in the previous section. The
asymptotic properties of each estimator are presented. We review the generalized
method of moments, which encompasses most of the classic estimation methods such
as maximum likelihood or non linear least squares. We then present methods using
simulations. All the methods are illustrated using simple Dynamic Programming
models, such as the cake eating problem which has been seen in chapters 2 and 3.
In the following subsections, we assume that there is a ”true” model, x(ut, θ),
parameterized by a vector θ of dimension κ. ut is a shock which makes the model
probabilistic. For instance, the shock ut can be a taste shock, a productivity shock
or a measurement error. We observe a sequence of data generated by this model at
the ”true” value of the parameters, which we denote by θ0 and at the ”true” value
86
of the shocks u0t . Let {x(u0t , θ0)}Tt=1 be the observed data, which we also denote as
{xt}Tt=1 for simplicity. 32 We are interested in recovering an estimate of θ0 from the
observed data and making statistical inferences.
4.3.1 Generalized Method of Moments
The method of moment presented briefly in Section 4.2 minimized the distance
between an empirical moment and the predicted one. This exploits the fact that on
average, the difference between the predicted and the observed series (or a function
of these series) should be close to zero at the true value of the parameter θ0. Denote
this difference as h(θ, xt), so:
E(h(θ0, xt)) = 0 (4.11)
This identifying equality is called an orthogonality restriction. Denote the sample
average of h(θ, xt):
g(θ) =
1
T
T∑
t=1
h(θ, xt)
An estimate of θ can be found as:
θ̂ = arg min
θ
Q(θ) = arg min
θ
g(θ)′W −1T g(θ)
W −1T is a weighting matrix, which might depend on the data, hence the T subscript.
If g(θ) is of size qx1, then W −1T is of size qxq.
For instance, if we want to match the first two moments of the process {xt}, the
function h() can be written:
h(θ, xt) =
(
xt(θ) − xt
xt(θ)
2 − x2t
)
Averaging this vector over the sample will yield g(θ) = (x̄(θ) − x̄, x̄(θ)2 − x̄2).
Economic theory often provides more restrictions which can be used in the es-
timation method. They often take the form of first order conditions, such as Euler
equations, which can be used as an orthogonality restriction as in (4.11). This is
87
the intuition that guided the Hansen and Singleton (1982) study of consumption
that we discuss in detail in Chapter 6, section 6.3.3. Here we summarize that with
an example.
Example 4.5
In a standard intertemporal model of consumption with stochastic income and no
borrowing constraints, the first order condition gives:
u′(ct) = βREtu
′(ct+1)
One can use this restriction to form h(θ, ct, ct+1) = [u
′(ct) − βRu′(ct+1)], where θ is
parameterizing the utility function. On average, h(θ, ct, ct+1) should be close to zero
at the true value of the parameter. The Euler equation above brings actually more
information than we have used so far. Not only should the differences between the
marginal utility in period t and t + 1 be close to zero, but it should also be orthogonal
to information dated t. Suppose zt is a variable which belongs to the information set
at date t. Then the first order condition also implies that, on average, h(θ, ct, ) =
zt.[u
′(ct) − βRu′(ct+1)] should be close to zero at the true value of the parameter.
If we have more than one zt variable, then we can exploit as many orthogonality
restrictions.
For further examples, we refer the reader to section 8.4.3.
Asymptotic Distribution:
Let θ̂T be the GMM estimate, i.e. the solution to (4.3.1). Under regularity conditions
(see Hansen (1982)):
• θ̂T is a consistent estimator of the true value θ0.
88
• The GMM estimator is asymptotically normal:
√
T (θ̂T − θ0) d−→ N (0, Σ)
where Σ = (DW −1∞ D
′)−1 and where
D′ = plim
T
{∂g(θ, YT )
∂θ′ θ=θ0
}
The empirical counterpart of D is:
D̂′T =
∂g(θ, YT )
∂θ′ θ=θ̂T
This means that asymptotically, one can treat the GMM estimate θ̂T as a normal
variable with mean θ0 and variance Σ̂/T :
θ̂T ∼ N (θ0, Σ̂/T )
Note that the asymptotic properties of the GMM estimator are independent of
the distribution of the error term in the model. In particular, one does not have to
assume normality.
Optimal Weighting Matrix
We have not discussed the choice of the weighting matrix W −1T , so far. The choice
of the weighting matrix does not have any bearing on the convergence of the GMM
estimator to the true value. However, a judiciously chosen weighting matrix can
minimize the asymptotic variance of the estimator. It can be shown that the optimal
weighting matrix W ∗T produces the estimator with the smallest variance. It is defined
as:
W ∗∞ = lim
T →∞
1
T
T∑
t=1
∞∑
l=−∞
h(θ0, yt)h(θ0, yt−l)
′
Empirically, one can replace W ∗∞ by a consistent estimator of this matrix Ŵ
∗
T :
Ŵ ∗T = Γ0,T +
q∑
ν=1
(1 − [ν/(q + 1)])(Γν,T + Γ′ν,T )
89
with
Γν,T =
1
T
T∑
t=ν+1
h(θ̂, yt)h(θ̂, yt−ν )
′
which is the Newey-West estimator (see Newey and West (1987) for a more detailed
exposition).
Overidentifying Restrictions
If the number of moments q is larger than the number of parameters to estimate κ,
then the system is overidentified. One would only need κ restrictions to estimate θ.
The remaining restrictions can be used to evaluate the model. Under the null that
the model is the true one, these additional moments should be empirically close to
zero at the true value of the parameters. This forms the basis of a specification test:
T g(θ̂T )
′Ŵ −1T g(θ̂T )
L−→ χ2(q − κ)
In practice, this test is easy to compute, as one has to compare T times the criterion
function evaluated at the estimated parameter vector to a chi-square critical value.
Link with Other Estimation Methods
The generalized method of moment is quite a general estimation method. It actually
encompasses most estimation method as OLS, non linear least squares, instrumental
variables or maximum likelihood by choosing an adequate moment restriction. For
instance, the OLS estimator is defined such that the right hand side variables are not
correlated with the error term, which provides a set of orthogonal restrictions that
can be used in a GMM framework. In a linear model, the GMM estimator defined
this way is also the OLS estimator. The instrumental variable method exploits the
fact that an instrument is orthogonal to the residual.
90
4.3.2 Maximum Likelihood
In contrast to the GMM approach, the maximum likelihood strategy requires an
assumption on the distribution of the random variables. Denote by f (xt, θ) the
probability of observing xt given a parameter θ. The estimation method tries to
maximize the likelihood of observing a sequence of data X = {x1, . . . , xT }. Assum-
ing iid shocks, the likelihood for the entire sample is:
L(X, θ) =
T∏
t=1
f (xt, θ)
It is easier to maximize the log of the likelihood
l(X, θ) =
T∑
t=1
log f (xt, θ)
Example 4.6
Consider the cake eating problem, defined by the Bellman equation below, where W
is the size of the cake, ρ is a shrink factor and ε is an iid shock to preferences:
V (W, ε) = max [εu(W ), EV (ρW, ε′)]
V (.) represents the value of having a cake of size W , given the realization of the
taste shock ε. The equation above states that the individual is indifferent between
consuming the cake and waiting if the shock is ε∗(W, θ) = EV (ρW, ε′)/u(W ), where
θ is a vector of parameters describing preferences, the distribution of ε and the shrink
factor ρ. If ε > ε∗(W, θ), then the individual will consume the cake. ε∗(W, θ) has
no analytical expression, but can be solved numerically with the tools developed in
Chapter 3. The probability of not consuming a cake of size W in a given period is
then:
P (ε < ε∗(W, θ)) = F (ε∗(W, θ))
where F is the cumulative density of the shock ε. The likelihood of observing an
91
individual i consuming a cake after t periods is then:
li(θ) = (1 − F (ε∗(ρtW1, θ)))
t−1∏
l=1
F (ε∗(ρlW1, θ))
Suppose we observe the stopping time for N individuals. Then the likelihood of the
sample is:
L(θ) =
N∏
i=1
li(θ)
The maximization of the likelihood with respect to θ gives the estimate, θ̂.
For additional examples, we refer the reader to the second part of the book, and in
particular, section 5.5.4.
Exercise 4.4
Use the stochastic cake eating problem to simulate some data. Construct the
likelihood of the sample and plot it against different possible values for ρ.
Asymptotic Properties
To derive the asymptotic properties of the maximum likelihood estimator, it is
convenient to notice that the maximum likelihood can be seen as a GMM procedure.
The first order condition for the maximum of the log likelihood function is:
T∑
t=1
∂logf (xt, θ)
∂θ
= 0
This orthogonality condition can be used as a basis for a GMM estimation, where
h(θ, xt) = ∂logf (xt, θ)/∂θ. The first derivative of the log likelihood function is also
called the score function.
Using the GMM formula, the covariance matrix is D̂T Ŝ
−1
T D̂
′
T , with
D̂′T =
∂g(θ)
∂θ′ θ=θ̂T
=
1
T
T∑
t=1
∂2 log f (xt, θ)
∂θ∂θ′
= −I
92
where I is also known as the information matrix, i.e. minus the second derivative
of the log likelihood function.
ŜT =
1
T
T∑
t=1
h(xt, θ̂T )h(xt, θ̂T )
′ = I
So, we get:
√
T (θ̂T − θ0) L−→ N (0, I−1)
The maximum likelihood estimator is asymptotically normal, with mean zero and
a variance equal to I−1/T .
4.3.3 Simulation Based Methods
We review here estimation methods based on simulation. This field is a growing one
and we will concentrate on only a few methods. For a more in depth presentation
of these methods, we refer the reader to Gourieroux and Monfort (1996) and Pakes
and Pollard (1989), McFadden (1989), Laroque and Salanié (1989) or McFadden
and Ruud (1994) (see also Lerman and Manski (1981) for an early reference).
These methods are often used because the calculation of the moments are too
difficult to construct (e.g. multiple integrals in multinomial probits as in McFadden
(1989) or Hajivassiliou and Ruud (1994), or because the model includes a latent
(unobserved) variable as in Laroque and Salanié (1993)). Or, it might be that the
model M (θ) has no simple analytic representation so that the mapping from the
parameters to moments must be simulated.
Example 4.7
Consider the cake eating problem studied in section 4.3.2, but where the taste shocks
ε are serially correlated. The Bellman equation is expressed as:
V (W, ε) = max
[
εu(W ), Eε′|εV (ρW, ε
′)
]
93
Here the expectations operator indicates that the expectation of next period’s shock
depends on the realization of the current shock. We can still define the threshold
shock ε∗(W ) = Eε′|ε∗V (ρW, ε′)/u(W ), for which the individual is indifferent between
eating and waiting. The probability of waiting t periods to consume the cake can be
written as:
Pt = P (ε1 < ε
∗(W1), ε2 < ε
∗(ρW1), . . . , εt > ε
∗(ρtW1))
In section 4.3.2, the shocks were iid, and this probability could easily be decomposed
into a product of t terms. If ε is serially correlated, then this probability is extremely
difficult to write as εt is correlated with all the previous shocks.
33 For t periods, we
have to solve a multiple integral of order t, which conventional numerical methods
of integration cannot handle. In this section, we will show how simulated methods
can overcome this problem to provide an estimate of θ.
The different methods can be classified into two groups. The first group of
methods compares a function of the observed data to a function of the simulated
data. Here the average is taken both on the simulated draws and on all observation
in the original data set at once. This approach is called moment calibration. It
includes the simulated method of moments and indirect inference.
The second set of methods compare the observed data, observation by observa-
tion, to an average of the simulated predicted data, where the average is taken over
the simulated shocks. This is called path calibration. Simulated non linear least
squares or maximum likelihood fall into this category.
The general result is that path calibration methods require the number of sim-
ulations to go to infinity to achieve consistency. In contrast, moment calibration
methods are consistent for a fixed number of simulations.
94
Simulated Method of Moments
Definition: This method was first developed by McFadden (1989), Lee and In-
gram (1991) and Duffie and Singleton (1993). Let {x(ut, θ0)}Tt=1 be a sequence of
observed data. Let {x(ust , θ)}, t = 1, . . . , T, s = 1, . . . , S or xst (θ) for short, be a set
of S series of simulated data, each of length T , conditional on a vector of parameters
θ. The simulations are done by fixing θ and by using the T S draws of the shocks
ust (drawn once and for all). Denote by µ(xt) a vector of functions of the observed
data 34. The estimator for the SMM is defined as:
θ̂S,T (W ) = arg min
θ
[
T∑
t=1
(
µ(xt) −
1
S
S∑
s=1
µ(x(ust , θ))
)]′
W −1T[
T∑
t=1
(
µ(xt) −
1
S
S∑
s=1
µ(x(ust , θ))
)]
This criterion is similar to the one presented for the method of moments in
section 4.2.1. The difference is that we can avoid the calculation of the theoretical
moments µ(xt(θ)) directly. Instead, we are approximating them numerically with
simulations.
Example 4.8
We use here the cake example with serially correlated shocks. Suppose we have
a data set of T cake eaters for which we observe the duration of their cake Dt,
t = 1, . . . , T .
Given a vector of parameter θ which describes preferences and the process of ε,
we can solve numerically the model and compute the thresholds ε∗(W ). Next, we
can simulate a series of shocks and determine the duration for this particular draws
of the shock. We can repeat this step in order to construct S data sets containing
each T simulated durations.
To identify the parameters of the model, we can for instance use the mean du-
ration and the variance of the duration. Both of these moments would be calculated
95
from the observed data set and the artificial ones. If we want to identify more than
two parameters, we can try to characterize the distribution of the duration better and
include the fraction of cakes eaten at the end of the first, second and third period for
instance.
For further examples, we refer the reader to the second part of the book, and in
particular to section 6.3.6 and section 7.3.3.
Exercise 4.5
Construct a computer program to implement the approach outlined in Exam-
ple 4.8. First, use as moments the mean and the variance of the duration. Increase
then the number of moments using also the fraction of cakes eaten after the first and
second period. As the model is overidentified, test the overidentification restrictions.
Properties: When the number of simulation S is fixed and T −→ ∞,
• θ̂ST (W ) is consistent.
•
√
T (θ̂ST − θ0) −→ N (0, QS(W ))
where
QS(W ) = (1 +
1
S
)
[
E0
∂µ′
∂θ
W −1T
∂µ
∂θ′
]−1
E0
∂µ′
∂θ
W −1T Σ(θ0)W
−1
T
∂µ
∂θ′
[
E0
∂µ′
∂θ
W −1T
∂µ
∂θ′
]−1
where Σ(θ0) is the covariance matrix of 1/
√
T ( 1
T
∑T
t=1(µ(xt) − E0µ(xst (θ))).
The optimal SMM is obtained when ŴT = Σ̂T . In this case,
QS(W
∗) = (1 +
1
S
)
[
E0
∂µ′
∂θ
W ∗−1
∂µ
∂θ′
]−1
When S increases to infinity, the variance of the SMM estimator is the same as the
variance of the GMM estimator. Note that when S tends to infinity, the covariance
96
matrix of the estimator converges to the covariance matrix of the standard GMM
estimator.
In practice, the optimal weighting matrix can be estimated by:
Ŵ ∗T =
1
T
T∑
t=1
[
µ(xt) −
1
S
S∑
s=1
µ(xst (θ̂ST ))
]
.
[
µ(xt) −
1
S
S∑
s=1
µ(xst (θ̂ST ))
]′
+
1
S
1
T
T∑
t=1
[
µ(xst (θ̂ST )) −
1
L
L∑
l=1
µ(xlt(θ̂ST ))
]
.
[
µ(xst (θ̂ST )) −
1
L
L∑
l=1
µ(xlt(θ̂ST ))
]′
where xst (θ) and x
l
t(θ) are simulations generated by independent draws from the
density of the underlying shock. Ŵ ∗T is a consistent estimate of W
∗
∞ for T −→ ∞
and L −→ ∞. Note that the SMM requires a large number of simulations to
compute the standard errors of the estimator, even if the estimator is consistent for
a fixed number of simulation.
Simulated Non Linear Least Squares
Definition: We could estimate the parameters θ by matching, at each period, the
observation xt with the prediction of the model x(u
s
t , θ), where u
s
t is a particular
draw for the shock. There are two reasons why the predicted data would not match
the observed one. First, we might evaluate the model at an incorrect parameter
point (i.e. θ �= θ0). Second, the ”true” shock u0t is unobserved, so replacing it
with a random draw ust would lead to a discrepancy. In trying to minimize the
distance between these two objects, we would not know whether to change θ or ust .
To alleviate the problem, we could use S simulated shocks and compare xt with
x̄St (θ) = 1/S
∑S
s=1 x(u
s
t , θ). A natural method of estimation would be to minimize
the distance between the observed data and the average predicted variable:
min
1
T
T∑
t=1
(xt − x̄St (θ))2
97
Unfortunately, this criterion does not provide a consistent estimator of θ, for a fixed
number of simulation S, as the sample size T increases to infinity. 35
Laffont et al. (1995) proposes to correct the non linear least square objective
function by minimizing the following criterion:
min
θ
1
T
T∑
t=1
[
(xt − x̄St (θ))2 −
1
S(S − 1)
S∑
s=1
(x(ust , θ) − x̄St (θ))2
]
(4.12)
The first term is the same as the one discussed above, the distance between the
observed variable and the average predicted one. The second term is a second order
correction term which takes into account the bias introduced by the simulation for
a fixed S.
Example 4.9
Consider the continuous cake eating problem defined as:
V (W, ε) = max
c
εu(c) + βEε′|εV (W − c, ε′)
where W is the size of the cake, c is the amount consumed and ε is a taste shock.
The optimal policy rule for this program is of the form c = c(W, ε). Suppose we
observe an individual through time and we observe both the consumption level and
the size of the cake, {ĉt, Ŵt}t=1,…T . The taste shock is unobserved to the researcher.
To estimate the vector of parameter θ which describes preferences, we can use the
simulated non linear least square method. We simulate S paths for the taste shock,
{εst }t=1,…T, s=1,…S which are used to construct simulated predictions for the model
{x(Wt, εst )}t=1,…T, s=1,…S. At each period, we construct the average consumption con-
ditional on the observed size of the cake, c̄(Ŵt), by averaging out over the S simulated
taste shocks. This average is then used to compare with the observed consumption
level ĉt, using formula (4.12).
98
For further examples on the simulated non linear least square method, we refer the
reader to section 7.3.3.
Asymptotic Properties: For any fixed number of simulation S,
• θ̂ST is consistent.
•
√
T (θ̂ST − θ0) d−→ N (0, ΣS,T )
A consistent estimate of the covariance matrix ΣS,T can be obtained by computing:
Σ̂S,T = Â
−1
S,T B̂S,T Â
−1
S,T
where ÂS,T and B̂S,T are defined below. To this end, denote ∇xst = ∂x(ust , θ)/∂θ,
the gradient of the variable with respect to the vector of parameters, and ∇xt =
1
S
∑S
s=1 ∇xst , its average across all simulations.
ÂS,T =
1
T
T∑
t=1
[
∇xt∇x′t −
1
S(S − 1)
S∑
s=1
(
∇xst − ∇xt
)(
∇xst − ∇xt
)′]
B̂S,T =
1
T
T∑
t=1
dS,t(θ)dS,t(θ)
′
with dS,t a k dimensional vector:
dS,t(θ) = (xt − x̄t(θ))∇xt(θ) +
1
S(S − 1)
S∑
s=1
[x(ust , θ) − x̄(θ)]∇xst (θ)
Simulated Maximum Likelihood
Definition: The model provides us with a prediction x(ut, θ), where θ is a vector
of parameters and ut is an unobserved error. The distribution of ut implies a dis-
tribution for x(ut, θ), call it φ(xt, θ). This can be used to evaluate the likelihood
of observing a particular realization xt. In many cases, the exact distribution of
x(θ, ut) is not easily determined, as the model can be non linear or might not even
99
have an explicit analytical form. In this case, we can evaluate the likelihood using
simulations.
The Simulated Maximum Likelihood (SML) method approximates this likelihood
by using simulations. Let φ̃(xt, u, θ) be an unbiased simulator of φ(xt, θ):
Euφ̃(xt, u, θ) = lim
S
1
S
S∑
s=1
φ̃(xt, u
s, θ) = φ(xt, θ)
The SML estimator is defined as:
θ̂ST = arg max
θ
T∑
t=1
log
[
1
S
S∑
s=1
φ̃(xt, u
s
t ; θ)
]
Asymptotic Properties:
• The SML estimator is consistent, if T and S tend to infinity. When both T
and S goes to infinity and when
√
T
S
−→ 0, then
√
T (θ̂ST − θ0) d−→ N (0, I−1(θ0))
The matrix I(θ0) can be approximated by:
− 1
T
T∑
t=1
∂2 log
(
1
S
∑S
s=1 φ̃(xt, u
s
t , θ)
)
∂θ∂θ′
• It is inconsistent if S is fixed.
The bias is then:
Eθ̂ST − θ0 ∼
1
S
I−1(θ0)Ea(xt, θ)
where
a(xt, θ) =
Eu
∂φ̃
∂θ
Vuφ̃
(Euφ̃)3
− covu(
∂φ̃
∂θ
, φ̃)
(Euφ̃)2
The bias decreases in the number of simulations and with the precision of the esti-
mated parameters, as captured by the information matrix. The bias also depends on
the choice of the simulator, through the function a. Gourieroux and Monfort (1996)
100
proposes a first order correction for the bias. Fermanian and Salanié (2001) extend
these results and propose a non parametric estimator of the unknown likelihood
function, based on simulations.
Indirect Inference
When the model is complex, the likelihood is sometimes intractable. The indirect
inference method works around it by using a simpler auxiliary model, which is esti-
mated instead. This auxiliary model is estimated both on the observed data, and on
simulated data. The indirect inference method tries to find the vector of structural
parameters which brings the auxiliary parameters from the simulated data as close
as possible to the one obtained on observed data. A complete description can be
found in Gourieroux et al. (1993) (see also Smith (1993)).
Consider the likelihood of the auxiliary model φ̃(xt, β), where β is a vector of
auxiliary parameters. The estimator β̂T , computed from the observed data is defined
by:
β̂T = arg max
β
T∏
t=1
φ̃(xt, β)
Under the null, the observed data are generated by the model at the true value
of the parameter θ0. There is thus a link between the auxiliary parameter β0 (the
true value of the auxiliary parameter) and the structural parameters θ. Follow-
ing Gourieroux et al. (1993) we denote this relationship by the binding function
b(θ). Were this function known, we could invert it to directly compute θ from the
value of the auxiliary parameter. Unfortunately, this function usually has no known
analytical form, so the method relies on simulations to characterize it.
The model is then simulated, by taking independent draws for the shock ust ,
which gives S artificial data sets of length T : {xs1(θ), . . . , xsT (θ)}, s = 1, . . . , S. The
101
auxiliary model is then estimated out of the simulated data, to get β̂sT :
β̂sT (θ) = arg max
β
T∏
t=1
φ̃(xst (θ), β)
Define β̂ST the average value of the auxiliary parameters, over all simulations:
β̂ST =
1
S
S∑
s=1
β̂sT (θ)
The indirect inference estimator θ̂ST is the solution to:
θ̂ST = arg min
θ
[β̂T − β̂ST (θ)]′ΩT [β̂T − β̂ST (θ)]
where ΩT is a positive definite weight matrix which converges to a deterministic
positive definite matrix Ω.
Example 4.10
Consider the cake problem with serially correlated shocks. The likelihood of the
structural model is intractable, but we can find an auxiliary model which is easier
to estimate. As the data set consists of durations, a natural auxiliary model is
a standard duration model. Suppose we chose an exponential model, which is a
simple and standard model of duration characterized by a constant hazard equal to
β. The probability of observing a particular duration is βe−βDt . The log likelihood
of observing a set of durations Dt, t = 1, . . . , T is :
ln L =
T∑
t=1
ln
(
βe−(βDt)
)
This likelihood can be maximized with respect to β. Straightforward maximization
gives β̂T =
1
T
∑T
t=1 Dt. In this case, the auxiliary parameter is estimated as the
average duration in the data set. Given a value for the structural parameters of
our model of interest θ, we can construct by simulation S data sets containing T
observations. For each artificial data set s, we can estimate the auxiliary duration
model to obtain β̂sT . Using the procedure above, we are then able to obtain an
102
estimate of θ, such as the auxiliary parameters both on the observed and the simulated
data are as close as possible. Note that with the simple auxiliary model we use, it
turns out that the indirect inference procedure is the same as a simulated method of
moments, as we are matching the average duration.
We have used the exponential duration model for the simplicity of the exposition.
This model is only parameterized by one parameter, so we can identify at best only
one structural parameter. To identify more parameters, we could estimate a duration
model with a more flexible hazard.
For more examples on the indirect inference method, we refer the reader to the
second part of the book, in particular section 5.5.3 and 8.6.1.
Gallant and Tauchen (1996) develop an Efficient Method of Moments based on
the use of an auxiliary method. Instead of matching on a set of auxiliary parameters,
they propose to minimize the score of the auxiliary model, i.e. the first derivative
of the likelihood of the auxiliary model:
m(θ, βT ) =
1
S
S∑
s=1
1
T
T∑
t=1
∂
∂β
ln φ̃(xst (θ), β̂T )
The structural parameter are obtained from:
θ∗ = argmin
θ
m(θ, β̂T )
′ Ω m(θ, β̂T )
where Ω is a weighting matrix. Gourieroux et al. (1993) show that the EMM and
the indirect inference estimators are asymptotically equivalent.
Properties: For a fixed number of simulations S, when T goes to infinity the
indirect inference estimator is consistent and normally distributed.
√
T (θ̂ST − θ0) −→ N (0, QS(Ω))
103
where
QS(Ω) = (1+
1
S
)
[
∂b′(θ0)
∂θ
Ω
∂b(θ0)
∂θ′
]−1
∂b′(θ0)
∂θ
ΩJ−10 (I0−K0)J−10 Ω
∂b(θ0)
∂θ′
[
∂b′(θ0)
∂θ
Ω
∂b(θ0)
∂θ′
]−1
Denote ψT (θ, β) =
∑T
t=1 log φ̃(x
s
t (θ), β). The matrices I0, J0 and K0 are defined
as:
J0 = plimT −
∂2ψT (θ, β)
∂β∂β′
I0 = limT V
[√
T
∂ψT (θ, β)
∂β
]
K0 = limT V
[
E
(√
T ∂
∂β′
∑T
t=1 φ̃(xt, β)
)]
∂b′(θ0)
∂θ
= J−10 limT
∂2ψT (θ0, b(θ0))
∂β∂θ′
The latter formula is useful to compute the asymptotic covariance matrix without
calculating directly the binding function. As in the GMM case, there exists an
optimal weighting matrix such that the variance of the estimator is minimized. The
optimal choice denoted Ω∗ is:
Ω∗ = J0(I0 − K0)−1J0
in this case, the variance of the estimator simplifies to:
QS(Ω
∗) = (1 +
1
S
)
(
∂b′(θ0)
∂θ
J0(I0 − K0)−1J0
∂b(θ0)
∂θ′
)−1
or equivalently
QS(Ω
∗) = (1 +
1
S
)
(
∂2ψ∞(θ0, b(θ0))
∂θ∂β′
(I0 − K0)−1
∂2ψ∞(θ0, b(θ0))
∂β∂θ′
)−1
The latter formula does not require to compute explicitly the binding function. Note
that the choice of the auxiliary model matters for the efficiency of the estimator.
Clearly, one would want an auxiliary model such that ∂b′(θ)/∂θ is large in absolute
values. If not, the model would poorly identify the structural parameters.
104
In practice, b(θ0) can be approximated by β̂ST (θ̂ST ). A consistent estimator of
I0 − K0 can be obtained by computing:
( ̂I0 − K0) = T
S
S∑
s=1
(Ws − W̄ )(Ws − W̄ )′
with
Ws =
∂ψT (θ̂, β̂)
∂β
W̄ = 1
S
∑S
s=1 Ws
see Gourieroux et al. (1993), appendix 2.
Note that if the number of parameters to estimate in the structural model is equal
to the number of parameters in the auxiliary parameters, the weighting matrix Ω
plays no role, and the variance QS(Ω) simplifies to:
QS(Ω) = (1 +
1
S
)
[
∂b′(θ0)
∂θ
Ω∗
∂b(θ0)
∂θ′
]−1
Specification Tests: A global specification test can be carried out using the
minimized
ζT =
T S
1 + S
min
θ
[β̂T − β̂ST (θ)]′ΩT [β̂T − β̂ST (θ)]
follows asymptotically a chi-square distribution with q − p degrees of freedom.
4.4 Conclusion
This chapter presents methods to estimate the parameters of a model. We have re-
viewed both classic methods such as maximum likelihood or the generalized method
of moments and simulation based methods. In general, when dealing with dynamic
programming models, the likelihood function or the analytical form of the moments
are difficult to write out. If this is the case, simulated methods are of great use.
However, they come at a cost, as simulated methods are very time consuming. The
105
computation of the value function and the optimal policy rules often requires the
use of numerical techniques. If on top of that simulation estimation methods are
used, the estimation of a full fledged structural model can take hours (or even days).
The choice of a particular method depends on the problem and the data set.
Path calibration methods such as non linear least squares or maximum likelihood
use all the information available in the data, as each particular observation is used
in the estimation procedure. The draw back is that one has to specify the entire
model, up to the distribution of the unobserved shock. To have tractable likelihood
functions, one often impose a normal distribution for the shocks and this might
impose too much structure on the problem. On the other hand, moment calibration
methods such as the method of moments use only part of the information provided
by the data. These methods concentrate on particular functions of the data, as the
mean or the variance for instance. In contrast to maximum likelihood, the method
does not necessarily requires the specification of the whole model.
Both approaches can be justified. The researcher might be interested in only
a subset of the parameters, as the intertemporal elasticity of consumption. As in
example 4.5, the GMM method allows to estimate this parameter, without specifying
the distribution of the income shock. However, calibration methods require the
choice of moments that identify the parameters of the model. When the model is
simple, this is not very difficult. When the models are more complex, for instance
when unobserved heterogeneity is present, it is not that straightforward to find
informative moments. In such cases, maximum likelihood can be more desirable.
Finally, if the data is subject to measurement errors, taking moments of the data
can reduce the problem. When using simulation methods, calibration methods
also presents the advantage of requiring only a fixed number of simulations to get
consistent estimates, so the computation time is lower.
Overview of Methodology
The first three chapters have presented theoretical tools to model, solve and estimate
economic models. Ideally, to investigate a particular economic topic, a research
agenda would include all three parts, building on economic theory and confronting
it with the data to assess its validity.
Figure 4.6 summarizes this approach and points to the relevant chapters. The
figure starts with an economic model, described by a set of parameters and some
choice structure. It is important at this stage to characterize the properties of that
model and to characterize the first order conditions or to write it as a recursive
problem. The model under consideration might be difficult to solve analytically.
In this case, it is sometime necessary to use numerical methods as developed in
Chapter 3. One can then derive the optimal policy rules, i.e. the optimal behavior
given a number of predetermined variables.
Given the policy rules 36 the parameters can be estimated. This is usually done
by comparing some statistics built both from the observed data and from the model.
The estimated parameters are produced by minimizing the distance between the
observed and the predicted outcome of the model. Once the optimal parameters are
found, the econometric task is not over. One has to evaluate the fit of the model.
There are various ways of doing this. First, even though the models are often non
linear, one can construct a measure such as the R2, to evaluate the percentage of the
variance explained by the model. A higher value is seen as a better fit. However,
106
107
the model can be very good at reproducing some aspects of the data but can fail
miserably in other important dimensions. For instance, in the discrete cake eating
problem, the fit of the model could be considerably increased in the first T periods
if one were to construct time dependent utility functions, with T dummy variables
for each time period. Such a model would generate a perfect fit when it comes to
predict the fraction of cakes eaten in the first periods. However, the model could
do very poorly for the remaining periods. A second way to evaluate the estimated
model is to use over identification restrictions if the model is overidentified. Finally,
one can also perform out of sample forecasts.
Once one is confident that the estimated model is a convincing representation of
reality, the model can be used to evaluate different scenarios.
The next chapters present examples of this strategy using a number of relevant
topics.
[Figure 4.6 approximately here]
Part II
Applications
108
Chapter 5
Stochastic Growth
5.1 Overview
To begin our exploration of applications of dynamic programming problems in
macroeconomics, a natural starting point is the stochastic growth model. Starting
with Kydland and Prescott (1982), this framework has been used for understanding
fluctuations in the aggregate economy. To do so, the researcher must understand the
mapping from the parameters of preferences and technology to observations, per-
haps summarized by pertinent moments of the data. Further, the model provides
an analytic structure for policy evaluation.37
The stochastic growth model provides our first opportunity to review the tech-
niques of dynamic programming, numerical methods and estimation methodology.
We begin with the non-stochastic model to get some basic concepts straight and
then enrich the model to include shocks and other relevant features.
5.2 Non-Stochastic Growth Model
Consider the dynamic optimization problem of a very special household. This house-
hold is endowed with one unit of leisure each period and supplies this inelastically
109
110
to a production process. The household consumes an amount ct each period which
it evaluates using a utility function, u(ct). Assume that u(·) is strictly increasing
and strictly concave. The household’s lifetime utility is given by
∞∑
1
βt−1u(ct) (5.1)
The household has access to a technology that produces output (y) from capital
(k), given its inelastically supplied labor services. Let y = f (k) be the production
function. Assume that f (k) is strictly increasing and strictly concave.
The capital input into the production process is accumulated from forgone con-
sumption. That is, the household faces a resource constraint that decomposes output
into consumption and investment (it):
yt = ct + it.
The capital stock accumulates according to:
kt+1 = kt(1 − δ) + it
where δ ∈ (0, 1) is the rate of physical depreciation.
Essentially the household’s problem is to determine an optimal savings plan by
splitting output between these two competing uses. Note that we have assumed
the household produces using a concave production function rather than simply
renting labor and capital in a market for factors of production. In this way, the
model of the household is very special and often this is referred to as a Robinson
Crusoe economy as the household is entirely self-sufficient. Nonetheless the model is
informative about market economies as one can argue (see below) that the resulting
allocation can be decentralized as a competitive equilibrium. For now, our focus is
on solving for this allocation as the solution of a dynamic optimization problem.
To do so, we use the dynamic programming approach and consider the following
functional equation:
111
V (k) = max
k′
u(f (k) + (1 − δ)k − k′) + βV (k′) (5.2)
for all k. Here the state variable is the stock of capital at the start of the period
and the control variable is the capital stock for the next period.38
With f (k) strictly concave, there will exist a maximal level of capital achievable
by this economy given by k̄ where
k̄ = (1 − δ)k̄ + f (k̄).
This provides a bound on the capital stock for this economy and thus guarantees
that our objective function, u(c), is bounded on the set of feasible consumption
levels, [0, f (k̄) + (1 − δ)k̄]. We assume that both u(c) and f (k) are continuous and
real-valued so there exists a V (k) that solves (5.2).39
The first-order condition is given by:
u′(c) = βV ′(k′). (5.3)
Of course, we don’t know V (k) directly so that we need to use (5.2) to determine
V ′(k). As (5.2) holds for all k ∈ [0, k̄], we can take a derivative and obtain:
V ′(k) = u′(c)(f ′(k) + (1 − δ)).
Updating this one period and inserting this into the first-order condition implies:
u′(c) = βu′(c′)(f ′(k′) + (1 − δ)).
This is an Euler condition that is not unlike the one we encountered in the cake
eating problem. Here the left side is the cost of reducing consumption by ε today.
The right side is then the increase in utility in the next period from the extra capital
created by investment of the ε. As in the cake eating structure, if the Euler equation
holds then no single period deviations will increase utility of the household. As with
that problem, this is a necessary but not a sufficient condition for optimality.40
112
From the discussion in Chapter 2, V (k) is strictly concave. Consequently, from
(5.3), k′ must be increasing in k. To see why, suppose that current capital increases
but future capital falls. Then current consumption will certainly increase so that
the left side of (5.3) decreases. Yet with k′ falling and V (k) strictly concave, the
right side of (5.3) increases. This is a contradiction.
5.2.1 An Example
Suppose that u(c) = ln(c), f (k) = kα and δ = 1. With this special structure, we can
actually solve this model. As in Sargent (1987), we guess that the value function is
given by:
V (k) = A + B ln k
for all k. If this guess is correct, then we must be able to show that it satisfies (5.2).
If it does, then the first-order condition, (5.3), can be written:
1
c
=
βB
k
′ .
Using the resource constraint (kα = c + k′),
βB(kα − k′) = k′
or
k′ = (
βB
1 + βB
)kα. (5.4)
So, if our guess on V (k) is correct, this is the policy function.
Given this policy function, we can now verify whether or not our guess on V (k)
satisfies the functional equation, (5.2). Substitution of (5.4) into (5.2) yields
113
A + B ln k = ln[(
1
1 + βB
)kα] + β[A + B ln((
βB
1 + βB
)kα)] (5.5)
for all k. Here we use c = y − k′ so that
c = (
1
1 + βB
)kα.
Grouping constant terms implies:
A = ln(
1
1 + βB
) + β[A + B ln(
βB
1 + βB
)]
and grouping terms that multiply ln k,
B = α + βBα.
Hence B = α
1−βα . Using this, A can be determined. Thus, we have found the solution
to the functional equation.
As for the policy functions, using B, we find
k′ = βαkα
and
c = (1 − βα)kα.
It is important to understand how this type of argument works. We started
with a guess of the value function. Using this guess, we derived a policy function.
Substituting this policy function into the functional equation gave us an expression,
(5.5), that depends only on the current state, k. As this expression must hold for all
k, we grouped terms and solved for the unknown coefficients of the proposed value
function.
Exercise 5.1
114
To see how this approach to finding a solution to the nonstochastic growth model
could ”fail”, argue that the following cannot be solutions to the functional equation:
1. V (k) = A
2. V (k) = B ln k
3. V (k) = A + Bkα
5.2.2 Numerical Analysis
Though the non-stochastic growth model is too simple to seriously take to the
data, it provides an opportunity to again exploit the contraction mapping property
to obtain a numerical solution to the functional equation given in (5.2). This is
valuable as the set of economies which one can obtain an analytic solution to (5.2)
is very small. Thus techniques must be developed to obtain policy functions in more
general environments.
The Matlab code grow.m solves (5.2), for the functional forms given below,
using a value function iteration routine.41 The code has four main sections that we
discuss in turn.
Functional Forms
There are two primitive functions that must be specified for the nonstochastic growth
model. The first is the production function and the second is the utility function of
the household. The grow.m code assumes that the production function is given by:
f (k) = kα.
Here α is restricted to lie in the interval (0, 1) so that f (k) is strictly increasing and
strictly concave.
115
The household’s utility function is given by:
u(c) =
c1−σ
1 − σ .
With this utility function, the curvature of the utility function,
−u′′(c)c/u′(c)
is equal to σ.42 We assume that σ is positive so that u(c) is strictly increasing and
strictly concave. When σ = 1, u(c) is given by ln(c).
Parameter Values
The second component of the program specifies parameter values. The code is
written so that the user can either accept some baseline parameters (which you can
edit) or input values in the execution of the program. Let
Θ = (α, β, δ, σ)
denote the vector of parameters that are inputs to the program. In an estimation
exercise, Θ would be chosen so that the model’s quantitative implications match
data counterparts. Here we are simply interested in the anatomy of the program
and thus Θ is set at somewhat arbitrary values.
Spaces
As noted earlier, the value function iteration approach does require an approxima-
tion to the state space of the problem. That is, we need to make the capital state
space discrete. Let κ represent the capital state space. We solve the functional
equation for all k ∈ κ with the requirement that k′ lie in κ as well. So the code
for the non-stochastic growth model does not interpolate between the points in this
grid but rather solves the problem on the grid.
116
The choice of κ is important. For the nonstochastic growth model we might be
interested in transition dynamics: if the economy is not at the steady state, how
does it return to the steady state? Let k∗ be the steady state value of the capital
stock which, from (5.2), solves
1 = β[αk∗(α−1) + (1 − δ)].
This value of the steady state is computed in grow.m. Then the state space is
built in the neighborhood of the steady state through the definitions of the highest
and lowest values of the capital stock, khi and klow in the code.43 Finally, a grid is
set-up between these two extreme values. The researcher specifies the fineness of the
grid with two considerations in mind. A finer grid provides a better approximation
but is ”expensive” in terms of computer time.44
Value function Iteration
The fourth section of the program solves (5.2) using a value function iteration rou-
tine. To do so, we need an initial guess on the value function. For this guess, the
program uses the one-period problem in which the household optimally consumes
all output as well as the undepreciated capital stock (termed ytot in grow.m). 45
Given this initial guess, a loop is set-up to perform value function iteration, as
described in some detail in Chapter 3. Note that the program requires two inputs.
The first is the total number of iterations that is allowed, termed T . The second
is the tolerance which is used to determine whether the value function iteration
routine has ”converged”. This tolerance is called toler and this scalar is compared
against the largest percent difference between the last two calculations of the value
function V and v in the grow.m program.
117
Evaluating the Results
Once the program has converged, aspects of the policy function can be explored.
The program produces two plots. The first, (Figure 5.1 below), plots the policy
function: k′ as a function of k. The policy function is upward sloping as argued
earlier. The second, (Figure 5.2), plots the level of net investment (k′ − k) for
each level of k in the state space. This line crosses zero at the steady state and
is downward sloping. So, for value of k below the steady state the capital stock
is increasing (net investment is positive) while for k above k∗, net investment is
negative.
[Figure 5.1 approximately here]
[Figure 5.2 approximately here]
The program also allows you to calculate transition dynamics starting from an
(arbitrary) initial capital stock. There are at least two interesting exercises one can
perform from this piece of the code.
Exercise 5.2
1. Study how other variables (output, consumption, the real interest rate) behave
along the transition path. Explain the patterns of these variables.
2. Study how variations in the parameters in Θ influence the speed and other
properties of the transitional dynamics.
5.3 Stochastic Growth Model
We build upon the discussion of the nonstochastic growth model to introduce ran-
domness into the environment. We start from a specification of the basic economic
118
environment. The point is to make clear the nature of the intertemporal choice
problem and the assumptions underlying the specification of preferences and tech-
nology.
We then turn to the planners’ optimization problem. We take the approach
of a planner with an objective of maximizing the expected lifetime utility of a
representative agent.46 In this way, we can characterize allocations as the results
of a single optimization problem rather than through the solution of a competitive
equilibrium. Given that there are no distortions in the economy, it is straightforward
to determine the prices that support the allocation as a competitive equilibrium.
We do this later in a discussion of the recursive equilibrium concept.
5.3.1 Environment
The stochastic growth model we study here is based upon an economy with infinitely
lived households. Each household consumes some of the single good (ct) and invests
the remainder (it). Investment augments the capital stock (kt) with a one period lag:
i.e. investment today creates more capital in the next period. There is an exogenous
rate of capital depreciation denoted by δ ∈ (0, 1). For now, we assume there is a
single good which is produced each period from capital and labor inputs.47 The
capital input is predetermined from past investment decisions and the labor input
is determined by the household.
Fluctuations in the economy are created by shocks to the process of producing
goods. Thus, “good times” represent higher productivity of both labor and capital
inputs. The planner will optimally respond to these variations in productivity by
adjusting household labor supply and savings (capital accumulation) decisions. Of
course, investment is a forward looking decision since the new capital is durable
and is not productive until the next period. Further, the extent to which the labor
119
decision responds to the productivity variation depends, in part, on whether capital
and labor are likely to be more productive in the future. Consequently, the serial
correlation properties of the shocks are critical for understanding the responses of
employment and investment.
More formally, the households preferences over consumption (ct) and leisure (lt)
are given by:
∞∑
t=0
βtu(ct, lt)
where the discount factor β ∈ (0, 1). We will assume that the function u(c, l) is
continuously differentiable and strictly concave. The households face a constraint
on their time allocation:
1 = lt + nt
where the unit time endowment must be allocated between leisure and work (nt).
The production side of the economy is represented by a constant returns to scale
production function over the two inputs. Since scale is not determined, we model
the economy as if there was a single competitive firm that hires the labor services
of the households (Nt) and uses the households’ capital in the production process.
The production function is expressed as:
Yt = AtF (Kt, Nt)
where F (K, N ) is increasing in both inputs, exhibits constant returns to scale and
is strictly concave. Variations in total factor productivity, At will be the source
of fluctuations in this economy. Here upper case variables refer to economywide
aggregates and lower case variables are household (per capita) variables.
120
Finally, there is a resource constraint: the sum of consumption and investment
cannot exceed output in each period. That is:
Yt = Ct + It.
For characterizing the solution to the planner’s problem, this is all the informa-
tion that is necessary. That is, given the statement of preferences, the production
function and the time and resource constraints, the planner’s problem can be speci-
fied. In fact, the natural approach might be to allow the planner to choose a sequence
of history dependent functions that describe the choices of consumption, investment
and employment for all time periods conditional on the state of the economy at that
point in time. In this most general formulation the description of the state would
include all productivity shocks and the value of the capital stock.
Instead of solving a planner’s problem in which the choice is a sequence of state
contingent functions, the tools of dynamic programming can be used. We turn to
that approach now.
5.3.2 Bellman’s Equation
To begin the analysis, we assume that labor is inelastically supplied at one unit per
household. Thus we consider preferences represented by u(c). This allows us to
focus on the dynamics of the problem. Of course, we will want to include a labor
supply decision before confronting the data, else we would be unable to match any
moments with labor variations. Hence we turn to the more general endogenous
labor supply formulation later.
In this case, we use the constant returns to scale assumption on F (K, N ) to
write per capita output (yt) as a strictly concave function of the per capita capital
stock (kt):
121
yt ≡ AtF (Kt/N, 1) ≡ Atf (kt).
As F (K, N ) exhibits constant returns to scale, f (k) will be strictly concave. Bell-
man’s equation for the infinite horizon stochastic growth model is specified as
V (A, k) = maxk′ u(Af (k) + (1 − δ)k − k′) + βEA′|AV (A′, k′) (5.6)
for all (A, k). Here the transition equation used to construct (5.6) is k′ = Af (k) +
(1 − δ)k − c.
An important element of this model is the multiplicative productivity shock.
Through the introduction of this shock, the model is constructed to capture pro-
cyclical fluctuations in productivity. An important question is whether the fluc-
tuations in output, employment, consumption, investment, etc. induced by these
shocks match relevant features of the data.
For the quantitative analysis, we assume that A is a bounded, discrete random
variable that follows a first-order Markov process. The transition matrix is given by
Π and this is, implicitly, used in the conditional expectation in (5.6).48
As in the general discussion of Chapter 2, one important question is whether
there exists a solution to the function equation. A second is characterizing the
optimal policy function.
For the growth model, it is important to be sure that the problem is bounded.
For this, let k̄ solve:
k = A+f (k) + (1 − δ)k (5.7)
where A+ is the largest productivity shock. Since consumption must be non-
negative, then, from the transition equation, the k that solves this expression is
the largest amount of capital that this economy could accumulate. Since f (k) is
strictly concave, there will exist a unique finite value of k̄ that satisfies (5.7). This
122
then implies that the largest level of consumption is also k̄: the largest feasible
consumption occurs when the largest capital stock is consumed in a single period.
Thus we can bound utility by u(k̄).
Given that we have bounded the problem, assumed that the discount factor is
less than one and assumed the shocks follow a bounded, first-order Markow process,
the results from Chapter 2 will apply. Thus we know that there exists a unique
value function V (A, k) that solves (5.6). Further, we know that there is a policy
function given by: k′ = φ(A, k).
Our goal is to learn more about the properties of this solution. To stress an
important point, the policy function represents the bridge from the optimization
problem to the data. The policy function itself depends on the underlying struc-
tural parameters and delivers a relationship between variables, some of which are
observable. So, the inference problem is clean: what can we determine about the
structural parameters from observations on output, capital, consumption, produc-
tivity, etc.?
5.3.3 Solution Methods
Linearization
One approach to characterizing a solution to the stochastic growth model written
above is through analysis of the resource constraints and the intertemporal Euler
equation. The latter is a necessary condition for optimality and can be obtained
directly from the sequence problem representation of the planners problem. Alter-
native, using Bellman’s equation, the first-order condition for the planner is
u′(Af (k) + (1 − δ)k − k′) = βEA′|AVk′(A′, k′) (5.8)
123
for all (A, k). Though we do not know V (A, k), we can solve for its derivative. From
(5.6),
Vk(A, k) = u
′(c)[Af ′(k) + (1 − δ)].
Substituting this into (5.8) and evaluating it at (A′, k′) implies:
u′(c) = βEA′|Au
′(c′)[A′f ′(k′) + (1 − δ)] (5.9)
where
c = Af (k) + (1 − δ)k − k′ (5.10)
and c′ is defined accordingly. These two expressions, along with the evolution of A
(specified below) defines a system of equations. So, one can represent the optimal
growth model as a system of first order stochastic difference equations in (c, k, A).
In order to approximately characterize this solution, it is common to linearize
this condition and the resource constraints around the steady state, (c∗, k∗).49 To
do so, we fix A at its mean value, Ā. The steady state value of the capital stock
will then satisfy:
1 = β[Āf ′(k∗) + (1 − δ)]. (5.11)
Further, in steady state k′ = k = k∗ so the steady state level of consumption satisfies
c∗=Āf (k∗) − δk∗.
Following King et al. (1988), let ĉt, k̂t and Ât denote percent deviations from
their steady state values respectively. So, for example, x̂t ≡ xt−x∗x∗ . Assume that in
terms of deviations from mean, the shocks follow a first-order autoregressive process,
Ât+1 = ρÂt + εt+1 with ρ ∈ (0, 1).
Then we can rewrite the Euler condition, (5.9), as:
ξĉt = ξĉt+1 + νρÂt + νχk̂t+1 (5.12)
124
where ξ is the elasticity of the marginal utility of consumption, ξ ≡ u′′(c∗)c∗
u′(c∗) . The
parameter ν ≡ βĀf ′(k∗) which equals 1−β(1−δ) in the steady state. The parameter
ρ is the serial correlation of the deviation of the shock from steady state and χ ≡
f ′′(k∗)k∗
f ′(k∗) is the elasticity of the marginal product of capital with respect to capital.
The resource condition, (5.10), can be approximated by:
k̂t+1 =
1
β
k̂t +
δ
(1 − sc)
Ât −
sc
(1 − sc)
δĉt. (5.13)
Here sc is consumption’s steady state share of output.
If the researcher specifies a problem such that preferences and the production
function exhibit constant elasticities then, ξ and χ are fixed parameters and one does
not have to ever solve explicitly for a steady state. For example, if the production
function is Cobb-Douglas where α is capital’s share, then χ is simply (α − 1).
Likewise, ν just depends on the discount factor and the rate of physical capital
depreciation. Finally, the consumption share sc is just a function of the parameters
of the economy as well.
For example, in the Cobb-Douglas case, (5.11) can be written as:
1 = β[α(y∗/k∗) + (1 − δ)]
where y∗ is the steady state level of output. Since the steady state level of investment
i∗ = δk∗, then this can be rewritten as:
1 = β[αδ/(1 − sc) + (1 − δ)].
Solving this,
(1 − sc) =
βαδ
1 − β(1 − δ)
so that sc can be calculated directly from the underlying parameters.
This approach thus delivers a log-linearized system whose parameter are deter-
mined by the underlying specification of preferences, technology and the driving
125
processes of the economy. This system can be simplified by solving out for ĉt yield-
ing a stochastic system characterizing the evolution of the state variables, i.e. the
system can be written solely in terms of (Ât, k̂t). At this point, the response of
the system to productivity innovations can be evaluated and, as discussed further
below, taken to the data.50
Value Function Iteration
Instead of obtaining an approximate solution by log-linearization, one can attack
the dynamic programming problem directly. To more fully characterize a solution,
we often resort to specific examples or numerical analysis.
As a leading example, assume that u(c) = ln(c) and that the rate of depreciation
of capital is 100%. Further, suppose that the process for the shocks is given by
lnA′ = ρ lnA + ε
where ρ ∈ (−1, 1), so that the process is stationary. Finally, suppose that the
production function has the form f (k) = Akα.
With these restrictions, the Euler equation (5.9) reduces to:
1
c
= βEA′|A[
A′αk′(α−1)
c′
]. (5.14)
Note that here we take the expectation, over A′ given A, of the ratio since future
consumption, c′, will presumably depend on the realized value of the productivity
shock next period.
To solve for the policy function, we make a guess and verify it.51 We assert that
the policy function k′ = φ(A, k) is given by
φ(A, k) = λAkα
where λ is an unknown constant. That is, we will try a guess that the future capital
is proportional to output which is quite similar to the policy function we deduced
126
for the example of the nonstochastic growth model. Given the resource constraint,
this implies
c = (1 − λ)Akα.
To verify this guess and determine λ, we use this proposed policy function in (5.14).
This yields:
1
(1 − λ)Akα = βEA′|A[
A′αk′(α−1)
(1 − λ)A′k′α ].
Solving for the policy function yields:
k′ = βαAkα. (5.15)
Hence our guess is verified and λ = αβ. This implies that consumption is propor-
tional to income:
c = (1 − βα)Akα. (5.16)
In this case, one can show that the value function that solves (5.6) is given by:
V (A, k) = G + B ln(k) + D ln(A)
for all (A, k), where G, B and D are unknown constants which we can solve for.
If so, then using (5.15) and (5.16), the functional equation is given by:
G + B ln(k) + D ln(A) = ln((1 − βα)Akα) + β[G + B ln(βαAkα) + DEA′|A ln(A′).
(5.17)
for all (A, k). Importantly, there is no maximization here as we have substituted the
policy function into the functional equation.52 Since, EA′|A ln(A′) = ρ ln A, we make
use of the fact that this relationship holds for all (A, k) and group terms together
as we did in the analysis of the nonstochastic growth model. So the constants must
127
be the same on both sides of (5.17):
G = ln(1 − βα) + βG + βB ln(βα).
Similarly, for the coefficients multiplying ln(k), we must have:
B = α + βBα.
Finally, with respect to ln(A),
D = 1 + βB + βDρ.
So if (G, B, D) solve this system of equation, then they solve the functional
equation. As this solution is unique, we verify our guess. While tedious, one can
show that the solution is:
G =
ln(1 − βα) + β( α
1−βα ) ln(βα)
1 − β , B =
α
1 − βα , D =
1
(1 − βρ)(1 − βα).
Note here the role of discounting: if β = 1, then G is infinity.
Unfortunately, this is a very special case. We will use it again when we discuss
empirical implications of the stochastic growth model.
Exercise 5.3
Verify that if there is less than 100% depreciation, the solution given by φ(A, k) =
λAkα fails.
Outside of the special examples, one is left with a direct analysis of (5.6). It is
straightforward to apply the analysis of Chapter 2 to this problem so that a solution
to the functional equation will exist.53 Further, one can show that the value function
is a strictly concave function of k. Consequently, the policy function is increasing
128
in k. To see this, consider (5.8). An increase in k will increase the left side of this
expression. If k′ doesn’t rise, then (5.8) will not hold since the right side, from the
concavity of V (A, k) is a decreasing function of k′.
Further details about the policy function require numerical analysis. One can
build a stochastic version of the program termed grow.m that was discussed above.
Doing so is a good exercise to be sure that you understand how to write a value
function iteration program.54 We take this up again in the next section once we
introduce a labor supply decision to the model.
Exercise 5.4
Drawing on grow.m, write a value function iteration program to find the solution
to (5.6).
5.3.4 Decentralization
To study the decentralized economy, the household’s problem must be supplemented
by a budget constraint and the sources of income (labor income, capital income,
profits) would have to be specified along with the uses of these funds (consumption,
savings). Likewise, the firm’s demands for labor and capital inputs will have to be
specified as well. We discuss these in turn using the recursive equilibrium concept.55
The firm’s problem is static as we assume the households hold the capital. Thus
the firm rents capital from the household at a price of r per unit and hires labor at
a wage of ω per hour. The wage and rental rates are all in terms of current period
output. Taking these prices as given, the representative firm will maximize profits
by choosing inputs (K, N ) such that:
AfN (K, N ) = ω and AfK (K, N )+(1-δ) = r.
129
Here we stipulate that the capital rental agreement allows the firm to use the capital
and to retain the undepreciated capital which it then sells for the same price as
output in the one-sector model. Due to the constant returns to scale assumption,
the number and size of the firms is not determined. We assume for simplicity that
there is a single firm (though it acts competitively) which employs all the capital
and labor in the economy, denoted by upper case letters.
For the households, their problem is:
V (A, k, K) = maxk′u(r(K)k + ω(K) + Π − k′) + βEA′|AV (A′, k′, K′) (5.18)
where Π is the flow of profits from the firms to the households. This is a different
expression than (5.6) as there is an additional state variable, K. Here k is the
household’s own stock of capital while K is the per capita capital stock economy
wide. The household needs to know the current value of K since factor prices
depend on this aggregate state variable through the factor demand equations. This
is indicated in (5.18) by the dependence of r(K) and ω(K) on K.
Let K′ = H(A, K) represent the evolution of the aggregate capital stock. As
the household is competitive, it takes the evolution of the aggregate state variable
as given. Thus the household takes current and future factor prices as given.
The first-order condition for the household’s capital decision is:
u′(c) = βEVk(A
′, k′, K′). (5.19)
Here the household uses the law of motion for K. Using (5.18), we know that
Vk = r(K)u
′(c) so that the first-order condition can be written as:
u′(c) = βEr′u′(c′). (5.20)
A recursive equilibrium is comprised of:
130
• factor price functions: r(K) and ω(K)
• individual policy functions: h(A, k, K) from (5.18)
• a law for motion for K: H(A, K)
such that:
• households and firms optimize
• markets clear
• H(A, k) = h(A, k, k)
By using the first-order conditions from the factor demand of the operating firm,
it is easy to see that the solution to the planners problem is a recursive equilibrium.
5.4 A Stochastic Growth Model with Endogenous
Labor Supply
We now supplement the version of the stochastic growth model given above with
an endogenous labor supply decision. For now, we retain the perspective of the
planner’s problem and discuss decentralization later in this section.
5.4.1 Planner’s Dynamic Programming Problem
Supplementing preferences and the technology with a labor input, the modified
planner’s problem is given by:
V (A, k) = maxk′,nu(Af (k, n) + (1 − δ)k − k′, 1 − n) + βEA′|AV (A′, k′). (5.21)
for all (A, k). Here the variables are measured in per capita terms: k and n are the
capital and labor inputs per capita.
131
The optimization problem entails the dynamic choice between consumption and
investment that was key to the stochastic growth model with fixed labor input. In
addition, given k′, (5.21) has a “static” choice of n.56 This distinction is impor-
tant when we turn to a discussion of programming the solution to this functional
equation.
For given (A, k, k′), define σ(A, k, k′) from:
σ(A, k, k′) = maxnu(Af (k, n) + (1 − δ)k − k′, 1 − n) (5.22)
and let n = φ̂(A, k, k′) denote the solution to the optimization problem. The first-
order condition for this problem is given by:
uc(c, 1 − n)Afn(k, n) = ul(c, 1 − n). (5.23)
This condition equates the marginal gain from increasing employment and consum-
ing the extra output with the marginal cost in terms of the reduction in leisure time.
This is clearly a necessary condition for optimality: in an optimal solution, this type
of static variation should not increase welfare.
Thus given the current productivity shock and the current capital stock and
given a level of capital for the future, φ̂(A, k, k′) characterizes the employment
decision. We can think of σ(A, k, k′) as a return function given the current state
(A, k) and the control (k′).
Using the return function from this choice of the labor input, rewrite the func-
tional equation as:
V (A, k) = maxk′σ(A, k, k
′) + βEA′|AV (A
′, k′). (5.24)
for all (A, k). This has the same structure as the stochastic growth model with a
fixed labor supply though the return function, σ(A, k, k′), is not a primitive object.
Instead, it is derived from a maximization problem and thus inherits its properties
from the more primitive u(c, 1 − n) and f (k, n) functions. Using the results in
132
Chapter 2, there will be a solution to this problem and a stationary policy function
will exist. Denote the policy function by k′ = h(A, k).
The first-order condition for the choice of the future capital stock is given by:
σk′(A, k, k
′) + βEA′|AVk′(A
′, k′) = 0
where the subscripts denote partial derivatives. Using (5.24), we can solve for
EA′|AVk(A′, k′) yielding an Euler equation:
−σk′(A, k, k′) = βEA′|Aσk′(A′, k′, k′′).
Using (5.22), this can be rewritten in more familiar terms as:
uc(c, 1 − n) = βEA′|A[uc(c′, 1 − n′)[A′fk(k′, n′) + (1 − δ)] (5.25)
where c = Af (k, n) + (1 − δ)k − k′ and c′ is defined similarly. This Euler equation is
another necessary condition for an optimum: else a variation in the level of savings
could increase lifetime expected utility.
The policy functions will exhibit a couple of key properties revolving around
the themes of intertemporal substitution and consumption smoothing. The issue is
essentially understanding the response of consumption and employment to a pro-
ductivity shock. By intertemporal substitution, the household will be induced to
work more when productivity is high. But, due to potentially offsetting income and
substitution effects, the response to a productivity shocks will be lower the more
permanent are these shocks.57 By consumption smoothing, a household will opti-
mally adjust consumption in all periods to an increase in productivity. The more
persistent is the shock to productivity, the more responsive will consumption be to
it.58
133
5.4.2 Numerical Analysis
A discussion along the same lines as that for the stochastic growth model with
fixed labor input applies here as well. As in King et al. (1988), one can attack the
set of necessary conditions ((5.23), (5.25) and the resource constraint) through a
log-linearization procedure. The reader is urged to study that approach from their
paper.
Alternatively, one can again simply solve the functional equation directly. This
is just an extension of the programming exercise given at the end of the previous
section on the stochastic growth model with fixed labor supply. The outline of the
program will be discussed here leaving the details as an additional exercise.
The program should be structured to focus on solving (5.24) through value func-
tion iteration. The problem is that the return function is derived and thus must be
solved for inside of the program. The researcher can obtain an approximate solution
to the employment policy function, given above as φ̂(A, k, k′). This is achieved by
specifying grids for the shocks, the capital state space and the employment space.59
As noted earlier, this is the point of approximation in the value function iteration
routine: finer grids yield better approximations but are costly in terms of computer
time. Once φ̂(A, k, k′) is obtained, then
σ(A, k, k′) = u(Af (k, φ̂(A, k, k′)) + (1 − δ)k − k′, 1 − φ̂(A, k, k′))
can be calculated and stored. This should all be done prior to starting the value
function iteration phase of the program. So, given σ(A, k, k′), the program would
then proceed to solve (5.24) through the usual value function iteration routine.
The output of the program is then the policy function for capital accumulation,
k′ = h(A, k), and a policy function for employment, n = φ(A, k) where
φ(A, k) = φ̂(A, k, h(A, k)).
134
Hence both of these policy functions ultimately depend only on the state variables,
(A, k). These policy functions provide a link between the primitive functions (and
their parameters) and observables. We turn now to a discussion of exploiting that
link as the stochastic growth model confronts the data.
5.5 Confronting the Data
Since Kydland and Prescott (1982), macroeconomists have debated the empirical
success of the stochastic growth model. This debate is of interest both because of
its importance for the study of business cycles and for its influence on empirical
methodology. Our focus here is on the latter point as we use the stochastic growth
model as a vehicle for exploring alternative approaches to the quantitative analysis
of dynamic equilibrium models.
Regardless of the methodology, the link between theory and data is provided by
the policy functions. To set notation, let Θ denote a vector of unknown parameters.
We will assume that the production function is Cobb-Douglas and is constant returns
to scale. Let α denote capital’s share. Further, we will assume that
u(c, 1 − n) = ln(c) + ξ(1 − n)
as our specification of the utility function.60 Thus the parameter vector is:
Θ = (α, δ, β, ξ, ρ, σ)
where α characterizes the technology, δ determines the rate of depreciation of the
capital stock, β is the discount factor, and ξ parameterizes preferences. The tech-
nology shock process is parameterized by a serial correlation (ρ) and a variance (σ).
To make clear that the properties of this model economy depend on these parame-
ters, we index the policy functions by Θ: k′ = hΘ(A, k) and n = φΘ(A, k). At this
135
point, we assume that for a given Θ these policy functions have been obtained from
a value function iteration program. The question is then how to estimate Θ.
5.5.1 Moments
One common approach to estimation of Θ is based upon matching moments. The
researcher specifies a set of moments from the data and then finds the value of
Θ to match (as closely as possible) these moments. A key element, of course, is
determining the set of moments to match.
The presentation in Kydland and Prescott [1982] is a leading example of one ver-
sion of this approach termed calibration. Kydland and Prescott consider a much
richer model than that presented in the previous section as they include: a sophis-
ticated time to build model of capital accumulation, non-separable preferences, a
signal extraction problem associated with the technology shock. They pick the pa-
rameters for their economy using moments obtained from applied studies and from
low frequency observations of the U.S. economy. In their words,
”Our approach is to focus on certain statistics for which the noise intro-
duced by approximations and measurement errors is likely to be small
relative to the statistic.”
Since the model we have studied thus far is much closer to that analyzed by
King, Plosser and Rebelo, we return to a discussion of that paper for an illustration
of this calibration approach.61 King, Plosser and Rebelo calibrate their parameters
from a variety of sources. As do Kydland and Prescott, the technology parameter is
chosen to match factor shares. The Cobb-Douglas specification implies that labor’s
share in the National Income and Product Accounts should equal (1−α). The rate of
physical depreciation is set at 10% annually and the discount rate is chosen to match
a 6.5% average annual return on capital. The value of ξ is set so that on average
136
hours worked are 20% of total hours corresponding to the average hours worked
between 1948 and 1986. King, Plosser and Rebelo use variations in the parameters
of the stochastic process (principally ρ) as a tool for understanding the response of
economic behavior as the permanence of shocks is varied. In other studies, such as
Kydland and Prescott, the parameters of the technology shock process is inferred
from the residual of the production function.
Note that for these calibration exercises, the model does not have to be solved
in order to pick the parameters. That is, the policy functions are not actually used
in the calibration of the parameters. Instead, the parameters are chosen by looking
at evidence that is outside of business cycle properties, such as time series averages.
Comparing the model’s predictions against actual business cycle moments is thus
an informal overidentification exercise.
The table below shows a set of moments from U.S. data as well as the predic-
tions of these moments from the King, Plosser and Rebelo model parameterized as
described above.62 The first set of moments is the standard deviation of key macroe-
conomic variables relative to output. The second set of moments is the correlation
of these variables with respect to output.
[Table 5.1 approximately here]
In this literature, this is a common set of moments to study. Note that the
stochastic growth model, as parameterized by King, Plosser and Rebelo exhibits
many important features of the data. In particular, the model produces consumption
smoothing as the standard deviation of consumption is less than that of output.
Further, as in U.S. data, the variability of investment exceeds that of output. The
cross correlations are all positive in the model as they are in the data. One apparent
puzzle is the low correlation of hours and output in the data relative to the model.63
Still, based on casual observation, the model “does well”. However, these papers do
137
not provide “tests” of how close the moments produced by the model actually are
to the data.
Of course, one can go a lot further with this moment matching approach. Letting
ΨD be the list of 8 moments from U.S. data shown in Table 5.1, one could solve the
problem of:
min
Θ
(ΨD − ΨS(Θ))W (ΨD − ΨS(Θ))′. (5.26)
where ΨS(Θ) is a vector of simulated moments that depend on the vector of param-
eters (Θ) that are inputs into the stochastic growth model. As discussed in Chapter
4, W is a weighting matrix. So, for their parameterization, the ΨS(Θ) produced
by the KPR model is simply the column of moments reported in Table 5.1. But,
as noted earlier, the parameter vector was chosen based on other moments and
evidence from other studies.
Exercise 5.5
Using a version of the stochastic growth model to create the mapping ΨD, solve
5.26.
5.5.2 GMM
Another approach, closer to the use of orthogonality conditions in the GMM ap-
proach, is used by Christiano and Eichenbaum (1992). Their intent is to enrich
the RBC model to encompass the observations that the correlation between the
labor input (hours worked) and the return to working (the wage and/or the average
product of labor). To do so, they add shocks to government purchases, financed by
lump-sum taxes. Thus government shocks influence the labor choice of households
through income effects. For their exercise, this is important as this shift in labor
supply interacts with variations in labor demand thereby reducing the excessively
138
high correlation between hours and the return to work induced by technology shocks
alone.
While the economics here is of course of interest, we explore the estimation
methodology employed by Christiano and Eichenbaum. They estimate eight pa-
rameters: the rate of physical depreciation(δ), the labor share of the Cobb-Douglas
technology (α), a preference parameter for household’s marginal rate of substitution
between consumption and leisure (γ), as well as the parameters characterizing the
distributions of the shocks to technology and government spending.
Their estimation routine has two phases. In the first, they estimate the param-
eters and in the second they look at additional implications of the model.
For the first phase, they use unconditional moments to estimate these parame-
ters. For example, using the capital accumulation equation, the rate of depreciation
can be solved for as:
δ = 1 − kt+1 − it
kt
.
Given data on the capital stock and on investment, an estimate of δ can be ob-
tained as the time series average of this expression. 64 Note that there is just
a single parameter in this condition so that δ is estimated independently of the
other parameters of the model. Building on this estimate, Christiano and Eichen-
baum then use the intertemporal optimality condition (under the assumption that
u(c)=ln(c))to determine capital’s share in the production function. They proceed
in this fashion of using unconditional movements to identify each of the structural
parameters.
Christiano and Eichenbaum then construct a larger parameter vector, termed Φ,
which consists of the parameters described above from their version of the stochastic
growth model and a vector of second moments from the data. They place these
139
moments within the GMM framework. Given this structure, they can use GMM to
estimate the parameters and to obtain an estimate of the variance covariance matrix
which is then used to produce standard errors for their parameter estimates. 65
As the point of the paper is to confront observations on the correlation of hours
and the average product of labor, corr(y/n, n), and the relative standard deviations
of the labor input and the average productivity of labor, σn/σy/n. They test whether
their model, at the estimated parameters, is able to match the values of these
moments in the data. Note that this is in the spirit of an overidentification test
though the model they estimate is just identified. They find that the stochastic
growth model with the addition of government spending shocks is unable (with one
exception) to match the observations for these two labor market statistics. The
most successful version of the model is estimated with establishment data, assumes
that the labor input is indivisible and government spending is not valued at all by
the households.66
5.5.3 Indirect Inference
Smith (1993) illustrates the indirect inference methodology using a version of the
simple stochastic growth model with fixed employment, as in (5.6). There is one
important modification: Smith considers an accumulation equation of the form:
k′ = k(1 − δ) + Ztit
where Zt is a second shock in the model. Greenwood et al. (1988) interpret this as
a shock to next investment goods and Cooper and Ejarque (2000) view this as an
“intermediation shock”.
With this additional shock, the dynamic programming problem for the represen-
tative household becomes:
V (A, Z, k) = maxk′,nu(Af (k, n)+
(1 − δ)k − k′
Z
, 1−n)+βEA′|AV (A′, Z′, k′). (5.27)
140
Note the timing here: the realized value of Z is known prior to the accumulation
decision. As with the stochastic growth model, this dynamic programming problem
can be solved using value function iteration or by linearization around the steady
state.
From the perspective of the econometrics, by introducing this second source of
uncertainty, the model has enough randomness to avoid zero likelihood observations.67
As with the technology shock, there is a variance and a serial correlation parame-
ter used to characterize this normally distributed shock. Smith assumes that the
innovations to these shocks are uncorrelated.
To take the model to the data, Smith estimates a VAR(2) on log detrended
quarterly U.S. time series for the period 1947:1-1988:4. The vector used for the
analysis is:
xt = [yt it]
′
where yt is the detrended log of output and it is the detrended log of investment
expenditures. With two lags of each variable, two constants and three elements of
the variance-covariance matrix, Smith generates 13 coefficients.
He estimates 9 parameters using the SQML procedure. As outlined in his paper
and Chapter 3, this procedure finds the structural parameters which maximize the
likelihood of observing the data when the likelihood function is evaluated at the
coefficients produced by running the VARs on simulated data created from the
model at the estimated structural parameters. Alternatively, one could directly
choose the structural parameters to minimize the difference between the VAR(2)
coefficients on the actual and simulated data.
141
5.5.4 Maximum Likelihood Estimation
Last but certainly not least versions of the stochastic growth model has been es-
timated using the maximum likelihood approach. As in the indirect inference ap-
proach, it is necessary to supplement the basic model with additional sources of
randomness to avoid the zero likelihood problem. This point is developed in the
discussion of maximum likelihood estimation in Kocherlakota et al. (1994). Their
goal is to evaluate the contribution of technology shocks to aggregate fluctuations.
Kocherlakota et al. (1994) construct a model economy which includes shocks to
the production function and stochastic depreciation. In particular, the production
function is given by Yt = AtK
α
t (NtXt)
1−α + Qt. Here Xt is exogenous technological
progress, Yt is the output of the single good, Kt is the capital stock and Nt is the labor
input. The transition equation for capital accumulation is: Kt+1 = (1 − δt)Kt + It
where δt is the rate of depreciation and It is the level of investment.
The authors first consider a version of the stochastic growth model without a
labor input. They show that the linearized decision rules imply that consumption
and the future capital stock are proportional to the current stock of capital.68
They then proceed to the estimation of their model economy with these three
sources of uncertainty. They assume that the shocks follow an AR(1) process.
Kocherlakota et al. (1994) construct a representation of the equilibrium process
for consumption, employment and output as a function of current and lagged values
of the shocks. This relationship can then be used to construct a likelihood function,
conditional on initial values of the shocks.
Kocherlakota et al. (1994) fix a number of the parameters that one might ulti-
mately be interested in estimating and focus attention on Σ, the variance-covariance
matrix of the shocks. This is particularly relevant to their exercise of determining
the contribution of technology shocks to fluctuations in aggregate output. In this
142
regard, they argue that without additional assumptions about the stochastic process
of the shocks, they are unable to identify the relative variances of the shocks.
There are a number of other papers that have taken the maximum likelihood
approach.69 Altug (1989) estimates a version of the Kydland and Prescott (1982)
model with a single fundamental shock to technology and measurement error else-
where. Altug (1989) finds some difficulty matching the joint behavior of labor and
other series.
Hall (1996) studies a version of a labor hoarding model which is then compared
to the overtime labor model of Hansen and Sargent (1988). While the Hall (1996)
paper is too complex to present here, the paper is particularly noteworthy for its
comparison of results from estimating parameters using GMM and maximum like-
lihood.
5.6 Some Extensions
The final section of this chapter considers extensions of the basic models. These are
provided here partly as exercises for readers interested in going beyond the models
presented here.70 One of the compelling aspects of the stochastic growth model is
its flexibility in terms of admitting a multitude of extensions.
5.6.1 Technological Complementarities
As initially formulated in a team production context by Bryant (1983) and explored
subsequently in the stochastic growth model by Baxter and King (1991), supple-
menting the individual agent’s production function with a measure of the level of
activity by other agents is a convenient way to introduce interactions across agents.71
The idea is to introduce a complementarity into the production process so that high
levels of activity in other firms implies that a single firm is more productive as well.
143
Let y represent the output at a given firm, Y be aggregate output, k and n the
firm’s input of capital and labor respectively. Consider a production function of:
y = AkαnφY γY ε−1 (5.28)
where A is a productivity shock that is common across producers. Here γ param-
eterizes the contemporaneous interaction between producers. If γ is positive, then
there is a complementarity at work: as other agents produce more, the productiv-
ity of the individual agent increases as well. In addition, this specification allows
for a dynamic interaction as well parameterized by ε. As discussed in Cooper and
Johri (1997), this may be interpreted as a dynamic technological complementarity
or even a learning by doing effect. This production function can be imbedded into
a stochastic growth model.
Consider the problem of a representative household with access to a production
technology given by (5.28). This is essentially a version of (5.21) with a different
technology.
There are two ways to solve this problem. The first is to write the dynamic
programming problem, carefully distinguishing between individual and aggregate
variables. As in our discussion of the recursive equilibrium concept, a law of motion
must be specified for the evolution of the aggregate variables. Given this law of
motion, the individual household’s problem is solved and the resulting policy func-
tion compared to the one that governs the economy-wide variables. If these policy
functions match, then there is an equilibrium. Else, another law of motion for the
aggregate variables is specified and the search continues.72
Alternatively, one can use the first-order conditions for the individuals optimiza-
tion problem. As all agents are identical and all shocks are common, the represen-
tative household will accumulate its own capital, supply its own labor and interact
with other agents only due to the technological complementarity. In a symmetric
144
equilibrium, yt = Yt. As in Baxter and King (1991), this equilibrium condition is
neatly imposed through the first-order conditions when the marginal products of la-
bor and capital are calculated. From the set of first-order conditions, the symmetric
equilibrium can be analyzed through by approximation around a steady state.
The distinguishing feature of this economy from the traditional RBC model is
the presence of the technological complementarity parameters, γ and �. It is possible
to estimate these parameters directly from the production function or to infer them
from the equilibrium relationships. 73
5.6.2 Multiple Sectors
The stochastic growth model explored so far has a single sector of production. Of
course this is just an abstraction which allows the research to focus on intertempo-
ral allocations without being very precise about the multitude of activities arising
contemporaneously.
As an example, suppose there are two sectors in the economy. Sector one pro-
duces consumption goods and second two produces investment goods.74Let the pro-
duction function for sector j = 1, 2 be given by:
yj = A
jf (kj, nj)
Here there are sector specific total factor productivity shocks. An important issue
for this model is the degree of correlation across the sectors of activity.
Assuming that both capital and labor can be costlessly shifted across sectors of
production, the state vector contains the aggregate stock of capital rather than its
use in the previous period. Further, there is only a single accumulation equation for
capital. The dynamic programming problem for the planner becomes:
145
V (A1, A2, k) = max{kj ,nj}u(c, 1 − n) + βEA1′,A2′|A1,A2 V (A1′, A2′, k′). (5.29)
subject to:
c = A1f (k1, n1) (5.30)
k′ = k(1 − δ) + A2f (k2, n2) (5.31)
n = n1 + n2 (5.32)
k = k1 + k2 (5.33)
This optimization problem can be solved using value function iteration and the
properties of the simulated economy can, in principle, be compared to data. For this
economy, the policy functions will specify the state contingent allocation of capital
and labor across sectors.
Economies generally exhibit positive comovement of employment and output
across sectors. This type of correlation may be difficult for a multi-sector economy
to match unless there is sufficient correlation in the shocks across sectors.75
This problem can be enriched by introducing costs of reallocating capital and/or
labor across the sectors. At the extreme, capital may be entirely sector specific. In
that case, the state space for the dynamic programming problem must include the
allocation of capital across sectors inherited from the past. By adding this friction
to the model, the flow of factors across the sectors may be reduced.
Exercise 5.6
Extend the code for the one sector stochastic growth model to solve (5.29). Use
the resulting policy functions to simulate the model and compute moments as a
function of key parameters, such as the correlation of the shocks across the sectors.
Relate these to observed correlations across sectors.
146
5.6.3 Taste Shocks
Another source of uncertainty that is considered within the stochastic growth model
allows for randomness in tastes. This may be a proxy for variations in the value
of leisure brought about by technology changes in a home production function.
Here we specify a model with shocks to the marginal rate of substitution between
consumption and work. Formally, consider:
V (A, S, k) = max{k′,n}u(c, 1 − n, S) + βEA′,S′|A,SV (A′, S′, k′) (5.34)
subject to the usual production function and capital accumulation equations. Here
S represents the shocks to tastes. This problem may be interpreted as a two sector
model where the second sector produces leisure from time and a shock (S). Empir-
ically this type of specification is useful as there is a shock, internal to the model,
that allows the intratemporal first order condition to be violated, assuming that S
is not observable to the econometrician.
As usual, the policy functions will specify state contingent employment and
capital accumulation. Again, the model can be solved, say through value function
iteration, and then parameters selected to match moments of the data.
Exercise 5.7
Extend the code for the one sector stochastic growth model to solve (5.34). Use
the resulting policy functions to simulate the model and compute moments as a
function of key parameters, including the variance/covariance matrix for the shocks.
Relate these to observed correlations from US data. Does the existence of taste shocks
“help” the model fit the data better?
147
5.6.4 Taxes
One important extension of the stochastic growth model introduces taxes and gov-
ernment spending. These exercises are partly motivated as attempts to determine
the sources of fluctuations. Further, from a policy perspective, the models are used
to evaluate the impacts of taxes and spending on economic variables and, given
that the models are based on optimizing households, one can evaluate the welfare
implications of various policies.
McGrattan (1994) and Braun (1994) study these issues. We summarize the
results and approach of McGrattan (1994) to elaborate on maximum likelihood
estimation of these models.
McGrattan (1994) specifies a version of the stochastic growth model with four
sources of fluctuations: productivity shocks, government spending shocks, capital
taxes and labor taxes. The government’s budget is balanced each period by the use
of lump-sum taxes/transfers to the households. So, household preferences are given
by U (c, g, n) where c is private consumption, g is public consumption and n is the
labor input.76. The budget constraint for the household in any period t is given by:
ct + it = (1 − τ kt )rtkt + (1 − τ nt )wtnt + δτ kt kt + Tt (5.35)
where it is investment by the household and the right side is represents income from
capital rentals, labor supply, depreciation allowances and a lump-sum transfer. Here
τ kt and τ
n
t are the period t tax rates on capital and labor respectively. Given the
presence of these distortionary taxes,
McGrattan (1994) cannot appeal to a planner’s optimization problem to charac-
terize optimal decision rules and thus works directly with a decentralized allocation.
As in the above discussion of recursive equilibrium, the idea is to specify state
contingent transitions for the aggregate variables and thus, in equilibrium, for rela-
tive prices. These prices are of course relevant to the individual through the sequence
148
of budget constraints, (5.35). Individual households take these aggregate variables
as given rules and optimize. In equilibrium, the representative household’s choices
and the evolution of the aggregate variables coincide.77
McGrattan (1994) estimates the model using maximum likelihood techniques.
To do so, the fundamental shocks are supplemented by measurement errors through
the specification of a measurement equation. Assuming innovations are normally
distributed, McGrattan (1994) can write down a likelihood function for the model
economy. Given quarterly observations on output, investment, government pur-
chases, hours, capital and the tax rates on capital and labor, the parameters of the
model are estimated. Included in the list of parameters are those that characterize
the utility function, production function as well as the stochastic process for the
shocks in the system. McGrattan (1994) finds a capital share of 0.397, a discount
factor of 0.9927 a capital depreciation rate of about .02. Interestingly, government
purchases do not appear to enter directly into the household’s utility function. Fur-
ther the log utility specification can not be rejected.
5.7 Conclusions
The models presented in this chapter represent some simple versions of the stochastic
growth model. This is one of the workhorse models of macroeconomics. There is an
enormous literature about this model and solution techniques. The intention was
more to provide insights into the solution and estimation of these models using the
dynamic programming approach than to provide a case for or against the usefulness
of these models in the evaluation of aggregate fluctuations.
There is an almost endless list of extensions of the basic framework. Using the
approach in this chapter, the researcher can solve these problems numerically and
begin the task of confronting the models with data.
Chapter 6
Consumption
6.1 Overview and Motivation
The next two chapters study consumption. We devote multiple chapters to this
topic due to its importance in macroeconomics and also due to the common (though
unfortunate) separation of consumption into a study of (i) nondurables and services
and (ii) durables.
From the perspective of business cycle theory, consumption is the largest com-
ponent of total expenditures. One of the main aspects of consumption theory is the
theme of consumption smoothing (defined below). This is evident in the data as
the consumption of nondurablers/services is not as volatile as income. Relatedly,
durable expenditures is one of the more volatile elements in the GDP accounts.
These are important facts that our theories and estimated models must confront.
This chapter focuses on the consumption of nondurables and services. We start
with a simple two-period model to build intuition. We then progress to more com-
plex models of consumption behavior by going to the infinite horizon, adding various
forms of uncertainty and also considering borrowing restrictions. In keeping with the
theme of this book, we pay particular attention to empirical studies that naturally
149
150
grow out of consideration of these dynamic optimization problems.
6.2 Two-Period Problem
The two-period problem is, as always, a good starting point to build intuition about
the consumption and savings decisions. We start with a statement of this problem
and its solution and then discuss some extensions.
6.2.1 Basic Problem
The consumer maximizes the discount present value of consumption over the two-
period horizon. Assuming that preferences are separable across periods, we represent
lifetime utility as:
1∑
t=0
βtu(ct) = u(c0) + βu(c1) (6.1)
where β ∈ [0, 1] and is called the discount factor. As you may know from the
optimal growth model, this parameter of tastes is tied to the marginal product of
capital as part of an equilibrium allocation; here it is treated as a fixed parameter.
Period 0 is the initial period, making use of β0 = 1.
The consumer is endowed with some initial wealth at the start of period 0 and
earns income yt in period t=0,1. For now, these income flows are exogenous; we
later discuss adding a labor supply decision to the choice problem. We assume that
the agent can freely borrow and lend at a fixed interest rate between each of the two
periods of life. Thus the consumer faces a pair of constraints, one for each period
of life, given by:
a1 = r0(a0 + y0 − c0)
and
151
a2 = r1(a1 + y1 − c1).
Here yt is period t income and at is the agent’s wealth at the start of period t.
It is important to appreciate the timing and notational assumptions made in these
budget constraints. First, rt represents the gross return on wealth between period
t and period t+1. Second, the consumer earns this interest on wealth plus income
less consumption over the period. It is as if the income and consumption decisions
were made at the start of the period and then interest was earned over the period.
Nothing critical hinges on these timing decisions but it is necessary to be consistent
about them.
There are some additional constraints to note. First, we restrict consumption to
be non-negative. Second, the stock of assets remaining at the end of the consumer’s
life (a2) must be non-negative. Else, the consumer would set a2 = −∞ and die (rel-
atively happily) with an enormous outstanding debt. We leave open the possibility
of a2 > 0.
This formulation of the consumers’ constraints are similar to the ones used
throughout this book in our statement of dynamic programming problems. These
constraints are often termed flow constraints since they emphasize the intertemporal
evolution of the stock of assets being influenced by consumption. As we shall see,
it is natural to think of the stock of assets as a state variables and consumption as
a control variable.
There is an alternative way to express the consumer’s constraints that combines
these two flow conditions by substituting the first into the second. After some
rearranging, this yields:
a2/(r1r0) + c1/r0 + c0 = (a0 + y0) + y1/r0 (6.2)
152
The left side of this expression represents the expenditures of the consumer on goods
in both periods of life and on the stock of assets held at the start of period 2. The
right side measures the total amount of resources available to the household for
spending over its lifetime. This is a type of ”sources” vs. ”uses” formulation of
the lifetime budget constraint. The numeraire for this expression of the budget
constraint is period 0 consumption goods.
Maximization of (6.1) with respect to (c0, c1) subject to (6.2) yields:
u′(c0) = λ = βr0u
′(c1) (6.3)
as a necessary condition for optimality where λ is the multiplier on (6.2). This is
an intertemporal first order condition (often termed the consumer’s Euler equation)
that relates the marginal utility of consumption across two periods.
It is best to think about this condition from the perspective of a deviation from
a proposed solution to the consumers optimization problem. So, given a candidate
solution, suppose that the consumer reduces consumption by a small amount in
period 0 and increases savings by this same amount. The cost of this deviation
is given by u′(c0) from (6.3). The household will earn r0 between the two periods
and will consume those extra units of consumption in period 1. This leads to a
discounted gain in utility given by the right side of (6.3). When this condition
holds, lifetime utility cannot be increased through such a perturbation from the
optimal path.
As in our discussion of the cake eating problem in chapter 2, this is just a
necessary condition since (6.3) captures a very special type of deviation from a
proposed path: reduce consumption today and increase it tomorrow. For more
general problems (more than 2 periods) there will be other deviations to consider.
But, even in the two-period problem, the consumer could have taken the reduced
153
consumption in period 0 and used it to increase a2.
Of course, there is another first-order condition associated with (6.1): the choice
of a2. The derivative with respect to a2 is given by:
λ = φ
where φ is the multiplier on the non-negativity constraint for a2. So, clearly the non-
negativity constraint binds (φ > 0) if and only if the marginal utility of consumption
is positive (λ > 0). That is, it is sub-optimal to leave money in the bank when more
consumption is desirable.
This (somewhat obvious but very important) point has two implications to keep
in mind. First, in thinking about perturbations from a candidate solution, we were
right to ignore the possibility of using the reduction in c0 to increase a2 as this is
clearly not desirable. Second, and perhaps more importantly, knowing that a2 = 0
is a critical part of solving this problem. Looking at the Euler equation (6.3) alone
guarantees that consumption is optimally allocated across periods but this condition
can hold for any value of a2. So it is valuable to realize that (6.3) is only a necessary
condition for optimality; a2 = 0 is necessary as well.
With a2 = 0, the consumer’s constraint simplifies to:
c1/r0 + c0 = a0 + y0 + y1/r0 ≡ w0 (6.4)
where w0 is lifetime wealth for the agent in terms of period 0 goods. Clearly,
the optimal consumption choices depend on the measure of lifetime wealth (w0)
and the intertemporal terms of trade (r0). In the absence of any capital market
restrictions, the timing of income across the households lifetime is irrelevant for
their consumption decisions. Instead, variations in the timing of income, given w0
are simply reflected in the level of savings between the two periods.78
154
As an example, suppose utility is quadratic in consumption:
u(c) = a + bc − (d/2)c2
where we require that u′(c) = b − dc > 0. In this case, the Euler condition simplifies
to:
b − dc0 = βr0(b − dc1).
With the further simplification that βr0 = 1, we have constant consumption: c0 =
c1. Note that this prediction is independent of the timing of income over the periods
0 and 1. this is an example of a much more general phenomenon, termed consump-
tion smoothing That will guide our discussion of consumption policy functions.
6.2.2 Stochastic Income
We now add some uncertainty to the problem by supposing that income in period 1
(y1) is not known to the consumer in period 0. Further, we use the result of A2 = 0
and rewrite the optimization problem more compactly as:
max
c0
Ey1|y0 [u(c0) + βu(R0(A0 + y0 − c0) + y1)]
where we have substituted for c1 using the budget constraint. Note that the expec-
tation is taken here with respect to the only unknown variable (y1) conditional on
knowing y0, period 0 income. In fact, we assume that
y1 = ρy0 + ε1
where |ρ| ∈ [0, 1]. Here ε1 is a shock to income that is not forecastable using period
0 information. In solving the optimization problem, the consumer is assumed to
155
take the information about future income conveyed by observed current income into
account.
The Euler equation for this problem is given by:
u′(c0) = Ey1|y0 βR0u
′(R0(A0 + y0 − c0) + y1).
Note here that the marginal utility of future consumption is stochastic. Thus the
tradeoff given by the Euler equation reflects the loss of utility today from reduc-
ing consumption relative to the expected gain which depends on the realization of
income in period 1.
The special case of quadratic utility and βR0 = 1 highlights the dependence of
the consumption decision on the persistence of income fluctuations. For this case,
the Euler equation simplifies to:
c0 = Ey1|y0 c1 = R0(A0 + y0 − c0) + Ey1|y0 y1.
Solving for c0 and calculating Ey1|y0 y1 yields:
c0 =
R0(A0 + y0)
(1 + R0)
+
ρy0
(1 + R0)
=
R0A0
(1 + R0)
+ y0
(R0 + ρ)
(1 + R0)
. (6.5)
This expression relates period 0 consumption to period 0 income through two
separate channels. First, variations in y0 directly affect the resources currently
available to the household. Second, variations in y0 provide information about
future income (unless ρ = 0).
From (6.5),
∂c0
∂y0
=
(R0 + ρ)
(1 + R0)
.
In the extreme case of iid income shocks (ρ = 0), consumers will save a fraction of an
income increase and consume the remainder. In the opposite extreme of permanent
156
shocks (ρ = 1), current consumption moves one-for-one with current income. For
this case, savings does not respond to income at all. Clearly the sensitivity of
consumption to income variations depends on the permanence of those shocks.79
Both of these extreme results reflect a fundamental property of the optimal
consumption problem: consumption smoothing. This property means that vari-
ations in current income are spread over time periods in order to satisfy the Euler
equation condition that marginal utility today is equal to the discounted marginal
utility of consumption tomorrow, given the return R0. In fact, consumption smooth-
ing is the intertemporal expression of the normality of goods property found in static
demand theory.
But, there is an interesting aspect of consumption smoothing highlighted by
our example: as the persistence of shocks increases, so does the responsiveness
of consumption to income variations. In fact, this makes good sense: if income
increases today are likely to persist, there is no need to save any of the current
income gain since it will reappear in the next period. These themes of consumption
smoothing and the importance of the persistence of shocks will reappear throughout
our discussion of the infinite horizon consumer optimization problem.
6.2.3 Portfolio Choice
A second extension of the two-period problem is of interest: the addition of multiple
assets. Historically, there has been a close link between the optimization problem
of a consumer and asset pricing models. We will make these links clearer as we
proceed and begin here with a savings problem in which there are two assets.
Assume that the household has no initial wealth and can save current income
through two assets. One is nonstochastic and has a one period gross return of Rs.
The second asset is risky with a return denoted by R̃rand a mean return of R̄r. Let
157
ar and as denote the consumer’s holdings of asset type j = r, s. Assets prices are
normalized at 1 in period 0.
The consumer’s choice problem can then be written as:
max
ar,as
u(y0 − ar − as) + ER̃r βu(R̃rar + Rsas + y1).
Here we make the simplifying assumption that y1 is known with certainty. The first
order conditions are:
u′(y0 − ar − as) = βRsER̃r u′(R̃rar + Rsas + y1)
and
u′(y0 − ar − as) = βER̃r R̃ru′(R̃rar + Rsas + y1).
Note we have not imposed any conditions regarding the holding of these assets. In
particular, we have allowed the agent to buy or sell the two assets.
Suppose that u(c) is strictly concave, so that the agent is risk averse. Further,
suppose we search for conditions such that the household is willing to hold positive
amounts of both assets. In this case, we would expect that the agent would have
to be compensated for the risk associated with holding the risky asset. This can
be seen by equating these two first order conditions (which hold with equality) and
then using the fact that the expectation of the product of two random variables is
the product of the expectations plus the covariance. This manipulation yields:
Rs = R̄r +
cov[R̃r, u′(R̃rar + Rsas + y1)]
ER̃r u
′(R̃rar + Rsas + y1)
. (6.6)
The sign of the numerator of the ratio on the right depends on the sign of ar.
If the agent holds both the riskless and the risky asset (ar > 0 and as > 0 ),
then the strict concavity of u(c) implies that the covariance must be negative. In
158
this case, R̄r must exceed Rs : the agent must be compensated for holding the risky
asset.
If the average returns are equal then the agent will not hold the risky asset
(ar = 0) and (6.6) will hold. Finally, if R̄
r is less than Rs, the agent will sell the
risky asset and buy additional units of the riskless asset.
6.2.4 Borrowing Restrictions
A final extension of the two-period model is to impose a restriction on the borrowing
of agents. To illustrate, consider a very extreme constraint where the consumer is
able to save but not to borrow: c0 ≤ y0. Thus the optimization problem of the
agent is:
max
c0≤y0
[u(c0) + βu(R0(A0 − y0 − c0) + y1)].
Denote the multiplier on the borrowing constraint by µ, the first-order condition is
given by:
u′(c0) = βR0u
′(R0(A0 + y0 − c0) + y1) + µ.
If the constraint does not bind, then the consumer has non-negative savings and the
familiar Euler equation for the two-period problem holds. However, if µ > 0, then
c0 = y0 and
u′(y0) > βR0u
′(y1).
The borrowing constraint is less likely to bind if βR0 is not very large and if y0 is
large relative to y1.
An important implication of the model with borrowing constraints is that con-
sumption will depend on the timing of income receipts and not just W0. That is,
imagine a restructuring of income that increased y0 and decreased y1 leaving W0
unchanged. In the absence of a borrowing restriction, consumption patterns would
159
not change. But, if the borrowing constraint binds, then this restructuring of income
will lead to an increase in c0 and a reduction in c1 as consumption “follows” income.
To the extent that this change in the timing of income flows could reflect govern-
ment tax policy (yt is then viewed as after tax income), the presence of borrowing
restrictions implies that the timing of taxes can matter for consumption flows and
thus for welfare.
The weakness of this and more general models is that the basis for the borrowing
restrictions is not provided. Given this, it is not surprising that researchers have
been interested in understanding the source of borrowing restrictions. We return to
this point below.
6.3 Infinite Horizon Formulation: Theory and Em-
pirical Evidence
We now consider the infinite horizon version of the optimal consumption problem.
In doing so, we see how the basic intuition of consumption smoothing and other
aspects of optimal consumption allocations carry over to the infinite horizon setting.
In addition, we introduce empirical evidence into our presentation.
6.3.1 Bellman’s equation for the Infinite Horizon Probem
Consider a household with a stock of wealth denoted by A, a current flow of income
y and a given return on its investments over the past period given by R−1. Then the
state vector of the consumer’s problem is (A, y, R−1) and the associated Bellman
equation is:
v(A, y, R−1) = max
c
u(c) + βEy′,R|R−1,yv(A
′, y′, R)
160
for all (A, y, R−1) where the transition equation for wealth is given by:
A′ = R(A + y − c).
We assume that the problem is stationary so that no time subscripts are necessary.80
This requires, among other things, the income and returns are stationary random
variables and that the joint distribution of (y′, R) depends only on (y, R−1).
The transition equation has the same timing as we assumed in the two period
problem: interest is earned on wealth plus income less consumption over the period.
Further, the interest rate that applies is not necessarily known at the time of the
consumption decision. Thus the expectation in Bellman’s equation is over the two
unknowns (y′, R′) where the given state variables provide information on forecasting
these variables.81
6.3.2 Stochastic Income
To analyze this problem, we first consider the special case where the return on
savings is known and the individual faces uncertainty only with respect to income.
We then build on this model by adding in a portfolio choice, endogenous labor
supply and borrowing restrictions.
Theory
In this case, we study:
v(A, y) = max
c
u(c) + βEy′|yv(A
′, y′) (6.7)
where A′ = R(A + y − c) for all (A, y). The solution to this problem is a policy
function that relates consumption to the state vector: c = φ(A, y). The first order
condition is:
161
u′(c) = βREy′|yvA(A
′, y′) (6.8)
which holds for all (A, y), where vA(A
′, y′) denotes ∂v(A′, y′)/∂A′.
Using (6.7) to solve for Ey′|yvA(A′, y′) yields the Euler equation:
u′(c) = βREy′|yu
′(c′). (6.9)
The interpretation of this equation is that the marginal loss of reducing consumption
is balanced by the discounted expected marginal utility from consuming the proceeds
in the following period. As usual, this Euler equation implies that a one-period
deviation from a proposed solution that satisfies this relationship will not increase
utility. The Euler equation, (6.9), holds when consumption today and tomorrow
is evaluated using this policy function. In the special case of βR = 1, the theory
predicts that the marginal utiliity of consumption follows a random walk.
In general, one cannot generate a closed-form solution of the policy function
from these conditions for optimality. Still, some properties of the policy functions
can be deduced. Given that u(c) is strictly concave, one can show that v(A, y) is
strictly concave in A. As argued in Chapter 2, the value function will inherit some
of the curvature properties of the return function. Using this and (6.8), the policy
function, φ(A, y), must be increasing in A. Else, an increase in A would reduce
consumption and thus increase A′. This would contradict (6.8).
As a leading example, consider the specification of utility where
u(c) =
c1−γ − 1
1 − γ
where γ = 1 is the special case of u(c) = ln(c). This is called the constant relative
risk aversion case (CRRA) since −cu′′(c)/u′(c) = γ.
Using this utility function, (6.9) becomes:
162
1 = βRE(
c′
c
)−γ
where the expectation is taken with respect to future consumption which, through
the policy function, depends on (A′, y′) . As discussed in some detail below, this
equation is then used to estimate the parameters of the utility function, (β, γ).
Evidence
Hall (1978) studies the case in which u(c) is quadratic so that the marginal utility
of consumption is linear. In this case, consumption itself is predicted to follow a
random walk. Hall uses this restriction to test the predictions of this model of
consumption. In particular, if consumption follows a random walk then:
ct+1 = ct + εt+1.
The theory predicts that the growth in consumption (εt+1) should be orthogonal to
any variables known in period t: Etεt+1 = 0. Hall uses aggregate quarterly data for
non durable consumption. He shows that lagged stock market prices significantly
predict consumption growth, which violates the permanent income hypothesis. 82
Flavin (1981) extends Hall’s analysis allowing for a general ARMA process for the
income. Income is commonly found as a predictor of consumption growth. Flavin
points out that this finding is not necessarily in opposition with the prediction of the
model. Current income might be correlated with consumption growth not because
of a failure of the permanent income hypothesis, but because current income signals
changes in the permanent income. However, she also rejects the model.
The importance of current income to explain consumption growth has been seen
as evidence of liquidity constraints (see section 6.3.5). A number of authors have
investigated this issue. 83 However, most of the papers used aggregate data to test
the model. Blundell et al. (1994) test the model on micro data and find that when
163
one controls for demographics and household characteristics, current income does
not appear to predict consumption growth. Meghir and Weber (1996) explicitly test
for the presence of liquidity constraints using a US panel data and do not find any
evidence.
6.3.3 Stochastic Returns: Portfolio choice
We already considered a simple portfolio choice problem for the two-period problem
so this discussion will be intentionally brief. We then turn to empirical evidence
based upon this model.
Theory
Assume that there are N assets available. Let R−1 denote the N -vector of gross
returns between the current and previous period and let A be the current stock of
wealth. Let si denote the share of asset i = 1, 2, …N held by the agent. Normalizing
the price of each asset to be unity, the current consumption of the agent is then:
c = A −
∑
i
si.
With this in mind, the Bellman equation is given by:
v(A, y, R−1) = max
si
u(A −
∑
i
si) + βER,y′|R−1,yv(
∑
i
Risi, y
′, R) (6.10)
where Ri is the stochastic return on asset i. Note that R−1 is in the state vector only
because of the informational value it provides on the return over the next period,
R.
The first order condition for the optimization problem holds for i = 1, 2, …, N
and is:
164
u′(c) = βER,y′|R−1,yRivA(
∑
i
Risi, y
′, R).
where again vA() is defined as ∂v()/∂A. Using (6.10) to solve for the derivative of
the value function, we obtain:
u′(c) = βER,y′|R−1,yRiu
′(c′) for i = 1, 2, ..N
where, of course, the level of future consumption will depend on the vector of returns,
R, and the realization of future income, y′.
This system of Euler equations forms the basis for financial models that link
asset prices to consumption flows. This system is also the basis for the argument
that conventional models are unable to explain the observed differential between the
return on equity and relatively safe bonds. Finally, these conditions are also used
to estimate the parameters of the utility function, such as the curvature parameter
in the traditional CRRA specification.
This approach is best seen through a review of Hansen and Singleton (1982). To
understand this approach recall that Hall uses the orthogonality conditions to test
a model of optimal consumption. Note that Hall’s exercise does not estimate any
parameters as the utility function is assumed to be quadratic and the real interest
rate is fixed. Instead, Hall essentially tests a restriction imposed by his model at
the assumed parameter values.
The logic pursued by Hansen-Singelton goes a step further. Instead of using the
orthogonality constraints to evaluate the predictions of a parameterized model they
use these conditions to estimate a model. In fact, if one imposes more conditions
than there are parameters (i.e. if the exercise if overidentified), then the researcher
can both estimate the parameters and test the validity of the model.
165
Empirical implementation
The starting point for the analysis is the Euler equation for the household’s problem
with N assets. We rewrite that first order condition here using time subscripts to
make clear the timing of decisions and realizations of random variables:
u′(ct) = βEtRit+1u
′(ct+1) for i = 1, 2, ..N (6.11)
where Rit+1 is defined as the real return on asset i between period t and t + 1. The
expectation here is conditional on all variables observed in period t. Unknown t + 1
variables include the return on the assets as well as period t + 1 income.
The power of the GMM approach derives from this first-order condition. Es-
sentially, the theory tells us that while ex post this first-order condition need not
hold, any deviations from it must be unpredictable given period t information. That
is, the period t + 1 realization say, of income, may lead the consumer to increase
consumption is period t + 1 thus implying that ex post (6.11) does not hold. This
deviation is not inconsistent with the theory as long as it was not predictable given
period t information.
Formally, define εit+1(θ) as
εit+1(θ) ≡
βRit+1u
′(ct+1)
u′(ct)
− 1, for i = 1, 2, ..N (6.12)
Thus εit+1(θ) is a measure of the deviation for an asset i. We have added θ as an
argument in this error to highlight its dependence on the parameters describing the
household’s preferences. Household optimization implies that
Et(ε
i
t+1(θ)) = 0 for i = 1, 2, ..N.
Let zt be a q-vector of variables that are in the period t information set.
84 This
restriction on conditional expectations implies:
E(εit+1(θ) ⊗ zt) = 0 for i = 1, 2, ..N. (6.13)
166
where ⊗ is the Kronecker product. So the theory implies the Euler equation errors
from any of the N first-order conditions ought to be orthogonal to any of the zt
variables in the information set. There are N.q restrictions created.
The idea of GMM estimation is then to find the vector of structural parameters
(θ) such that (6.13) holds. Of course, applied economists only have access to a
sample, say of length T . Let mT (θ) be an N.q-vector where the component relating
asset i to one of the variables in zt, z
j
t , is defined by:
1
T
T∑
t=1
(εit+1(θ)z
j
t ).
The GMM estimator is defined as the value of θ that minimizes
JT (θ) = mT (θ)
′WT mT (θ).
Here WT is an N qxN q matrix that is used to weight the various moment restrictions.
Hansen and Singleton (1982) use monthly seasonally adjusted aggregate data on
US non durable consumption or nondurables and services between 1959 and 1978.
They use as a measure of stock returns, the equally weighted average return on all
stocks listed on the New York Stock Exchange. They choose a constant relative
risk aversion utility function u(c) = c1−γ/(1 − γ). With this specification, there are
two parameters to estimate, the curvature of the utility function γ and the discount
factor β. Thus, θ = (β, γ) The authors use as instruments z
j
t lagged values of ε
i
t+1
and estimate the model with 1, 2, 4 or 6 lags. Depending on the number of lags and
the series used, they find values for γ which vary between 0.67 and 0.97 and values
for the discount factor between 0.942 and 0.998. As the model is overidentified, there
is scope for an overidentification test. Depending on the number of lags and the
series used, the test gives mixed results as the restrictions are sometimes satisfied
and sometimes rejected.
Note that the authors do not adjust for possible trends in the estimation. Sup-
167
pose that log consumption is characterized by a linear trend:
ct = exp(αt)c̃t
where c̃t is the detrended consumption. In that case, equation (6.12) is rewritten
as:
εit+1(θ) ≡
βe−αγRit+1c̃
−γ
t+1
c̃
−γ
t
− 1, for i = 1, 2, ..N
Hence the estimated discount factor is a product between the true discount factor
and a trend effect. Ignoring the trend would result in a bias for the discount rate.
6.3.4 Endogenous Labor Supply
Of course, it is natural to add a labor supply decision to this model. In that case,
we can think that the stochastic income, taken as given above, actually comes from
a stochastic wage (w) and a labor supply decision (n). In this case, consider the
following functional equation:
v(A, w) = max
A′,n
U (A + wn − (A′/R), n) + βEw′|wv(A′, w′)
for all (A, w). Here we have substituted in for current consumption so that the agent
is choosing labor supply and future wealth.
Note that the labor supply choice, given (A, A′), is purely static. That is, the
level of employment and thus labor earnings has no dynamic aspect other than sup-
plementing the resources available to finance current consumption and future wealth.
Correspondingly, the first order condition with respect to the level of employment
does not directly involve the value function and is given by:
wUc(c, n) = −Un(c, n). (6.14)
Using c=A + wn − (A′/R), this first order condition relates n to (A, w, A′). Denote
168
this relationship as n = ϕ(A, w, A′). This can then be substituted back into the
dynamic programming problem yielding a simpler functional equation:
v(A, w) = max
A′
Z(A, A′, w) + βEw′|wv(A
′, w′)
where
Z(A, A′, w) ≡ U (A + wϕ(A, w, A′) − (A′/R), ϕ(A, w, A′))
This simplified Bellman equation can be analyzed using standard methods, thus
ignoring the static labor supply decision. 85 Once a solution is found, the level of
employment can then be determined from the condition n = ϕ(A, w, A′).
Using a similar model, MaCurdy (1981) studies the labor supply of young men
using the Panel Study on Income Dynamics (PSID). The estimation of the model
is done in several steps. First, the intra period allocation (6.14) is estimated. The
coefficients are then used to get at the intertemporal part of the model.
To estimate the parameters of the utility function, one has to observe hours
of work and consumption, but in the PSID, total consumption is not reported.
To identify the model, the author uses a utility function which is separable between
consumption and labor supply. The utility function is specified as u(ct, nt) = γ1tc
ω1
t −
γ2tn
ω2
t , where γ1t and γ2t are two deterministic functions of observed characteristics
which might affect preferences such as age, education or the number of children.
With this specification, the marginal utility of leisure, Un(c, n) is independent of
the consumption decision. Using (6.14), hours of work can be expressed as:
ln(nt) =
ln wt
ω2 − 1
+
1
ω2 − 1
(ln Uc(ct, nt) − ln γ2t − ln ω2)
While the first term in the right-hand-side is observed, the second term contains
the unobserved marginal utility of consumption. Uc(ct, nt) can be expressed as a
169
function of the Lagrange multiplier associated with the wealth constraint in period
0:
Uc(ct, nt) =
λ0
βt(1 + r1) . . . (1 + rt)
The author treats the unobserved multiplier λ0, as a fixed effect and uses panel
data to estimate a subset of the parameters of the utility function using first differ-
ences. In a next step, the fixed effect is backed out. At this point, some additional
identification assumptions are needed. A specific functional form is assumed for the
Lagrange multiplier, written as a function of wages over the life cycle and initial
wealth, all of them being unobserved in the data set. The author uses then fixed
characteristics such that education or age to proxy for the Lagrange multiplier. The
author finds that a 10% increase in the real wage induces a one to five percent
increase in hours worked.
Eichenbaum et al. (1988) analyze the time series properties of a household model
with both a savings and a labor supply decision. They pay particular attention to
specifications in which preferences are non-separable, both across time and between
consumption and leisure contemporaneously. They estimate their model using GMM
on time series evidence on real consumption (excluding durables) and hours worked.
They find support for non-time separability in preferences though in some cases they
found little evidence against the hypothesis that preferences were separable within
a period.
6.3.5 Borrowing Constraints
The Model and Policy Function
The extension of the two period model with borrowing constraints to the infinite
horizon case is discussed by Deaton (1991). 86 One of the key additional insights
from extending the horizon is to note that even if the borrowing constraint does
170
not bind in a period, this does not imply that consumption and savings take the
same values as they would in the problem without borrowing constraints. Simply
put, consumers anticipate that borrowing restrictions may bind in the future (i.e.
in other states) and this influences their choices in the current state.
Following Deaton (1991), let x = A + y represent cash on hand. Then the
transition equation for wealth implies:
A′ = R(x − c)
where c is consumption. In the event that income variations are iid, we can write
the Bellman equation for the household as:
v(x) = max
0≤c≤x
u(c) + βEy′v(R(x − c) + y′) (6.15)
so that the return R is earned on the available resources less consumption, x − c.
Note that income is not a state variable here as it is assumed to be iid. Hence cash
on hand completely summarizes the resources available to the consumer.
The borrowing restriction takes the simple form of c ≤ x so that the consumer
is unable to borrow. Of course this is extreme and entirely ad hoc but it does allow
us to explore the consequences of this restriction. As argued by Deaton, the Euler
equation for this problem must satisfy:
u′(c) = max{u′(x), βREu′(c′)}. (6.16)
So, either the borrowing restriction binds so that c = x or it doesn’t so that the
more familiar Euler equation holds. Only for low values of x will u′(x) > βREu′(c′)
and only in these states, as argued for the two-period problem, will the constraint
bind. To emphasize an important point: even if the u′(x) < βREu′(c′) so that the
standard condition of
171
u′(c) = βREu′(c′)
holds, the actual state dependent levels of consumption may differ from those that
are optimal for the problem in which c is not bounded above by x.
Alternatively, one might consider a restriction on wealth of the form: A ≥
Amin(s) where s is the state vector describing the household. In this case the house-
hold may borrow but its assets are bounded below. In principle, the limit on wealth
may depend on the state variables of the household: all else the same, a household
with a high level of income may be able to borrow more. One can look at the im-
plications of this type of constraint and, through estimation, uncover Amin(s). (see
Adda and Eaton (1997)).
To solve the optimal problem, one can use the value function iteration approach,
described in chapters 2 and 3, based on the Bellman equation (6.15). Deaton (1991)
uses another approach, working from the Euler equation (6.16). The method is sim-
ilar to the projection methods presented in chapter 3, but the optimal consumption
function is obtained by successive iterations instead of solving a system of non linear
equations. Although there is no formal proof that iterations on the Euler equation
actually converge to the optimal solution, the author note that empirically conver-
gence always occur. Figure 6.1 displays the optimal consumption rule in the case
of a serially correlated income. In this case, the problem has two state variables,
the cash-on-hand and the current realization of income, which provide information
on future income. The policy rule has been computed using a (coarse) grid with
three points for the current income and with 60 equally spaced points for the cash-
on-hand. When cash-on-hand is low, the consumer is constrained and is forced to
consume all his cash-on-hand. The policy rule is then the 45 degree line. For higher
values of the cash-on-hand, the consumer saves part of the cash-on-hand for future
172
consumption.
[Figure 6.1 approximately here]
[Figure 6.2 approximately here]
Figure 6.2 displays a simulation of consumption and assets over 200 periods. The
income follows an AR(1) process with unconditional mean of 100, a persistence of
0.5 and the innovations to income are drawn from N (0, 10). The path of income is
asymmetric, as good income shocks are smoothed by savings whereas the liquidity
constraints prevents the smoothing of low income realizations. Consumption is
smoother than income, with a standard deviation of 8.9 instead of 11.5.
An Estimation Exercise
In section 6.3.3, we presented a GMM estimation by Hansen and Singleton (1982)
based on the Euler equation. Hansen and Singleton (1982) find a value for γ of
about 0.8. This is under the null that the model is correctly specified, and in
particular, that the Euler equation holds in each periods. When liquidity constraints
are binding, the standard Euler equation does not hold. An estimation procedure
which does not take into account this fact would produce biased estimates.
Suppose that the real world is characterized by potentially binding liquidity con-
straints. If one ignores them and consider a simpler model without any constraints,
how would it affect the estimation of the parameter γ?
To answer this question, we chose different values for γ, solved the model with
liquidity constraints and simulated it. The simulated consumption series are used
to get an estimate γ̂GM M such that:
γ̂GM M = Argmin
γ
1
T
T∑
t=1
εt(γ) with εt(γ) = β(1 + r)
c
−γ
t+1
c
−γ
t
− 1
[Table 6.1 approximately here]
173
The results are displayed in Table 6.1. When γ is low, the consumer is less risk
averse and consumes more out of the available cash-on-hand and saves less. The
result is that the liquidity constraints are binding more often. In this case, the bias
in the GMM estimate is the biggest. The bias is decreasing in the proportion of
liquidity constrained periods, as when liquidity constraints are almost absent, the
standard Euler equation holds. From Table 6.1, there is no value of γ which would
generate a GMM estimate of 0.8 as found by Hansen and Singelton.
6.3.6 Consumption Over the Life Cycle
Gourinchas and Parker (2001) investigate the ability of a model of intertemporal
choice with realistic income uncertainty to match observed life cycle profiles of con-
sumption. (For a related study see also Attanasio et al. (1999)). They parameterize
a model of consumption over the life cycle, which is solved numerically. The pa-
rameters of the model are estimated using a simulated method of moments method,
using data on household consumption over the life cycle. We first present a sim-
plified version of their model. We then discuss the numerical computation and the
estimation methods.
Following Zeldes (1989a) 87, the log income process is modelled as a random
walk with a moving average error. This specification is similar to the one used in
empirical work (see Abowd and Card (1989)) and seems to fit the data well. Denote
Yt the income of the individual:
Yt = PtUt
Pt = GtPt−1Nt
Income is the product of two components. Ut is a transitory shock which is indepen-
dently and identically distributed and takes a value of 0 with a probability p and a
positive value with a probability (1 − p). Pt is a permanent component which grows
at a rate Gt which depends on age. Nt is the innovation to the permanent compo-
174
nent. ln Nt and ln Ut, conditionally on Ut > 0, are normally distributed with mean
0 and variance σ2n and σ
2
u respectively. The consumer faces a budget constraint:
Wt+1 = (1 + r)(Wt + Yt − Ct)
The consumer can borrow and save freely. However, under the assumption that there
is a probability that income will be zero and that the marginal utility of consumption
is infinite at zero, the consumer will choose never to borrow against future income.
Hence, the outcome of the model is close to the one proposed by Deaton (1991) and
presented in section 6.3.5. Note that in the model, the agent can only consume non-
durables. The authors ignore the durable decision, or equivalently assume that this
decision is exogenous. This might be a strong assumption. Fernández-Villaverde
and Krueger (2001) argue that the joint dynamics of durables and non durables are
important to understand the savings and consumption decisions over the life cycle.
Define the cash-on-hand as the total of assets and income:
Xt = Wt + Yt Xt+1 = R(Xt − Ct) + Yt+1
Define Vt(Xt, Pt) as the value function at age T −t. The value function is indexed by
age as it is assumed that the consumer has a finite life horizon. The value function
depends on two state variables, the cash-on-hand which indicates the maximal limit
that can be consumed, and the realization of the permanent component which pro-
vides information on future values of income. The program of the agent is defined
as:
Vt(Xt, Pt) = max
Ct
[u(Ct) + βEtVt+1(Xt+1, Pt+1)]
The optimal behavior is given by the Euler equation:
u′(Ct) = βREtu
′(Ct+1)
As income is assumed to be growing over time, cash-on-hand and consumption are
also non-stationary. This problem can be solved by normalizing the variables by
175
the permanent component. Denote xt = Xt/Pt and ct = Ct/Pt. The normalized
cash-on-hand evolves as:
xt+1 = (xt − ct)
R
Gt+1Nt+1
+ Ut+1
Under the assumption that the utility function is u(c) = c(1−γ)/(1 − γ), the Euler
equation can be rewritten with only stationary variables:
u′(ct) = βREtu
′(ct+1Gt+1Nt+1)
As the horizon of the agent is finite, one has to postulate some terminal condition
for the consumption rule. It is taken to be linear in the normalized cash-on-hand:
cT +1 = γ0 + γ1xT +1.
Gourinchas and Parker (2001) use this Euler equation to compute numerically
the optimal consumption rule. Normalized consumption is only a function of the
normalized cash-on-hand. By discretizing the cash-on-hand over a grid, the problem
is solved recursively by evaluating ct(x) at each point of the grid using:
u′(ct(x)) = βR(1 − p)
∫ ∫
u′
(
ct+1
(
(x − ct)
R
Gt+1N
+ U
)
Gt+1N
)
dF (N )dF (U )
+βRp
∫
u′
(
ct+1
(
(x − ct)
R
Gt+1N
)
Gt+1N
)
dF (N )
The first term on the right-hand-side calculates the expected value of the future
marginal utility conditional on a zero income, while the second term is the ex-
pectation conditional on a strictly positive income. The integrals are solved by a
quadrature method (see Chapter 3). The optimal consumption rules are obtained
by minimizing the distance between the left hand side and the right hand side.
Figure 6.3 displays the consumption rule at different ages. 88
Once the consumption rules are determined, the model can be simulated to gen-
erate average life cycle profiles of consumption. This is done using the approximated
consumption rules and by averaging the simulated behavior of a large number of
176
households. The simulated profiles are then compared to actual profiles from US
data. Figure 6.4 displays the predicted consumption profile for two values of the in-
tertemporal elasticity of substitution , as well as the observed consumption profiles
constructed from the US Consumer Expenditure Survey. 89
More formally, the estimation method is the simulated method of moments (see
Chapter 4). The authors minimize the distance between observed consumption and
predicted one at different ages. As neither the cash-on-hand nor the permanent
component of income are directly observed, the authors integrate out the state
variables to calculate the unconditional mean of (log) consumption at a given age:
ln Ct(θ) =
∫
ln Ct(x, P, θ)dFt(x, P, θ)
where θ is the vector of parameters characterizing the model and where Ft() is the
density of the state variables for individuals of age t. Characterizing this density
is difficult as it has no closed form solution. Hence, the authors use simulations to
approximate ln Ct(θ). Denote
g(θ) =
1
It
It∑
i=1
ln Cit −
1
S
S∑
s=1
ln Ct(X
s
t , P
s
t , θ)
The first part is the average log consumption for households of age t and It is the
number of observed household in the data set. The second part is the average
predicted consumption over S simulated paths. θ is estimated by minimizing
g(θ)′W g(θ)
where W is a weighting matrix.
The estimated model is then used to analyze the determinant of savings. There
are two reasons to accumulate savings in this model. First, it cushions the agent
from uninsurable income shocks, to avoid facing a low marginal utility. Second,
savings are used to finance retirement consumption. Gourinchas and Parker (2001)
177
show that the precautionary motive dominates at least until age 40 whereas older
agents save mostly for retirement.
[Figure 6.3 approximately here]
[Figure 6.4 approximately here]
6.4 Conclusion
This chapter demonstrates how to use the approach of dynamic programming to
characterize the solution of the households optimal consumption problem and to
link it with observations. In fact, the chapter goes beyond the savings decision to
integrate it with the labor supply and portfolio decisions.
As in other chapters, there are numerous extensions that are open for the re-
searcher to consider. The next chapter is devoted to one of these, the introduction
of durable goods. Further, there are many policy related exercises that can be eval-
uated using one of these estimated models, included a variety of policies intended
to influence savings decisions.90
Chapter 7
Durable Consumption
7.1 Motivation
Up to now, the consumption goods we have looked at are all classified as either
nondurables or services. This should be clear since consumption expenditures af-
fected utility directly in the period of the purchase and then disappear.91 However,
durable goods play a prominent role in business cycles as durable expenditures are
quite volatile.92
This chapter studies two approaches to understanding durable consumption.
The first is an extension of the models studied in the previous chapter in which a
representative agent accumulates durables to provide a flow of services. Here we
present the results of Mankiw (1982) which effectively rejects the representative
agent model. 93
The second model introduces a non-convexity into the household’s optimization
problem. The motivation for doing so is evidence that households do not continu-
ously adjust their stock of durables. This section of the chapter explores this through
the specification and estimation of a dynamic discrete choice model.
178
179
7.2 Permanent Income Hypothesis Model of Durable
Expenditures
We begin with a model that builds upon the permanent income hypothesis structure
that we used in the previous chapter to study nondurable expenditures. We first
exhibit theoretical properties of the model and then discuss its empirical implemen-
tation.
7.2.1 Theory
To model expenditures on both durable and non-durable goods, we consider a model
of household behavior in which the consumer has a stock of wealth (A), a stock
of durable goods (D) and current income (y). The consumer uses wealth plus
current income to finance expenditures on current nondurable consumption (c) and
to finance the purchase of durable goods (e) at a relative price of p.
There are two transition equations for this problem. One is the accumulation
equation for wealth given by:
A′ = R(A + y − c − pe).
The accumulation equation for durables is similar to that used for capital held by
the business sector:
D′ = D(1 − δ) + e (7.1)
where δ ∈ (0, 1) is the depreciation rate for the stock of durables.
Utility depends on the flow of services from the stock of durables and the pur-
chases of nondurables. In terms of timing, assume that durables bought in the
current period yield services starting in the next period. So, as with capital there
180
is a time lag between the order and the use of the durable good.94
With these details in mind, the Bellman equation for the household is given by:
V (A, D, y, p) = max
D′,A′
u(c, D) + βEy′,p′|y,pV (A
′, D′, y′, p′) (7.2)
for all (A, D, y, p) with
c = A + y − (A′/R) − p(D′ − (1 − δ)D) (7.3)
and the transition for the stock of durables given by (7.1). The maximization gives
rise to two first-order conditions:
uc(c, D) = βREy′,p′|y,pVA(A
′, D′, y′) (7.4)
and
uc(c, D)p = βEy′,p′|y,pVD(A
′, D′, y′).
In both cases, these conditions can be interpreted as equating the marginal costs of
reducing either nondurable or durable consumption in the current period with the
marginal benefits of increasing the (respective) state variables in the next period.
Using the functional equation (7.2), we can solve for the derivatives of the value
function and then update these two first order conditions. This implies:
uc(c, D) = βREy′|yuc(c
′, D′) (7.5)
and
puc(c, D) = βEy′,p′|y,p[uD(c
′, D′) + p′(1 − δ)uc(c′, D′)] (7.6)
The first condition should be familiar from the optimal consumption problem
without durables. The marginal gain of increasing consumption is offset by the
reduction in wealth and thus consumption in the following period. In this specifi-
cation, the marginal utility of non-durable consumption may depend on the level
181
of durables. So, to the extent there is an interaction within the utility function be-
tween nondurable and durable goods, empirical work that looks solely at nondurable
consumption may be inappropriate.95
The second first order condition compares the benefits of buying durables with
the marginal costs. The benefits of a durable expenditure comes from two sources.
First, increasing the stock of durables has direct utility benefits in the subsequent
period. Second, as the Euler equation characterizes a one-period deviation from
a proposed solution, the undepreciated part of the additional stock is sold and
consumed. This is reflected by the second term on the right side. The marginal
cost of the durable purchase is the reduction in expenditures on nondurables that
the agent must incur.
A slight variation in the problem assumes that durables purchased in the current
period provide services starting that period. Since this formulation is also found
in the literature, we present it here as well. In this case, the dynamic programming
problem is:
V (A, D, y, p) = max
D′,A′
u(c, D′) + βEy′|yV (A
′, D′, y′, p′) (7.7)
for all (A, D, y, p) with c defined in (7.3).
Manipulation of the conditions for optimality implies (7.5) and
puc(c, D
′) = [uD(c, D
′) + βEy′,p′|y,pp
′(1 − δ)uc(c′, D
′′
)] (7.8)
If prices are constant (p = p′), then this becomes
uD(c, D
′) = βREy′|yuD(c
′, D
′′
).
This condition corresponds to a variation in which the stock of durables is reduced by
ε in the current period, the resources are saved and then used to purchase durables
in the subsequent period.96 As in the case of nondurable consumption, in the special
case of βR = 1, the marginal utility from durables follows a random walk.
182
Note too that regardless of the timing assumption, there are interactions between
the two Euler equations. One source of interrelationship arises if utility is not
separable between durables and nondurables (ucD �= 0). Further, shocks to income
will influence both durable and nondurable expenditures.
7.2.2 Estimation of a Quadratic Utility Specification
Mankiw (1982) studied the pattern of durable expenditures when u(c, D′) is sepa-
rable and quadratic. In this case, Mankiw finds that durable expenditures follows
an ARMA(1,1) process given by:
et+1 = a0 + a1et + εt+1 − (1 − δ)εt
where a1 = βR. Here the MA piece is parameterized by the rate of depreciation.
Empirically, Mankiw finds that estimating the model using U.S. data that δ is
quite close to 1. So, durables appear not to be so durable after all!
Adda and Cooper (2000b) study the robustness of Mankiw’s results across differ-
ent time periods, different frequencies and across countries (US and France). Their
results are summarized in the following table of estimates.
[Table 7.1 approximately here]
These are annual series for France and the US. The rows pertain to both aggre-
gated durable expenditures and estimates based on cars (both total expenditures
on cars (for France) and new car registrations). The model is estimated with and
without a linear trend.
For both countries, the hypothesis that the rate of depreciation is close to 100%
per year would not be rejected for most of the specifications. Mankiw’s ”puzzle”
seems to be robust across categories of durables, countries, time periods and the
method of detrending.
183
Over the past few years, there has been considerable effort to understand Mankiw’s
result. One approach, described below is to embellish the basic representation agent
model through the addition of adjustment costs and the introduction of shocks other
than variations in income. A second approach, coming from Bar-Ilan and Blinder
(1992) and Bertola and Caballero (1990), is to recognize that at the household level
durable expenditures are often discrete. We turn to these lines of research in turn.
7.2.3 Quadratic Adjustment Costs
Bernanke (1985) goes beyond this formulation by adding in price variations and
costs of adjustment. As he notes, it is worthwhile to look jointly at the behavior of
durable and nondurable expenditures as well.97 Consider the dynamic optimization
problem of:
V (A, D, y, p) = max
D′,A′
u(c, D, D′) + βEy′|yV (A
′, D′, y′, p′) (7.9)
for all (A, D, y, p) where the functional equation holds for all values of the state
vector. Bernanke assumes a quadratic utility function with quadratic adjustment
costs of the form:
u(c, D, D′) = −1
2
(c̄ − c)2 − a
2
(D̄ − D)2 − d
2
(D′ − D)2
where ct is non-durable consumption and Dt is the stock of durables. The adjustment
cost is part of the utility function rather than the budget constraints for tractability
reasons. Given the quadratic structure, the model (7.9) can be solved explicitly
as a (non-linear) function of the parameters. Current non-durable consumption
is a function of lagged non-durable consumption, the current and lagged stock of
durables and of the innovation to the income process. Durables can be expressed
as a function of the past stock of durables and of the innovation to income. The
two equations with an equation describing the evolution of income are estimated
184
jointly by non-linear three stage least squares where current income, non-durable
consumption and the stock of durables were instrumented to control for simultaneity
and for measurement error bias. Instruments are lagged measures of prices, non-
durable consumption, durable stocks and disposable income.
Overall, the model is rejected by the data when testing the over identifying
restrictions. The estimation of the cost of adjustment gives conflicting results as
described in more detailed in Bernanke (1985). The non-linear function of this
parameter implies an important cost of adjustment whereas the parameter itself is
not statistically different from zero.
Bernanke (1984) tests the permanent hypothesis model at the micro level by
looking at car expenditures for a panel of households. While, Bernanke does not
reject the model on this type of data, it is at odds with observations (described
below) as it predicts continuous adjustment of the stock whereas car expenditures
are typically lumpy at the individual level.
Exercise 7.1
Write a program to solve (7.9). Obtain the decision rules by the household. Use
these decision rules to create a panel data set, allowing households to have different
realizations of income. Consider estimating the Euler equations from the house-
hold’s optimization problem. If there were non-separabilities present in u(c, D, D′),
particularly ucD �= 0, which were ignored by the researcher, what types of “incorrect
inferences” would be reached?
7.3 Non Convex Adjustment Costs
The model explored in the previous section is intended to capture the behavior of
a representative agent. Despite its theoretical elegance, the model has difficulty
185
matching two aspects of the data. First, as noted above, Mankiw’s estimate of close
to 100% depreciation should be viewed as a rejection of the model. Second, there
is evidence at the household level that adjustment of the stock of durables is not
continuous. Instead, households purchases of some durables, such as cars as studied
by Lam (1991), are relatively infrequent. This may reflect irreversibility due to
imperfect information about the quality of used durable good, the discrete nature
of some durable goods or the nature of adjustment costs.
Bar-Ilan and Blinder (1992) and Bar-Ilan and Blinder (1988) present a simple
setting in which a fixed cost of adjustment implies inaction from the agent when the
stock of durable is not too far from the optimal one. They argue that the optimal
consumption of durables should follow an (S,s) policy. When the durable stock
depreciates to a lower value s, the agent increases the stock to a target value S as
depicted in Figure 7.1.
[Figure 7.1 approximately here]
7.3.1 General Setting
To gain some insight into the importance of irreversibility, consider the following
formalization of a model in which irreversibility is important. By this we mean that
due to some friction in the market for durables, households receive only a fraction
of the true value of a product they wish to sell. This can be thought of as a version
of Akerlof’s famous lemons problem.98
In particular, suppose that the price of durables is normalized to 1 when they
are purchases (e) but that the price of durables when they are sold (s) is given by
ps < 1. The Bellman equation for the household’s optimization problem is given by:
V (A, D, y) = max(V b(A, D, y), V s(A, D, y), V i(A, D, y)) (7.10)
where
186
V b(A, D, y) = max
e,A′
u(A + y − (A′/R) − e, D) + βEy′|yV (A′, D(1 − δ) + e, y′) (7.11)
V s(A, D, y) = max
s,A′
u(A + y −(A′/R) + pss, D) + βEy′|yV (A′, D(1−δ)−s, y′) (7.12)
V i(A, D, y) = max
A′
u(A + y − (A′/R), D) + βEy′|yV (A′, D(1 − δ), y′) (7.13)
for all (A, D, y). This is admittedly a complex problem as it includes elements of a
discrete choice (to adjust or not) and also an intensive margin (given adjustment,
the level of durable purchases (sales) must be determined).
The presence of a gap between the buying and selling price of durables will create
inaction. Imagine a household with a substantial stock of durables that experiences
an income loss say due to a layoff. In the absence of irreversibility (ps = 1), the
household may optimally sell off some durables. If a job is found and the income
flow returns, then the stock of durables will be rebuilt. However, in the presence
of irreversibility, the sale and subsequent purchase of durables is costly due to the
wedge between the buying and selling price of durables. Thus, in response to an
income shock, the household may be inactive and thus not adjust its stock.
The functional equation in (7.10) cannot be solved using linearization techniques
as there is no simple Euler equation given the discrete choice nature of the problem.
Instead, value function iteration techniques are needed. As in the dynamic discrete
choice problem specified in Chapter 3, one starts with initial guesses of the values
of the three options and then induces V (A, D, y) through the max operator. Given
these initial solutions, the iteration procedure begins. As there is also an intensive
margin in this problem (given adjustment, the stock of durables one can choose is a
continuous variable), a state space for durables as well as assets must be specified.
This is a complex setting but one that the value function iteration approach can
handle.
187
So, given a vector of parameters describing preferences and the stochastic pro-
cesses, policy functions can be created. In principle, these can be used to generate
moments that can be matched with observations in an estimation exercise. This is
described in some detail, for a different model, in the subsequent subsections.
7.3.2 Irreversibility and Durable Purchases
Grossman and Laroque (1990) develop a model of durable consumption and also
consider an optimal portfolio choice. They assume that the durable good is illiquid
as the agent incurs a proportional transaction cost when selling the good. The
authors show that under the assumption of a constant relative risk aversion utility
function, the state variable is the ratio of wealth A over the stock of durables D.
The optimal behavior of the agent is to follow an [s, S] rule, with a target s∗ ∈ [s, S].
The agent does not change the stock of durable if the ratio A/D is within the two
bands s and S. If the ratio drifts out of this interval, the agent adjusts it by buying
or selling the good such that A/D = s∗.
Eberly (1994) empirically investigates the relevance of some aspects of the Grossman-
Laroque model. She uses data from the Survey of Consumer Finances which reports
information on assets, income and major purchases. She estimates the bands s and
S. These bands can be computed by observing the ratio A/D for individuals just
before an adjustment is made. The target s∗ can be computed as the average ra-
tio just after adjustment. Eberly (1994) estimates the band width and investigates
its determinants. She finds that the year to year income variance and the income
growth rate are strong predictors of the width of the band.
Attanasio (2000) develops a more elaborate estimation strategy for these bands,
allowing for unobserved heterogeneity at the individual level. This heterogeneity
is needed as, conditional on household characteristics and the value of the ratio
188
of wealth to consumption, some are adjusting their stock and some are not. The
estimation is done by maximum likelihood on data drawn from the Consumer Ex-
penditure Survey. The width of the bands are functions of household characteristics
such as age and race. The estimated model is then aggregated to study the aggregate
demand for durables.
Caballero (1993) uses the Grossman and Laroque (1990) approach to investigate
the aggregate behavior of durable goods. The individual agent is assumed to follow
an [s,S] consumption rule because of transaction costs. In the absence of transac-
tion costs, the agent would follow a PIH type behavior as described in section 7.2.
Caballero postulates that the optimal behavior of the agent can be described by the
distance between the stock of durables held by the agent and the ”target” defined
as the optimal stock in the PIH model. The agent adjusts the stock when the gap
between the realized and the desired stock is big enough. In this setting, the state
variables are the stock of durables and the target. The target stock is assumed
to follow a known stochastic process. Hence in this model, it is assumed that the
evolution of the target is a sufficient statistic to inform of all the relevant economic
variables such as prices or income.
The aggregate demand for durables is the sum of all agents who decide to adjust
their stock in a given period. Hence, Caballero stresses the importance of the cross
sectional distribution of the gap between the target and the realized stock. When
there is an aggregate shock on the target, the aggregate response depends not only
on the size of the shock but also on the number of individuals close to the adjustment
line. The aggregate demand for durables can therefore display complicated dynamic
patterns. The model is estimated on aggregate US data.
189
7.3.3 A Dynamic Discrete Choice Model
Suppose that instead of irreversibility, there is a restriction that households can
have either no car or one car.99 Thus, by assumption, the household solves a dy-
namic discrete choice problem. We discuss solutions of that problem, estimation of
parameters and aggregate implications in this section.100
Optimal Behavior
We start with the dynamic programming problem as specified in Adda and Cooper
(2000b). At the start of a period, the household has a car of a particular age, a
level of income and a realization of a taste shock. Formally, the household’s state
is described by the age of its car, i, a vector Z = (p, Y, ε) of aggregate variables
and a vector z = (y) of idiosyncratic variables. Here, p is the relative price of the
(new) durable good. Current income is given by the sum Y + y where Y represents
aggregate income and y represents idiosyncratic shocks to nondurable consumption
that could reflect variations in household income or required expenditures on car
maintenance and other necessities.101 The final element in the state vector is a taste
shock, ε.
At every point in time, the household decides whether to retain a car of age
i, trade it or scrap it. If the household decides to scrap the car, then it receives
the scrap value of π and has the option to purchase a new car. If the household
retains the car, then it receives the flow of services from that car and cannot, by
assumption, purchase another car. Thus the household is constrained to own at
most a single car.
Formally, let Vi(z,Z) represent the value of having a car of age i to a household
in state (z, Z). Further, let Vki (z,Z) and V
r
i (z,Z) represent the values from keeping
and replacing an age i car in state (z, Z). Then,
190
Vi(z, Z) = max[V
k
i (z, Z), V
r
i (z, Z)]
where
V ki (z, Z) = u(si, y + Y, ε) + β(1 − δ)EVi+1(z′, Z′) + (7.14)
βδ{EV1(z′, Z′) − u(s1, y′ + Y ′, ε′) + u(s1, y′ + Y ′ − p′ + π, ε′)}
and
V ri (z, Z) = u(s1, y + Y − p + π, ε) + β(1 − δ)EV2(z′, Z′) +
βδ{EV1(z′, Z′) − u(s1, y′ + Y ′, ε′) + u(s1, y′ + Y ′ − p′ + π, ε′)}.
In the definition of V ki (z, Z), the car is assumed to be destroyed (from accidents
and breakdowns) with probability δ leading the agent to purchase a new car in the
next period. The cost of a new car in numeraire terms is p′ − π, which is stochastic
since the price of a new car in the next period is random. Further, since it is assumed
that there is no borrowing and lending, the utility cost of the new car is given by
u(s1, y
′ + Y ′, ε′) − u(s1, y′ + Y ′ − p′ + π, ε′) which exceeds p′ − π as long as u(·) is
strictly concave in nondurable consumption. It is precisely at this point that the
borrowing restriction appears as an additional transactions cost.
Adding in either borrowing and lending or the purchase and sale of used cars
presents no modelling difficulties. But adding in wealth as well as resale prices as
state variables certainly increases the dimensionality of the problem. This remains
as work in progress.
Exercise 7.2
Reformulate (7.14) to allow the household to borrow/lend and also to resell cars
in a used car market. What additional state variables would you have to add when
191
these choices are included? What are the new necessary conditions for optimal
behavior of the household?
Further Specification
For the application the utility function is defined to be additively separable between
durables and nondurables:
u(si, c) =
[
i−γ +
ε(c/λ)1−ξ
1 − ξ
]
where c is the consumption of non-durable goods, γ is the curvature for the service
flow of car ownership, ξ the curvature for consumption and λ is a scale factor. In
this specification, the taste shock (ε) influences the contemporaneous marginal rate
of substitution between car services and non-durables.
In order for the agent’s optimization problem to be solved, a stochastic process
for income, prices and the aggregate taste shocks must be specified. Aggregate
income, prices and the unobserved preference shock are assumed to follow a VAR(1)
process given by:102
Yt = µY + ρY Y Yt−1 + ρY ppt−1 + uY t
pt = µp + ρpY Yt−1 + ρpppt−1 + upt
εt = µε + ρεY Yt−1 + ρεppt−1 + uεt
The covariance matrix of the innovations u = {uY t, upt, uεt} is
Ω =
ωY ωY p 0ωpY ωp 0
0 0 ωε
As the aggregate taste shock is unobserved, we impose a block diagonal structure
on the VAR, which enables us to identify all the parameters involving prices and
aggregate income in a simple first step regression. This considerably reduces the
number of parameters to be estimated in the structural model. We allow prices and
income to depend on lagged income and lagged prices. 103
192
The aggregate taste shock potentially depends on lagged prices and income. The
coefficients of this process along with ωε are estimated within the structural model.
By allowing a positive correlation between the aggregate taste shock and lagged
prices, given that prices are serially correlated, we can reconcile the model with the
fact that sales and prices are positively correlated in the data. This allows us to
better capture some additional dynamics of sales and prices in the structural esti-
mation. An alternative way would be to model jointly the producer and consumer
side of the economy, to get an upward slopping supply curve. However, solving for
the equilibrium is computationally very demanding.
Solving the Model
The model is solved by the value function iteration method. Starting with an
initial guess for Vi(z, Z), the value function is updated by backward iterations until
convergence.
The policy functions that are generated from this optimization problem are of
an optimal stopping variety. That is, given the state of the household, the car is
scrapped and replaced if and only if the car is older than a critical age. Letting
hk(zt, Zt; θ) represent the probability that a car of age k is scrapped, the policy
functions imply that hk(zt, Zt; θ)=δ if k < J(zt, Zt;θ) and hk(zt, Zt;θ) = 1 otherwise.
Here J(zt, Zt; θ) is the optimal scrapping age in state (zt, Zt) when θ is the vector
of parameters describing the economic environment.
In particular, for each value of the idiosyncratic shock z, there is an optimal
scrapping age. Aggregating over all possible values of this idiosyncratic shock pro-
duces an aggregate policy function which indicates the fraction of cars of a given
vintage which are scrapped when the aggregate state of the world is Zt:
Hk(Zt, θ) =
∫
hk(zt, Zt, θ)φ(zt)dzt
193
where φ(·) is the density function of zt, taken to be the normal distribution. Hk(·) is
an increasing function of the vintage and bounded between δ and 1. The aggregated
hazard can be used to predict aggregate sales and the evolution of the cross section
distribution of car vintages over time. Letting ft(k) the period t cross sectional
distribution of k, aggregate sales are given by
St(Zt, θ) =
∑
k
Hk(Zt, θ)ft(k) (7.15)
From an initial condition on the cross sectional distribution, it is possible to generate
a time series for the cross sectional distribution given a particular parameterization
of the hazard function. The evolution of ft(k) is given by:
ft+1(k, Zt, θ) = [1 − Hk(Zt; θ)]ft(k − 1) for k > 1 (7.16)
and
ft+1(1, Zt, θ) = St(Zt, θ)
Thus for a given θ and a given draw of T aggregate shocks one can simulate both
sales and the cross sectional distribution. This can be repeated N times to produce
N simulated data sets of length T , which can be used in the estimation. Define
Stn(Zt, θ) = St(pt, Yt, εnt, θ) as the predicted aggregate sales given prices, aggregate
income and unobserved taste shock εnt. Define S̄t(Zt, θ) = 1/N
∑N
n=1 Snt(Zt, θ) as
the average aggregate sales conditional on prices, aggregate income and period t − 1
cross sectional distribution.
Estimation Method and Results
In total there are eight parameters to estimate: θ = {γ, δ, λ, ζ, σy, ρεY , ρεc, ωε}. The
estimation method follows Adda and Cooper (2000b) and is a mix between simulated
non-linear least squares and simulated method of moments. The first part of the
criterion matches predicted sales of new cars with the observed ones, conditional
194
on prices and aggregate income. The second part of the criterion matches the
predicted shape of the cross section distribution of car vintages to the observed one.
The objective function to minimize is written as the sum of the two criteria:
LN (θ) = αL1N (θ) + L2N (θ)
where N is the number of simulated draws for the unobserved aggregate taste shock
εnt. The two criteria are defined by:
L1N (θ) = 1T
∑T
t=1
[
(St − S̄t(θ))2 − 1N (N−1)
∑N
n=1(Stn(θ) − S̄t(θ))2
]
L2N (θ) =
∑
i={5,10,15,AR,M A} αi(F̄
i − F̄ i(θ))2
where S̄t(θ) is the average F̄
i, i = 5, 10, 15 is the average fraction of cars of age i
across all periods and F̄ i, i = AR, M A are the autoregressive and moving average
coefficients from an ARMA(1,1) estimated on aggregate sales.
The estimation uses two criteria for identification reasons. Matching aggregate
sales at each period extracts information on the effect of prices and income on
behavior and helps to identify the parameter of the utility function as well as the
parameters describing the distribution of the aggregate taste shock. However, the
model is able to match aggregate sales under different values for the agent’s optimal
stopping time. In other words, there can be different cross section distributions
that produce aggregated sales which are close to the observed ones. In particular,
the parameter δ is poorly identified by using only the first criterion. The second
criterion pins down the shape of the cross section distribution of car vintages.
[Figure 7.2 approximately here]
[Figure 7.3 approximately here]
The data come from France and the US and consists of the cross sectional dis-
tribution of car vintages over time, as well as the aggregate sales of new cars, prices
195
and aggregate income. The estimated aggregate hazard functions Ht(Z) over the
period 1972-1995 for France and 1981-1995 for the US are displayed in Figures 7.2
and 7.3. Note that the probability of replacement for young cars which is equal to
the δ is estimated at a low value between 5 to 10%. Hence, in contrast with the esti-
mated PIH models described in section 7.2, the model is able to produce a sensible
estimate of the rate of depreciation. Moreover, when estimating an ARMA(1,1), as
in section 7.2.2, on the predicted aggregate sales, the MA coefficient is estimated
close to zero as in the observed data. Hence, viewed from a PIH perspective, the
model appears to support a 100% depreciation rate at the aggregate level, whereas
at the micro level, the depreciation rate is low.
Once the model is estimated, Adda and Cooper (2000b) investigate the ability
of the model to reproduce a number of other features such as the impulse response
of sales to an increase in prices. They also use the estimated model to decompose
the source of variation in aggregate sales. Within the model, there are two main
sources, the endogenous evolution of the cross section distribution and the effect of
aggregate variables such as prices or income. Caballero (1993) seems to imply that
the evolution of the cross section distribution is an important determinant. However,
the empirical decomposition shows that its role is relatively minor, compared with
the effect of income and prices.
The Impact of Scrapping Subsidies
Adda and Cooper (2000a) uses the same framework to analyze the impact of scrap-
ping subsidies introduced first in France and later in a number of European countries
such as Spain or Italy.
From February 1994 to June 1995 the French government offered individuals
5000 francs (approximately 5 to 10% of the value of a new car) for the scrapping
of an old car (ten years or older) and the purchase of a new car. Sales of new cars
196
which had been low in the preceding period (see Figure 7.4) increased markedly
during the period the policy was in place. In September 1995 to September 1996,
the government re-introduced the policy, with an age limit of eight years. After
September 1996, the demand for new cars collapsed at a record low level.
As evident from Figure 7.4, the demand for cars is very cyclical and follows the
business cycle. The increased demand for new cars during the period 1994-1996
could be due either to the policy or to the cyclical nature of demand. If the latter
is true, the French government has been wasting money on car owners who would
have replaced their cars during that period anyway. Even if the increased demand
was entirely fueled by the scrapping subsidies, the government has been giving out
money to car owners who would have replaced their car in the periods ahead. The
effect of the policy is then to anticipate new sales, and creating future and potentially
bigger cycles in car demand. As a large number of new cars were sold in this period,
demand for new cars was low when the policy stopped, but a peak in demand is
likely to appear about 10 years after the policy as the cars bought in 1995-1996 are
scrapped.
[Figure 7.4 approximately here]
Adda and Cooper (2000a) estimate the model in section 7.3.3 on the pre-policy
period. The policy works through the scrapping price π, which is constant and at a
low value (around 500 French francs) before 1993. When the policy is in place, this
scrapping price increases and is age specific:
π(i) = 500 if i < 10
π(i) = 5000 if i ≥ 10
Given the estimated model, the effect of the policy can be simulated as well as the
counterfactual without the policy in place. This is done conditional on the cross
section distribution of cars at the beginning of the period and conditional on the
197
realized income and prices (prices of new cars are assumed to be independent of
the policy. While this is debatable, empirical evidence suggest that prices remained
stable throughout the period mainly because the government negotiated a stable
price with car producers).
While the first scrapping subsidy was largely unexpected by the consumers, the
second one was partly anticipated. Just after the first subsidy, there were discussions
on whether to implement a new one. This is taken into account in the model by
adding the scrapping price π(i) as a stochastic state variable. More precisely, π is
assumed to follow a first order Markov process, with four states. These four states
are described in Table 7.2. The first state models the 1994 reform and the second one
the 1995 reform. State 3 is a state with heightened uncertainty, in which there are
no subsidies. State 4 is the baseline state. In state 1, the scrap value is set at 5500 F
for cars older than 10 years. This state is not assumed to be very permanent: there
is only a one percent chance that the subsidy will be in effect in the next period,
conditional on being in force in the current period. In state 2, the scrap value is
also 5500F but for cars older than 8 years old.
[Table 7.2 approximately here]
Figures 7.5 and 7.6 display the predicted sales and government revenue relative
to baseline. The model captures the peak in sales during the two policies, as well
as the decline in between due to the uncertainty. The sales are lower for about 10
years, with little evidence of a subsequent peak. This result is in line with the one
discussed in section 7.3.3 where it was find that the evolution of the cross section
distribution has little effect on aggregate sales.
[Figure 7.5 approximately here]
[Figure 7.6 approximately here]
198
Government revenues are lower over the whole period. The government revenue
is formed by the value added taxes perceived from the purchases of new cars, minus
the scrapping subsidies given out for eligible cars. From the perspective of govern-
ment revenues, the policy is clearly undesirable. In terms of sales, the subsidies
accounted for about 8 to 10% of the increased demand.
Chapter 8
Investment
8.1 Overview/Motivation
This chapter studies capital accumulation. Investment expenditures are one of the
most volatile elements of the aggregate economy. From the perspective of policy
interventions, investment is also key. The dependence of investment on real interest
rates is critical to many discussions of the impact of monetary policy. Further, many
fiscal policy instruments, such as investment tax credits and accelerated depreciation
allowances, act directly through their influence on capital accumulation.
It should seem then that macroeconomics would have developed and evaluated
numerous models to meet this challenge. Yet, relative to the enormous work done
on consumption, research on investment lags behind. As noted in Caballero (1999),
this has changed dramatically in the last 10 or so years.104 Partly, we now have the
ability to characterize investment behavior in fairly rich settings. Combined with
plant-level data sets, researchers are able to confront a rich set of observations with
these sophisticated models.
Investment, with its emphasis on uncertainty and nonconvexities is a ripe area for
applications of dynamic programming techniques. In this chapter, we first analyze a
199
200
general dynamic optimization problem and then focus on special cases of convex and
non-convex adjustment costs. This then sets the stage for the empirical analyzes
that follow. We also discuss the use of these estimates for the analysis of policy
interventions.
8.2 General Problem
The unit of analysis will be the plant though for some applications (such as consider-
ation of borrowing constraints) focusing on the firm may be more appropriate. The
”manager” is assumed to maximize the value of the plant: there are no incentive
problems between the manager and the owners. The problem involves the choice of
factors of production that are rented for the production period, the hiring of labor
and the accumulation of capital. To focus on the investment decision, we assume
that demand for the variable inputs (denoted by x) is optimally determined given
factor prices (represented by the vector w) and the state variables of the plant’s
optimization problem, represented by (A, K). Here the vector of flexible factors of
production might include labor, materials and energy inputs into the production
process.
The result of this optimization leaves a profit function, denoted by Π(A, K)
which depends solely on the state of the plant, where
Π(A, K) = max
x
R(Â, K, x) − wx.
Here R(Â, K, x) denotes revenues given the inputs of capital (K), the variable factors
(x) and a shock to revenues and/or productivity, denoted by Â. The reduced form
profit function thus depends on the stochastic variable A, that encompasses both
 and w, and the stock of physical capital (K). Thus we often refer to A as
a profitability shock since it reflects variations in technology, demand and factor
201
prices.
Taking this profit function as given, we consider variations of the following sta-
tionary dynamic programming problem:
V (A, K, p) = max
K′
Π(A, K) − C(K′, A, K) − p(K′ − (1 − δ)K) + βEA′|AV (A′, K′, p′)
(8.1)
for all (A, K, p) where K′ = K(1 − δ) + I is the capital accumulation equation and
I is investment. Here unprimed variables are current values and primed variables
refer to future values. In this problem, the manager chooses the level of the future
capital stock denoted K′. The timing assumption is that new investment becomes
productive with a one-period lag. The rate of depreciation of the capital stock is
denoted by δ ∈ [0, 1]. The manager discounts the future at a fixed rate of β.105
Exercise 8.1
Suppose that, in contrast to (8.1), investment in period t is productive in that
period. Compare these two formulations of the investment problem. Assuming that
all functions are differentiable, create Euler equations for each specification. Explain
any differences.
Exercise 8.2
How would you modify (8.1) to allow the manager’s discount factor to be influ-
enced by variations in the real interest rate?
There are no borrowing restrictions in this framework. So, the choice of in-
vestment and thus future capital is not constrained by current profits or retained
earnings. We return to this issue later in the chapter when we discuss the implica-
tions of capital market imperfections.
202
There are two costs of obtaining new capital. The first is the direct purchase
price, denoted by p. Notice that this price is part of the state vector as it is a source
of variation in this economy.106
Second, there are costs of adjustment given by the function C(K′, A, K). These
costs are assumed to be internal to the plant and might include: installation costs,
disruption of productive activities in the plant, the need to retrain workers, the need
to reconfigure other aspects of the production process, etc. This function is general
enough to have components of both convex and non-convex costs of adjustment as
well as a variety of transactions costs.
8.3 No Adjustment Costs
To make clear the contribution of adjustment costs, it is useful to start with a
benchmark case in which these costs are absent: C(K′, A, K) ≡ 0 for all (K′, A, K).
Note though that there is still a time to build aspect of investment so that capital
accumulation remains forward looking. The first-order condition for the optimal
investment policy is given by:
βEA′,p′|A,pVk(A
′, K′, p′) = p (8.2)
where subscripts on the functions denote partial derivatives. This condition implies
that the optimal capital stock depends on the realized value of profitability, A, only
through an expectations mechanism: given the time to build, current profitability
is not relevant for investment except as a signal of future profitability. Further the
optimal capital stock does not depend on the current stock of capital. Using (8.1)
to solve for E(A′,p′|A,p)Vk(A′, K′, p′) yields:
βE(A′,p′|A,p)[Πk(A
′, K′) + (1 − δ)p′] = p. (8.3)
203
This condition has a natural interpretation. The cost of an additional unit of capital
today (p) is equated to the marginal return on capital. This marginal return has
two pieces: the marginal profits from the capital (Πk(A
′, K′)) and the resale value
of undepreciated capital at the future price ((1 − δ)p′).
Substituting for the future price of capital and iterating forward, we find:
pt = β
∞∑
τ =0
[β(1 − δ)]τ EAt+τ |At ΠK (Kt+τ +1, At+τ +1)
where pt is the price of capital in period t. So the firm’s investment policy equates
the purchase price of capital today with the discounted present value of marginal
profits in the future. Note that in stating this condition, we are assuming that the
firm will be optimally resetting its capital stock in the future so that (8.3) holds in
all subsequent periods.
While simple, the model without adjustment costs does not fit the data well.
Cooper and Haltiwanger (2000) argue that relative to observations, this model
without adjustment costs implies excessive sensitivity of investment to variations
in profitability. So, one of the empirical motivations for the introduction of adjust-
ment costs is to temper the otherwise excessively volatile movements in investment.
Further, this model is unable to match the observation of inaction in capital ad-
justment seen (and discussed below) in plant-level data. For these reasons, various
models of adjustment costs are considered.107
8.4 Convex Adjustment Costs
In this section, we assume that C(K′, A, K) is a strictly increasing, strictly convex
function of future capital, K′.108 The firm chooses tomorrow’s capital (K′) using its
conditional expectations of future profitability, A′. Of course, to the extent that A′
is correlated with A, current profits will be correlated with future profits.
204
Assuming that V (K, A, p) exists, an optimal policy, obtained by solving the
maximization problem in (8.1), must satisfy:
CK′(K
′, A, K) + p = βE(A′,p′|A,p)VK′(A
′, K′, p′). (8.4)
The left side of this condition is a measure of the marginal cost of capital accumula-
tion and includes the direct cost of new capital as well as the marginal adjustment
cost. The right side of this expression measures the expected marginal gains of
more capital through the derivative of the value function. This is conventionally
termed ”marginal Q” and denoted by q. Note the timing: the appropriate measure
of marginal Q is the expected discounted value for the following period due to the
one-period investment delay.
Using (8.1) to solve for E(A′,p′|A,p)VK′(A′, K′, p′), (8.4) can be simplified to an
Euler equation:
CK′(K
′, A, K) + p = βE(A′,p′|A,p){ΠK (K′, A′) + p′(1 − δ) − CK′(K′′, A′, K′)}. (8.5)
To interpret this necessary condition for an optimal solution, consider increasing
current investment by a small amount. The cost of this investment is measured on
the left side of this expression: there is the direct cost of the capital (p) as well as the
marginal adjustment cost. The gain comes in the following period. The additional
capital increases profits. Further, as the manager ”returns” to the optimal path
following this deviation, the undepreciated capital is valued at the future market
price p′ and adjustment costs are reduced.
Exercise 8.3
Suppose that the problem had been written, perhaps more traditionally, with the
choice of investment rather than the future capital stock. Derive and analyze the
resulting Euler equation.
205
8.4.1 Q Theory: Models
One of the difficult aspects of investment theory with adjustment costs is empirical
implementation. As the value function and hence its derivative is not observable,
(8.4) cannot be directly estimated. Thus the theory is tested either by finding a
suitable proxy for the derivative of V (A, K, p) or by estimating the Euler equation,
(8.5). We focus here on the development of a theory which facilitates estimation
based upon using the average value of the firm as a substitute for the marginal value
of an additional unit of capital.
This approach, called Q theory, places additional structure on (8.1). In particu-
lar, following Hayashi (1982), assume that: Π(K, A) is proportional to K, and that
the cost of adjustment function is quadratic.109 Further, we assume that the price
of capital is constant. So consider:
V (A, K) = max
K′
AK − γ
2
(
K′ − (1 − δ)K
K
)2
K −p(K′−(1−δ)K)+βEA′|AV (A′, K′)
(8.6)
As always, Bellman’s equation must be true for all (A, K). Suppose that the shock
to profitability, A, follows an autoregressive process given by:
A′ = ρA + �′
where |ρ| < 1 and �′ is white noise. The first order condition for the choice of the
investment level implies that the investment rate in (i ≡ I/K) is given by:
i =
1
γ
(βEA′|AVK (A
′, K′) − p). (8.7)
206
Here EA′|AVK (A′, K′) is again the expected value of the derivative of the value func-
tion, a term we called ”marginal Q”. To solve this dynamic programming problem,
we can guess at a solution and verify that it works. Given the linear-quadratic
structure of the problem, it is natural to guess that:
V (A, K) = φ(A)K
where φ(A) is some unknown function. Using this guess, expected marginal Q is a
function of A given by:
EA′|AVK (A
′, K′) = EA′|Aφ(A
′) ≡ φ̃(A).
Note that in this case the expected value of marginal and average Q (defined as
V (A, K)/K = φ(A)) are the same.110 Using this in the Euler equation implies that
i =
1
γ
(βφ̃(A) − p) ≡ z(A).
This expression implies that the investment rate is actually independent of the
current level of the capital stock.
To verify our guess, substitute this investment policy function into the original
functional equation implying:
φ(A)K = AK − γ
2
(z(A))2K − pz(A)K + βφ̃(A)K[(1 − δ) + z(A)]
must hold for all (A, K). Clearly, the guess that the value function is proportional to
K is indeed correct: the value of K cancels from the above expression. So, given the
conjecture that V (A, K) is proportional to K, we find an optimal investment policy
which confirm the asserted proportionality. The remaining part of the unknown
value function φ(A) is given implicity by the expression above.111
The result the value function is proportional to the stock of capital is, at this
point, a nice property of the linear-quadratic formulation of the capital accumulation
207
problem. In the discussion of empirical evidence, it forms the basis for a wide range
of empirical exercises since it allows the researcher to substitute the average value
of Q (observable from the stock market) for marginal Q (unobservable).
8.4.2 Q Theory: Evidence
Due to its relatively simple structure, the convex adjustment cost model is one of the
leading models of investment. In fact, as discussed above, the convex model is often
simplified further so that adjustment costs are quadratic, as in (8.6). Necessary
conditions for optimality for this model are expressed in two ways.
First, from the first-order conditions, the investment rate is linearly related to
the difference between the future marginal value of new capital and the current
price of capital, as in (8.7). Using the arguments from above, this marginal value of
capital can under some conditions be replaced by the average value of capital. This
sets the basis for the Q-theory empirical approach discussed below.
Second, one can base an empirical analysis on the Euler equation that emerges
from (8.6). This naturally leads to estimation using GMM and is discussed below
as well.
The discussion of estimation based upon Q-theory draws heavily upon two pa-
pers. The first by Gilchrist and Himmelberg (1995) provides a clean and clear
presentation of the basic approach and evidence on Q-theory based estimation of
capital adjustment models. A theme in this and related papers is that empirically
investment depends on variables other than average Q, particularly measures of cash
flow.
The second by Cooper and Ejarque (2001) works from Gilchrist and Himmel-
berg (1995) to explore the significance of imperfect competition and credit market
frictions.112 This paper illustrates the use of indirect inference.
208
Tests of Q theory on panel data are frequently conducted using an empirical
specification of:
(I/K)it = ai0 + a1βEq̄it+1 + a2(Xit/Kit) + υit (8.8)
Here the i subscript refers to firm or plant i and the t subscript represents time.
From (8.7), a1 should equal 1/γ. This is an interesting aspect of this specification:
under the null hypothesis, one can infer the adjustment cost parameter from this
regression. There is a constant term in the regression which is plant specific. This
comes from a modification of the quadratic cost of adjustment to:
C(K′, K) =
γ
2
(
K′ − (1 − δ)K
K
− ai)2K.
as in Gilchrist and Himmelberg (1995).113
Finally, this regression includes a third term, (Xit/Kit). In fact, Q theory does
not suggest the inclusion of other variables in (8.8) since all relevant information is
incorporated in average Q. Rather, these variables are included as a means of testing
the theory, where the theory predicts that these variables from the information set
should be insignificant. Hence researchers focus on the statistical and economic
significance of a2. In particular, Xit often includes financial variables as a way of
evaluating an alternative hypothesis in which the effects of financial constraints are
not included in average Q.
The results obtained using this approach have been mixed. Estimates of large
adjustment costs are not uncommon. Hayashi (1982) estimates a1 = 0.0423 and
thus γ of about 25. Gilchrist and Himmelberg (1995) estimate a1 at 0.033.
Further, many studies, estimate a positive value for a2 when Xit is a measure of
profits and/or cash flow.114 This is taken as a rejection of the Q theory, which of
course implies that the inference drawn about γ from the estimate of a1 may not
be valid. Moreover, the significance of the financial variables has lead researchers
209
to conclude that capital market imperfections must be present.
Cooper and Ejarque (2001) argue that the apparent failure of Q theory stems
from misspecification of the firm’s optimization problem: market power is ignored.
As shown by Hayashi (1982), if firms have market power, then average and marginal
Q diverge. Consequently, the substitution of marginal for average Q in the standard
investment regression induces measurement error that may be positively correlated
with profits.115 Cooper and Ejarque (2001) ask whether one might find positive and
significant a2 in (8.8) in a model without any capital market imperfections.
Their methodology follows the indirect inference procedures described in Gourier-
oux and Monfort (1996) and Gourieroux et al. (1993). This approach to estimation
was discussed in Chapter 4. This is a minimum distance estimation routine in which
the structural parameters of the optimization problem are chosen to bring the re-
duced form coefficients from the regression on the simulated data close to those from
the actual data. The key is that the same reduced form regression is run on both
the actual and simulated data.
Cooper and Ejarque (2001) use the parameter estimates of Gilchrist and Himmel-
berg (1995) for (8.8) as representative of the Q theory based investment literature.
Denote these estimates from their pooled panel sample using the average (Tobin’s)
Q measure by (a∗1, a
∗
2)= (.03, .24).
116 Cooper and Ejarque (2001) add three other
moments reported by Gilchrist and Himmelberg (1995): the serial correlation of
investment rates (.4), the standard deviation of profit rates (.3) and the average
value of average Q (3). Let Ψd denote the vector moments from the data. In the
Cooper and Ejarque (2001) study,
Ψd = [.03 .24 .4 .3 3].
The estimation focuses on two key parameters: the curvature of the profit func-
tion (α) and the level of the adjustment costs (γ). So, they set other parameters
210
at levels found in previous studies: δ = .15 and β = .95. This leaves (α,γ) and
the stochastic process for the firm-specific shocks to profitability as the param-
eters remaining to be estimated. Cooper and Ejarque (2001) estimate the serial
correlation (ρ) and the standard deviation (σ) of the profitability shocks while the
aggregate shock process is represented process as a two-state Markov process with
a symmetric transition matrix in which the probability of remaining in either of the
two aggregate states is .8.117
As described in Chapter 4, the indirect inference procedure proceeds, in this
application, by:
• given a vector of parameters, Θ ≡ (α,γ, ρ, σ), solve the firm’s dynamic pro-
gramming problem of
V (A, K) = max
K′
AKα−γ
2
(
K′ − (1 − δ)K
K
)2
K−p(K′−(1−δ)K)+βEA′|AV (A′, K′)
(8.9)
for all (A, K) using value function iteration. The method outlined in Tauchen
(1986) is used to create a discrete state space representation of the shock
process given (ρ, σ). Use this in the conditional expectation of the optimization
problem.
• given the policy functions obtained by solving the dynamic programming prob-
lem, create a panel data set by simulation
• estimate the Q theory model, as in (8.8), on the simulated model and calculate
relevant moments. Let Ψs(Θ) denote the corresponding moments from the
simulated data
• Compute J(Θ) defined as:
211
J(Θ) = (Ψd − Ψs(Θ))′W (Ψd − Ψs(Θ)) (8.10)
where W is an estimate of the inverse of the variance-covariance matrix of Ψd.
• The estimator of Θ, Θ̂, solves:
min
Θ
J(Θ).
The second row of Table 8.1 presents the estimates of structural parameters and
standard errors reported in Cooper and Ejarque (2001).118 Table 8.2 reports the
resulting regression results and moments. Here the row labelled GH95 represents
the regression results and moments reported by Gilchrist and Himmelberg (1995).
[Table 8.1 approximately here]
[Table 8.2 approximately here]
The model, with its four parameters, does a good job of matching four of the five
estimates/moments but is unable to reproduce the high level of serial correlation in
plant-level investment rates. This appears to be a consequence of the fairly low level
of γ which implies that adjustment costs are not very large. Raising the adjustment
costs will increase the serial correlation of investment.
The estimated curvature of the profit function of .689 implies a markup of about
15%.119 This estimate of α and hence the markup is not at variance with results
reported in the literature.
The other interesting parameter is the estimate of the level associated with the
quadratic cost of adjustment, γ. Relative to other studies, this appears quite low.
However, an interesting point from these results is that the estimate of γ is
not identified from the regression coefficient on average Q. From this table, the
212
estimated value of γ = .149 is far from the inverse of the coefficient on average Q
(about 4). So clearly the identification of the quadratic cost of adjustment parameter
from a2 is misleading in the presence of market power.
Exercise 8.4
Write a program to solve
V (A, K) = max
K′
AKα− γ
2
(
K′ − (1 − δ)K
K
)2
K−p(K′−(1−δ)K)+βEA′|AV (A′, K′)
(8.11)
using a value function iteration routine given a parameterization of the problem.
Use the results to explore the relationship of investment to average Q. Is there a
nonlinearity in this relationship? How is investment related to profitability in your
simulated data set?
8.4.3 Euler Equation Estimation
This approach to estimation shares with the consumption applications presented
in Chapter 6 a simple but powerful logic. The Euler equation given in (8.5) is a
necessary condition for optimality. In the quadratic cost of adjustment model case
this simplifies to:
it =
1
γ
[
β[Et(πK (At+1, Kt+1) + pt+1(1 − δ) +
γ
2
i2t+1 + γ(1 − δ)it+1] − pt
]
.
Let εt+1 be defined from realized values of these variables:
εt+1 = it −
1
γ
[
β[(πK (At+1, Kt+1) + pt+1(1 − δ) +
γ
2
i2t+1 + γ(1 − δ)it+1)] − pt
]
.
(8.12)
Then the restriction imposed by the theory is that Etεt+1 = 0. It is precisely
this orthogonality condition that the GMM procedure exploits in the estimation of
213
underlying structural parameters, θ = (β, γ, δ, α).
To illustrate, we have solved and simulated a model with quadratic adjustment
costs (γ = 2) with constant investment good prices. Using that data set, we can
estimate the parameters of the firm’s problem using GMM.
To make this as transparent as possible, assume that the researcher knows the
values of all parameters except for γ. Thus, we can rely on a single orthogonality
condition to determine γ. Suppose that we use the lagged profitability shock as the
instrument. Define
Ω(γ) =
1
T
∑
t
εt+1(γ)At (8.13)
The GMM estimate of γ is obtained from the minimization of Ω(γ). This function
is shown in Figure 8.1. Clearly, this function is minimized near γ = 2.120
[Figure 8.1 approximately here]
Whited (1998) contains a thorough review and analysis of existing evidence
on Euler equation estimation of investment models. As Whited notes, the Euler
equation approach certainly has a virtue over the Q-theory based model: there is no
need to try to measure marginal Q. Thus some of the restrictions imposed on the
estimation, such as the conditions specified by Hayashi, do not have to be imposed.
Estimation based upon an investment Euler equation generally leads to rejection of
the overidentifying restrictions and, as in the Q-theory based empirical work, the
inclusion of financial constraints improves the performance of the model.
The point of Whited (1998) is to dig further into these results. Importantly, her
analysis brings the importance of fixed adjustment costs into the evaluation of the
Euler equation estimation. As noted earlier and discussed at some length below,
investment studies have been broadened to go beyond convex adjustment costs to
match the observations of non-adjustment in the capital stock. Whited (1998) takes
214
this into account by dividing her sample into the set of firms which undertakes
positive investment. Estimation of the Euler equation for this subset is much more
successful. Further Whited (1998) finds that while financial variables are important
overall, they are also weakly relevant for the firms with ongoing investment.
These results are provocative. They force us to think jointly about the pres-
ence of non-convex adjustment costs and financial variables. We now turn to these
important topics.
8.4.4 Borrowing Restrictions
Thus far, we have ignored the potential presence of borrowing restrictions. These
have a long history in empirical investment analysis. As in our discussion of the
empirical Q-theory literature, financial frictions are often viewed as the source of
the significance of profit rates and/or cash flow in investment regressions.
There is nothing particularly difficult about introducing borrowing restrictions
into the capital accumulation problem. Consider:
V (A, K) = max
K′∈Γ(A,K)
AKα − γ
2
(
K′ − (1 − δ)K
K
)2
K (8.14)
− p(K′ − (1 − δ)K) + βEA′|AV (A′, K′) (8.15)
for all (A, K) where Γ(A, K) constrains the choice set for the future capital stock.
So, for example, if capital purchases had to be financed out of current profits, then
the financial restriction is
K′ − (1 − δ)K ≤ AKα (8.16)
so that
Γ(A, K) = [0, AKα + (1 − δ)K] (8.17)
The dynamic optimization problem with a restriction of this form can certainly
be evaluated using value function iteration techniques. The problem of the firm can
215
be broadened to include retained earnings as a state variable and to include other
financial variables in the state vector. There are a number of unresolved issues
though that have limited research in this area:
• What are the Γ(A, K) functions suggested by theory?
• For what Γ(A, K) functions is there a wedge between average and marginal
Q?
The first point is worthy of note: while we have many models of capital accu-
mulation without borrowing restrictions, the alternative model of investment with
borrowing restrictions is not on the table. Thus, the rejection of the model without
constraints in favor of one with constraints is not as convincing as it could be.
The second point, related to work by Chirinko (1993) and Gomes (2001), returns
to the evidence discussed earlier on Q theory based empirical models of investment.
The value function, V (A, K) that solves (8.15) contains all the information about
the constrained optimization problem. As long as this function is differentiable
(which restricts the Γ(A, K) function), marginal Q will still measure the return to
an extra unit of capital. The issue is whether the borrowing friction introduces a
wedge between marginal and average Q.121 Empirically, the issue is whether this
wedge between marginal and average Q can create the regression results such as
those reported in Gilchrist and Himmelberg (1995).
8.5 Non-Convex Adjustment: Theory
Empirically, one finds that at the plant level there are frequent periods of invest-
ment inactivity and also bursts of investment activity. Table 8.3 below, taken from
Cooper and Haltiwanger (2000), documents the nature of capital adjustment in
the Longitudinal Research Database (LRD), a plant level U.S. manufacturing data
216
set.122
[Table 8.3 approximately here]
Here inaction is defined as a plant level investment rate less than .01 and a spike
is an investment rate in excess of 20%. Clearly the data exhibit both inaction as
well as large bursts of investment.
As argued by Caballero et al. (1995), Cooper et al. (1999) and Cooper and
Haltiwanger (2000) it is difficult to match this type of evidence with a quadratic
cost of adjustment model. Thus we turn to alternative models which can produce
inaction. In the first type of model, we relax the convex adjustment cost structure
and assume that the costs of adjustment depend only on whether investment has
been undertaken and not its magnitude. We then consider a second type of model
in which there is some type of irreversibility. The next section reports on estimation
of these models.
8.5.1 Non-convex Adjustment Costs
For this formulation of adjustment costs, we follow Cooper and Haltiwanger (1993)
and Cooper et al. (1999) and consider a dynamic programming problem specified at
the plant level as:
V (A, K, p) = max{V i(A, K, p), V a(A, K, p)} for all (A, K, p) (8.18)
where the superscripts refer to active investment ”a” and inactivity ”i”. These
options, in turn, are defined by:
V i(A, K, p) = Π(A, K) + βEA′,p′|A,pV (A
′, K(1 − δ), p′)
and
217
V a(A, K, p) = max
K′
Π(A, K)λ − F K − p(K′ − (1 − δ)K) + βEA′,p′|A,pV (A′, K′, p′).
Here there are two costs of adjustment that are independent of the level of investment
activity. The first is a loss of profit flow equal to (1− λ). This is intended to capture
an opportunity cost of investment in which the plant must be shut down during a
period of investment activity. The second non-convex cost is simply subtracted from
the flow of profits as F K. The inclusion of K here is intended to capture the idea
that these fixed costs, while independent of the current level of investment activity,
may have some scale aspects to them.123 In this formulation, the relative price of
capital (p) is allowed to vary as well.
Before proceeding to a discussion of results, it might be useful to recall from
Chapter 3 how one might obtain a solution to a problem such as (8.18).124 The first
step is to specify a profit function, say Π(A, K) = AKα and to set the parameters,
(F, β, λ, α, δ) as well as the stochastic processes for the random variables (A, p).
Denote this parameter vector by Θ. The second step is to specify a space for
the state variables, (A, K, p) and thus for control variable K′. Once these steps
are complete, the value function iteration logic (subscripts denote iterations of the
mapping) takes over:
• provide an initial guess for V1(A, K, p), such as the one period solution
• using this initial guess, compute the values for the two options, V a1 (A, K, p)
and V i1 (A, K, p)
• using these values, solve for the next guess of the value function: V2(A, K, p) =
max {V a1 (A, K, p) , V i1 (A, K, p)}
• continue this process until convergence
218
• once the value function is known, it is straightforward to compute the set of
state variables such that action (inaction) are optimal as well as the investment
level in the event adjustment is optimal.
• given these policy functions, the model can be simulated to create either a
panel or a time series data set.
The policy function for this problem will have two important dimensions. First,
there is the determination of whether the plant will adjust its capital stock or not.
Second, conditional on adjustment, the plant must determine its level of investment.
As usual, the optimal choice of investment depends on the marginal value of capital
in the next period. However, in contrast to say the quadratic cost of adjustment
model, the future value of additional capital depends on future choice with respect
to adjustment. Thus there is no simple Euler equation linking the marginal cost of
additional capital today with future marginal benefit, as in (8.5), since there is no
guarantee that this plant will be adjusting its capital stock in the future period.
Note that the two types of costs have very different implications for the cyclical
properties of investment. In particular, when adjustment costs interfere with the
flow of profits (λ < 1) then it is more expensive to investment in periods of high
profitability. Yet, if the shocks are sufficiently correlated, there is a gain to investing
in good times. In contrast, if costs are largely lump sum, then given the time to
build aspect of the accumulation decision, the best time to invest is when it is prof-
itable to do so (A is high) assuming that these shocks are serially correlated. Thus
whether investment is procyclical or countercyclical depends on both the nature of
the adjustment costs and the persistence of shocks.
We shall discuss the policy functions for an estimated version of this model
below. For now, we look at a simple example to build intuition.
219
Machine Replacement Example
As an example, we turn to a modified version of the simple model of machine
replacement studied by Cooper and Haltiwanger (1993). Here there is no choice of
the size of the investment expenditure. Investment means the purchase of a new
machine at a net price of p. By assumption the old machine is scrapped. The size
of the new machine is normalized to 1.125
Further, to simplify the argument, we assume that new capital becomes pro-
ductive immediately. In addition, the price of new capital good is assumed to be
constant and can be interpreted as including the fixed cost of adjusting the capital
stock. In this case, we can write the Bellman equation as:
V (A, K) = max{V i(A, K), V a(A, K)}
for all (A, K) where the superscripts refer to active investment “a” and inactivity
“i”. These options, in turn, are defined by:
V i(A, K) = Π(A, K) + βEA′|AV (A
′, K(1 − δ))
and
V a(A, K) = Π(A, 1)λ − p + βEA′|AV (A′, (1 − δ)).
So here ”action” means that a new machine is bought and is immediately productive.
The cost of this is the net price of the new capital and the disruption caused by the
adjustment process. Let ∆(A, K) be the relative gains to action so:
∆(A, K) ≡ V a(A, K) − V i(A, K) = Π(A, 1)λ − Π(A, K) − p +
β
(
EA′|AV (A
′, (1 − δ)) − EA′|AV (A′, K(1 − δ))
)
220
The problem posed in this fashion is clearly one of the optimal stopping vari-
ety. Given the state of profitability (A), there is a critical size of the capital stock
(K∗(A)) such that machine replacement occurs if and only if K < K∗(A). To see
why this policy is optimal, note that by our timing assumption, V a(A, K) is in fact
independent of K. Clearly V i(A, K) is increasing in K. Thus there is a unique cross-
ing of these two functions at K∗(A). In other words, ∆(A, K) is decreasing in K,
given A with ∆(A, K∗(A)) = 0.
Is K∗ between 0 and 1? With Π(A, 0) sufficiently small, V i(A, K) < V a(A, K)
for K near 0. Hence, K∗ > 0. Further, with the costs of acquiring new capital
(p > 0, λ < 1), large enough and the rate of depreciation low enough, capital will
not be replaced each period: K∗ < 1. Thus there will be a ”replacement cycle” in
which there is a burst of investment activity followed by inactivity until the capital
ages enough to warrant replacement.126
The policy function is then given by z(A, K) ∈ {0, 1} where z(A, K) = 0 means
inaction and z(A, K) = 1 means replacement. From the argument above, for each
A there exists K∗(A) such that z(A, K) = 1 if and only if K ≤ K∗(A).
With the assumption that capital becomes productively immediately, the re-
sponse of K∗(A) to variations in A can be analyzed.127 Suppose for example that
λ = 1 and A is iid. In this case, the dependence of ∆(A, K) on A is solely through
current profits. Thus ∆(A, K) is increasing in A as long as the marginal productiv-
ity of capital is increasing in A, ΠAK (A, K) > 0. So, K
∗(A) will be increasing in A
and replacement will be more likely in good times.
Alternatively, suppose that λ < 1. In this case, during periods of high produc-
tivity it is desirable to have new capital but it is also costly to install it. If A is
positively serially correlated, then the effect of A on ∆(A, K) will reflect both the
direct effect on current profits and the effects on the future values. If the opportu-
nity cost is large (a small λ) and shocks are not persistent enough, then machine
221
replacement will be delayed until capital is less productive.
Aggregate Implications of Machine Replacement
This model of capital adjustment at the plant level can be used to generate aggre-
gate implications. Let ft(K) be the current distribution of capital across a fixed
population of plants. Suppose that the shock in period t, At, has two components,
At = atεt. The first is aggregate and the second is plant specific. Following Cooper
et al. (1999), assume that the aggregate shock takes on two values and the plant
specific shock takes on 20 values. Further, assume that the idiosyncratic shocks
are iid. With this decomposition, write the policy function as z(at, εt, Kt) where
z(at, εt, Kt) = 1 signifies actions and z(at, εt, Kt) = 0 indicates inaction. Clearly
the decision on replacement will generally depend differentially on the two types of
shocks since they may be drawn from different stochastic properties. For example,
if the aggregate shock is more persistent than the plant specific one, the response
to a variation in at will be larger than the response to an innovation in εt.
Define
H(at, K) =
∫
ε
z(at, εt, K)dGt(ε)
where Gt(ε) is the period t cumulative distribution function of the plant specific
shocks. Here H(at, K) is a hazard function representing the probability of adjust-
ment for all plants with capital K in aggregate state at. To the extent that the
researcher may be able to observe aggregate but not plant specific shocks, H(at, K)
represents a hazard that averages over the {0, 1} choices of the individual plants so
that H(at, K) ∈ [0, 1].
Using this formulation, let I(at; ft(K)) be the rate of investment in state at
given the distribution of capital holdings ft(K) across plants. Aggregate investment
is defined as:
222
I(at; ft(K)) =
∑
K
H(at, K)ft(K). (8.19)
Thus total investment reflects the interaction between the average adjustment haz-
ard and the cross sectional distribution of capital holdings.
The evolution of the cross sectional distribution of capital is given by:
gt+1((1 − δ)K) = (1 − H(at, K))gt(K) (8.20)
Expressions such as these are common in aggregate models of discrete adjust-
ment, see for example, Rust (1985) and Caballero et al. (1995). Given an initial
cross sectional distribution and a hazard function, a sequence of shocks will thus
generate a sequence of aggregate investment levels from (8.19) and a sequence of
cross sectional distributions from (8.20).
Thus the machine replacement problem can generate both a panel data set and,
through aggregation, time series as well. In principle, estimation from aggregate
data supplements the perhaps more direct route of estimating a model such as this
from a panel.
Exercise 8.5
Use a value function iteration routine to solve the dynamic optimization problem
with a firm when there are non-convex adjustment costs. Suppose there is a panel of
such firms. Use the resulting policy functions to simulate the time series of aggre-
gate investment. Now, use a value function iteration routine to solve the dynamic
optimization problem with a firm when there are quadratic adjustment costs. Create
a time series from the simulated panel. How well can a quadratic adjustment cost
model approximate the aggregate investment time series created by the model with
non-convex adjustment costs?
223
8.5.2 Irreversibility
The specifications considered thus far do not distinguish between the buying and
selling prices of capital. However, there are good reasons to think that investment
is at least partially irreversible so that the selling price of a unit of used capital is
less than the cost of a unit of new capital. This reflects frictions in the market for
used capital as well as specific aspects of capital equipment that may make them
imperfectly suitable for uses at other production sites. To allow for this, we alter
our optimization problem to distinguish the buying and selling prices of capital:
The value function for this specification is given by:
V (A, K) = max{V b(A, K), V s(A, K), V i(A, K)}
for all (A, K) where the superscripts refer to the act of buying capital ”b”, selling
capital ”s” and inaction ”i”. These options, in turn, are defined by:
V b(A, K) = max
I
Π(A, K) − I + βEA′|AV (A′, K(1 − δ) + I),
V s(A, K) = max
R
Π(A, K) + psR + βEA′|AV (A
′, K(1 − δ) − R)
and
V i(A, K) = Π(A, K) + βEA′|AV (A
′, K(1 − δ)).
Under the buy option, the plant obtains capital at a cost normalized to one.
Under the sell option, the plant retires R units of capital at a price ps. The third
option is inaction so that the capital stock depreciates at a rate of δ. Intuitively, the
gap between the buying and selling price of capital will produce inaction. Suppose
that there is an adverse shock to the profitability of the plant. If this shock was
known to be temporary, then selling capital and repurchasing it in the near future
224
would not be profitable for the plant as long as ps < 1. Thus inaction may be
optimal. Clearly though, the amount of inaction that this model can produce will
depend on both the size of ps relative to 1 and the serial correlation of the shocks.
128
8.6 Estimation of a Rich Model of Adjustment
Costs
Using this dynamic programming structure to understand the optimal capital deci-
sion at the plant (firm) level, we confront the data on investment decisions allowing
for a rich structure of adjustment costs.129 To do so, we follow Cooper and Halti-
wanger (2000) and consider a model with quadratic adjustment costs, non-convex
adjustment costs and irreversibility. We describe the optimization problem and then
the estimation results obtained by Cooper and Haltiwanger.
8.6.1 General Model
The dynamic programming problem for a plant is given by:
V (A, K) = max{V b(A, K), V s(A, K), V i(A, K)} (8.21)
for all (A, K) where, as above, the superscripts refer to the act of buying capital
”b”, selling capital ”s” and inaction ”i”. These options, in turn, are defined by:
V b(A, K) = max
I
Π(A, K) − F K − I − γ
2
[I/K]2K + βEA′|AV (A
′, K(1 − δ) + I),
V s(A, K) = max
R
Π(A, K) + psR − F K −
γ
2
[R/K]2K + βEA′|AV (A
′, K(1 − δ) − R)
and
225
V i(A, K) = Π(A, K) + βEA′|AV (A
′, K(1 − δ)).
Cooper and Haltiwanger (2000) estimate three parameters, Θ ≡ (F, γ, ps) and
assume that β = .95, δ = .069. Further, they specify a profit function of Π(A, K) =
AKθ with θ=.50 estimated from a panel data set of manufacturing plants.130 Note
that the adjustment costs in (8.21) exclude any disruptions to the production process
so that the Π(A, K) can be estimated and the shock process inferred independently
of the estimation of adjustment costs. If these additional adjustment costs were
added, then the profit function and the shocks would have to be estimated along
with the parameters of the adjustment cost function.
These parameters are estimated using an indirect inference routine. The reduced
form regression used in the analysis is:
iit = αi + ψ0 + ψ1ait + ψ2(ait)
2 + uit (8.22)
where iit is the investment rate at plant i in period t, ait is the log of a profitability
shock at plant i in period t and αi is a fixed effect.
131 This specification was chosen
as it captures in a parsimonious way the nonlinear relationship between investment
rates and fundamentals. The profitability shocks are inferred from the plant level
data using the estimated profit function.132 Cooper and Haltiwanger document the
extent of the nonlinear response of investment to shocks.
Table 8.4 reports Cooper and Haltiwanger’s results for four different models
along with standard errors. The first row shows the estimated parameters for the
most general model. The parameter vector Θ = [0.043, 0.00039, 0.967] implies the
presence of statistically significant convex and non-convex adjustment costs (but
non-zero) and a relatively substantial transaction cost. Restricted versions of the
model are also reported for purposes of comparison. Clearly the mixed model does
226
better than any of the restricted models.
[Table 8.4 approximately here]
Cooper and Haltiwanger argue that these results are reasonable.133 First, as
noted above a low level for the convex cost of adjustment parameter is consistent
with the estimates obtained from the Q-theory based models due to the presence
of imperfect competition. Further, the estimation implies that the fixed cost of
adjustment is about 0.04% of average plant level profits. Cooper and Haltiwanger
find that this cost is significant relative to the difference between adjusting and not
adjusting the capital stock. So, in fact, the estimated fixed cost of adjustment, along
with the irreversibility, produces a large amount of inaction. Finally, the estimated
selling price of capital is much higher than the estimate report in Ramey and Shapiro
(2001) for some plants in the aerospace industry.
Cooper and Haltiwanger (2000) also explore the aggregate implications of their
model. They contrast the time series behavior of the estimated model with both
convex and non-convex adjustment costs against one in which there are only convex
adjustment costs. Even though the model with only convex adjustment costs does
relatively poorly on the plant-level data, it does reasonably well in terms of matching
time series. In particular, Cooper and Haltiwanger (2000) find that over 90% of the
time series variation in investment created by a simulation of the estimated model
can be accounted for by a quadratic adjustment model. Of course, this also implies
that the quadratic model misses 10% of the variation.
Note too that this framework for aggregation captures the smoothing by ag-
gregating over heterogeneous plants but misses smoothing created by variations in
relative prices. From Thomas (2000) and Kahn and Thomas (2001) we know that
this additional source of smoothing can be quite powerful as well.
227
8.6.2 Maximum Likelihood Estimation
A final approach to estimation follows the approach in Rust (1987). Consider again,
for example, the stochastic machine replacement problem given by:
V (A, K, F ) = max{V i(A, K, F ), V a(A, K, F )} for all (A, K, F ) (8.23)
where:
V i(A, K, F ) = Π(A, K) + βEA′|AV (A
′, K(1 − δ), F ′)
and
V a(A, K, F ) = max
K′
Π(A, K)λ − F K − p(K′ − (1 − δ)K) + βEA′|AV (A′, K′, F ′).
Here we have added the fixed cost of adjustment into the state vector as we assume
that the adjustment costs are random at the plant level. Let G(F ) represent the cu-
mulative distribution function for these adjustment costs. Assume that these are iid
shocks. Then, given a guess for the functions {V (A, K, F ), V i(A, K, F ), V a(A, K, F )},
the likelihood of inaction can be computed directly from the cumulative distribu-
tion function G(·). Thus a likelihood function can be constructed which depends
on the parameters of the distribution of adjustment costs and those underlying the
dynamic optimization problem. From there, a maximum likelihood estimate can be
obtained.134
8.7 Conclusions
The theme of this chapter has been the dynamics of capital accumulation. From
the plant-level perspective, the investment process is quite rich and entails periods
228
of intense activity followed by times of inaction. This has been documented at the
plant-level. Using the techniques of the estimation of dynamic programming models,
this chapter has presented evidence on the nature of adjustment costs.
Many open issues remain. First, the time series implications of non-convexities is
still not clear. How much does the lumpiness at the plant-level matter for aggregate
behavior? Put differently, how much smoothing obtains from the aggregate across
heterogeneous plants as well as through variations in relative prices?
Second, there are a host of policy experiments to be considered. What, for exam-
ple, are the implications of investment tax credits given the estimates of adjustment
cost parameters?
Exercise 8.6
Add in variations in the price of new capital into the optimization problem given
in (8.21). How would you use this to study the impact of, say, an investment tax
credit?
Chapter 9
Dynamics of Employment
Adjustment
9.1 Motivation
This chapter studies labor demand. The usual textbook model of labor demand
depicts a firm as choosing the number of workers and their hours given a wage rate.
But, the determination of wages, employment and hours is much more complex than
this. The key is to recognize that the adjustment of many factors of production,
including labor, is not costless. We study the dynamics of capital accumulation
elsewhere in this book and in this chapter focus attention on labor demand.
Understanding the nature of adjustment costs and thus the factors determined
labor demand is important for a number of reasons. First, many competing models
of the business cycle depend crucially on the operation of labor markets. As empha-
sized in Sargent (1978), a critical point in distinguishing competing theories of the
business cycle is whether labor market observations could plausibly be the outcome
of a dynamic market clearing model. Second, attempts to forecast macroeconomic
conditions often resort to consideration of observed movements in hours and em-
229
230
ployment to infer the state of economic activity. Finally, policy interventions in the
labor market are numerous and widespread. These include: restrictions on wages,
restrictions on hours, costs of firing workers and so forth. Policy evaluate requires
a model of labor demand.
We begin the chapter with the simplest models of dynamic labor demand where
adjustment costs are assumed to be convex and continuously differentiable. These
models are analytically tractable as we can often estimate their parameters directly
from first-order conditions. However, they have implications of constant adjustment
that are not consistent with microeconomic observations. Nickell (1978) argues:
“One point worth noting is that there seems little reason to suppose
costs per worker associated with either hiring or firing increase with the
rate at which employees flow in or out. Indeed, given the large fixed costs
associated with personnel and legal departments, it may even be more
reasonable to suppose that the average cost of adjusting the workforce
diminishes rather than increases with the speed of adjustment.”
This quote is supported by recent evidence in Hamermesh (1989) and Caballero
et al. (1997) that labor adjustment is rather erratic at the plant level with periods of
inactivity punctuated by large adjustments. Thus this chapter goes beyond the con-
vex case and considers models of adjustment which can mimic these microeconomic
facts.
9.2 General Model of Dynamic Labor Demand
In this chapter, we consider variants of the following dynamic programming problem:
V (A, e−1) = max
h,e
R(A, e, h) − ω(e, h, A) − C(e, e−1) + βEA′|AV (A′, e). (9.1)
231
for all (A, e−1). Here A represents a shock to the profitability of the plant and/or
firm. As in our discussion of the investment problem, this shock could reflect vari-
ations in product demands or variations in the productivity of inputs. Generally
A will have a component that is common across plants, denoted a, and one that is
plant specific, denoted ε.135 The function R(A, e, h) represents the revenues which
depend on the hours worked (h) and the number of workers (e) as well as the prof-
itability shock. Other factors of production, such as capital, are assumed to be
rented and optimization over these inputs are incorporated into R(A, e, h).136
The function ω(e, h, A) is the total cost of hiring e workers when each supplies
h units of labor time. This general specification allows for overtime pay and other
provisions. Assume that this compensation function is increasing in both of its
arguments and is convex with respect to hours. Further, we allow this compensation
function to be state dependent. This may reflect a covariance with the idiosyncratic
profitability shocks (due, perhaps, to profit sharing arrangements) or an exogenous
stochastic component in aggregate wages.
The function C(e, e−1) is the cost of adjusting the number of workers. Hamer-
mesh (1993) and Hamermesh and Pfann (1996) provide a lengthy discussion of var-
ious interpretations and motivations for adjustment costs. This function is meant
to cover costs associated with:
• search and recruiting
• training
• explicit firing costs
• variations in complementary activities (capital accumulation, reorganization
of production activities, etc.)
It is important to note the timing implicit in the statement of the optimization
232
problem. The state vector includes the stock of workers in the previous period,
e−1. In contrast to the capital accumulation problem, the number of workers in the
current period is not predetermined. Instead, workers hired in the current period
are immediately utilized in the production process: there is no ”time to build”.
The next section of the chapter is devoted to the study of adjustment cost
functions such that the marginal cost of adjustment is positive and increasing in e
given e−1. We then turn to more general adjustment cost functions which allow for
more nonlinear and discontinuous behavior.
9.3 Quadratic Adjustment Costs
Without putting additional structure on the problem, particularly the nature of
adjustment costs, it is difficult to say much about dynamic labor demand. As a
starting point, suppose that the cost of adjustment is given by
C(e, e−1) =
η
2
(e − (1 − q)e−1)2. (9.2)
so C(e, e−1) is convex in e and continuously differentiable. Here, q is an exogenous
quit rate.
In this specification of adjustment costs, the plant/firm incurs a cost of changing
the level of employment relative to the stock of workers ((1 − q)e−1) that remain
on the job from the previous period. Of course, this is a modelling choice: one can
also consider the case where the adjustment cost is based on net rather than gross
hires.137
The first-order conditions for (9.1) using (9.2) are:
Rh(A, e, h) = ωh(e, h, A) and (9.3)
Re(A, e, h) − ωe(e, h, A) − η(e − (1 − q)e−1) + βEVe(A′, e) = 0. (9.4)
Here the choice of hours, given in (9.3) is static: the firm weighs the gains to the
233
increasing labor input against the marginal cost (assumed to be increasing in hours)
of increasing hours.
In contrast, (9.4) is a dynamic relationship since the number of employees is a
state variable. Assuming that the value function is differentiable, EVe(A
′, e′) can be
evaluated using (9.1) leading to:
Re(A, e, h) − ωe(e, h, A) − η(e − (1 − q)e−1) + βE[η(e′ − (1 − q)e)(1 − q)] = 0 (9.5)
The solution to this problem will yield policy functions for hours and employment
given the state vector. Let e = φ(A, e−1) denote the employment policy function
and h = H(A, e−1) denote the hours policy function. These functions jointly satisfy
(9.3) and (9.5).
As a benchmark, suppose there were no adjustment costs, η ≡ 0, and the com-
pensation function is given by:
ω(e, h, A) = eω̃(h).
Here compensation per worker depends only on hours worked. Further, suppose
that revenues depend on the product eh so that only total hours matters for the
production process. Specially,
R(A, e, h) = AR̃(eh) (9.6)
with R̃(eh) strictly increasing and strictly concave.
In this special case, the two first-order conditions can be manipulated to imply
1 = h
ω̃′(h)
ω̃(h)
.
So, in the absence of adjustment costs and with the functional forms given above,
hours are independent of either e or A. Consequently, all variations in the labor
input arise from variations in the number of workers rather than hours. This is
efficient given that the marginal cost of hours is increasing in the number of hours
234
worked while there are no adjustment costs associated with varying the number of
workers.
At another extreme, suppose there are adjustment costs (η �= 0). Further, sup-
pose that compensation is simply
ω(e, h, A) = eh
so there are no costs to hours variation. In this case, (9.3) implies AR̃′(eh) = 1.
Using this, (9.5) is clearly satisfied at a constant level of e. Hence, the variation
in the labor input would be only in terms of hours and we would never observe
employment variations.
Of course, in the presence of adjustment costs and a strictly convex (in h) com-
pensation function, the plant/firm will optimally balance the costs of adjustment
hours against those of adjusting the labor force. This is empirically relevant since
in the data both employment and hours variation are observed. Note though that
it is only adjustment in the number of workers that contains a dynamic element.
The dynamic in hours is derived from the dynamic adjustment of employees.138 It
is this tradeoff between hours and worker adjustment that lies at the heart of the
optimization problem.
Given functional forms, these first-order conditions can be used in an estimation
routine which exploits the implied orthogonality conditions. Alternatively, a value
function iteration routine can be used to approximate the solution to (9.1) using
(9.2). We consider below some specifications.
A Simulated Example
Here we follow Cooper and Willis (2001) and study the policy functions gener-
ated by a quadratic adjustment cost model with some particular functional form
assumptions.139 Suppose output is a Cobb-Douglas function of total labor input
235
(eh) and capital and assume the firm has market power as a seller. In this case,
consider:
R(A, e, h) = A(eh)α (9.7)
where α reflects labor’s share in the production function as well as the elasticity of
the demand curve faced by the firm.
Further, impose a compensation schedule that follows Bils (1987):
ω(e, h) = w ∗ e ∗
[
w0 + h + w1 (h − 40) + w2 (h − 40)2
]
(9.8)
where w is the straight-time wage.
Instead of working with (9.5), Cooper and Willis (2001) solve the dynamic pro-
gramming problem, (9.1), with the above functional forms, using value function
iteration. The functional equation for the problem is:
V (A, e−1) = max
h,e
A(eh)α − ω(e, h) − η
2
(e − e−1)2
e−1
+ βEA′|AV (A
′, e) (9.9)
for all (A, e−1).
In this analysis, decisions are assumed to be made at the plant level. Accordingly,
the profitability shocks are assumed to have two components: a piece that is common
across plants (an aggregate shock) and a piece that is plant specific. Both types of
shocks are assumed to follow first-order Markov processes. These are embedded in
the conditional expectation in (9.9).
In this formulation, the adjustment costs are paid on net changes in employment.
Further, the adjustment costs depend on the rate of adjustment rather than the
absolute change alone.140
The policy function that solves (9.9) is given by e = φ(A, e−1). This policy
function can be characterized given a parameterization of (9.9).
Cooper and Willis (2001) assume:
• Labor’s share is 0.65 and the markup is 25% so that α in (9.7) is .72 .
236
• the compensation function uses the estimates of Bils (1987) and Shapiro
(1986): {w0, w1, w2} = {1.5, 0.19, 0.03} and the straight time wage, w, is
normalized to 0.05 for convenience. The elasticity of the wage with respect to
hours is close to 1 on average
• the profitability shocks are represented by a first-order Markov process and
are decomposed into aggregate (A) and idiosyncratic components (ε). A ∈
{0.9, 1.1} and ε takes on 15 possible values. The serial correlation for the
plant-level shocks is 0.83 and is 0.8 for the aggregate shocks.141
This specification leaves open the parameterization of η in the cost of adjustment
function. In the literature, this is a key parameter to estimate.
The policy functions computed for two values of A at these parameter choices are
depicted in Figure 9.1. Here we have set η = 1 which is at the low end of estimates
in the literature. These policy functions have two important characteristics:
• φ(A, e−1) is increasing in (e−1).
• φ(A, e−1) is increasing in A: as profitability increases, so does the marginal
gain to adjustment and thus e is higher.
[Figure 9.1 approximately here]
The quadratic adjustment cost model can be estimated either from plant (firm)
data or aggregate data. To illustrate this, we next discuss the approach of Sargent
(1978). We then discuss a more general approach to estimation in a model with a
richer specification of adjustment costs.
Exercise 9.1
Write down the necessary conditions for the optimal choices of hours and em-
ployment in (9.9). Provide an interpretation of these conditions.
237
Sargent: Linear Quadratic Specification
A leading example of bringing the quadratic adjustment cost model directly to the
data is Sargent (1978). In that application, Sargent assumes there are two types of
labor input: straight-time and overtime workers. The production function is linear-
quadratic in each of the two inputs and the costs of adjustment are quadratic and
separable across the types of labor. As the two types of labor inputs do not interact
in either the production function or the adjustment cost function, we will focus on
the model of straight-time employment in isolation. Following, Sargent assume that
revenue from straight-time employment is given by:
R(A, e) = (R0 + A)e − (R1/2)e2 (9.10)
Here A is a productivity shock and follows an AR(1) process. Sargent does not
include hours variation in his model except through the use of overtime labor. Ac-
cordingly, there is no direct dependence of the wage bill on hours. Instead he assumes
that the wage rate follows an exogenous (with respect to employment) given by:
wt = ν0 +
i=n∑
i=1
νiwt−i + ζt. (9.11)
In principle, the innovation to wages can be correlated with the shocks to revenues.142
With this structure, the firm’s first-order condition with respect to employment
is given by:
βEtet+1 − et
(
R1
η
+ (1 + β)
)
+ et−1 =
1
η
(wt − R0 − At) (9.12)
From this Euler equation, current employment will depend on the lagged level of
employment (through the cost of adjustment) and on (expected) future values of
the stochastic variables, productivity and wages, as these variables influence the
future level of employment. As described by Sargent, the solution to this Euler
equation can be obtained so that employment in a given period depends on lagged
238
employment, current and (conditional expectations of) future wages and current and
(conditional expectations of) future productivity shocks. Given the driving process
for wages and productivity shocks, this conditional expectations can be evaluated
so that employment in period t is solely a function of lagged employment, current
and past wages. The past wages are relevant for predicting future wages.
Sargent estimates the resulting VAR model of wages employment using max-
imum likelihood techniques.143 The parameters he estimated included (R1, η, ρ)
where ρ is the serial correlation of the productivity shocks. In addition, Sargent
estimated the parameters of the wage process.
The model is estimated using quarterly data on total US civilian employment.
Interestingly, he also decides to use seasonally unadjusted data for some of the
estimation, arguing that, in effect, there is no reason to separate the responses to
seasonal and nonseasonal variations. The data are detrended to correspond to the
stationarity of the model.
He finds evidence of adjustment costs insofar as η is significantly different from
zero.144 Sargent [pg. 1041] argues that these results ”..are moderately comforting to
the view that the employment-real-wage observations lie along a demand schedule
for employment”.
Exercise 9.2
There are a number of exercises to consider working from this simple model.
1. Write a program to solve (9.9) for the employment and hours policy functions
using value function iteration. What are the properties of these policy functions?
How do these functions change as you vary the elasticity of the compensation func-
tion and the cost of adjustment parameter?
2. Solve (9.9) using a log-linearization technique. Compare your results with
those obtained by the value function iteration approach.
239
3. Consider some moments such as the relative variability of hours and employ-
ment and the serial correlations of these two variables. Calculate these moments
from a simulated panel and also from a time series constructed from the panel.
Look for studies that characterize these moments at the micro and/or aggregate lev-
els. Or, better yet, calculate them yourself. Construct an estimation exercise using
these moments.
4. Suppose that you wanted to estimate the parameters of (9.9) using GMM.
How would you proceed?
9.4 Richer Models of Adjustment
In part, the popularity of the quadratic adjustment cost structure reflects it tractabil-
ity. But, the implications of these models conflict with evidence of inactivity and
bursts at the plant level. Thus researchers have been motivated to consider a richer
set of models. Those are studied here and then are used for estimation purposes be-
low. For these models of adjustment, we discuss the dynamic optimization problem
and present policy functions.
9.4.1 Piecewise Linear Adjustment Costs
One of the criticisms of the quadratic adjustment cost specification is the implication
of continuous adjustment. At the plant-level, as mentioned earlier, there is evidence
that adjustment is much more erratic than the pattern implied by the quadratic
model. Piecewise linear adjustment costs can produce inaction.
For this case, the cost of adjustment function is:
C(e, e−1) =
{
γ+∆e if ∆e > 0
γ−∆e if ∆e < 0
. (9.13)
240
The optimal policy rules are then determined by solving (9.1) using this specification
of the adjustment cost function.
The optimal policy rule will look quite different from the one produced with
quadratic adjustment costs. This difference is a consequence of the lack of differen-
tiability in the neighborhood of zero adjustment. Consequently, small adjustments
will not occur since the marginal cost of adjustment does not go to zero as the size of
the adjustment goes to zero. Further, this specificiation of adjustment costs implies
there is no partial adjustment. Since the marginal cost of adjustment is constant,
there is no basis for smoothing adjustment.
The optimal policy is characterized by two boundaries: e−(A) and e+(A) If
e−1 ∈ [e−(A), e+(A)], then there is no adjustment. In the event of adjustment, the
optimal adjustment is to e−(A) if e−1 < e−(A) and is to e+(A) if e−1 > e+(A).
Following Cooper and Willis (2001) and using the same basic parameters as
described above, we can study the optimal policy function for this type of adjustment
cost. Assume that γ+ = γ− = .05 which produces inaction at the plant level in 23%
of the observations. 145 Then (9.1) along with (9.13) can be solved using value
function iteration and the resulting policy functions evaluated.
These are shown in Figure 9.2. Note that there is no adjustment for values of
e−1 in an interval: the employment policy function coincides with the 45 degree
line. Outside of that internal there are two targets: e−(A) and e+(A). Again, as
this policy function is indexed by the values of γ+ and γ−. So these parameters
can be estimated by matching the implications of the model against observations
of employment adjustment at the plant and/or aggregate levels. We will return to
this point below.
[Figure 9.2 approximately here]
Exercise 9.3
241
Specify the dynamic programming problem for labor adjustment using a piece-wise
linear adjustment cost structure. What determines the region of inaction? Study this
model numerically by solving the dynamic programming problem and obtaining policy
functions.
9.4.2 Non-Convex Adjustment Costs
The observations of inactivity at the plant level that motivate the piecewise linear
specification are also used to motivate consideration of fixed costs in the adjustment
process. As noted by Hamermesh and Pfann (1996) the annual recruiting activities
of economics departments provide a familiar example of the role of fixed costs. In
the US, hiring requires the posting of an advertisement of vacancies, the extensive
review of material provided by candidates, the travel of a recruiting team to a
convention site, interviews of leading candidates, university visits and finally a vote
to select among the candidates. Clearly there are fixed cost components to many of
these activities that comprise the hiring of new employees. 146
As a formal model of this, consider:
V (A, e−1) = max [V
a (A, e−1) , V
n (A, e−1)] (9.14)
for all (A, e−1) where V a (A, e−1) represents the value of adjusting employment and
V n (A, e−1) represents the value of not adjusting employment. These are given by
V a (A, e−1) = max
h,e
R(A, e, h) − ω(e, h) − F + βEA′|AV (A′, e) (9.15)
V n (A, e−1) = max
h
R(A, e−1, h) − ω(e−1, h) + βEA′|AV (A′, e−1). (9.16)
So, in this specification, the firm can either adjust the number of employees or
not. These two options are labelled action (V a (A, e−1)) and inaction (V n (A, e−1)).
In either case, hours are assumed to be freely adjusted and thus will respond to
242
variations in profitability even if there is no adjustment in the number of workers.
Note too that this specification assumes adjustment costs depend on gross changes
in the number of workers. In this way the model can potentially match the inaction
in employment adjustment at the plant level defined by zero changes in the number
of workers.
The optimal policy has three dimensions. First, there is the choice of whether
to adjust or not. Let z(A, e−1) ∈ {0, 1} indicate this choice where z(A, e−1) = 1
if and only if there is adjustment. Second, there is the choice of employment in
the event of adjustment. Let φ(A, e−1) denote that choice where φ(A, e−1) = e−1 if
z(A, e−1) = 0. Finally, there is the choice of hours, h(A, e−1), which will reflect the
decision of the firm whether or not to adjust employment. As these employment
adjustments depend on (A, e−1) through e = φ(A, e−1), one can always consider
hours to be a function of the state vector alone.
There are some rich trade-offs between hours and employment variations imbed-
ded in this model. Suppose that there is a positive shock to profitability: A rises. If
this variation is large and permanent, then the optimal response of the firm will be
to adjust employment. Hours will vary only slightly. If the shock to profitability is
not large or permanent enough to trigger adjustment, then by definition employment
will remain fixed. In that case, the main variation will be in worker hours.
These variations in hours and employment are shown in Figure 9.3. The policy
functions underlying this figure were created using a baseline parameters with fixed
costs at .1 of the steady state profits.147
[Figure 9.3 approximately here]
Exercise 9.4
Specify the dynamic programming problem for labor adjustment using a non-
convex adjustment cost structure. What determines the frequency of inaction? What
243
comovement of hours and employment is predicted by the model? What features
of the policy functions distinguish this model from the one with piece-wise linear
adjustment costs? Study this model numerically by solving the dynamic programming
problem and obtaining policy functions.
9.4.3 Asymmetries
As discussed in Hamermesh and Pfann (1996), there is certainly evidence in favor of
asymmetries in the adjustment costs. For example, there may be a cost of advertising
and evaluation that is proportional to the number of workers hired but no costs of
firing workers. Alternatively, it may be of interest to evaluate the effects of firing
costs on hiring policies as discussed in the context of some European economies.
It is relatively straightforward to introduce asymmetries into the model. Given
the approach to obtaining policy functions by solving (9.1) through a value function
iteration routine, asymmetries do not present any additional difficulties. As with
the other parameterizations of adjustment costs, these model can be estimated using
a variety of techniques. Pfann and Palm (1993) provide a nice example of this
approach. They specify an adjustment cost function of:
C(e, e−1) = −1 + eγ∆e − γ∆e +
1
2
η(∆e)2. (9.17)
where ∆e ≡ (e − e−1). If γ ≡ 0, then this reduces to (9.2) with q = 0.
As Pfann and Palm (1993) illustrate, the asymmetry in adjustment costs is
controlled by γ. For example, if γ < 0, then firing costs exceed hiring costs.
Using this model of adjustment costs, Pfann and Palm (1993) estimate parame-
ters using a GMM approach on data for manufacturing in the Netherlands (quarterly,
seasonally unadjusted data, 1971(I)-1984(IV)) and annual data for U.K. manufac-
turing. They have data on both production and nonproduction workers and the
244
employment choices are interdependent from the production function.
For both countries they find evidence of the standard quadratic adjustment cost
model: η is positive and significantly different from zero for both types of workers.
Moreover, there is evidence of asymmetry. They report that the costs of firing
production workers are lower than the hiring costs. But, the opposite is true for the
non-production workers.
9.5 The Gap Approach
The work in Caballero and Engel (1993b) and Caballero et al. (1997) pursues an
alternative approach to studying dynamic labor adjustment. Instead of solving an
explicit dynamic optimization problem, they postulate that labor adjustment will
respond to a gap between the actual and desired employment level at a plant. They
then test for nonlinearities in this relationship.
The theme of creating an employment target to define an employment gap as
a proxy for the current state is quite intuitive and powerful. As noted in our dis-
cussion of non-convex adjustment costs, when a firm is hit by a profitability shock,
a gap naturally emerges between the current level of employment and the level of
employment the firm would choose if there were no costs of adjustment. This gap
should then be a good proxy for the gains to adjustment. These gains, of course, are
then compared to the costs of adjustment which depend on the specification of the
adjustment cost function. This section studies some attempts to study the nature
of adjustment costs using this approach.148
The power of this approach is the simplification of the dynamic optimization
problem as the target level of employment summarizes the current state. However,
as we shall see, these gains may be difficult to realize. The problem arises from the
fact that the target level of employment and thus the gap is not observable.
245
To understand this approach, it is useful to begin with a discussion of the par-
tial adjustment model. We then return to evidence on adjustment costs from this
approach.
9.5.1 Partial Adjustment Model
Researchers often specify a partial adjustment model in which the firm is assumed
to adjust the level of employment to a target.149 The assumed model of labor
adjustment would be:
et − et−1 = λ(e∗ − et−1). (9.18)
So here the change in employment et −et−1 is proportional to the difference between
the previous level of employment and a target, e∗, where λ parameterizes how quickly
the gap is closed.
Where does this partial adjustment structure come from? What does the target
represent?
Cooper and Willis (2001) consider a dynamic programming problem given by:
£(e∗, e−1) = min
e
(e − e∗)2
2
+
κ
2
(e − e−1)2 + βEe∗′|e∗£(e∗′, e). (9.19)
where the loss depends on the gap between the current stock of workers (e) and the
target (e∗). The target is taken as an exogenous process though in general it reflects
the underlying shocks to profitability that are explicit in the optimizing model. In
particular, suppose that e∗ follows an AR(1) process with serial correlation of ρ.
Further, assume that there are quadratic adjustment costs, parameterized by κ.
The first-order condition to the optimization problem is:
(e − e∗) + κ(e − e−1) − βκE(e′ − e) = 0 (9.20)
where the last term was obtained from using (9.19) to solve for ∂£/∂e. Given
that the problem is quadratic, it is natural to conjecture a policy function in which
246
the control variable (e) is linearly related to the two elements of the state vector
(e∗, e−1).
e = λ1e
∗ + λ2e−1. (9.21)
Using this conjecture in (9.20) and taking expectations of the future value of e∗
yields:
(e − e∗) + κ(e − e−1) − βκ(λ1ρe∗ + (λ2 − 1)e) = 0. (9.22)
This can be used to solve for e as a linear function of (e∗, e−1) with coefficients
given by:
λ1 =
1 + βκλ1ρ
1 + κ − βκ(λ2 − 1)
(9.23)
and
λ2 =
κ
(1 + κ − βκ (λ2 − 1))
. (9.24)
Clearly, if the shocks follow a random walk (ρ = 1), then partial adjustment is
optimal (λ1 + λ2 = 1). Otherwise, the optimal policy created by minimization of
the quadratic loss is linear but does not dictate partial adjustment.
9.5.2 Measuring the Target and the Gap
Taking this type of model directly to the data is problematic as the target e∗ is not
observable. In the literature (see, for example, the discussion in Caballero and Engel
(1993b)) the target is meant to representation the destination of the adjustment
process. There are two representations of the target.
One, termed a static target, treats e∗ as the solution of a static optimization
problem, as if adjustment costs did not exist. Thus, e∗ solves (9.5) with η ≡ 0 and
hours set optimally.
A second approach is treats e∗ as the level of employment the firm would choose
if there were no adjustment costs for a single period. This is termed the frictionless
target. This level of employment solves e = φ(A, e) where φ(A, e−1) is the policy
247
function for employment for the quadratic adjustment cost model. Thus the target
is the level of employment where the policy function, contingent on the profitability
shock, crosses the 45 degree line, as in Figure 9.1.
Following Caballero et al. (1997) define the gap as the difference between desired
(e∗i,t) and actual employment levels (in logs):
z̃i,t ≡ e∗i,t − ei,t−1. (9.25)
Here ei,t−1 is number of workers inherited from the previous period. So z̃i,t
measures the gap between the desired and actual levels of employment in period
t prior to any adjustments but after any relevant period t random variables are
realized as these shocks are embedded in the target and thus the gap.
The policy function for the firm is assumed to be:150
∆ei,t = φ(z̃i,t). (9.26)
The key of the empirical work is to estimate the function φ(·).
Unfortunately, estimation of (9.26) is not feasible as the target and thus the gap
are not observable. So, the basic theory must be augmented with a technique to
measure the gap. There are two approaches in the literature corresponding to the
two notions of a target level of employment, described earlier.
Caballero et al. (1997) pursue the theme of a frictionless target. To implement
this, they postulate a second relationship between another (closely related) measure
of the gap, (z̃1i,t), and plant specific deviations in hours:
z̃1i,t = θ(hi,t − h̄). (9.27)
Here z̃1i,t is the gap in period t after adjustments in the level of e have been made:
z̃1i,t = z̃i,t − ∆ei,t.
The argument in favor of this approach again returns to our discussion of the
choice between employment and hours variation in the presence of adjustment costs.
248
In that case we saw that the firm chose between these two forms of increasing output
when profitability rose. Thus, if hours are measured to be above average, this will
reflect a gap between actual and desired workers. If there was no cost of adjustment,
the firm would choose to hire more workers. But, in the presence of these costs the
firm maintains a positive gap and hours worked are above average.
The key to (9.27) is θ. Since the left side of (9.27) is also not observable, the
analysis is further amended to generate an estimate of θ. Caballero et al. (1997)
estimate θ from:
∆ei,t = α − θ∆hi,t + εi,t. (9.28)
where the error term includes unobserved changes in the target level of employment,
∆e∗i,t) as well as measurement error. Caballero et al. (1997) note that the equation
may have omitted variable bias as the change in the target may be correlated with
changes in hours. From the discussion in Cooper and Willis (2001), this omitted
variable bias can be quite important.
Once θ is estimated, Caballero et al. (1997) can construct plant specific gap
measures using observed hours variations. In principle, the model of employment
adjustment using these gap measures can be estimated from plant level data. In-
stead, Caballero et al. (1997) focus on the aggregate time series implications of their
model. In particular, the growth rate of aggregate employment is given by:
∆Et =
∫
z
zΦ(z)ft(z) (9.29)
where Φ(z) is the adjustment rate or hazard function characterizing the fraction
of the gap that is closed by employment adjustment. From aggregate data, this
expression can be used to estimate Φ(z). As discussed in Caballero et al. (1997),
if Φ(z) is say a quadratic, then (9.29) can be expanded implying that employment
growth will depend on the first and third moments of the cross sectional distribution
249
of gaps.
The findings of Caballero et al. (1997) can be summarized as:
• using (9.28), θ is estimated at 1.26.
• the relationship between the average adjustment rate and the gap is nonlinear.
• they find some evidence of inaction in employment adjustment.
• aggregate employment growth depends on the second moment of the distribu-
tion of employment gaps.
In contrast, Caballero and Engel (1993b) do not estimate θ. Instead they cali-
brate it from a structural model of static optimization by a firm with market power.
In doing so, they are adopting a target that ignores the dynamics of adjustment.
From their perspective, the gap is defined using (9.25) where e∗i,t corresponds to the
solution of a static optimization problem over both hours and employment with-
out any adjustment costs. They argue that this static target will approximate the
frictionless target quite well if shocks are random walks. As with Caballero et al.
(1997), once the target is determined a measure of the gap can be created.
This approach to approximating the dynamic optimization problem is applied
extensively because it is so easy to characterize. Further, it is a natural extension
of the partial adjustment model. But as argued in Cooper and Willis (2001) the
approach may place excessive emphasis on static optimization.151
Caballero and Engel (1993b) estimate their model using aggregate observations
on net and gross flows for US manufacturing employment. They find that a quadratic
hazard specification fits the aggregate data better than the flat hazard specification.
The key point in both of these papers is the rejection of the flat hazard model.
Both Caballero et al. (1997) and Caballero and Engel (1993b) argue that the es-
timates of the hazard function from aggregate data imply that the cross sectional
250
distribution “matters” for aggregate dynamics. Put differently, both studies reject
a flat hazard specification in which a constant fraction of the gap is closed each
period.
Given that this evidence is obtained from time series, this implies that the non-
convexities at the plant-level have aggregate implications. This is an important
finding in terms of the way macroeconomists build models of labor adjustment.
To the extent that the flat hazard model is the outcome of a quadratic adjustment
cost model, both papers reject that specification in favor of a model that generates
some nonlinearities in the adjustment process. But, as these papers do not consider
explicit models of adjustment, one can not infer from these results anything about
the underlying adjustment cost structure.
Further, as argued by Cooper and Willis (2001), the methodology of these studies
may itself induce the nonlinear relationship between employment adjustment and
the gap. Cooper and Willis (2001) construct a model economy with quadratic
adjustment costs. They assume that shocks follow a first-order Markov process, with
serial correlation less than unity.152 They find that using either the Caballero et al.
(1997) or Caballero and Engel (1993b) measurements of the gap, the cross sectional
distribution of employment gaps may be significant in a time series regression of
employment growth.
9.6 Estimation of a Rich Model of Adjustment
Costs
Thus far we have discussed some evidence associated with the quadratic adjustment
cost models and provided some insights into the optimal policy functions from more
complex adjustment cost models. In this section we go a step further and discuss
251
attempts to evaluate models that may have both convex and non-convex adjustment
costs.
As with other dynamic optimization problems studied in this book, there is, of
course, a direct way to estimate the parameters of labor adjustment costs. This
requires the specification of a model of adjustment that nests the variety of special
cases described above along with a technique to estimate the parameters. In this
subsection, we outline this approach.153
Letting A represent the profitability of a production unit (e.g. a plant), we
consider the following dynamic programming problem:
V (A, e−1) = max
h,e
R(A, e, h) − ω(e, h, A) − C (A, e−1, e) + βEA′|AV (A′, e). (9.30)
As above, let,
R(A, e, h) = A(eh)α (9.31)
where the parameter α is again determined by the shares of capital and labor in the
production function as well as the elasticity of demand.
The function ω(e, h, A) represents total compensation to workers as a function of
the number of workers and their average hours. As before, this compensation func-
tion could be taken from other studies or perhaps a constant elasticity formulation
might be adequate: w = w0 + w1h
ζ .
The costs of adjustment function nests quadratic and non-convex adjustment
costs of changing employment
C (A, e−1, e) =
F H + ν
2
(
e−e−1
e−1
)2
e−1, if e > e−1
F F + ν
2
(
e−e−1
e−1
)2
e−1, if e < e−1
(9.32)
where F H and F F represent the respective fixed costs of hiring and firing workers.
Note that quadratic adjustment costs are based upon net not gross hires. In (9.32),
252
ν parameterizes the level of the adjustment cost function. This adjustment cost
function yields the following dynamic optimization problem
V (A, e−1) = max
{
V H (A, e−1), V
F (A, e−1), V
N (A, e−1)
}
(9.33)
for all (A, e−1) where N refers to the choice of no adjustment of employment. These
options are defined by:
V H (A, e−1) = max
h,e
R(A, e, h) − ω(e, h, A) − F H
−ν
2
(
e − e−1
e−1
)2
e−1 + βEA′|AV (A
′, e) if e > e−1
V F (A, e−1) = max
h,e
R(A, e, h) − ω(e, h, A) − F F
−ν
2
(
e − e−1
e−1
)2
e−1 + βEA′|AV (A
′, e) if e < e−1
V N (A, e−1) = max
h
R(A, e−1, h) − ω(e−1, h, A) + βEA′|AV (A′, e−1)
This problem looks formidable. It contains both an extensive (adjustment or
no adjustment) as well an an intensive (the choice of e, given adjustment) margin.
Further, there is no simple Euler equation to study given the non-convex adjustment
costs.154
But, given the methodology of this book, attacking a problem like this is feasible.
In fact, one could build additional features into this model, such as allowing for a
piece-wise linear adjustment cost a structure.155
From our previous discussion, we know that “solving” a model with this com-
plexity is relatively straightforward. Let Θ represent the vector of parameters nec-
essary to solve the model.156 Then, for a given value of this vector, a value function
iteration procedure will generate a solution to (9.30).
253
Once a solution to the functional equation is obtained, then policy functions can
be easily created. Figure 9.4 produces a policy function for the case of η = 1 and
F F = F H = .01.
[Figure 9.4 approximately here]
One can obtain correlations from a simulated panel. For this parameterization,
some moments of interest are: corr(e,A)=.856; corr(h,A)=.839 and corr(e,h)=.461.
Clearly, employment and hours adjustment are both positively related to the shock.
Further, we find that the correlation of hours and employment is positive indicating
that the adjustment towards a target, in which the correlation is negative, is offset
by the joint response of these variables to a shock.
Computation of these moments for a given Θ opens the door to estimation. If
these moments can be computed for a given Θ, then:
• it is easy to compute other moments (including regression coefficients)
• it is easy to find a value of Θ to bring the actual and simulated moments close
together
The techniques of this book are then easily applied to a study of labor market
dynamics using either panel data or time series.157 Of course, this exercise may be
even more interesting using data from countries other than the US who, through
institutional constraints, have richer adjustment costs.
9.7 Conclusions
This point of this chapter has been to explore the dynamics of labor adjustment. In
the presence of adjustment costs, the conventional model of static labor demand is
254
replaced by a possibly complex dynamic optimization problem. Solving these prob-
lems and estimating parameters using either plant-level or aggregate observations
is certainly feasible using the techniques developed in this book.
In terms of policy implications, governments often impose restrictions on em-
ployment and hours. The dynamic optimization framework facilitates the analysis
of those interventions.158 Further, these policies (such as restrictions on hours and/or
the introduction of firing costs) may provide an opportunity to infer key structural
parameters.159
Chapter 10
Future Developments
10.1 Overview/Motivation
This final section of this book covers an assortment of additional topics. These
represent active areas of research which utlize the approach of this book. In some
cases, the research is not yet that far along. Examples of this would include ongoing
research on the integration of pricing and inventory problems or the joint evolution
of capital and labor. In a second category are search models of the labor market
which illustrate the usefulness of empirical work on dynamic programming though
generally are not part of standard course in applied macroeconomics.
Consequently, the presentation is different than other chapters. Here we focus
mainly on the statement of coherent dynamic optimization problems and properties
of policy functions. To the extent that there are empirical studies, we summarize
them.
10.2 Price Setting
We begin with a very important problem in macroeconomics, the determination of
prices. For this discussion, we do not rely on the Walrasian auctioneer to miracu-
255
256
lously set prices. Instead, we allow firms to set prices and study this interaction in
a monopolistic competition setting.160
The natural specification includes a fixed cost of adjusting prices so that the
firm optimally chooses between adjusting or not. Hence we term this the state de-
pendent pricing model. These have been most recently termed “menu cost” models
to highlight the fact that a leading parable of the model is one where a seller finds
it costly to literally change the posted price. In fact, this terminology is somewhat
unfortunate as it tends to trivialize the problem. Instead, it is best to view these
costs as representing a wide range of sources of frictions in the pricing of goods.
Besides presenting a basic optimization problem, this section summarizes two
empirical exercises. The first reports on an attempt to use indirect inference to
estimate the cost of price adjustment for magazine prices. The second is a study of
the aggregate implications of state dependent pricing.
10.2.1 Optimization Problem
Consider a dynamic optimization problem at the firm level where, by assumption,
prices are costly to adjust. The firm has some market power, represented by a
downward sloping demand curve. This demand curve may shift around so that
the price the firm would set in the absence of adjustment costs is stochastic. The
question is: how, in the presence of adjustment costs, do firms behave?
Suppose, to be concrete, that product demand comes from the CES specification
of utility so that the demand for product i is given by:
qdi (p, D, P ) =
( p
P
)−γ D
P
(10.1)
Here all variables are nominal. The price of product i is p while the general price level
is P . Finally, nominal spending, taken to be exogenous and stochastic is denoted
257
D.
Given this specification of demand and the realized state, (p, D, P ), the firm’s
real profits are:
π(p, D, P ) = qdi (p, D, P )
p
P
− c(qdi (p, D, P )) (10.2)
where c(·) is assumed to be a strictly increasing and strictly convex function of
output.
The dynamic optimization problem of the firm, taking the current values and
evolution of (D, P ) as given, is:
V (p, D, P, F ) = max{V a(p, D, P, F ), V n(p, D, P, F )} (10.3)
for all (p, D, P, F ) where
V a(p, D, P, F ) = maxp̃ π(p̃, D, P ) − F + βE(D′,P ′,F ′|D,P,F )V (p̃, D′, P ′, F ′) (10.4)
V n(p, D, P, F ) = π(p, D, P ) + βE(D′,P ′,F ′|D,P,F )V (p, D
′, P ′, F ′) (10.5)
Here the state vector is (p, D, P, F ). The cost of changing a price is F . It enters
the state vector since, in this specification, we allow this adjustment cost to be
stochastic.161
The firm has two options. If the firm does not change its price, it enjoys a profit
flow, avoids adjustment costs and then, in the next period, has the same nominal
price. Of course, if the aggregate price level changes (P �= P ′), then the firm’s
relative price will change over time. Note that the cost here is associated with
adjustment of the nominal price.
Alternatively, the firm can pay the “menu cost” F and adjust its price to p̃. This
price change becomes effective immediately so that the profit flow given adjustment
is π(p̃, D, P ). This price then becomes part of the state vector for the next period.
The policy function for this problem will have two components. First, there is
a discrete component indicating whether or not price adjustment will take place.
258
Second, conditional on adjustment, there is the policy function characterizing the
dependence of p̃ on the state vector (D, P, F ). Interestingly, the choice of p̃ is
independent of p once the decision to adjust has been made.
There is a very important difference between this optimization problem and most
of the others studied in this book. From (10.3), the choice at the individual firm
level depends on the choices of other firms, summarized by P . Thus, given the
specification of demand, the behavior of a single firm depends on the behavior of
other firms.162 This feature opens up a number of alternative ways of solving the
model.
As a starting point, one might characterize the exogenous evolution of P , per-
haps through a regression model, and impose this in the optimization problem of
the firm.163 In this case, the individual optimizer is simply using an empirical model
of the evolution of P .
Using this approach, there is no guarantee that the aggregate evolution of P
assumed by the individual agent actually accords with the aggregated behavior of
these agents. This suggests a second approach in which this consistency between the
beliefs of agents and their aggregate actions is imposed on the model. Essentially
this amounts to:
• solving (10.3) given a transition equation for P
• using the resulting policy functions to solve for the predicted evolution of P
• stopping if these functions are essentially the same
• iterating if they are not.
There is a point of caution here though. For the dynamic programming problem, we
can rely on the contraction mapping property to guarantee that the value function
iteration process will find the unique solution to the functional equation. We have no
259
such theorem to guide us in the iterative procedure described above. Consequently,
finding an equilibrium may be difficult and, further, there is no reason to suspect
that the equilibrium is unique.164
10.2.2 Evidence on Magazine Prices
Willis (2000a) studies the determination of magazine price adjustment using a data
set initially used by Cecchetti (1986). The idea is to use data on the frequency and
magnitude of magazine price adjustment to estimate a dynamic menu cost model.165
Willis postulates a theory model similar to that given above. For the empir-
ical analysis, he specifies an auxiliary equation in which the probability of price
adjustment is assumed to depend on:
• the number of years since the last price adjustment
• cumulative inflation since the last price adjustment
• cumulative growth in industry demand since the last price adjustment
• current inflation
• current industry demand.
This specification is partly chosen as it mimics some of the key elements of the
specification in Cecchetti (1986). Further, the cumulative inflation and demand since
the last price change are, from the dynamic programming problem, key elements in
the incentive to adjust prices. Interestingly, there seems to be little support for any
time dependence, given the presence of the proxies for the state variables.
Willis estimates this auxiliary model and then uses it, through an indirect infer-
ence procedure, to estimate the structural parameters of his model. These include:
• the curvature of the profit function
260
• the curvature of the cost function
• the distribution of menu costs.
Willis (2000a) finds that magazine sellers have a significant amount of market
power but that production is essentially constant returns to scale. Finally, Willis is
able the distinguish the average adjustment cost in the distribution from the average
that is actually paid. He finds that the former is about 35% of revenues while the
latter is only about 4% of revenues.166
10.2.3 Aggregate Implications
A large part of the motivation for studying models with some form of price rigidity
reflected the arguments, advanced by macroeconomists, that inflexible prices were
a source of aggregate inefficiency. Further, rigidity of prices and/or wages provides
a basis for the non-neutrality of money, thus generating a link between the stock
of nominal money and real economic activity. But, these arguments rest on the
presence of quantitatively relevant rigidities at the level of individual sellers. Can
these costs of adjusting prices “explain” observations at both the microeconomic
and aggregate levels?
One approach to studying these issues is to model the pricing behavior of sellers
in a particular industry. This estimated model can then be aggregated to study the
effects of, say, money on output. An alternative, more aggregate approach, is to
specify and estimate a macroeconomic model with price rigidities.
At this point, while the estimation of such a model is not complete, there is some
progress. A recent paper by Dotsey et al. (1999) studies the quantitative implications
of state dependent pricing for aggregate variables. We summarize those results here.
The economy studied by Dotsey et al. (1999) has a number of key elements:
261
• as in Blanchard and Kiyotaki (1987) the model is based upon monopolistic
competition between producers of final goods
• sellers face a (stochastic) iid fixed cost of adjusting their price (expressed in
terms of labor time)
• sellers meet all demand forthcoming at their current price
• there is an exogenously specified demand for money
At the individual level, firms solve a version of (10.3) where the cost of adjustment
F is assumed to be iid. Further, heterogeneity across firms is restricted to two
dimensions, (F, p). That is, firms may be in different states because they began the
period with a different price or because their price adjustment cost for that period
is different from that of other firms. There is a very important consequence of this
restricted form of heterogeneity: if two firms choose to adjust, they select the same
price.
Interestingly, Dotsey et al. solve the dynamic optimization problem of a firm by
using a first-order condition. This is somewhat surprising as we have not used first-
order conditions to characterize the solutions to dynamic discrete choice problems.
Consider the choice of a price by a firm conditional on adjustment, as in (10.4). The
firm optimally sets the price taking into account the effects on current profits and
on the future value.
In the price setting model, the price only effects the future value if the firm elects
not to adjust in the next period. If the firm adjusts its price in the next period, as
in (10.4), then the value of the price at the start of the period is irrelevant.
So, there is a first-order condition which weighs the effects of the price on current
profits and on future values along the no-adjustment branch of the value function.
As long as the value function of the firm along this branch is differentiable in p̃,
262
there will be a first-order condition characterizing this optimal choice given by:
∂π(p̃, D, P )/∂p + βG(F ∗)E(D′,P ′,F ′|D,P,F )∂V
n(p̃, D′, P ′, F ′)/∂p = 0. (10.6)
where G(F ∗) is the state contingent probability of not adjusting in the next period.
This is not quite an Euler equation as the derivative of the future value remains in
this expression. Dotsey et al. iterate this condition forward and, using a restriction
that the firm eventually adjusts, derivatives of the primitive profit function can
substitute for ∂V n(p̃, D′, P ′, F ′)/∂p.167
The solution of the optimization problem and the equilibrium analysis relies on
a discrete representation of the possible states of the firms. Given a value of p, there
will exist a critical adjustment cost such that sellers adjust if and only if the realized
value of F is less than this critical level. So, given the state of the system, there is
an endogenously determined probability of adjustment for each seller. Dotsey et al.
(1999) use this discrete representation, these endogenous probabilities of adjustment
and the (common) price charged by sellers who adjust to characterize the equilibrium
evolution of their model economy.
Details on computing an equilibrium are provided in Dotsey et al. (1999). In
terms of the effects of money on output they find:
• if the inflation rate is constant at 10% then prices are adjusted at least once
every 5 quarter.
• comparing different constant inflation rate regimes, the higher the inflation
rate, the shorter is the average time to adjustment and the mark-up only
increases slightly
• an unanticipated, permanent monetary expansion leads to higher prices and
higher output at impact and there is some persistence in the output effects.
263
• as the money shocks become less persistent, the price response dampens and
consequently the output effect is larger.
This discussion of the aggregate implications of monetary shocks in an environ-
ment with state dependent prices nicely complements our earlier discussion of the
estimation of a state dependent pricing model using micro-data. Clearly, there is an
open issue here concerning the estimation of a state dependent pricing model using
aggregate data.168
10.3 Optimal Inventory Policy
The models we have studied thus far miss an important element of firm behavior,
the holding of inventories. This is somewhat ironic as the optimal inventory problem
was one of the earlier dynamic optimization problems studied in economics.169
We begin with a traditional model of inventories in which a seller with a convex
cost function uses inventories to smooth production when demand is stochastic. We
then turn to models which include non-convexities. The section ends with a brief
discussion of a model with dynamic choices over prices and inventories.
10.3.1 Inventories and the Production Smoothing Model
The basic production smoothing argument for the holding of inventories rests upon
the assumption that the marginal cost of production is increasing. In the face of
fluctuating demand, the firm would then profit by smoothing production relative to
sales. This requires the firm to build inventories in periods of low demand and to
liquidate them in periods of high demand.
Formally, consider:
V (s, I) = maxyr(s) − c(y) + βEs′|sV (s′, I′) (10.7)
264
for all (s, I). Here the state vector is the level of sales s and the stock of inventories
at the start of the period, I. The level of sales is assumed to be random and
outside of the firm’s control. From sales, the firm earns revenues of r(s). The firm
chooses its level of production (y) where c(y) is a strictly increasing, strictly convex
cost function. Inventories at the start of the next period are given by a transition
equation:
I′ = R(I + y − s). (10.8)
where R is the return on a marginal unit of inventory (which may be less than
unity).170 From this problem, a necessary condition for optimality is:
c′(y) = βREs′|sc
′(y′) (10.9)
where future output is stochastic and will generally depend on the sales realization
in the next period.
To make clear the idea of production smoothing, suppose that sales follow an
iid process: Es′|ss is independent of s. In that case, the right side of (10.9) is
independent of the current realization of sales. Hence, since (10.9) must hold for
all s, the left side must be constant too. Since production costs are assumed to be
strictly convex, this implies that y must be independent of s.
Exercise 10.1
Solve (10.7) using a value function iteration routine (or another for comparison
purposes). Under what conditions will the variance of production be less than the
variance of sales?
Despite its appeal, the implications of the production smoothing model contrast
sharply with observation. In particular, the model’s prediction that production will
be smoother than sales but the data do not exhibit such production smoothing.171
265
One response to this difference between the model’s predictions and observation
is to introduce other shocks into the problem to increase the variability of pro-
duction. A natural candidate would be variations in productivity or the costs of
production. Letting A denote a productivity shock, consider:
V (s, I, A) = maxyr(s) − c(y, A) + βEA′,s′|A,sV (s′, I′, A′) (10.10)
so that the cost of producing y units is stochastic. In this case, (10.9) becomes:
cy(y, A) = βREA′,s′|A,scy(y
′, A′). (10.11)
In this case, inventories are used so that goods can be produced during periods of
relatively low cost and, in the absence of demand variations, sold smoothly over
time.172
Kahn (1987) studies a model with an explicit model of stock-out avoidance. Note
that in (10.7), the seller was allowed to hold negative inventories. As discussed
in Kahn (1987), some researchers add a nonnegativity constraint to the inventory
problem while others are more explicit about a cost of being away from a target level
of inventories (such as a fraction of sales). Kahn (1987) finds that even without a
strictly convex cost function, the nonnegativity constraint alone can increase the
volatility of output relative to sales.
Exercise 10.2
Solve (10.10) using a value function iteration routine (or another for comparison
purposes). Under what conditions on the variance of the two types of shocks and
on the cost function will the variance of production be less than the variance of
sales? Supplement the model with a nonnegativity constraint on inventories and/or
an explicit target level of investment. Explore the relationship between the variance
of sales and the variance of output.
266
Alternatively, researchers have introduced non-convexities into this problem.
One approach, as in Cooper and Haltiwanger (1992), is to introduce production
bunching due to the fixed costs of a production run. For that model, consider a
version of (10.7) where the cost of production is given by:
c(y) =
0 for y = 0
K + ay for y ∈ (0, Y ]
∞ otherwise
(10.12)
Here Y represents the total output produced if there is a production run. It repre-
sents a capacity constraint on the existing capital.
In this case, production is naturally more volatile than sales as the firm has an
incentive to have a large production run and then to sell from inventories until the
next burst of production.173
Further, the original inventory models that gave rise to the development of the
(S,s) literature were based upon a fixed cost of ordering.174 One dynamic stochastic
formalization of the models discussed in Arrow et al. (1951) might be:
v(x, y) = max{vo(x, y), vn(x, y)} (10.13)
where x measures the state of demand and y the inventories on hand at the sales
site. The optimizer has two options: to order new goods for inventory (vo) or not
(vn). These options are defined as:
vo(x, y) = maxqr(s) − c(q) − K + βEx′|xv(x′, (y − s + q)(1 − δ)) (10.14)
and
vn(x, y) = r(s) + βEx′|xv(x
′, (y − s)(1 − δ). (10.15)
Here s is a measure of sales and is given as the maximum of (x, y): demand can only
be met from inventories on hand. The function r(s) is simply the revenues earned
from selling s units.
267
If the firm orders new inventories, it incurs a fixed cost of K and pays c(q),
an increasing and convex function, to obtain q units. In the case of ordering new
goods, the inventories next period reflect the sales and the new orders. The rate of
inventory depreciation is given by δ.
If the firm does not order inventories, then its inventories in the following period
are the depreciated level of initial inventories less sales. This is zero is the firm
stocks out.
This problem, which is similar to the stochastic investment problem with non-
convex adjustment costs, can be easily solved numerically. It combines a discrete
choice along with a continuous decision given that the firm decides to order new
goods. 175
10.3.2 Prices and Inventory Adjustment
Thus far we have treated pricing problems and inventory problems separately. So,
in the model of costly price adjustment, sellers had no inventories. And, in the
inventory models, sales are usually taken as given. Yet, there is good reason to
think jointly about pricing decisions and inventories.176
First, one of the motivations for the holding of inventories is to smooth produc-
tion relative to sales. But, there is another mechanism for smoothing sales: as its
demand fluctuates, the firm (assuming it has some market power) could adjust its
price. Yet, if prices are costly to adjust, this may be an expensive mechanism. So,
the choices of pricing and inventory policies reflect the efficient response of a profit
maximizing firm to variations in demand and/or technology.
At one extreme, suppose that the firm can hold inventories and faces a cost of
changing its price. In this case, the functional equation for the firm is given by:
V (p, I; S, P ) = max{V a(p, I; S, P ), V n(p, I; S, P )} (10.16)
268
where
V a(p, I; S, P ) = maxp̃ π(p̃, I; S, P ) − F + βE(S′,P ′|S,P )V (p̃, I′; S′, P ′) (10.17)
V n(p, I; S, P ) = π(p, I; S, P ) + βE(S′,P ′|S,P )V (p, I
′; S′, P ′) (10.18)
where the transition equation for inventories is again I′ = R(I + y − s). In this
optimization problem, p is again the price of the seller and I is the stock of invento-
ries. These are both controlled by the firm. The other elements in the state vector,
S and P , represent a shock to profits and the general price level respectively. The
function π(p, I; S, P ) represent the flow of profit when the firm charges a price p,
holding inventories I when the demand shock is S and the general price level is P .
Here, in contrast to the inventory problems described above, sales are not ex-
ogenous. Instead, sales come a stochastic demand function that depends on the
firm’s price (p) and the price index (P ). From this, we see that the firm can in-
fluence sales by its price adjustment. But, of course, this adjustment is costly so
that the firm must balance meeting fluctuating demand through variations in in-
ventories, variations in production or through price changes. The optimal pattern
of adjustment will presumably depend on the driving process of the shocks, the cost
of price adjustment and the curvature of the production cost function (underlying
π(p, I; S, P )).
Exercise 10.3
A recent literature asserts that technology shocks are negatively correlated with
employment in the presence of sticky prices. Use (10.19) to study this issue by
interpreting S as a technology shock.
At the other extreme, suppose that new goods are delivered infrequently due to
the presence of a fixed ordering cost. In that case, the firm will seek other ways
269
to meet fluctuations in demand, such as changing its price. Formally, consider the
optimization problem of the seller if there is a fixed cost to ordering and, in contrast
to (10.13), prices are endogenous:
V (p, I; S, P ) = max{V o(p, I; S, P ), V n(p, I; S, P )} (10.19)
where
V o(p, I; S, P ) = maxp̃,q π(p̃, I; S, P )−K −c(q)+βE(S′,P ′|S,P )V (p̃, I′; S′, P ′) (10.20)
V n(p, I; S, P ) = maxp̃,π(p̃, I; S, P ) + βE(S′,P ′|S,P )V (p̃, I
′; S′, P ′). (10.21)
The transition equation for inventories is again I′ = R(I + q − s).
Aguirregabiria (1999) studies a model with menu costs and lump-sum costs of ad-
justing inventories. This research is partly motivated by the presence of long periods
of time in which prices are not adjusted and by observations of sales promotions.
Interestingly, the model has predictions for the joint behavior of markups and
inventories even if the costs of adjustment are independent. Aguirregabiria (1999)
argues that markups will be high when inventories are low. This reflects the effects
of stock-outs on the elasticity of sales. Specifically, Aguirregabiria assumes that:
s = min(D(p), q + I) (10.22)
where as above, s is sales, q is orders of new goods for inventory and I is the stock of
inventories. Here D(p) represents demand that depends, among other things, on the
current price set by the seller. So, when demand is less than output and the stock
of inventories, then sales equal demand and the price elasticity of sales is equal to
that of demand. But, when demand exceeds q + I, then the elasticity of sales with
respect to price is zero: when the stock-out constraint binds, realized ”demand” is
very inelastic. In the model of Aguirregabiria (1999) the firm chooses its price and
the level of inventories prior to the realizations of a demand shock so that stock-outs
may occur.
270
Aguirregabiria (1999) estimates the model using monthly data on a supermar-
ket chain. His initial estimation is of a reduced form model for the choice to adjust
prices and/or inventories. In this discrete choice framework he finds an interesting
interaction between the adjustments of inventories and prices. The level of invento-
ries are significant in the likelihood of price adjustment: large inventories increases
the probability of price adjustment.
Aguirregabiria (1999) estimates a structural model based upon a dynamic pro-
gramming model.177 He finds support for the presence of both types of lump-sum
adjustment costs. Moreover, he argues that the costs of increasing a price appear
to exceed the cost of price reductions.
10.4 Capital and Labor
The grand problem we consider here allows for adjustment costs for both labor and
capital.178 Intuitively, many of the stories of adjustment costs for one factor have
implications for the adjustment of the other. For example, if part of the adjustment
cost for capital requires the shutting down of a plant to install new equipment,
then this may also be a good time to train new workers. Moreover, we observe
inaction in the adjustment of both labor and capital and bursts as well. So, it seems
reasonable to entertain the possibility that both factors are costly to adjust and
that the adjustment processes are interdependent.
For this more general dynamic factor demand problem, we assume that the
dynamic programming problem for a plant is given by:
V (A, K, L) = max
K′,L′,h
Π(A, K, L′, h) − ω(L′, h, K, A) − (10.23)
C(A, K, L, K′, L′) + βEA′|AV (A
′, K′, L′).
for all (A, K, L). Here the flow of profits, Π(A, K, L′, h), depends on the profitability
271
shock, A, the predetermined capital stock, K,the number of workers, L′, and the
hours workers, h. The function ω(L′, h, K, A) represents the total state dependent
compensation paid to workers. Finally, the general adjustment cost function is given
by C(A, K, L, K′, L′).
To allow the model to capture inaction, the adjustment cost function in (10.23)
contains convex and non-convex adjustment costs for both labor and capital. Fur-
ther, one or both of these components might be interactive. So, for example, there
may be a fixed cost of adjusting capital that may ”cover” any adjustments in labor
as well. Or, within the convex piece of the adjustment cost function, there may be
some interaction between the factors.
Writing down and analyzing this dynamic optimization problem is by itself not
difficult. There are some computational challenges posed by the larger state space.
The key is the estimation of the richer set of parameters.
One approach would be to continue in the indirect inference spirit and consider
a VAR estimated from plant-level data in, say, hours, employment and capital.
As with the single factor models, we might also include some nonlinearities in the
specification. We could use the reduced form parameters as the basis for indirect
inference of the structural parameters.
One of the interesting applications of the estimated model will be policy exper-
iments. In particular, the model with both factors will be useful in evaluating the
implications of policy which directly influences one factor on the other. So, for ex-
ample, we can study how restrictions on worker hours might influence the demand
for equipment. Or, how do investment tax credits impact on labor demand?
272
10.5 Technological Complementarities: Equilib-
rium Analysis
Here we continue discussion of a topic broached in Chapter 5 where we studied the
stochastic growth model. There we noted that researchers, starting with Bryant
(1983) and Baxter and King (1991), introduced interactions across agents through
the production function. The model captures, in a tractable way, the theme that
high levels of activity by other agents increases the productivity of each agent.179
Let y represent the output at a given firm, Y be aggregate output, k and n the
firm’s input of capital and labor respectively. Consider a production function of:
y = AkαnφY γY ε−1 (10.24)
where A is a productivity shock that is common across producers. Here γ param-
eterizes the contemporaneous interaction between producers. If γ is positive, then
there is a complementarity at work: as other agents produce more, the productivity
of the individual agent increases as well. In addition, this specification allows for
a dynamic interaction as well parameterized by ε, where Y−1 is the lagged level of
aggregate output. As discussed in Cooper and Johri (1997), this may be interpreted
as a dynamic technological complementarity or even a learning by doing effect. This
production function can be imbedded into a stochastic growth model.
Consider the problem of a representative household with access to a production
technology given by (10.24). This is essentially a version of the stochastic growth
model with labor but with a different technology.
There are two ways to solve this problem. The first is to write the dynamic
programming problem, carefully distinguishing between individual and aggregate
variables. As in our discussion of the recursive equilibrium concept, a law of motion
must be specified for the evolution of the aggregate variables. Given this law of
273
motion, the individual household’s problem is solved and the resulting policy func-
tion compared to the one that governs the economy-wide variables. If these policy
functions match, then there is an equilibrium. Else, another law of motion for the
aggregate variables is specified and the search continues. This is similar to the ap-
proach described above for finding the equilibrium in the state dependent pricing
model. 180
Alternatively, one can use the first-order conditions for the individual’s optimiza-
tion problem. As all agents are identical and all shocks are common, the represen-
tative household will accumulate its own capital, supply its own labor and interact
with other agents only due to the technological complementarity. In a symmetric
equilibrium, yt = Yt. As in Baxter and King (1991), this equilibrium condition is
neatly imposed through the first-order conditions when the marginal products of la-
bor and capital are calculated. From the set of first-order conditions, the symmetric
equilibrium can be analyzed through by approximation around a steady state.
The distinguishing feature of this economy from the traditional Real Business
Cycle model is the presence of the technological complementarity parameters, γ and
�. It is possible to estimate these parameters directly from the production function
or to infer them from the equilibrium relationships. 181
10.6 Search Models
This is a very large and active area of research in which the structural approach
to individual decision making has found fertile ground. This partly reflects the
elegance of the search problem at the individual level, the important policy question
surrounding the provision of unemployment insurance and the existence of rich data
sets on firms and workers. This subsection will only introduce the problem and
briefly touch on empirical methodology and results.
274
10.6.1 A Simple Labor Search Model
The starting point is a model in the spirit of McCall (1970).182 A prospective worker
has a job offer, denoted by ω. If this job is accepted, then the worker stays in this
job for life and receives a return of
u(ω)
1−β . Alternatively, the offer can be rejected.
In this case, the worker can receive unemployment benefits of b for a period and
then may draw again from the distribution. Assume that the draws from the wage
distribution are iid. 183
The Bellman equation for a worker with a wage offer of ω in hand is:
v(ω) = max
{
u(ω)
1 − β , u(b) + βEv(ω
′)
}
. (10.25)
for all ω. The worker either accepts the job, the first option, or rejects it in favor of
taking a draw in the next period.
Given the assumption of iid draws, the return to another draw, Ev(ω′) is just
a constant, denoted κ. It is intuitive to think of this functional equation from the
perspective of value function iteration. For a given value of κ, (10.25) implies a
function v(ω). Use this to create a new expected value of search and thus a new
value for κ. Continue to iterate in this fashion until the process converges.184
Clearly, the gain to accepting the job is increasing in ω while the return associated
with rejecting the job and drawing again is independent of ω. Assuming that the
lower (upper) support of the wage offer distribution is sufficiently low (high) relative
to b, there will exist a critical wage, termed the reservation wage, such that the
worker is indifferent between accepting and rejecting the job. The reservation wage,
w∗ is determined from:
u(w∗)
1 − β = u(b) + βκ (10.26)
275
where
κ = Ev(w) =
∫ +∞
−∞
v(w)dF (w) (10.27)
= F (w∗) (u(b) + βκ) +
∫ ∞
w∗
u(w)
1 − β dF (w)
For wages below the reservation wage, the value v(·) is constant and independent
of w as the individual chooses to stay in unemployment. For wages above w∗, the
individual accept the offer and gets the utility of the wage for ever.
Exercise 10.4
Write a program to solve (10.25) using the approach suggested above.
10.6.2 Estimation of the Labor Search Model
There is now a large literature on the estimation of these models. Here we focus
on estimating the simple model given above and then discuss other parts of the
literature.
The theory implies that there exists a reservation wage that depends on the
underlying parameters of the search problem: w∗(Θ).185 Suppose that the researcher
has data on a set of I individuals over T periods. In particular, suppose that an
observation for agent i in period t is zit ∈ {0, 1} where zit = 0 means that the agent
is searching and zit = 1 means that the agent has a job. For purposes of discussion,
we assume that the model is correct: once an agent has a job, he keeps it forever.
Consider then the record for agent i who, say, accepted a job in period k + 1.
According to the model, the likelihood of this is
F (w∗)k(1 − F (w∗)). (10.28)
The likelihood function for this problem is equivalent to the coin flipping exam-
ple that we introduced in Chapter 4. There we saw that the likelihood function
276
would provide a way to estimate the probability of ”heads” but would not allow the
researcher to identify the parameters that jointly determine this probability.
The same point is true for the search problem. Using (10.28) for all agents in
the sample, we can represent the likelihood of observing the various durations of
search. But, in the end, the likelihood will only depend on the vector Θ through
w∗.
Wolpin (1987) estimates a version of this search model with a finite horizon
and costly search. This implies, among other things, that the reservation wage is
not constant as the problem is no longer stationary. Instead, he argues that the
reservation wage falls over time.186 This time variation in the reservation wage is
useful for identification since it creates time variation in the acceptance probability
for given Θ.
Wolpin (1987) also assumes that agents receive an offer each period with a prob-
ability less than one. In order to estimate the model, he specifies a function for the
likelihood an agent receives an offer in a given period. This probability is allowed
to depend on the duration of unemployment.
Wolpin uses data on both duration to employment and accepted wages. The ad-
dition of wage data is interesting for a couple of reasons. First, the lowest accepted
wage yields an upper bound on the reservation wage. Second, the researcher gener-
ally observes accepted wage but not the offered wage. Thus there is an interesting
problem of deducing the wage distribution from data on accepted wages.
Wolpin (1987) estimates the model using a panel from the 1979 NLS youth
cohort. In doing so, he allows for measurement error in the wage and also specifies
a distribution for wage offers. Among other things, he finds that a log-normal
distribution of wages fits better than a normal distribution. Further, the estimated
hazard function (giving the likelihood of accepting a job after j periods of search)
mimics the negative slope of that found in the data.
277
10.6.3 Extensions
Of course, much has been accomplished in the search literature over the recent years.
This includes introducing equilibrium aspects to the problem so that the wage dis-
tribution is not completely exogenous. Other contributions introduce bargaining
and search intensity, such as Eckstein and Wolpin (1995). Postel-Vinay and Robin
(2002) develop an equilibrium model where the distribution of wage offers is en-
dogenous to the model and results from heterogenous workers and firms and from
frictions in the matching process. The model is then estimated on French data by
maximum likelihood techniques.
The simple model of labour search (10.25) can be extended to include transitions
into unemployment, learning by doing and experience effects, as well as the effect
of unobserved heterogeneity. The model of labor search can also be extended to
education choices. The education choices can be a function of an immediate cost of
education and the future rewards in terms of increased wages. Eckstein and Wolpin
(1999) develop such a model.
Wages and Experience
The model in (10.25) can also be extended to understand why wages are increasing
in age. An important part of the labor literature has tried to understand this
phenomenon. This increase can come from two sources, either through an increase
in productivity through general experience or possibly seniority within the firm, or
through labor market mobility and on the job search. Altonji and Shakotko (1987),
Topel (1991), Altonji and Williams (1997) and Dustmann and Meghir (2001) explore
these issues, although in a non structural framework.
Distinguishing the effect of experience from seniority is mainly done by com-
paring individuals with similar experience but with different seniority. However,
278
seniority depends on the job to job mobility which is a choice for the agent, possi-
bly influenced by heterogeneity in the return to experience. Hence seniority (and
experience) has to be considered as an endogenous variable. It is difficult to find
good instruments which can deal with the endogeneity. Altonji and Shakotko (1987)
instrument the seniority variable with deviations from job means, while Dustmann
and Meghir (2001) use workers who are fired when the whole plant close down as
an exogenous event.
We present a structural model below which can potentially be used to distinguish
between the two sources of wage determinants. The wage is a function of labor
market experience X, of seniority in the firm S, of an unobserved fixed component
ε, which is possibly individual specific and a stochastic individual component η
which is specific to the match between the agent and the firm and is potentially
serially correlated. An employed individual earns a wage w(X, S, ε, η). At the end
of the period, the agent has a probability δ of being fired. If not, next period, the
individual receives a job offer represented by a wage w(X′, 0, ε, η̃′). This is compared
to a wage within the firm of w(X′, S′, ε, η′). The value of work and of unemployment
are defined as: 187
V W (X, S, ε, η) = w(X, S, ε, η) + βδV U (X′, ε) (10.29)
+β(1 − δ)Eη′|η,η̃′ max[V W (X′, S′, ε, η′), V W (X′, 0, ε, η̃′)]
V U (X, ε) = b(X) + βEη′ max[V
U (X, ε), V W (X, 0, ε, η′)]
When employed, the labor market experience evolves as X′ = X +1 and seniority, S,
evolves in a similar way. When unemployed, the individual earns an unemployment
benefit b(X) and receive at the end of the period a job offer characterized by a wage
w(X, 0, ε, η′). The individual then chooses whether to accept the job or to remain
for at least an additional period in unemployment.
279
An important issue is the unobserved heterogeneity in the return to experience.
The model capture this with the term ε. Here, the identification of the different
sources of wage growth comes from the structural framework and no instruments
are needed. This model could be solved numerically using a value function iteration
approach and then estimated by maximum likelihood, integrating out the unob-
served heterogeneity. This can be done as in Heckman and Singer (1984) allowing
for mass point heterogeneity (see for example Eckstein and Wolpin (1999) for an
implementation in the context of a structural dynamic programming problem).
Equilibrium Search
Yashiv (2000) specifies and estimates a model of search and matching. The impor-
tant feature of this exercise is that it accounts for the behavior of both firms and
workers. In this model, unemployed workers search for jobs and firms with vacancies
search for workers.
Firms have stochastic profit functions and face costs of attracting workers through
the posting of vacancies. Workers have an objective of maximizing the discounted
expected earnings. Workers too face a cost of search and choose their search inten-
sity. These choices yield Euler equations which are used in the GMM estimation.
The key piece of the model is a matching function that brings the search of the
workers and the vacancies of the firms together. The matching function has inputs
of the vacancies opened by firms and the search intensity by the unemployed work-
ers. Though all agents (firms and workers) take the matching probability as given,
this probability is determined by their joint efforts in equilibrium. Empirically, an
important component of the analysis is the estimation of the matching function.
Yashiv (2000) finds that the matching function exhibits increasing returns, contrary
to the assumption made in much of the empirical literature on matching.
There is a very interesting link between this research and the discussion of dy-
280
namic labor demand. While researchers have specified labor adjustment costs, the
exact source of these costs is less clear. The analysis in Yashiv (2000) is a step
towards bridging this gap: he provides an interpretation of labor adjustment costs
in the estimated search model.
10.7 Conclusions
The intention of this book was to describe a research methodology for bringing
dynamic optimization problems to the data. In this chapter, we have described
some ongoing research programs that utilize this methodology.
Still, there are many avenues for further contributions. In particular, the applica-
tions described here have generally been associated with the dynamic optimization
problem of a single agent. Of course, this agent may be influenced by relative prices
but these prices have been exogenous to the agent.
This does not present a problem as long as we are content to study individual
optimization. But, as noted in the motivation of the book, one of the potential gains
associated with the estimation of structural parameters is the confidence gained in
the examination of alternative policies. In that case, we need to include policy
induced variations in equilibrium variables. That is, we need to go beyond the
single-agent problem to study equilibrium behavior. While some progress has been
made on these issues, estimation of a dynamic equilibrium model with heterogeneous
agents and allowing for non-convex adjustment of factors of production and/or prices
still lies ahead.188
Related to this point, the models we have studied do not allow any strategic
interaction between agents. One might consider the estimation of a structure in
which a small set of agents interact in a dynamic game. The natural approach
is to compute a Markov-perfect equilibrium and use it as a basis for estimating
281
observed behavior by the agents. Pakes (2000) provides a thorough review of these
issues in the context of applications in industrial organization. Again, extensions to
macroeconomics lie ahead. , ,
Bibliography
Abel, A. and J. Eberly (1994). “A Unified Model of Investment Under Uncer-
tainty.” American Economic Review , 94, 1369–84.
Abowd, J. and D. Card (1989). “On the Covariance Structure of Earnings and
Hours Changes.” Econometrica, 57, 411–445.
Adda, J. and R. Cooper (2000a). “Balladurette and Juppette: A Discrete
Analysis of Scrapping Subsidies.” Journal of Political Economy, 108(4), 778–806.
Adda, J. and R. Cooper (2000b). “The Dynamics of Car Sales: A Discrete
Choice Approach.” NBER WP No. 7785.
Adda, J., C. Dustmann, C. Meghir, and J.-M. Robin (2002). “Human
capital investment and job transitions.” mimeo University College London.
Adda, J. and J. Eaton (1997). “Borrowing with Unobserved Liquidity Con-
straints: Structural Estimation with an Application to Sovereign Debt.” mimeo,
Boston University.
Aguirregabiria, V. (1997). “Estimation of Dynamic Programming Models with
Censored Dependent Variables.” Investigaciones Economicas, 21, 167–208.
Aguirregabiria, V. (1999). “The Dynamics of Markus and Inventories in Retail-
ing Firms.” Review of Economic Studies, 66, 275–308.
282
283
Altonji, J. and R. Shakotko (1987). “Do Wages Rise with Job Seniority?”
Review of Economic Studies, 54(3), 437–459.
Altonji, J. and Williams (1997). “Do Wages Rise with Job Security?” Review
of Economic Studies, 54(179), 437–460.
Altug, S. (1989). “Time to Build and Aggregate Fluctuations: Some New Evi-
dence.” International Economic Review , 30, 889–920.
Amman, H. M., D. A. Kendrick, and J. Rust (1996). Handbook of Computa-
tional Economics, volume 1. Elsevier Science, North-Holland, Amsterdam, New
York and Oxford.
Arrow, K. J., T. Harris, and J. Marschak (1951). “Optimal Inventory
Policy.” Econometrica, 19(3), 250–72.
Attanasio, O. (2000). “Consumer Durables and Inertial Behaviour: Estimation
and Aggregation of (S, s) Rules for Automobile Purchases.” Review of Economic
Studies, 67(4), 667–696.
Attanasio, O., J. Banks, C. Meghir, and G. Weber (1999). “Humps and
Bumps in Lifetime Consumption.” Journal of Business and Economic Statistics,
17(1), 22–35.
Ball, L. and D. Romer (1990). “Real Rigidities and the Non-neutrality of
Money.” Review of Economic Studies, 57(2), 183–204.
Bar-Ilan, A. and A. Blinder (1988). “The Life-Cycle Permanent-Income Model
and Consumer Durables.” Annales d’Economie et de Statistique, (9).
Bar-Ilan, A. and A. S. Blinder (1992). “Consumer Durables : Evidence on the
Optimality of usually doing Nothing.” Journal of Money, Credit and Banking,
24, 258–272.
284
Baxter, M. (1996). “Are Consumer Durables Important for Business Cycles?”
The Review of Economics and Statistics, 77, 147–55.
Baxter, M. and R. King (1991). “Production Externalities and Business Cy-
cles.” Federal Reserve Bank of Minneapolis, Discussion Paper 53 .
Bellman, R. (1957). Dynamic Programming. Princeton University Press.
Benassy, J. (1982). The Economics of Market Disequilibrium. NY: Academic
Press.
Beneveniste, L. and J. Scheinkman (1979). “On the differentiability of the
value function in dynamic models of economics.” Econometrica, 47(3), 727–732.
Benhabib, J. and R. Farmer (1994). “Indeterminacy and Increasing Returns.”
Journal of Economic Theory, 63, 19–41.
Bernanke, B. (1984). “Permanent Income, Liquidity and Expenditures on Au-
tomobiles: Evidence from Panel Data.” Quarterly Journal of Economics, 99,
587–614.
Bernanke, B. (1985). “Adjustment Costs, Durables and Aggregate Consump-
tion.” Journal of Monetary Economics, 15, 41–68.
Bertola, G. and R. J. Caballero (1990). “Kinked Adjustment Cost and Ag-
gregate Dynamics.” In NBER Macroeconomics Annual , edited by O. J. Blanchard
and S. Fischer. MIT Press, Cambridge, Mass.
Bertsekas, D. (1976). Dynamic Programming and Stochastic Control . Academic
Press.
Bils, M. (1987). “The Cyclical Behavior of Marginal Cost and Price.” American
Economic Review , 77, 838–55.
285
Bils, M. and P. Klenow (2002). “Some Evidence on the Importance of Sticky
Prices.” NBER Working Paper No. 9069.
Blackwell, D. (1965). “Discounted Dynamic Programming.” Annals of Mathe-
matical Statistics, 36, 226–35.
Blanchard, O. and N. Kiyotaki (1987). “Monopolistic Competition and the
Effects of Aggregate Demand.” American Economic Review , 77, 647–66.
Blinder, A. (1986). “Can the Production Smoothing Model of Inventory Behavior
be Saved?” Quarterly Journal of Economics, 101, 431–53.
Blinder, A. and L. Maccini (1991). “Taking Stock: A Critical Assessment of
Recent Research on Inventories.” Journal of Economic Perspectives, 5(1), 73–96.
Blundell, R., M. Browning, and C. Meghir (1994). “Consumer Demand
and the Life-Cycle Allocation of Household Expenditures.” Review of Economic
Studies, 61, 57–80.
Braun, R. (1994). “Tax Disturbances and Real Economic Activity in Post-War
United States.” Journal of Monetary Economics, 33.
Bryant, J. (1983). “A Simple Rational Expectations Keynes-Type Model.” Quar-
terly Journal of Economics, 97, 525–29.
Caballero, R. (1999). “Aggregate Investment.” In Handbook of Macroeconomics,
edited by J. Taylor and M. Woodford. North HOlland.
Caballero, R. and E. Engel (1993a). “Heterogeneity and Output Fluctuation
in a Dynamic Menu-Cost Economy.” Review of Economic Studies, 60, 95–119.
Caballero, R. and E. Engel (1993b). “Heterogeneity and Output Fluctuations
in a Dynamic Menu-Cost Economy.” Review of Economic Studies, 60, 95–119.
286
Caballero, R., E. Engel, and J. Haltiwanger (1995). “Plant Level Ad-
justment and Aggregate Investment Dynamics.” Brookings Paper on Economic
Activity, 0(2), 1–39.
Caballero, R., E. Engel, and J. Haltiwanger (1997). “Aggregate Employ-
ment Dynamics: Building From Microeconomic Evidence.” American Economic
Review , 87, 115–37.
Caballero, R. J. (1993). “Durable Goods: An Explanation for their Slow Ad-
justment.” Journal of Political Economy, 101, 351–384.
Campbell, J. and G. Mankiw (1989). “Consumption, Income and Interest
Rates : Reinterpreting the Time Series Evidence.” In NBER Macroeconomic An-
nual 1989 , edited by Olivier Blanchard and Stanley Fischer, pages 1–50. Chicago
University Press.
Caplin, A. and J. Leahy (1991). “State Dependent Pricing and the Dynamics
of Money and Output.” Quarterly Journal of Economics, 106, 683–708.
Caplin, A. and J. Leahy (1997). “Durable Goods Cycles.” mimeo, Boston
University.
Carroll, C. D. (1992). “The Buffer-Stock Theory of Saving : Some Macroeco-
nomic Evidence.” Brookings Papers on Economic Activity, 2, 61–156.
Cecchetti, S. (1986). “The Frequency of Price Adjustment: A Study of Newstand
Prices of Magazines.” Journal of Econometrics, 31, 255–74.
Chirinko, R. (1993). “Business Fixed Investment Spending.” Journal of Economic
Literature, 31, 1875–1911.
Christiano, L. (1988). “Why Does Inventory Investment Fluctuate So Much?”
Journal-of-Monetary-Economics, 21(2), 247–80.
287
Christiano, L. and M. Eichenbaum (1992). “Current Real-Business-Cycle The-
ories and Aggregate Labor Market Fluctuations.” American Economic Review ,
82, 430–50.
Cooper, R. (1999). Coordination Games: Complementarities and Macroeco-
nomics. Cambridge University Press.
Cooper, R. (2002). “Estimation and Identification of Structural Parameters in
the Presence of Multiple Equilibria.” NBER Working Paper No. 8941.
Cooper, R. and J. Ejarque (2000). “Financial Intermediation and Aggregate
Fluctuations: A Quantitative Analysis.” Macroeconomic Dynamics, 4, 423–447.
Cooper, R. and J. Ejarque (2001). “Exhuming Q: Market Power vs. Capital
Market Imperfections.” NBER Working Paper .
Cooper, R. and J. Haltiwanger (1993). “On the Aggregate Implications of
Machine Replacement: Theory and Evidence.” American Economic Review , 83,
360–82.
Cooper, R. and J. Haltiwanger. “On the Nature of the Capital Adjustment
Process.” NBER Working Paper #7925 (2000).
Cooper, R., J. Haltiwanger, and L. Power (1999). “Machine Replacement
and the Business Cycle: Lumps and Bumps.” American Economic Review , 89,
921–946.
Cooper, R. and A. Johri (1997). “Dynamic Complementarities: A Quantitative
Analysis.” Journal of Monetary Economics, 40, 97–119.
Cooper, R. and J. Willis. “The Economics of Labor Adjustment: Mind the
Gap.” NBER Working Paper # 8527 (2001).
288
Cooper, R. W. and J. C. Haltiwanger (1992). “Macroeconomic implications
of production bunching.” Journal Of Monetary Economics, 30(1), 107–27.
De Boor, C. (1978). A Practical Guide to Splines. Springer-Verlag, New York.
Deaton, A. (1991). “Savings and Liquidity Constraints.” Econometrica, 59, 1221–
1248.
Dotsey, M., R. King, and A. Wolman (1999). “State-Dependent Pricing and
the General Equilibrium Dynamics of Prices and Output.” Quarterly Journal of
Economics, 114, 655–90.
Duffie, D. and K. Singleton (1993). “Simulated Moment Estimation of Markov
Models of Asset Prices.” Econometrica, 61(4), 929–952.
Dustmann, C. and C. Meghir (2001). “Wages, experience and seniority.” IFS
working paper W01/01.
Eberly, J. C. (1994). “Adjustment of Consumers’ Durables Stocks : Evidence
from Automobile Purchases.” Journal of Political Economy, 102, 403–436.
Eckstein, Z. and K. Wolpin (1989). “The Specification and Estimation of
Dynamic Stochastic Discrete Choice Models.” Journal of Human Resources, 24,
562–98.
Eckstein, Z. and K. Wolpin (1995). “Duration to First Job and the Return
to Schooling: Estimates from a Search-Matching Model.” Review of Economic
Studies, 62, 263–286.
Eckstein, Z. and K. I. Wolpin (1999). “Why Youths Drop Out of High School:
The Impact of Preferences, Opportunities and Abilities.” Econometrica, 67(6),
1295–1339.
289
Eichenbaum, M. (1989). “Some Empirical Evidence on the Production Level
and Production Cost Smoothing Models of Inventory Investment.” American
Economic Review , 79(4), 853–64.
Eichenbaum, M., L. Hansen, and K. Singleton (1988). “A Time Series
Anlaysis of Representative Agent Models of Consumption and Leisure Choice
Under Uncertainty.” Quarterly Journal of Economics, 103, 51–78.
Eichenbaum, M. and L. P. Hansen (1990). “Estimating Models with Intertem-
poral Substitution Using Aggregate Time Series Data.” Journal of Business and
Economic Statistics, 8, 53–69.
Erickson, T. and T. Whited (2000). “Measurement Error and the Relationship
Between Investment and Q.” Journal of Policy Economy, 108, 1027–57.
Farmer, R. and J. T. Guo (1994). “Real Business Cycles and the Animal Spirits
Hypothesis.” Journal of Economic Theory, 63, 42–72.
Fermanian, J.-D. and B. Salanié (2001). “A Nonparametric Simulated Maxi-
mum Likelihood Estimation Method.” mimeo CREST-INSEE.
Fernández-Villaverde, J. and D. Krueger (2001). “Consumption and Sav-
ing over the Life Cycle: How Important are Consumer Durables?” mimeo, Stan-
ford University.
Flavin, M. (1981). “The Adjustment of Consumption to Changing Expectations
about future Income.” Journal of Political Economy, 89, 974–1009.
Gallant, R. A. and G. Tauchen (1996). “Which Moments to Match?” Econo-
metric Theory, 12(4), 657–681.
Gilchrist, S. and C. Himmelberg (1995). “Evidence on the role of cash flow
for Investment.” Journal of Monetary Economics, 36, 541–72.
290
Gomes, J. (2001). “Financing Investment.” American Economic Review , 91(5),
1263–1285.
Gourieroux, C. and A. Monfort (1996). Simulation-Based Econometric Meth-
ods. Oxford University Press.
Gourieroux, C., A. Monfort, and E. Renault (1993). “Indirect Inference.”
Journal of Applied Econometrics, 8, S85–S118.
Gourinchas, P.-O. and J. Parker (2001). “Consumption Over the Life Cycle.”
Forthcoming, Econometrica.
Greenwood, J., Z. Hercowitz, and G. Huffman (1988). “Investment, Ca-
pacity Utilization and the Real Business Cycle.” American Economic Review , 78,
402–17.
Grossman, S. J. and G. Laroque (1990). “Asset Pricing and Optimal Portfolio
Choice in the Presence of Illiquid Durable Consumption Goods.” Econometrica,
58, 25–51.
Hajivassiliou, V. A. and P. A. Ruud (1994). “Classical Estimation Methods
for LDV Models Using Simulation.” In Handbook of Econometrics, edited by D.
McFadden and R. Engle, volume 4, pages 2383–2441. North-Holland, Amsterdam.
Hall, G. (1996). “Overtime, Effort and the propagation of business cycle shocks.”
Journal of Monetary Economics, 38, 139–60.
Hall, G. (2000). “Non-convex costs and capital utilization: A study of production
scheduling at automobile assembly plants.” Journal of Monetary Economics, 45,
681–716.
291
Hall, G. and J. Rust (2000). “An empirical model of inventory investment
by durable commodity intermediaries.” Carnegie-Rochester Conference Series on
Public Policy, 52, 171–214.
Hall, R. E. (1978). “Stochastic Implications of the Life Cycle- Permanent Income
Hypothesis : Theory and Evidence.” Journal of Political Economy, 86, 971–987.
Hamermesh, D. (1989). “Labor Demand and the Structure of Adjustment Costs.”
American Economic Review , 79, 674–89.
Hamermesh, D. (1993). Labor Demand . Princeton University Press.
Hamermesh, D. and G. Pfann (1996). “Adjustment Costs in Factor Demand.”
Journal of Economic Literature, 34, 1264–92.
Hansen, G. (1985). “Indivisible Labor and the Business Cycle.” Journal of Mon-
etary Economics, 16, 309–27.
Hansen, G. and T. Sargent (1988). “Straight time and overtime in Equilib-
rium.” Journal of Monetary Economics, 21, 281–308.
Hansen, L. P., E. McGrattan, and T. Sargent (1994). “Mechanics of Form-
ing and Estimating Dynamic Linear Economies.” Federal Reserve Bank of Min-
neapolis, Staff Report 182 .
Hansen, L. P. and K. J. Singleton (1982). “Generalized Instrumental Variables
Estimation of Nonlinear Rational Expectations Models.” Econometrica, 50, 1269–
1286.
Hayashi, F. (1982). “Tobin’s marginal Q and average Q: A neoclassical interpre-
tation.” Econometrica, 50, 215–24.
292
Heckman, J. and B. Singer (1984). “A Method for Minimizing the Impact of
Distributional Assumptions in Econometric Models for Duration Data.” Econo-
metrica, 52(2), 271–320.
House, C. and J. Leahy (2000). “An sS model with Adverse Selection.” NBER
WP No. 8030.
Hubbard, G. (1994). “Investment under Uncertainty: Keeping One’s Options
Open.” Journal of Economic Literature, 32(4), 1816–1831.
John, A. and A. Wolman (1999). “Does State-Dependent Pricing Imply Coor-
dination Failure?” Federal Reserve Bank of Richmond .
Judd, K. (1992). “Projection Methods for Solving Aggregate Growth Models.”
Journal of Economic Theory, 58, 410–452.
Judd, K. (1996). “Approximation, Perturbation and Projection Methods in Eco-
nomic Analysis.” In Handbook of Computational Economics, edited by H. M.
Amman, D. A. Kendrick, and J. Rust. Elsevier Science, North-Holland.
Judd, K. (1998). Numerical methods in economics. MIT Press, Cambridge and
London.
Kahn, A. and J. Thomas (2001). “Nonconvex Factor Adjustments in Equilib-
rium Business Cycle Models: Do Nonlinearities Matter?” mimeo, University of
Minnesota.
Kahn, J. (1987). “Inventories and the Volatility of Production.” American Eco-
nomic Review , 77(4), 667–79.
Keane, M. P. and K. I. Wolpin (1994). “The Solution and Estimation of
Discrete Choice Dynamic Programming Models by Simulation and Interpolation:
Monte Carlo Evidence.” The Review of Economics and Statistics, pages 648–672.
293
King, R., C. Plosser, and S. Rebelo (1988). “Production, Growth and Busi-
ness Cycles I. The Basic Neoclassical Model.” Journal of Monetary Economics,
21, 195–232.
Kocherlakota, N., B. F. Ingram, and N. E. Savin (1994). “Explaining
Business Cycles: A Multiple Shock Approach.” Journal of Monetary Economics,
34, 415–28.
Kydland, F. and E. Prescott (1982). “Time To Build and Aggregate Fluctu-
ations.” Econometrica, 50, 1345–70.
Laffont, J.-J., H. Ossard, and Q. Vuong (1995). “Econometrics of First-
Price Auctions.” Econometrica, 63, 953–980.
Lam, P. (1991). “Permanent Income, Liquidity and Adjustments of Automobile
Stocks: Evidence form Panel Data.” Quarterly Journal of Economics, 106, 203–
230.
Laroque, G. and B. Salanié (1989). “Estimation of Multi-Market Fix-Price
Models: An Application of Pseudo Maximum Likelihood Methods.” Eca, 57(4),
831–860.
Laroque, G. and B. Salanié (1993). “Simulation Based Estimation Models
with Lagged Latent Variables.” Journal of Applied Econometrics, 8, S119–S133.
Lee, B.-S. and B. F. Ingram (1991). “Simulation Estimation of Time-Series
Models.” Journal of Econometrics, 47, 197–205.
Lerman, S. and C. Manski (1981). “On the Use of Simulated Frequencies to
Approximate Choice Probabilities.” In Structural Analysis of Discrete Data with
Econometric Applications, edited by C. Manski and D. McFadden, pages 305–319.
MIT Press, Cambridge.
294
Ljungqvist, L. and T. J. Sargent (2000). Recursive Macroeconomic Theory.
MIT.
MaCurdy, T. E. (1981). “An Empirical Model of Labor Supply in a Life-Cycle
Setting.” Journal of Political Economy, 89(6), 1059–1085.
Mankiw, G. N. (1982). “Hall’s Consumption Hypothesis and Durable Goods.”
Journal of Monetary Economics, 10, 417–425.
Manski, C. (1993). “Identification of Endogenous Social Effects: The Reflection
Problem,.” Review of Economic Studies, 60(3), 531–42.
McCall, J. (1970). “Economics of Information and Job Search.” Quarterly Journal
of Economics, 84(1), 113–26.
McFadden, D. (1989). “A Method of Simulated Moments for Estimation of Dis-
crete Response Models Without Numerical Integration.” Econometrica, 57, 995–
1026.
McFadden, D. and P. A. Ruud (1994). “Estimation by Simulation.” The Review
of Economics and Statistics, 76(4), 591–608.
McGrattan, E. (1994). “The Macroeconomic Effects of Distortionary Taxes.”
Journal of Monetary Economics, 33, 573–601.
McGrattan, E. R. (1996). “Solving the Stochastic Growth Model with a Finite
Element Method.” Journal of Economic Dynamics and Control , 20, 19–42.
Meghir, C. and G. Weber (1996). “Intertemporal Non-Separability or Borrow-
ing Restrictions ? A disaggregate Analysis Using US CEX Panel.” Econometrica,
64(5), 1151–1181.
295
Miranda, M. J. and P. G. Helmberger (1988). “The Effects of Commodity
Price Stabilization Programs.” American Economic Review , 78(1), 46–58.
Newey, W. K. and K. D. West (1987). “A Simple, Positive, Semi-Definite,
Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econo-
metrica, 55, 703–708.
Nickell, S. (1978). “Fixed Costs, Employment and Labour Demand over the
Cycle.” Econometrica, 45.
Pakes, A. (1994). “Dynamic Structural Models, Problems and Prospects: Mixed
Continuous discrete controls and market interactions.” In Advances in Econo-
metrics, Sixth World Congress, edited by C. Sims, pages 171–259. Cambridge
University Press.
Pakes, A. (2000). “A Framework for Applied Dynamic Analysis in I.O.” NBER
Paper # 8024.
Pakes, A. and D. Pollard (1989). “Simulation and the Asymptotics of Opti-
mization Estimators.” Econometrica, 57, 1027–1057.
Pfann, G. and F. Palm (1993). “Asymmetric Adjustment Costs in Non-linear
Labour Models for the Netherlands and U.K. Manufacturing Sectors.” Review of
Economic Studies, 60, 397–412.
Postel-Vinay, F. and J.-M. Robin (2002). “Equilibrium Wage Dispersion with
Worker and Employer Heterogeneity.” Econometrica.
Press, W., B. Flannery, S. Teukolsky, and W. Vetterling (1986). Nu-
merical Recipes: The Art of Scientific Computing.
Ramey, V. and M. Shapiro (2001). “Displaced Capital.” Journal of Political
Economy, 109, 958–92.
296
Reddy, J. (1993). An Introduction to the Finite Element Method . McGraw-Hill,
New York.
Rogerson, R. (1988). “Indivisible Labor, Lotteries and Equilibrium.” Journal of
Monetary Economics, 21, 3–16.
Rust, J. (1985). “Stationary Equilibrium in a Market for Durable Assets.” Econo-
metrica, 53(4), 783–805.
Rust, J. (1987). “Optimal Replacement of GMC Bus Engines: an Empirical Model
of Harold Zurcher.” Econometrica, 55(5), 999–1033.
Rust, J. and C. Phelan (1997). “How Social Security and Medicare Affect
Retirement Behavior in a World of Incomplete Markets.” Econometrica, 65(4),
781–832.
Sakellaris, P. (2001). “Patterns of Plant Adjustment.” Working Paper #2001-
05, Finance and Economics Discussion Series, Division of Research and Statistics
and Monetary Affairs, Federal Reserve Board, Washington D.C.
Sargent, T. (1978). “Estimation of Dynamic Labor Demand Schedules under
Rational Expectations.” Journal of Political Economy, 86(6), 1009–1044.
Sargent, T. (1987). Dynamic Macroeconomic Theory. Harvard University Press.
Scarf, H. (1959). “The Optimality of (S,s) Policies in the Dynamic Inventory
Problem.” In Mathmematical Methods in Social Sciences, edited by S. K. K. Ar-
row and P. Suppes, pages 196–202. Stanford University Press.
Shapiro, M. (1986). “The Dynamic Demand for Labor and Capital.” Quarterly
Journal of Economics, 101, 513–42.
297
Smith, A. (1993). “Estimating Nonlinear Time-Series Models using Simulated
Vector Autoregressions.” Journal of Applied Econometrics, 8, S63–84.
Stokey, N. and R. Lucas (1989). Recursive Methods in Economic Dynamics.
Harvard University Press.
Tauchen, G. (1986). “Finite State Markov-Chain Approximation to Univariate
and Vector Autoregressions.” Economics Letters, 20, 177–81.
Tauchen, G. (1990). “Solving the Stochastic Growth Model by Using Quadra-
ture Methods and Value-Function Iterations.” Journal of Business and Economic
Statistics, 8(1), 49–51.
Tauchen, G. and R. Hussey (1991). “Quadrature-Based Methods for Obtaining
Approximate Solutions to Nonlinear Asset Pricing Models.” Econometrica, 59,
371–396.
Taylor, J. B. and H. Uhlig (1990). “Solving Nonlinear Stochastic Growth
Models : A Comparison of Alternative Solution Methods.” Journal of Business
and Economic Statistics, 8, 1–17.
Thomas, J. (2000). “Is Lumpy Investment Relevant for the Business Cycle?”
manuscript, Carnegie-Mellon University, forthcoming, JPE.
Topel, R. (1991). “Specific Capital, Mobility, and Wages: Wages Rise with Job
Seniority.” Journal of Political Economy, 99(1), 145–176.
Whited, T. (1998). “Why Do Investment Euler Equations Fail?” Journal of
Business and Economic Statistics, 16(4), 479–488.
Willis, J. (2000a). “Estimation of Adjustment Costs in a Model of State-
Dependent Pricing.” Working Paper RWP 00-07, Federal Reserve Bank of Kansas
City.
298
Willis, J. (2000b). “General Equilibrium of Monetary Model with State Dependent
Pricing.” mimeo, Boston University.
Wolpin, K. (1987). “Estimating a Structural Search Model: The Transition from
School to Work.” Econometrica, 55(4), 801–18.
Wright, B. D. and J. C. Williams (1984). “The Welfare Effects of the Intro-
duction of Storage.” Quarterly Journal of Economics, 99(1), 169–192.
Yashiv, E. (2000). “The Determinants of Equilibrium Unemployment.” American
Economic Review , 90(5), 1297–1322.
Zeldes, S. (1989a). “Optimal Consumption with Stochastic Income : Deviations
from Certainty Equivalence.” Quarterly Journal of Economics, 104, 275–298.
Zeldes, S. P. (1989b). “Consumption and Liquidity Constraints : An Empirical
Investigation.” Journal of Political Economy, 97, 305–346.
Index
adjustment costs, see costs of adjust-
ment
aggregate implications
durable purchases, 192
machine replacement, 221
menu costs, 260
aggregation, 192, 221
asymptotic properties, 85–104
GMM, 87
indirect inference, 102
maximum likelihood, 91
simulated maximum likelihood, 99
simulated method of moments, 95
autocorrelation, 92
autoregressive process, 62
average Q, 206
Balladurette, 195
Bellman equation
definition, 19, 31
example, 90, 92, 120, 159, 170, 180,
205, 274
numerical solution, 41, 60
Blackwell’s conditions, 32
borrowing restrictions
consumption, 158, 169
investment, 214
cake eating problem
dynamic discrete choice, 26
example, 16
finite horizon, 14
infinite horizon, 18
example, 21
infinite horizon with taste shocks,
24
overview, 10–17
calibration
stochastic growth model, 135
capital
capital accumulation, 200
costs of adjustment, 202
convex, 203
non-convex, 215
quadratic, 205
imperfection in capital markets, 209
299
300
labor adjustment, 270–271
car sales, 196
certainty equivalence, 162
CES (constant elasticity of substitu-
tion), 256
Chebyshev polynomial, 49
coin flipping, 67
collocation method, 49
complementarities
technology, 272–273
technology and stochastic growth,
142
consumption
borrowing restrictions, 158
durables, see durable consumption
endogenous labor supply, 167
evidence, 162, 165, 173
GMM estimation, 165
infinite horizon, 159–163
life cycle, 173–177
portfolio choice, 156, 163
random walk, 162
smoothing, 149–154
stochastic income, 160
two period model, 150–159
contraction mapping, 32
control space, 42
control variable, 19
control vector, 29
convergence rate, 44
convex adjustment costs
durables, 183
investment, 203
costs of adjustment
capital, 202
convex, 203
non-convex, 215
quadratic, 205
capital and labor, 270–271
durables
non convex, 184
quadratic, 183
employment
asymmetric adjustment costs, 243
convex and non-convex adjust-
ment costs , 250
non-convex, 241
piece-wise linear, 239
quadratic, 232
CRRA (constant relative risk aversion),
41, 161
decentralisation, 128
demand (and supply), 79
301
depreciation rate, 138, 179, 182
discount factor, 10, 150
discounting, 30, 33
discrete cake eating problem
estimation, 90, 92, 94, 97, 101
numerical implementation, 51
overview, 26–28
distortionary taxes, 147
durable consumption, 178–198
dynamic discrete choice, 189
estimation
dynamic discrete choice, 193
with quadratic utility, 182
irreversibility, 187
non convex costs, 184–198
PIH, 179–184
scrapping subsidies, 195–198
duration
search, 276
duration model, 101
dynamic discrete choice
durables, 189
dynamic labor demand, 229–254
estimation, 250
linear quadratic specification, 237
non-convex adjustment costs, 241
partial adjustment, 245
piecewise linear adjustment costs,
239
quadratic adjustment costs, 232
dynamic programming theory
Blackwell’s sufficient conditions, 32
cake eating example, 10, 16
control vector, 29
discounting, 32
finite horizon cake eating problem,
14
general formulation, 28
infinite horizon cake eating prob-
lem, 18
monotonicity, 32
optimal stopping problem, 26
overview, 7
state vector, 29
stochastic models, 35
transition equation, 29
value function, 13
value function iteration, 33
education choice, 277
efficient method of moments, 102
elasticity
demand curve, 235
intertemporal, 166, 176
302
labor supply, 169
employment adjustment, 229–253
asymmetric adjustment costs, 243
convex and non-convex adjustment
costs , 250
gap approach , 244
general functional equation, 230
non-convex adjustment costs, 241
partial adjustment model, 245
piece-wise linear adjustment costs,
239
quadratic adjustment costs, 232
quadratic costs of adjustment
simulated example, 234
Sargent’s linear quadratic model,
237
endogenous labor supply
consumption, 167
growth, 130
equilibrium analysis, 272
equilibrium search, 279
ergodicity, 81, 83
Euler equation
consumption, 152, 174
consumption and borrowing con-
straints, 170
consumption and portfolio choice,
164
durables, 180
employment adjustment, 237
estimation, 87, 212
finite horizon cake eating problem,
11
investment, 203, 204
non-stochastic growth model, 111
projection methods, 46–51
stochastic growth model, 125
experience, return to, 277
exponential model, 101
finite element method, 50
functional equation, 19
Galerkin method, 49
gap approach
employment adjustment, 244
Gauss-Legendre quadrature, 61
generalized method of moments
example
capital and quadratic adjustment
costs, 212
consumption, 87, 165, 172
employment adjustment, 243
stochastic growth model, 137
orthogonality restriction, 86
303
theory, 70–72, 86–89
government spending, 147
heterogeneity, 261
Howard’s improvement algorithm, 45
identification, 76, 84–85
imperfection, in capital market, 158,
169, 209, 214
inaction, 215, 239
income specification, 173
indirect inference
example
cake eating problem, 101
dynamic capital demand, 224
Q theory of investment, 210
stochastic growth model, 139
supply and demand, 83
specification test, 104
theory, 74–76, 100–104
indirect utility, 7
infinite horizon
consumption model, 159–163
information matrix, 92
instrumental variable, 80, 89
integration methods, 60–64
quadrature methods, 61
interpolation methods, 58–60
least squares, 58
linear, 59
splines, 60
intertemporal elasticity of substitution,
166, 176
inventory policy, 263–270
prices, 267
production smoothing, 263
investment
borrowing restrictions, 214
convex adjustment costs, 203
convex and non-convex adjustment
costs, 224
costs of adjustment, 202
Euler equation with no adjustment
costs, 203
functional equation, 201
general formulation, 200
GMM estimation of quadratic ad-
justment cost model, 212
irreversibility, 223
machine replacement problem, 219
aggregate implications, 221
maximum likelihood estimation, 227
no adjustment costs, 202
non-convex adjustment costs, 215
Q theory, 205
304
evidence, 207
irreversibility
durables, 185, 187–188
investment, 223
IV, see instrumental variable
job mobility, 278
job offer, 276
Juppette, see Balladurette
labor market
experience, 277–280
mobility, 277–280
search, 273–280
transitions, 277–280
wage, 277–280
labor supply
endogenous, 130, 167–169
least squares interpolation, 58
life cycle consumption, 173–177
likelihood, 68, 82, 90
linear interpolation, 59
linear quadratic model of labor demand,
237
linearization, 122
logit model, 75
machine replacement
aggregate implications, 221
model, 219
magazine prices, 259
mapping, 32
marginal q, 204, 206
market power, 209, 235, 249, 256, 260
markov chain, 69
as approximation, 62
example, 197, 210, 235
simulation, 64
maximum likelihood, 68–70, 82–83, 90–
92
asymptotic properties, 91
example
coin flipping, 69
discrete cake eating problem, 90
employment adjustment, 238
investment, 227
stochastic growth model, 141
supply and demand, 82
simulated, see simulated maximum
likelihood
menu costs, 255–263
aggregate implications, 260
evidence, 259
model, 256
method of moments, 70, 80, 86
orthogonality condition, 81
305
misspecification, 172, 209
mobility, 277
moment calibration, 93
moments
stochastic growth model, 135
monotonicity, 32
multiple sector model, 144
Newey-West estimator, 89
non-convex adjustment costs
durables, 184
employment, 241
investment, 215
non-stochastic growth model, 109
Euler equation, 111
example, 112
matlab code, 114
preferences, 110
technology, 110
value function, 110
numerical integration, 60–64
quadrature methods, 61
optimal stopping problem, 26, 192, 220
optimal weighting matrix, 88, 95
orthogonality restriction, 81, 86
overidentification test, 89
partial adjustment model
employment , 245
permanent income hypothesis
durables, 179–184
permanent vs. transitory shocks, 132,
156, 162, 173, 242
PIH, see permanent income hypothesis
planner’s problem, 118
policy evaluation, 195–198
policy function
consumption, 171
definition, 21
policy function iterations, 45
policy rule, 47
portfolio choice, 156, 163
and durables, 187
price setting, 255–263
principle of optimality, 14, 15
production smoothing, 263
projection methods, 46
Q theory
evidence, 207
model, 205
quadratic adjustment costs
durables, 183
employment, 232
quadrature methods, 61
306
random walk in consumption
durables, 181
non durables, 161, 162
rate of convergence, 44
recursive equilibrium, 129
reduced form, 80
reservation wage, 274
return to
experience, 277
tenure, 277
sales of new cars, 196
score function, 91, 102
scrapping subsidies, 195
search model, 273–280
duration, 276
seniority, 277
sequence problem, 10
finite horizon, 10
serial correlation, 92
simulated maximum likelihood, 73
theory, 72, 98–100
simulated method of moments
asymptotic properties, 95
efficient method of moments, 102
example
cake eating problem, 94
consumption, 176
durables, 193–194
theory, 73, 94–96
simulated non linear least squares
example
cake eating problem, 97
durables, 193–194
theory, 96–98
simulation methods, 65
solution methods
linearization, 122
projection methods, 46–51
value function iteration, 41–45, 52–
54, 125
specification test
GMM, 89
indirect inference, 104
spline interpolation, 60
[s,S] models, 185, 187
state space, 42, 52, 115
large, 54
state variable, 19
state vector, 29
stationarity, 19, 30
stochastic cake eating problem
projection methods approach, 46
value function approach, 40
307
stochastic growth model, 109
calibration, 135
confronting the data, 134–142
calibration, 135
GMM, 137
indirect inference, 139
maximum likelihood, 141
decentralization, 128
endogenous labor supply, 130
example, 125
functional equation, 120
GMM, 137
indirect inference, 139
intermediation shocks, 139
investment shocks, 139
linearization, 122
multiple sectors, 144
overview, 117
taste shocks, 146
technological complementarities, 142
technology, 119
value function iteration, 125, 133
stochastic income, 154, 160
stochastic returns, 163
supply and demand, 79
taste shock
aggregate, 189
cake eating problem, 24, 51, 52
durables, 189
in estimation, 90
stochastic growth model, 141, 146
tax credits, 199
taxes, 147, 153, 198
technological complementarities, 272,
see complementarities
tenure, return to, 277
transition equation, 19, 29
transition matrix, 25
transversality condition, 111
uncertainty
consumption/saving choice, 154–156,
160–163
unobserved heterogeneity, 187, 261, 278,
279
utility
quadratic, 182
utility function
adjustment costs, 183
CRRA, 41, 161
quadratic, 183
value function
implementation, 41, 52
308
value function iteration, 33
example, 192–193
non-stochastic growth model, 114
stochastic growth model, 125
VAR, 191
wage offer, 277
weighting matrix, 86, 88
309
Notes
1This exercise is described in some detail in the chapter on consumer durables
in this book.
2Some of the tools for numerical analysis are also covered in Ljungqvist and
Sargent (2000) and Judd (1996).
3Assume that there are J commodities in this economy. This presentation as-
sumes that you understand the conditions under which this optimization problem
has a solution and when that solution can be characterized by first-order conditions.
4For a very complete treatment of the finite horizon problem with uncertainty,
see Bertsekas (1976).
5Throughout, the notation {xt}T1 is used to define the sequence (x1, x2, ....xT ) for
some variable x.
6This comes from the Weierstrass theorem. See Bertsekas (1976), Appendix B,
or Stokey and Lucas (1989), Chpt. 3, for a discussion.
7By the sequence approach, we mean solving the problem using the direct ap-
proach outlined in the previous section.
8As you may already know, stationarity is vital in econometrics as well. Thus
making assumptions of stationarity in economic theory have a natural counterpart
in empirical studies. In some cases, we will have to modify optimization problems
to ensure stationarity.
9To be careful, here we are adding shocks that take values in a finite and thus
countable set. See the discussion in Bertsekas (1976), Section 2.1, for an introduction
310
to the complexities of the problem with more general statements of uncertainty.
10For more details on markov chains we refer the reader to Ljungqvist and Sargent
(2000).
11The evolution can also depend on the control of the previous period. Note too
that by appropriate rewriting of the state space, richer specifications of uncertainty
can be encompassed.
12This is a point that we return to below in our discussion of the capital accumu-
lation problem.
13Throughout we denote the conditional expectation of ε′ given ε as Eε′|ε.
14Eckstein and Wolpin (1989) provide an extensive discussions of the formulation
and estimation of these problems in the context of labor applications.
15In the following chapter on the numerical approach to dynamic programming,
we study this case in considerable detail.
16This section is intended to be self-contained and thus repeats some of the ma-
terial from the earlier examples. Our presentation is by design not as formal as say
that provided in Bertsekas (1976) or Stokey and Lucas (1989). The reader interested
in more mathematical rigor is urged to review those texts and their many references.
17Ensuring that the problem is bounded is an issue in some economic applications,
such as the growth model. Often these problems are dealt with by bounding the
sets C and S.
18Essentially, this formulation inverts the transition equation and substitutes for
c in the objective function. This substitution is reflected in the alternative notation
for the return function.
311
19Some of the applications explored in this book will not exactly fit these con-
ditions either. In those cases, we will alert the reader and discuss the conditions
under which there exists a solution to the functional equation.
20The notation dates back at least to Bertsekas (1976).
21See Stokey and Lucas (1989) for a statement and proof of this theorem.
22Define σ(s, s′) as concave if σ(λ(s1, s′1) + (1 − λ)(s2, s′2)) ≥ λσ(s1, s′1) + (1 −
λ)σ(s2, s
′
2) for all 0 < λ < 1 where the inequality is strict if s1 �= s2.
23As noted earlier, this structure is stronger than necessary but accords with the
approach we will take in our empirical implementation. The results reported in
Bertsekas (1976) require that Ψ is countable.
24We present additional code for this approach in the context of the nonstochastic
growth model presented in Chapter 5.
25In some application, it can be useful to define a grid which is not uniformally
spaced, see the discrete cake eating problem in section 3.3.
26Popular orthogonal bases are Chebyshev, Legendre or Hermite polynomials.
27The polynomials are also defined recursively by pi(X) = 2Xpi−1(X) − pi−2(X),
i ≥ 2, with p0(0) = 1 and p(X, 1) = X.
28This is in fact the structure of a probit model.
29 This is not I since we have the restriction
∑
i Pi = 1.
30If we also want to estimate σD, σS and ρSD, we can include additional moments
such as E(p), E(q), V (p), V (q) or cov(p, q).
312
31The variance of U1 and U2 are defined as:
σ21 =
σ2D + σ
2
S − 2ρDS
(αp − βp)2
σ22 =
α2pσ
2
D + β
2
pσ
2
S − 2αpβpρDS
(αp − βp)2
and the covariance between U1 and U2 is:
ρ12 =
αpσ
2
D + βpσ
2
S − ρDS(αp + βp)
(αp − βp)2
The joint density of U1 and U2 can be expressed as:
f (u1, u2) =
1
2πσ1σ2
√
1 − ρ2
exp − 1
2(1 − ρ2)
(
u21
σ21
+
u22
σ22
+ 2ρu1u2
)
with ρ = ρ12/(σ1σ2).
32Here we view T as the length of the data for time series applications and as the
number of observations in a cross section.
33 For instance, if εt = ρεt−1 + ut with ut ∼N(0,σ2), the probability that the cake
is eaten in period 2 is:
p2 = P (ε1 < ε
∗(W1), ε2 > ε
∗(W2))
= P (ε1 < ε
∗(W1)) P (ε2 > ε
∗(W2)|ε1 < ε∗(W1))
= Φ
(
ε∗1(W1)
σ/
√
1 − ρ2
)
1√
2πσ
∫ +∞
ε∗2
∫ ε∗1
−∞
exp(− 1
2σ2
(u − ρv)2)dudv
If ρ = 0 then the double integral resumes to a simple integral of the normal distri-
bution.
34for instance, µ(x) = [x, x2] if one wants to focus on matching the mean and the
variance of the process
35To see this, define θ∞, the solution to the minimization of the above criterion,
313
when the sample size T goes to infinity.
θ∞ = arg min
θ
lim
T
1
T
T∑
t=1
(x(ut, θ0) − x̄(θ))2
= arg min
θ
E(x(u, θ0) − x̄(θ))2
= arg min
θ
E
(
x(u, θ0)
2 + x̄(θ)2 − 2x(u, θ0)x̄(θ)
)
= arg min
θ
V (x(u, θ0)) + V (x̄(θ)) + (Ex(u, θ0) − Ex̄(θ))2
This result holds as Exx̄ = ExEx̄, i.e. the covariance between ut and u
s
t is zero.
Differentiating the last line with respect to θ, we obtain the first order conditions
satisfied by θ∞:
∂
∂θ
V (x̄(θ∞)) + 2
∂
∂θ
Ex̄(θ∞)[Ex̄(θ∞) − Ex(u, θ0))] = 0
If θ∞ = θ0, this first order condition is only satisfied if
∂
∂θ
V (x̄(θ0)) = 0, which is not
guaranteed. Hence, θ∞ is not necessarily a consistent estimator. This term depends
on the (gradient of the) variance of the variable, where the stochastic element is the
simulated shocks. Using simulated paths instead of the true realization of the shock
leads to this inconsistency.
36The specification of the model should also be rich enough so that the estimation
makes sense. In particular, the model must contain a stochastic element which
explains why the model is not fitting the data exactly. This can be the case if some
characteristics, such as taste shocks, are unobserved.
37Though in the standard real business cycle model there is no rationale for such
intervention.
38 Equivalently, we could have specified the problem with k as the state, c as the
control and then used a transition equation of: k′ = f (k) + (1 − δ)k − c.
39This follows from the arguments in Chapter 2.
314
40As noted in the discussion of the cake eating problem, this is but one form of a
deviation from a proposed optimal path. Deviations for a finite number of periods
also do not increase utility if (5.2) holds. In addition, a transversality condition
must be imposed to rule out deviations over an infinite number of period.
41That code and explanations for its use is available on the web page for this
book.
42In the discussion of King et al. (1988), this term is often called the elasticity of
the marginal utility of consumption with respect to consumption.
43One must take care that the state space is not binding. For the growth model,
we know that k′ is increasing in k and that k′ exceeds (is less than) k when k is less
than (exceeds) k∗. Thus the state space is not binding.
44This tradeoff can be seen by varying the size of the state space in grow.m. In
many empirical applications, there is a limit to the size of the state space in that a
finer grid doesn’t influence the moments obtained from a given parameter vector.
45A useful exercise is to alter this initial guess and determine whether the solution
of the problem is independent of it. Making good initial guesses is often quite
valuable for estimation routines in which there are many loops over parameters so
that solving the functional equation quickly is quite important.
46Later in this chapter we move away from this framework to discuss economies
with distortions and heterogeneity.
47Later in this chapter, we discuss extensions that would include multiple sectors.
48Some of these restrictions are stronger than necessary to obtain a solution. As we
are going to literally compute the solution to (5.6), we will eventually have to create
315
a discrete representation anyways. So we have imposed some of these features at
the start of the formulation of the problem. The assumptions on the shocks parallel
those made in the presentation of the stochastic dynamic programming problem in
Chapter 2.
49Thus the problem is quite similar to that described by King et al. (1988) though
here we have not yet introduced employment.
50The discussion in the appendix of King et al. (1988) is recommended for those
who want to study this linearization approach in detail.
51Here we formulate the guess of the policy function rather than the value function.
In either case, the key is to check that the functional equation is satisfied.
52Alternatively, one could start from this guess of the value function and then use
it to deduce the policy function.
53Given that u(c) and f (k) are both strictly concave, it is straightforward to see
that the value function for the one period problem is strictly concave in k. As argued
in Chapter 2, this property is preserved by the T (V ) mapping used to construct a
solution to the functional equation.
54See Tauchen (1990) for a discussion of this economy and a comparison of the
value function iteration solution relative to other solution methods.
55See also the presentation of various decentralizations in Stokey and Lucas (1989).
56Of course, this is static for a given k′. The point is that the choice of n does not
influence the evolution of the state variable.
57In fact, preferences are often specified so that there is no response in hours
worked to permanent shocks. Another specification of preferences, pursued in
316
Hansen (1985), arises from the assumption that employment is a discrete variable
at the individual level. Rogerson (1988) provides the basic framework for the ”in-
divisible labor model”.
58We will see this in more detail in the following chapter on household savings
and consumption when there is stochastic income.
59For some specifications of the utility function, φ̂(A, k, k′) can be solved for an-
alytically and inserted into the program. For example, suppose u(c, 1 − n) = U (c +
ξ(1 − n)), where ξ is a parameter. Then the first order condition is Afn(k, n) = ξ
which can be solved to obtain φ̂(A, k, k′) given the production function. To verify
this, assume that Af (k, n) is a Cobb-Douglas function.
60The interested reader can clearly go beyond this structure though the arguments
put forth by King et al. (1988) on restrictions necessary for balanced growth should
be kept in mind. Here the function ξ(1 − n) is left unspecified for the moment
though we assume it has a constant elasticity given by η.
61Note though that King, Plosser and Rebelo build a deterministic trend into their
analysis which they remove to render the model stationary. As noted in Section 3.2.1
of their paper, this has implications for selecting a discount factor.
62Specifically, the moments from the KPR model are taken from their Table 4,
using the panel data labor supply elasticity and ρ = .9. and the standard deviation
of the technology shock (deviation from steady state) is set at 2.29.
63See King et al. (1988) for a discussion of this.
64As the authors appear to note, this procedure may actually just uncover the de-
preciation rate used to construct the capital series from observations on investment.
317
65Thus in contrast to many studies in the calibration tradition, this is truly an
estimation exercise, complete with standard errors.
66In this case, the model cannot be rejected at a 15 % level using the J-statistic
computed from the match of these two moments.
67This is the case since the empirical analysis focuses on output and investment
fluctuations.
68When employment is variable and wages are observed, then (5.23) has no error
term either. In this case, researchers include taste shocks. Using this, they find
that current consumption can be written as a function of current output and lagged
consumption without any error term. This prediction is surely inconsistent with
the data.
69See Hansen et al. (1994) for a general formulation of this approach.
70Each of these extensions creates an environment which the interested reader can
use as a basis for specifying and solving a dynamic programming and confronting it
with data.
71Cooper (1999) explores a wide variety of ways to model complementarities.
Enriching the neoclassical production function is the one closest to existing models.
See the discussion in Benhabib and Farmer (1994) and Farmer and Guo (1994) about
the use of these models to study indeterminacy. Manski (1993) and Cooper (2002)
discuss issues associated with the estimation of models with complementarities and
multiple equilibria.
72In contrast to the contraction mapping theorem, there is no guarantee that this
process will converge. In some cases, the household’s response to an aggregate law
of motion can be used as the next guess on the aggregate law of motion. Iteration
318
of this may lead to a recursive equilibrium.
73See Cooper (1999) and the references therein.
74For now think of these are producer durables though one could also add con-
sumer durables to this sector or create another sector.
75Similar problems of matching positive comovements arise in multiple-country
real business cycle models.
76McGrattan (1994) allows for past labor to enter current utility as well.
77See McGrattan (1994) and the references therein for a discussion of computing
such equilibria.
78This has a well understood implication for the timing of taxes. Essentially, a
government with a fixed level of spending must decide on the timing of its taxes.
If we interpret the income flows in our example as net of taxes, then intertemporal
variation in taxes (holding fixed their present value) will only change the timing of
household income and not its present value. Thus, tax policy will influence savings
but not consumption decisions.
79If ρ > 1, then ∂c0
∂y0
will exceed 1.
80We assume that there exists a solution to this function equation. This requires,
as always, that the choice be bounded, perhaps by a constraint on the total debt
that a household can accumulate.
81In fact, if there are other variables known to the decision maker that provide
information on (y′, R) then these variables would be included in the state vector as
well.
82Sargent (1978) also provides a test for the permanent income hypothesis and
319
rejects the model.
83See for instance Zeldes (1989b) or Campbell and Mankiw (1989).
84In fact, the theory does not imply which of the many possible variables should
be used when employing these restrictions in an estimation exercise. That is, the
question of “which moments to match?” is not answered by the theory.
85This is similar to the trick we used in the stochastic growth model with endoge-
nous employment.
86See also Wright and Williams (1984) and Miranda and Helmberger (1988) for
an early contribution on this subject, including numerical solutions and simulations
of these models
87see also Carroll (1992)
88The figure was computed using the following parameterization: β = 0.96, γ =
0.5, σ2u = 0.0212, σ
2
n = 0.044, p = 0.03. γ0 = 0.0196, γ1 = 0.0533. We are grateful
to Gourinchas and Parker for providing us with their codes and data.
89See footnote 88 for the parameterization.
90As an outstanding example, Rust and Phelan (1997) explore the effects of social
security policies on labor supply and retirement decisions in a dynamic programming
framework.
91In a model of habit formation, past consumption can influence current utility
even if the consumption is of a nondurable or service. In that case, the state vector
is supplemented to keep track of that experience. For the case of durable goods, we
will supplement the state vector to take the stock of durables into account.
92From Baxter (1996), the volatility of durable consumption is about five times
320
that of nondurable consumption.
93To be complete, as we explain there are also maintained assumption about
preferences, shocks and the lack of adjustment costs.
94Of course, other possible assumptions on timing are implementable in this frame-
work. We discuss this below.
95That is, movement in the marginal utility of consumption of nondurables may
be the consequence of variations in the stock of durables. We return to this point
in the discussion of empirical evidence.
96This condition doesn’t obtain under the previous timing due to the time to build
aspect of durables assumed there.
97See also Eichenbaum and Hansen (1990).
98See House and Leahy (2000) for a model of durables with an endogenous lemons
premium.
99The assumption that one car is the max is just for convenience. What is im-
portant is that the car choice set is not continuous.
100This presentation relies heavily on Adda and Cooper (2000b).
101 Adda and Cooper (2000b) explicitly views this as a household specific income
shock but a broader interpretation is acceptable, particularly in light of their iid
assumption associated with this source of variation.
102Here only a single lag is assumed to economize on the state space of the agents’
problem.
103 As in Adda and Cooper (2000b), we assume that the costs of production are
321
independent of the level of production. Combined with an assumption of constant
mark-ups, this implies that the product price is independent of the cross sectional
distribution of car vintages.
This assumption of an exogenous price process greatly simplifies the empirical
implementation of the model since we do not have to solve an equilibrium problem.
In fact, we have found that adding information on the moments of the cross sectional
distribution of car vintages has no explanatory power in forecasting car prices in
the French case. Results are mixed for the US case, as the average age of cars
significantly predicts future prices.
104There are numerous surveys of investment. See Caballero (1999) and Chirinko
(1993) and the references therein for further summaries of existing research.
105This is corresponds to the outcome of a stochastic growth model if there are
risk neutral consumers. Otherwise, a formulation with variable real interest rates
may be warranted.
106In many economies, it is also influenced by policy variations in the form of
investment tax credits.
107Moreover, the special case of no adjustment costs is generally nested in these
other models.
108In some applications, the cost of adjustment function depends on investment
and is written C(I, K) where I = K′ − (1 − δ)K .
109Abel and Eberly (1994) contain further discussion of the applicability of Q
theory for more general adjustment cost and profit functions.
110Hayashi (1982) was the first to point out that in this case average and marginal
322
q coincide though his formulation was nonstochastic.
111 Interestingly, the natural conjecture that φ(A) = A does not satisfy the func-
tional equation.
112We are grateful to Joao Ejarque for allowing us to use this material.
113The error term in (8.8) is often ascribed to stochastic elements in the cost of
adjustment function so that ai is modified to become ait = ai + εit.
114 Hubbard (1994) reviews these findings.
115Cooper and Ejarque (2001) do not attempt to characterize this measurement
error analytically but use their simulated environment to understand its implica-
tions. See Erickson and Whited (2000) for a detailed and precise discussion of the
significance of measurement error in the Q regressions.
116 Cooper and Ejarque (2001) have no unobserved heterogeneity in the model so
that the constant from the regression as well as the fixed effects are ignored. The
remaining coefficients are taken to be common across all firms.
117 In fact, the estimates are not very sensitive to the aggregate shocks. The model
is essentially estimated from the rich cross sectional variation, as in the panel study
of Gilchrist and Himmelberg (1995).
118The computation of standard errors follows the description in Chapter 4 of
Gourieroux and Monfort (1996).
119Cooper and Ejarque (2001) show that if p = y−η is the demand curve and
y = Akφl(1−φ) the production function. Maximization of profit over the flexible
factor, l, leads to a reduced form profit function where the exponent on capital is
φ(η−1)
(1−φ)(1−η)−1 . With φ = .33, η = .1315, implying a markup of about 15%.
323
120The program to estimate this model is very simple. Once Ω(γ) is programmed,
it is simply a basic routine to minimize this function. Obtaining Ω(γ) is easy too
using the information on parameters plus observations in the data set on investment
rates and the ratio of output to capital (which is used to determine marginal profit
rates). The minimization may not occur exactly at γ = 2 due to sampling error.
The interested reader can extend this analysis to create a distribution of estimates
by redrawing shocks, simulating and then re-estimating γ from the GMM procedure.
121If, in the example above, α = 1, then the constraint is proportional to K. In
this case, it appears that average and marginal Q are equal.
122Cooper and Haltiwanger provide a detailed description of the data.
123See Abel and Eberly (1994) for a model in which fixed costs are proportional to
K. If these costs were independent of size, then large plants would face lower adjust-
ment costs (relative to their capital stock) and thus might adjust more frequently.
So, as in the quadratic specification, the costs are scaled by size. This is though an
assumption and the relationship between plant size and investment activity is still
an open issue.
124 Recall the outline of the basic value function iteration program for the non-
stochastic growth model and the modification of that for non-convex adjustment
costs in Chapter 3.
125As discussed in Cooper and Haltiwanger (1993) and Cooper et al. (1999), this
assumption that a new machine has fixed size can be derived from a model with
embodied technological progress which is rendered stationary by dividing through by
the productivity of the new machine. In this case, the rate of depreciation measures
both physical deterioration and obsolescence.
324
126 Cooper and Haltiwanger (2000) and Cooper et al. (1999) argue that these
features also hold when there is a one period lag in the installation process.
127Cooper et al. (1999) analyze the more complicated case of a one-period lag in
the installation of new capital.
128An interesting extension of the model would make this gap endogenous.
129The data set is described in Cooper and Haltiwanger (2000) and is a balanced
panel of US manufacturing plants. Comparable data sets are available in other
countries. Similar estimation exercises using these data sets would be of considerable
interest.
130See the discussion in Cooper and Haltiwanger (2000) of the estimation of this
profit function.
131More recent versions of the Cooper-Haltiwanger paper explore adding lagged
investment rates to this reduced form to pick up some of the dynamics of the ad-
justment process.
132This is an important step in the analysis. Determining the nature of adjustment
costs will depend on the characterization of the underlying profitability shocks. For
example, if a researcher is trying to identify non-convex adjustment costs from bursts
of investment, then getting the distribution of shocks right is critical.
133The results are robust to allowing the discount factor to vary with the aggregate
shock in order to mimic the relationship between real interest rates and consumption
growth from a household’s Euler equation.
134The interested reader should read closely the discussion in Rust (1987) and the
papers that followed this line of work. Note that often assumptions are made on
325
G(·) to ease the computation of the likelihood function.
135Here we are also assuming that the discount factor is fixed. More generally it
might depend on a and a′.
136So, in contrast to the chapter on capital adjustment, here we assume that there
are no costs to adjusting the stock of capital. This is, of course, for convenience
only and a complete model would incorporate both forms of adjustment costs.
137We can study the implications of that specification by setting q = 0 in (9.2) to
study the alternative.
138As well as from the dynamic adjustment of other factors, such as capital.
139As discussed later in this chapter, this model is used in Cooper and Willis (2001)
as a basis for a quantitative analysis of the gap approach.
140The literature on labor adjustment costs contains both specifications. Cooper
and Willis (2001) find that their results are not sensitive to this part of the specifi-
cation.
141Alternatively, the parameters of these processes could be part of an estimation
exercise.
142The factors that help the firm forecast future wages are then included in the
state space of the problem; i.e. they are in the aggregate component of A.
143Sargent (1978) estimates a model with both regular and overtime employment.
For simplicity, we have presented the model of regular employment alone.
144He also discusses in detail the issue of identification and in fact finds multiple
peaks in the likelihood function. Informally, the issue is distinguishing between the
serial correlation in employment induced by lagged employment from that induced
326
by the serial correlation of the productivity shocks.
145This inaction rate is too high relative to observation: the parameterization is
for illustration only.
146In fact, this depiction also motivates consideration of a search model as the
primitive that underlies a model of adjustment costs. See the discussion of this in
the discussion of Yashiv (2000) in Chapter 10.
147At this level of fixed costs, there is about 50% employment inaction. Again the
parameterization is just for illustration.
148This presentation draws heavily upon Cooper and Willis (2001). We are grateful
to John Haltiwanger and Jon Willis for helpful discussions on this topic.
149In fact the structure is used to study adjustment of capital as well.
150Based on discussions above, the policy function of the firm should depend jointly
on (A, e−1) and not the gap alone.
151 This point was made some years ago. Nickell (1978) says,
“… the majority of existing models of factor demand simply analyze
the optimal adjustment of the firm towards a static equilibrium and it
is very difficult to deduce from this anything whatever about optimal
behavior when there is no ‘equilibrium’ to aim at.”
152The process is taken from the Cooper and Haltiwanger (2000) study of capital
adjustment. As these shocks were measured using static labor first order condition,
Cooper and Willis (2001) study the robustness of their results to variations in these
Markov processes.
327
153This discussion parallels the approach in Cooper and Haltiwanger (2000).
154Though see the discussion Aguirregabiria (1997) for progress in this direction.
155Of course, it then becomes a question of identification: can one distinguish the
non-convex and piecewise linear models.
156Note that Θ would include the parameters of the stochastic processes as well.
157This is the goal of an ongoing project.
158Though in some cases a more general equilibrium approach is needed to assess
the complete implications of the policy.
159This suggestion is along the lines of the so-called “natural experiments” ap-
proach to estimation where the researcher searches for “exogenous” events that may
allow for the identification of key parameters. Evaluating this approach in the con-
text of structural model is of interest.
160Early formulations of the framework we discuss include Benassy (1982), Blan-
chard and Kiyotaki (1987),Caballero and Engel (1993a), Caplin and Leahy (1991)
and Caplin and Leahy (1997).
161This is similar to the stochastic adjustment cost structure used in Rust (1987).
162As discussed, for example, in Blanchard and Kiyotaki (1987), there is a com-
plementarity that naturally arises in the pricing decisions in this environment.
163Of course, this may entails adding additional elements to the state space. See
Adda and Cooper (2000a) and Willis (2000a) for discussions of this point.
164Ball and Romer (1990) provide an example of this. John and Wolman (1999)
study these issues in a dynamic setting of price adjustment.
328
165The contribution here is bringing the dynamic menu cost model to the data.
Bils and Klenow (2002) provide further evidence on price setting behavior based
upon BLS price data.
166For this specification, there is assumed to be no serial correlation in the adjust-
ment costs. See Willis (2000a) for further discussion of this point and estimates
which relax this restriction.
167Thus in principle one can use this condition for estimation of some parameters
of the model using orthogonality conditions as moments. See the discussion of this
point in Pakes (1994) and Aguirregabiria (1997), where the latter paper includes a
labor example.
168The findings of Dotsey et al. (1999) are based on a parameterization of the
adjustment cost distribution and the other assumptions noted above. Whether
these properties obtain in an estimated model is an open issue. See Willis (2000b)
for progress on this issue.
169See the discussion in Arrow et al. (1951) and the references therein.
170Taken literally R in excess of unity means that inventories accumulate on their
own which may seem odd. The literature is much more explicit about various
marginal gains to holding inventories. If R is less than unity, than output will
be independent of the state but will be rising over time. This policy may require
negative inventories, an issue we address below.
171See Blinder (1986), Blinder and Maccini (1991) and the references therein for
the extensive literature on these points.
172See, for example, the discussion in Blinder (1986), Eichenbaum (1989) and
Christiano (1988).
329
173Hall (2000) studies a model of production scheduling using data on automobile
assembly plants and finds some support for hypothesis that nonconvexities in the
production process lie behind the observations on the relative volatility of production
and sales.
174See Scarf (1959) for developments of this argument.
175Hall and Rust (2000) examines a model of optimal inventory behavior in an
environment where there is a fixed ordering cost with a stochastic product price.
They argue that a calibrated version of their model fits important aspects of their
data from a US steel wholesaler.
176Kahn (1987) includes a period of price predetermination.
177The estimation methodology is complex and the reader is urged to study Aguir-
regabiria (1999).
178Estimation of this more general structure using plant level data is part of ongoing
research of R. Cooper and J. Haltiwanger. See Sakellaris (2001) for some interesting
facts concerning the interaction of capital and labor adjustment.
179This is the underlying theme of the macroeconomic complementarities literature,
as in Cooper (1999).
180In contrast to the contraction mapping theorem, there is no guarantee that this
process will converge. In some cases, the household’s response to an aggregate law
of motion can be used as the next guess on the aggregate law of motion. Iteration
of this may lead to a recursive equilibrium.
181See Cooper (1999) and the references therein.
182Interestingly, McCall mentions that his paper draws on Stanford class notes
330
from K. Arrow on the reservation wage property.
183This model is frequently used for expositional purposes in other presentations of
the search process. It can be enriched in many ways, including adding: fires, quits,
costly search, etc.
184Writing a small program to do this would be a useful exercise. Note that this
dynamic programming model is close to the discrete cake eating problem presented
in Chapters 2 to 4.
185Here Θ would include the parameters for the individual agent (eg. those char-
acterizing u(w) as well as β) and the parameters of the wage distribution.
186Sometimes unobserved heterogeneity is added to create the same effect.
187Adda et al. (2002) estimate a related model using a panel data of German
workers.
188As noted earlier, Willis (2000b) makes some progress on this in a pricing problem
and Thomas (2000) studies some of these issues in the context of an investment
problem.
Table 5.1: Observed and Predicted Moments
Moments US data KPR calibrated model
Std relative to output
consumption .69 .64
investment 1.35 2.31
hours .52 .48
wages 1.14 .69
Cross correlation with output
consumption .85 .82
investment .60 .92
hours .07 .79
wages .76 .90
Table 6.1: GMM Estimation Based on the Euler Equation
γ Prop of liquidity γ̂GM M
constrained periods
0.5 80% 2.54
1 50% 3.05
2 27% 3.92
3 23% 4.61
4 11% 5.23
5 9% 5.78
6 8% 6.25
Note: ρ = 0, σ = 10, µ = 100, β =
0.9, r = 0.05. Estimation done on
3000 simulated observations.
Table 7.1: ARMA(1,1) Estimates on US and French Data
Specification No trend Linear trend
α1 δ α1 δ
US durable expenditures 1.00(.03) 1.5 (.15) 0.76 (0.12) 1.42 (0.17)
US car registration 0.36(.29) 1.34 (.30) 0.33 (0.30) 1.35(0.31)
France durable expenditures 0.98 (0.04) 1.20 (0.2) 0.56 (0.24) 1.2 (0.36)
France car expenditures 0.97(0.06) 1.3 (0.2) 0.49 (0.28) 1.20 (0.32)
France car registrations 0.85 (0.13) 1.00 (0.26) 0.41 (0.4) 1.20 (0.41)
Notes: Annual data. For the US, source FRED database, 1959:1-1997:3. French
data: source INSEE, 1970:1-1997:2. US registration: 1968-1995.
Table 7.2: Transition Matrix for π
state tomorrow
1 2 3 4
1 0.01 0.01 0.01 0.97
state 2 0.01 0.01 0.01 0.97
today 3 0.225 0.225 0.1 0.45
4 0.01 0.01 0.01 0.97
Table 8.1: Estimated Structural Parameters
Structural Parameters
α γ ρ σ θ
GH95
CE .689(.011) .149(.016) .106(.008) .855 (.04) 2
Table 8.2: Regression Results and Moments
Reduced Form Coef . Estimates/Moments
a1 a2 sc
I
K
std π
K
q̄
GH95 .03 .24 .4 .25 3
CE .041 .237 .027 .251 2.95
Table 8.3: Descriptive Statistics, LRD
Variable LRD
Average Investment Rate 12.2%
Inaction Rate: Investment 8.1%
Fraction of Observations with Negative Investment 10.4%
Spike Rate: Positive Investment 18%
Spike Rate: Negative Investment 1.4%
Table 8.4: Parameter Estimates
Spec. Structural Parm. Estimates (s.e.) parm. est. for (8.22)
γ F ps ψ0 ψ1 ψ2
LRD -.013 .265 .20
all .043 (0.00224) .00039(.0000549) .967(.00112) -.013 .255 .171
F only 0 .0333(.0000155) 1 -.02 .317 .268
γ only .125(.000105) 0 1 -.007 .241 .103
ps only 0 0 .93(.000312) -.016 .266 .223
Figure 3.1: Stochastic Cake Eating Problem,
i_s=1
do until i_s>n_s * Loop over all sizes of the total
amount of cake X *
c_L=X_L * Min value for consumption *
c_H=X[i_s] * Max value for consumption *
i_c=1
do until i_c>n_c * Loop over all consumption levels *
c=c_L+(c_H-c_L)/n_c*(i_c-1)
i_y=1
EnextV=0 * initialize the next value to zero
do until i_y>n_y * Loop over all possible realizations
of the future endowment *
nextX=R*(X[i_s]-c)+Y[i_y] * Next period amount of cake *
nextV=V(nextX) * Here we use interpolation to find
the next value function *
EnextV=EnextV+nextV*Pi[i_y] * Store the expected future value
using the transition matrix *
i_y=i_y+1
endo * end of loop over endowment *
aux[i_c]=u(c)+beta*EnextV * stores the value of a given
consumption level *
i_c=i_c+1
endo * end of loop over consumption *
newV[i_s,i_y]=max(aux) * Take the max over all consumption
levels *
i_s=i_s+1
endo * end of loop over size of cake *
V=newV * update the new value function *
Figure 3.2: Value Function, Stochastic Cake Problem
Figure 3.3: Policy Function, Stochastic Cake Eating Problem
Figure 3.4: Stochastic Cake Eating Problem, Projection Method
procedure c(x) * Here we define an approximation for
cc=psi_0+psi_1*x+psi_2*x*x the consumption function based on
return(cc) a second order polynomial *
endprocedure
i_s=1
do until i_s>n_s * Loop over all sizes of the total
amount of cake *
utoday=U’(c(X[i_s])) * marginal utility of consuming *
ucorner=U’(X[i_s]) * marginal utility if corner solution *
i_y=1
do until i_y>n_y * Loop over all possible realizations
of the future endowment *
nextX=R(X[i_s]-c(X[i_s]))+Y[i_y] * next amount of cake *
nextU=U’(nextX) * next marginal utility of consumption *
EnextU=EnextU+nextU*Pi[i_y] * here we compute the expected future
marginal utility of consumption using
the transition matrix Pi *
i_y=i_y+1
endo * end of loop over endowment *
F[i_s]=utoday-max(ucorner,beta*EnextU)
i_s=i_s+1
endo * end of loop over size of cake *
Figure 3.5: Basis Functions, Finite Element Method
�
�
X1 X2 X3
0
1
p1(X) p2(X) p3(X) p4(X)
Figure 3.6: Discrete Cake Eating Problem,
i_s=2
do until i_s>n_s * Loop over all sizes of the cake *
i_e=1
do until i_e>2 * Loop over all possible realizations
of the taste shock *
ueat=u(W[i_s],e[i_e]) * utility of eating the eating now *
nextV1=V[i_s-1, 1] * next period value if low taste shock *
nextV2=V[i_s-1, 2] * next period value if high taste shock *
EnextV=nextV1*p[i_e,1]+nextV2*p[i_e,2]
newV[i_s,i_e]=max(ueat,beta*EnextV)
* Take the max between eating now
or waiting *
i_e=i_e+1
endo * end of loop over taste shock *
i_s=i_s+1
endo * end of loop over size of cake *
V=newV * update the new value function *
Figure 3.7: Value Function, Discrete Cake Eating Problem
Figure 3.8: Decision Rule, Discrete Cake Eating Problem
Figure 3.9: Approximation Methods
Figure 3.10: Example of Discretization, N=3
�
……………………………
……………………………
1/3 1/3 1/3
ε2 ε3z1 z2 z3
φ(ε)
Figure 3.11: Simulation of a Markov Process
t=1
oldind=1 * variable to keep track of state in period t-1 *
y[t]=z[oldind] * initialize first period *
do until t>T * Loop over all time periods *
u=uniform(0,1) * Generate a uniform random variable *
sum=0 * will contain the cumulative sum of pi *
ind=1 * index over all possible values for process *
do until u<=sum * loop to find out the state in period t *
sum=sum+pi[oldind,ind] * cumulative sum of pi *
ind=ind+1
endo
y[t]=z[ind] * state in period t *
oldind=ind * keep track of lagged state *
t=t+1
endo
Figure 4.1: Log Likelihood, True θ0 = 0
Figure 4.2: Objective Function, Simulated Method of Moments, true θ0 = 0
Figure 4.3: Just Identification
�
�
...................................................................................
...............................................
M (θ)
θ0
1
P
P ∗ ...........................................
θ∗
Figure 4.4: Non Identification
�
�
...................................................................................
..........................................................................
M (θ)
0
1
θ
P
...........................................
...........................................
θ∗2θ
∗
1
P ∗
Figure 4.5: Zero Likelihood
�
�
...................................................................................
...................................................................................
M (θ)
θ
P
0
1
P ∗
Figure 4.6: Overview of Methodology
Economic Model
Policy Rules
Predicted Outcome
Observed Outcome
Match ?
Yes
Vector of parameters
No
Optimal Parameters
Goodness of fit
Overidentification
tests
Policy analysis
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
Chapter 2
Economic
Properties
�
�
�
�
Chapter 3
Numerical
solution
�
�
�
�
Chapter 4
Estimation
method
Figure 5.1: Policy Function
9.5 10 10.5 11 11.5 12 12.5
9.5
10
10.5
11
11.5
12
12.5
current capital
fu
tu
re
c
a
p
ita
l
policy function
current capital
Figure 5.2: Net Investment
9.5 10 10.5 11 11.5 12 12.5
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
current capital
n
e
t
in
ve
st
m
e
n
t
Figure 6.1: Consumption and Liquidity Constraints: Optimal Consumption Rule
Figure 6.2: Simulations of Consumption and Assets with Serially Correlated Income
Figure 6.3: Optimal Consumption Rule
Figure 6.4: Observed and Predicted Consumption Profiles
Figure 7.1: [s,S] rule
Figure 7.2: Estimated Hazard Function, France
Figure 7.3: Estimated Hazard Function, US
Figure 7.4: Sales of New Cars, in thousands, monthly
Figure 7.5: Expected Aggregate Sales, Relative to Baseline
Figure 7.6: Expected Government Revenue, Relative to Baseline
Figure 8.1: The function Ω(γ)
1 1.5 2 2.5 3 3.5 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Gamma
fit
Figure 9.1: Employment Policy Functions: Quadratic Costs
0 200 400 600 800 1000 1200 1400
0
200
400
600
800
1000
1200
1400
current E
fu
tu
re
E
high state
low state
Figure 9.2: Employment Policy Functions: Piece-wise Linear Adjustment Costs
0 200 400 600 800 1000 1200 1400
0
200
400
600
800
1000
1200
1400
current E
fu
tu
re
E
high state
low state
Figure 9.3: Employment Policy Functions: Non-convex Adjustment Costs
0 200 400 600 800 1000 1200 1400
0
200
400
600
800
1000
1200
1400
current E
fu
tu
re
E
high state
low state
Figure 9.4: Employment Policy Functions: Mixed Adjustment Costs
0 200 400 600 800 1000 1200 1400
0
200
400
600
800
1000
1200
1400
current E
fu
tu
re
E
high state
low state
Prof. Andrzej Cieślik
Department of Macroeconomics and International Trade Theory, Faculty of Economic Sciences, University
of Warsaw,
4
4/50 Długa St., 00-241 Warsaw, email: cieslik@wne.uw.edu.pl
Office hours: Thursday
1
5.00-16.30, room 409/410.
A d v a n c e d M a c r o e c o n o m i c s C o u r s e S y l l a b u s – S p r i n g
2
0 1
3
:
M i c r o f o u n d a t i o n s , E c o n o m i c G r o w t h , B u s i n e s s C y c l e s a n d L a b o r M a r k e t
1. Description:
This is a 60-hour graduate course in advanced macroeconomics that focuses on dynamic real dynamic
macroeconomics. The topics will cover microeconomic foundations of macroeconomics, growth theories,
business cycles and selected labor market issues. This is an obligatory course for MA Programs in
International Economics and Quantitative Finance. Foreign students visiting the Faculty of Economic
Sciences at the University of Warsaw are also welcome to participate. Polish students with good knowledge
of English from other specialization fields can enroll subject to instructor’s approval. The course is offered
only in the spring semester. The class meets twice a week on Tuesdays and Thursdays for two hours (9.45-
11.20) in room A. The class is accompanied by a non-obligatory tutorial classes that meet on Mondays (9.45-
11.20) in room 203 every fortnight starting February 25, 2013.
2. Objectives:
The main objective of this course is to familiarize students with key analytical models in real
macroeconomics. The course consists of four parts. The first part is devoted to microfoundations of
macroeconomic models such as consumption, investment and the government sector. The second part
focuses on exogenous and endogenous growth theories and covers neoclassical models such as Solow-Swan,
Ramsey and OLG models as well as newer models such as AK, Lucas-Uzawa, Romer and Grossman-
Helpman models. The third part concentrates on business cycles and covers real business cycle and new
Keynesian theories. The fourth part is devoted to various labor market issues.
3. Required reading:
There is no single textbook for this course. Materials for this course come from various textbooks and
articles. All assigned readings are required readings. Most often reference will be made to the selected
chapters from the following six books:
[1] Acemoglu D., 2009, Introduction to Modern Economic Growth, Princeton University Press, Princeton,
[2] Adda J., Cooper R., 2003, Dynamic Economics, The MIT Press, Cambridge, M.A.,
[3] Bagliano F.C., Bertola G., 2004, Models for Dynamic Macroeconomics, Oxford University Press,
Oxford,
[4] Barro R.J., Sala-i-Martin X., 2004, Economic Growth, Second Edition, The MIT Press, Cambridge,
M.A.,
[5] Blanchard O.J., Fischer S., 1989, Lectures on Macroeconomics, The MIT Press, Cambridge, M.A.,
[6] Romer D., 2001, Advanced Macroeconomics, Second Edition, McGraw-Hill, New York.
4. Prerequisites:
The main prerequisite for this course is knowledge of both macro and microeconomics at the undergraduate
level, microeconomics and mathematical methods in economics at the graduate level.
5. Exam:
The grading will be based on the final written exam offered on June 13 (starting 9.00 and ending 12.00) in
room A.
1
mailto:cieslik@wne.uw.edu.pl
Detailed course program description:
Part I. Microeconomic Foundations
Topic 1. Consumption
Adda J., Cooper R., 2003, ch. 6, Consumption, 139-164.
Bagliano F.C., Bertola G., 2004, ch. 1, Dynamic consumption theory, 1-46.
Romer D., 2001, ch. 7, Consumption, 331-362.
Topic 2. Government sector
Romer D., 2001, ch. 11, Budget deficits and fiscal policy, 531-582.
Topic 3. Investment theory
Adda J., Cooper R., 2003, ch. 8, Investment, 187-214.
Bagliano F.C., Bertola G., 2004, ch. 2, Dynamic models of investment, 47-101.
Romer D., 2001, ch. 8., Investment, 367-409.
Sala-i-Martin X., 2000, Internal and external adjustment costs in the theory of fixed investment,
lecture notes.
Hall R., Jorgenson D., 1967, Tax policy and investment behavior, American Economic Review 57,
391-414.
Hayashi F., 1982, Tobin’s marginal q and average q: A neoclassical interpretation, Econometrica 50,
213-224.
Part II. Growth Theory
A. Neoclassical growth theory
Topic 4. Solow-Swan model
Acemoglu D., 2009, ch. 2., The Solow growth model, 26-76.
Barro R.J., Sala-i-Martin X., 2004, ch. 1, Growth models with exogenous saving rates, The
neoclassical model of Solow and Swan, 23-59.
Romer D., 2001, ch. 1, The Solow growth model, 5-43.
Topic 5. Ramsey-Cass-Koopmans (RCK) model
Acemoglu D., 2009, ch. 8., The neoclassical growth model, 287-326.
Barro R.J., Sala-i-Martin X., 2004, ch. 2, Growth models with consumer optimization, 85-133.
Romer D., 2001, ch. 2, Infinite horizon and overlapping generation models, Part A: The Ramsey-
Cass-Koopmans model, 47-74.
Blanchard O.J., Fischer S., 1989, Lectures on Macroeconomics, ch. 2, Consumption and
investment: Basic infinite horizon models, section 2.3, Government in the decentralized economy,
52-58.
Topic 6. Overlapping generations (OLG) model
Acemoglu D., 2009, ch. 9., Growth with overlapping generations, 327-358.
Romer D., 2001, ch. 2, Infinite horizon and overlapping generation models, Part B: The Diamond
model, 75-90.
Blanchard O.J., Fischer S., 1989, Lectures on Macroeconomics, ch. 3, The overlapping generations
model, section 3.2, Social security and capital accumulation, 110-114.
2
Barro R., 1974, Are government bonds net wealth?, Journal of Political Economy 82, 1095-1117.
Diamond P., 1965, National debt in a neoclassical growth model, American Economic Review 55,
1126-1150.
Samuelson P.A., 1958, An exact consumption-loan model of interest with or without the social
contrivance of money, Journal of Political Economy 66, 467-482.
Abel A., Mankiw N.G., Summers L., Zeckhauser R., 1989, Assessing dynamic inefficiency: Theory
and evidence, Review of Economic Studies 56, 1-20.
Topic 7. Convergence debate
Acemoglu D., 2009, ch. 3., The Solow model and the data, 77-108.
Barro R.J., Sala-i-Martin X., 2004, ch. 11, Empirical analysis of regional datasets, 461-496.
Barro R.J., Mankiw N.G., Sala-i-Martin X., 1995, Capital mobility in neoclassical models of
growth, American Economic Review 85, 103-115.
Islam N., 1995, Growth empirics: A panel data approach, Quarterly Journal of Economics 110,
1127-1170.
Mankiw N.G., Romer D., Weil D.N., 1992, A contribution to the empirics of economic growth,
Quarterly Journal of Economics 107, 407-437.
B. New growth theory
Topic 8. AK models and externalities
Acemoglu D., 2009, ch. 11., First-generation models of endogenous growth, 387-410.
Barro R.J., Sala-i-Martin X., 2004, ch. 1, Growth models with exogenous saving rates, Models of
endogenous growth, 61-71.
Barro R.J., Sala-i-Martin X., 2004, ch. 4, One sector model of endogenous growth, 205-232.
Rebelo S., 1991, Long-run policy analysis and long-run growth, Journal of Political Economy 99,
500-521.
Romer P., 1986, Increasing returns and long run growth, Journal of Political Economy 94, 1002-
1037.
Romer D., 2001, ch. 3, New growth theory, Part B, Cross-country income differences, 120-122.
Topic 9. Lucas-Uzawa model
Barro R.J., Sala-i-Martin X., 2004, ch. 5., Two-sector models of endogenous growth, 239-271.
Romer D., 2001, ch. 3, New growth theory, Part A, Research and development models, 98-160.
Lucas R.E., 1988, On the mechanics of economic development, Journal of Monetary Economics 22,
3-42.
Topic 10. Expanding product variety models
Acemoglu D., 2009, ch. 13., Expanding variety models, 433-457.
Barro R.J., Sala-i-Martin X., 2004, ch. 6, Technological change: Models with an expanding product
variety, 285-313.
Grossman G., Helpman E., 1993, Innovation and growth in the global economy, MIT Press,
Cambridge MA, ch. 3, Expanding product variety, 45-76.
Topic 11. Quality ladder models
Barro R.J., Sala-i-Martin X., 2004, ch. 7, Technological change: Models with improvements in the
quality of products, 317-343.
Grossman G., Helpman E., 1993, Innovation and growth in the global economy, MIT Press,
Cambridge MA, ch. 4, Rising product quality, 86-109.
3
Topic 12. Growth empirics
Barro R.J., Sala-i-Martin X., 2004, ch. 10, Growth accounting, 433-460.
Barro R.J., Sala-i-Martin X., 2004, ch. 11, Empirical analysis of regional datasets, 461-496.
Part III. Business Cycle Theory
Topic 13. Real business cycles
Acemoglu D., 2009, ch. 16., Stochastic growth models, 566-610.
Barro R.J., Sala-i-Martin X., 2004, ch. 9, Labor supply and population, 9.3, Labor/Leisure choice,
422-428.
Romer D., 2001, ch. 4, Real business cycle theory, 168-212.
Campbell J.M., 1994, Inspecting the mechanism: An analytical approach to the stochastic growth
model, Journal of Monetary Economics 33, 463-506.
Ritter J.A., 1995, An outsider’s guide to real cycle modeling, Federal Reserve Bank of St. Louis
Review, 49-60.
Christiano L., Eichenbaum M., 1992, Current real business cycle theories and aggregate labor
market fluctuations, American Economic Review 82, 430-450.
Topic 14. Coordination failures and macroeconomic policy
Bagliano F.C., Bertola G., 2004, ch. 5, Coordination and externalities in macroeconomics, 170-187.
Cooper R., 1999, Coordination games: Complementarities and Macroeconomics, Cambridge
University press, Cambridge.
Cooper R., John A., 1988, Coordinating coordination failures in Keynesian models, Quarterly
Journal of Economics 103, 441-463.
Diamond P., 1982, Aggregate demand management in search equilibrium, Journal of Political
Economy 90, 881-894.
Topic 15. Imperfect competition and real rigidities
Romer D., 2001, ch. 6, Microeconomic foundations of incomplete nominal adjustment, Part B,
Staggered price adjustment, 279-324.
Blanchard O., Kiyotaki N., 1987, Monopolistic competition and the effects of aggregate demand,
American Economic Review 77, 647-666.
Mankiw N.G., 1988, Imperfect competition and the Keynesian cross, Economics Letters 26, 7-13.
Rotemberg J.J., Saloner G., 1986, A supergame-theoretic model of price wars during booms,
American Economic Review 76, 390-407.
Weitzman M., 1982, Increasing returns and the foundations of unemployment theory, Economic
Journal 92, 787-804.
Part IV. Labor Market
Topic 16. Efficiency wage models of unemployment
Romer D., 2001, ch. 9, Unemployment, 410-432.
Yellen J.L., 1984, Efficiency-wage models of unemployment, American Economic Review 74, 200-
205.
Shapiro C., Stiglitz J.E., 1984, Equilibrium unemployment as a worker-discipline device, American
Economic Review 74, 433-444.
Topic 17. Search models of unemployment
Bagliano F.C., Bertola G., 2004, ch. 5, Coordination and externalities in macroeconomics, 188-206.
Romer D., 2001, ch. 9, Unemployment, 444-461.
4
- Department of Macroeconomics and International Trade Theory, Faculty of Economic Sciences, University of Warsaw, 44/50 Długa St., 00-241 Warsaw, email: cieslik@wne.uw.edu.pl
1. Description:
Part I. Microeconomic Foundations
Topic 2. Government sector
Topic 3. Investment theory
Topic 6. Overlapping generations (OLG) model
Romer D., 2001, ch. 2, Infinite horizon and overlapping generation models, Part B: The Diamond model, 75-90.
Barro R., 1974, Are government bonds net wealth?, Journal of Political Economy 82, 1095-1117.
Diamond P., 1965, National debt in a neoclassical growth model, American Economic Review 55, 1126-1150.
B. New growth theory
Topic 8. AK models and externalities
Topic 10. Expanding product variety models
Topic 11. Quality ladder models
Topic 12. Growth empirics
Part III. Business Cycle Theory
Topic 13. Real business cycles
Topic 14. Coordination failures and macroeconomic policy
Topic 15. Imperfect competition and real rigidities
Part IV. Labor Market
Topic 16. Efficiency wage models of unemployment
Models for Dynamic Macroeconomics
This page intentionally left blank
Models for Dynamic
Macroeconomics
Fabio-Cesare Bagliano
Giuseppe Bertola
1
3
Great Clarendon Street, Oxford ox2 6dp
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide in
Oxford New York
Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto
With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
Oxford is a registered trade mark of Oxford University Press
in the UK and in certain other countries
Published in the United States
by Oxford University Press Inc., New York
© Fabio-Cesare Bagliano and Giuseppe Bertola 2004
The moral rights of the authors have been asserted
Database right Oxford University Press (maker)
First published 2004
First published in paperback 2007
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
without the prior permission in writing of Oxford University Press,
or as expressly permitted by law, or under terms agreed with the appropriate
reprographics rights organization. Enquiries concerning reproduction
outside the scope of the above should be sent to the Rights Department,
Oxford University Press, at the address above
You must not circulate this book in any other binding or cover
and you must impose the same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
Library of Congress Cataloging in Publication Data
Data available
Typeset by SPI Publisher Services, Pondicherry, India
Printed in Great Britain
on acid-free paper by
Ashford Colour Press Ltd, Gosport, Hampshire
ISBN 978–0–19–926682–1 (hbk.)
ISBN 978–0–19–922832–4 (pbk.)
10 9 8 7 6 5 4 3 2 1
� P R E F A C E T O P A P E R B A C K E D I T I O N
The impact of macroeconomics on daily life is less tangible than that of micro-
economics. Everyone has to deal with rising supermarket prices, fluctuations
in the labor market, and other microeconomic problems. Only a handful of
policymakers and government officials really need to worry about fiscal and
monetary policy, or about a country’s overall competitiveness. The highly sim-
plified, and unavoidably controversial nature of theories used to represent the
complex phenomena resulting from the interaction of millions of individuals,
tends to make macroeconomics appear to be a relatively arcane and technical
branch of the social sciences. Its focus is on issues more likely to be of interest
to specialists than the general public.
Yet, macroeconomics and the problems it attempts to deal with are
extremely important, even if they are sometimes difficult to grasp. It cannot
be denied that macroeconomic analysis has become more technical over the
last few decades. The formal treatment of expectations and of inter-temporal
interactions is nowadays an essential ingredient of any model meant to address
practical and policy problems. But, at the same time, it has also become more
pragmatic because modern macroeconomics is firmly rooted in individual
agents’ day-to-day decisions. To understand and appreciate scientific research
papers, the modern macroeconomist has to master the dynamic optimization
tools needed to represent the solution of real, live individuals’ problems in
terms of optimization, equilibrium and dynamic accumulation relationships,
expectations and uncertainty. The macroeconomist, unlike most microecono-
mists, also needs to know how to model and interpret the interactions of
individual decisions that, in different ways and at different levels, make an
economy’s dynamic behavior very different from the simple juxtaposition of
its inhabitant’s actions and objectives.
This book offers its readers a step-by-step introduction to aspects of
macroeconomic engineering, individual optimization techniques and modern
approaches to macroeconomic equilibrium modeling. It applies the relevant
formal analysis to some of the standard topics covered less formally by all
intermediate macroeconomics course: consumption and investment, employ-
ment and unemployment, and economic growth. Aspects of each topic are
treated in more detail by making use of advanced mathematics and setting
them in a broader context than is the case in standard undergraduate text-
books. The book is not, however, as technically demanding as some other
graduate textbooks. Readers require no more mathematical expertise than is
provided by the majority of undergraduate courses. The exposition seeks to
vi PREFACE TO PAPERBACK EDITION
develop economic intuition as well as technical know-how, and to prepare
students for hands-on solutions to practical problems rather than providing
fully rigorous theoretical analysis. Hence, relatively advanced concepts (such
as integrals and random variables) are introduced in the context of economic
arguments and immediately applied to the solution of economic problems,
which are accurately characterized without an in-depth discussion of the
theoretical aspects of the mathematics involved. The style and coverage of
the material bridges the gap between basic textbooks and modern applied
macroeconomic research, allowing readers to approach research in leading
journals and understand research practiced in central banks and international
research institutions as well as in academic departments.
How to Use This Book
Models for Dynamic Macroeconomics is suitable for advanced undergraduate
and first-year graduate courses and can be taught in about 60 lecture hours.
When complemented by recent journal articles, the individual chapters—
which differ slightly in the relative emphasis given to analytical techniques
and empirical perspective—can also be used in specialized topics courses. The
last section of each chapter often sketches more advanced material and may
be omitted without breaking the book’s train of thought, while the chapters’
appendices introduce technical tools and are essential reading. Some exercises
are found within the chapters and propose extensions of the model discussed
in the text. Other exercises are found at the end of chapters and should be used
to review the material. Many technical terms are contained in the index, which
can be used to track down definitions and sample applications of possibly
unfamiliar concepts.
The book’s five chapters can to some extent be read independently, but
are also linked by various formal and substantive threads to each other and
to the macroeconomic literature they are meant to introduce. Discrete-time
optimization under uncertainty, introduced in Chapter 1, is motivated and
discussed by applications to consumption theory, with particular attention to
empirical implementation. Chapter 2 focuses on continuous-time optimiza-
tion techniques, and discusses the relevant insights in the context of partial-
equilibrium investment models. Chapter 3 revisits many of the previous
chapters’ formal derivations with applications to dynamic labor demand, in
analogy to optimal investment models, and characterizes labor market equi-
librium when not only individual firms’ labor demand is subject to adjustment
costs, but also individual labor supply by workers faces dynamic adjustment
PREFACE TO PAPERBACK EDITION vii
problems. Chapter 4 proposes broader applications of methods introduced by
the previous chapters, and studies continuous-time equilibrium dynamics of
representative-agent economies featuring both consumption and investment
choices, with applications to long-run growth frameworks of analysis. Chapter
5 illustrates the role of decentralized trading in determining aggregate equilib-
ria, and characterizes aggregate labor market dynamics in the presence of fric-
tional unemployment. Chapters 4 and 5 pay particular attention to strategic
interactions and externalities: even when each agent correctly solves his or her
individual dynamic problem, modern micro-founded macroeconomic mod-
els recognize that macroeconomic equilibrium need not have unambiguously
desirable properties.
Brief literature reviews at the end of each chapter outline some recent
directions of progress, but no book can effectively survey a literature as wide-
ranging, complex, and evolving as the macroeconomic one. In the interests
of time and space this book does not cover all of the important analytical and
empirical issues within the topics it discusses. Overlapping generation dynam-
ics and real and monetary business cycle fluctuations, as well as more technical
aspects, such as those relevant to the treatment of asymmetric information
and to more sophisticated game-theoretic and decision-theoretic approaches
are not covered. It would be impossible to cover all aspects of all relevant topics
in one compact and accessible volume and the intention is to complement
rather than compete with some of the other texts currently available.
∗
The
positive reception of the hardback edition, however, would seem to confirm
that the book does succeed in its intended purpose of covering the essential
elements of a modern macroeconomist’s toolkit. It also enables readers to
knowledgeably approach further relevant research. It is hoped that this paper-
back edition will continue to fulfil that purpose even more efficiently for a
number of years to come.
The first hardback edition was largely based on Metodi Dinamici e
Fenomeni Macroeconomici (il Mulino, Bologna, 1999), translated by Fabio
Bagliano (ch.1), Giuseppe Bertola (ch. 2), Marcel Jansen (chs. 3, 4, 5, edited
by Jessica Moss Spataro and Giuseppe Bertola). For helpful comments the
authors are indebted to many colleagues (especially Guido Ascari, Onorato
∗ Foundations of Modern Macroeconomics, by Ben J. Heijdra and Frederick van der Ploeg (Oxford
University Press, 2002) is more comprehensive and less technical; the two books can to some extent
complement each other on specific topics. This book offers more technical detail and requires less
mathematical knowledge than Lectures on Macroeconomics, by Olivier J. Blanchard and Stanley Fischer
(MIT Press, 1989), and offers a more up to date treatment of a more limited range of topics. It is less
wide ranging than Advanced Macroeconomics, by David Romer (McGraw-Hill 3rd rev. edn. 2005) but
provides more technical and rigorous hands-on treatment of more advanced techniques. By contrast,
Recursive Macroeconomic Theory, by Lars Ljungqvist and Thomas J. Sargent (MIT Press, 2nd edn. 2004)
offers a more rigorous but not as accessible formal treatment of a broad range of topics, and a narrower
range of technical and economic insights.
viii PREFACE TO PAPERBACK EDITION
Castellino, Elsa Fornero, Pietro Garibaldi, Giulio Fella, Vinicio Guidi, Claudio
Morana) and to the anonymous reviewers. The various editions of the book
have also benefited enormously from the input of the students and teaching
assistants (especially Alberto Bucci, Winfried Koeniger, Juana Santamaria,
Mirko Wiederholt) over many years at the CORIPE Master program in Turin,
at the European University Institute, and elsewhere. Any remaining errors and
all shortcomings are of course the authors’ own.
� C O N T E N T S
DETAILED CONTENTS x
LIST OF FIGURES xiii
1 Dynamic Consumption Theory 1
2 Dynamic Models of Investment 48
3 Adjustment Costs in the Labor Market 102
4 Growth in Dynamic General Equilibrium 130
5 Coordination and Externalities in Macroeconomics 170
ANSWERS TO EXERCISES 221
INDEX 274
� D E T A I L E D C O N T E N T S
LIST OF FIGURES xiii
1 Dynamic Consumption Theory 1
1.1 Permanent Income and Optimal Consumption 1
1.1.1 Optimal consumption dynamics 5
1.1.2 Consumption level and dynamics 7
1.1.3 Dynamics of income, consumption, and saving 9
1.1.4 Consumption, saving, and current income 11
1.2 Empirical Issues 13
1.2.1 Excess sensitivity of consumption to current income 13
1.2.2 Relative variability of income and consumption 15
1.2.3 Joint dynamics of income and saving 19
1.3 The Role of Precautionary Saving 22
1.3.1 Microeconomic foundations 22
1.3.2 Implications for the consumption function 25
1.4 Consumption and Financial Returns 29
1.4.1 Empirical implications of the CCAPM 31
1.4.2 Extension: the habit formation hypothesis 35
Appendix A1: Dynamic Programming 36
Review Exercises 41
Further Reading 43
References 45
2 Dynamic Models of Investment 48
2.1 Convex Adjustment Costs 49
2.2 Continuous-Time Optimization 52
2.2.1 Characterizing optimal investment 55
2.3 Steady-State and Adjustment Paths 60
2.4 The Value of Capital and Future Cash Flows 65
2.5 Average Value of Capital 69
2.6 A Dynamic IS–LM Model 71
2.7 Linear Adjustment Costs 76
2.8 Irreversible Investment Under Uncertainty 81
2.8.1 Stochastic calculus 82
2.8.2 Optimization under uncertainty and irreversibility 85
Appendix A2: Hamiltonian Optimization Methods 91
Review Exercises 97
Further Reading 99
References 100
3 Adjustment Costs in the Labor Market 102
3.1 Hiring and Firing Costs 104
3.1.1 Optimal hiring and firing 107
DETAILED CONTENTS xi
3.2 The Dynamics of Employment 110
3.3 Average Long-Run Effects 114
3.3.1 Average employment 115
3.3.2 Average profits 117
3.4 Adjustment Costs and Labor Allocation 119
3.4.1 Dynamic wage differentials 122
Appendix A3: (Two-State) Markov Processes 125
Exercises 127
Further Reading 128
References 129
4 Growth in Dynamic General Equilibrium 130
4.1 Production, Savings, and Growth 132
4.1.1 Balanced growth 134
4.1.2 Unlimited accumulation 136
4.2 Dynamic Optimization 138
4.2.1 Economic interpretation and optimal growth 139
4.2.2 Steady state and convergence 140
4.2.3 Unlimited optimal accumulation 141
4.3 Decentralized Production and Investment Decisions 144
4.3.1 Optimal growth 147
4.4 Measurement of “Progress”: The Solow Residual 148
4.5 Endogenous Growth and Market Imperfections 151
4.5.1 Production and non-rival factors 152
4.5.2 Involuntary technological progress 153
4.5.3 Scientific research 156
4.5.4 Human capital 157
4.5.5 Government expenditure and growth 158
4.5.6 Monopoly power and private innovations 160
Review Exercises 163
Further Reading 167
References 168
5 Coordination and Externalities in Macroeconomics 170
5.1 Trading Externalities and Multiple Equilibria 171
5.1.1 Structure of the model 171
5.1.2 Solution and characterization 172
5.2 A Search Model of Money 180
5.2.1 The structure of the economy 180
5.2.2 Optimal strategies and equilibria 182
5.2.3 Implications 185
5.3 Search Externalities in the Labor Market 188
5.3.1 Frictional unemployment 189
5.3.2 The dynamics of unemployment 191
5.3.3 Job availability 192
5.3.4 Wage determination and the steady state 195
5.4 Dynamics 199
5.4.1 Market tightness 199
5.4.2 The steady state and dynamics 203
xii DETAILED CONTENTS
5.5 Externalities and efficiency 206
Appendix A5: Strategic Interactions and Multipliers 211
Review Exercises 216
Further Reading 217
References 219
ANSWERS TO EXERCISES 221
INDEX 274
� L I S T O F F I G U R E S
1.1 Precautionary savings 24
2.1 Unit investment costs 50
2.2 Dynamics of q (supposing that ∂ F (·)/∂ K is decreasing in K ) 57
2.3 Dynamics of K (supposing that ∂È(·)/ ∂ K − ‰ < 0) 58
2.4 Phase diagram for the q and K system 59
2.5 Saddlepath dynamics 60
2.6 A hypothetical jump along the dynamic path, and the resulting time
path of Î(t ) and investment 63
2.7 Dynamic effects of an announced future change of w 64
2.8 Unit profits as a function of the real wage 68
2.9 A dynamic IS–LM model 73
2.10 Dynamic effects of an anticipated fiscal restriction 75
2.11 Piecewise linear unit investment costs 77
2.12 Installed capital and optimal irreversible investment 79
3.1 Static labor demand 103
3.2 Adjustment costs and dynamic labor demand 111
3.3 Nonlinearity of labor demand and the effect of turnover costs on
average employment, with r = 0 117
3.4 The employer’s surplus when marginal productivity is equal to the wage 118
3.5 Dynamic supply of labor from downsizing firms to expanding firms,
without adjustment costs 121
3.6 Dynamic supply of labor from downsizing firms to expanding firms,
without employers’ adjustment costs, if mobility costs Í per unit of labor 124
4.1 Decreasing marginal returns to capital 134
4.2 Steady state of the Solow model 134
4.3 Effects of an increase in the savings rate 136
4.4 Convergence and steady state with optimal savings 141
5.1 Stationarity loci for e and c ∗ 174
5.2 Equilibria of the economy 178
5.3 Optimal (�) response function 184
5.4 Optimal quantity of money M∗ and ex ante probability of
consumption P 187
5.5 Dynamics of the unemployment rate 192
xiv LIST OF FIGURES
5.6 Equilibrium of the labor market with frictional unemployment 198
5.7 Dynamics of the supply of jobs 201
5.8 Dynamics of unemployment and vacancies 203
5.9 Permanent reduction in productivity 204
5.10 Increase in the separation rate 205
5.11 A temporary reduction in productivity 206
5.12 Strategic interactions 212
5.13 Multiplicity of equilibria 214
1 Dynamic
Consumption Theory
Optimizing models of intertemporal choices are widely used by theoretical
and empirical studies of consumption. This chapter outlines their basic ana-
lytical structure, along with some extensions. The technical tools introduced
here aim at familiarizing the reader with recent applied work on consumption
and saving, but they will also prove useful in the rest of the book, when we
shall study investment and other topics in economic dynamics.
The chapter is organized as follows. Section 1.1 illustrates and solves the
basic version of the intertemporal consumption choice model, deriving the-
oretical relationships between the dynamics of permanent income, current
income, consumption, and saving. Section 1.2 discusses problems raised by
empirical tests of the theory, focusing on the excess sensitivity of consumption
to expected income changes and on the excess smoothness of consumption
following unexpected income variations. Explanations of the empirical evi-
dence are offered by Section 1.3, which extends the basic model by introducing
a precautionary saving motive. Section 1.4 derives the implications of optimal
portfolio allocation for joint determination of optimal consumption when
risky financial assets are available. The Appendix briefly introduces dynamic
programming techniques applied to the optimal consumption choice. Biblio-
graphic references and suggestions for further reading bring the chapter to a
close.
1.1. Permanent Income and Optimal Consumption
The basic model used in the modern literature on consumption and saving
choices is based on two main assumptions:
1. Identical economic agents maximize an intertemporal utility function,
defined on the consumption levels in each period of the optimization
horizon, subject to the constraint given by overall available resources.
2. Under uncertainty, the maximization is based on expectations of future
relevant variables (for example, income and the rate of interest) formed
rationally by agents, who use optimally all information at their disposal.
We will therefore study the optimal behavior of a representative agent who
lives in an uncertain environment and has rational expectations. Implications
2 CONSUMPTION
of the theoretical model will then be used to interpret aggregate data. The
representative consumer faces an infinite horizon (like any aggregate econ-
omy), and solves at time t an intertemporal choice problem of the following
general form:
max
{c t +i ;i =0,1,... }
U (c t , c t +1, . . .) ≡ Ut ,
subject to the constraint (for i = 0, . . . , ∞)
At +i +1 = (1 + rt +i ) At +i + yt +i − c t +i ,
where At +i is the stock of financial wealth at the beginning of period t + i ; rt +i
is the real rate of return on financial assets in period t + i ; yt +i is labor income
earned at the end of period t + i , and c t +i is consumption, also assumed to
take place at the end of the period. The constraint therefore accounts for the
evolution of the consumer’s financial wealth from one period to the next.
Several assumptions are often made in order easily to derive empirically
testable implications from the basic model. The main assumptions (some of
which will be relaxed later) are as follows.
� Intertemporal separability (or additivity over time) The generic utility
function Ut (·) is specified as
Ut (c t , c t +1, . . .) = vt (c t ) + vt +1(c t +1) + . . .
(with v′t +i > 0 and v
′′
t +i < 0 for any i ≥ 0), where vt +i (c t +i ) is the val-
uation at t of the utility accruing to the agent from consumption c t +i
at t + i . Since vt +i depends only on consumption at t + i , the ratio of
marginal utilities of consumption in any two periods is independent of
consumption in any other period. This rules out goods whose effects on
utility last for more than one period, either because the goods themselves
are durable, or because their consumption creates long-lasting habits.
(Habit formation phenomena will be discussed at the end of this chapter.)
� A way of discounting utility in future periods that guarantees intertempo-
rally consistent choices. Dynamic inconsistencies arise when the valuation
at time t of the relative utility of consumption in any two future periods,
t + k1 and t + k2 (with t < t + k1 < t + k2 ), differs from the valuation of
the same relative utility at a different time t + i . In this case the optimal
levels of consumption for t + k1 and t + k2 originally chosen at t may
not be considered optimal at some later date: the consumer would then
wish to reconsider his original choices simply because time has passed,
even if no new information has become available. To rule out this phe-
nomenon, it is necessary that the ratios of discounted marginal utilities
of consumption in t + k1 and t + k2 depend, in addition to c t +k1 and
c t +k2 , only on the distance k2 − k1, and not also on the moment in time
when the optimization problem is solved. With a discount factor for the
CONSUMPTION 3
utility of consumption in t + k of the form (1 + Ò)−k (called “exponential
discounting”), we can write
vt +k (c t +k ) =
(
1
1 + Ò
)k
u(c t +k ),
and dynamic consistency of preferences is ensured: under certainty, the
agent may choose the optimal consumption plan once and for all at the
beginning of his planning horizon.1
� The adoption of expected utility as the objective function under uncertainty
(additivity over states of nature) In discrete time, a stochastic process spec-
ifies a random variable for each date t , that is a real number associated
to the realization of a state of nature. If it is possible to give a probability
to different states of nature, it is also possible to construct an expecta-
tion of future income, weighting each possible level of income with the
probability of the associated state of nature. In general, the probabilities
used depend on available information, and therefore change over time
when new information is made available. Given her information set at
t , It , the consumer maximizes expected utility conditional on It : Ut =
E
(∑∞
i =0 vt +i (c t +i ) | It
)
. Together with the assumption of intertemporal
separability (additivity over periods of time), the adoption of expected
utility entails an inverse relationship between the degree of intertemporal
substitutability, measuring the agent’s propensity to substitute current
consumption with future consumption under certainty, and risk aver-
sion, determining the agent’s choices among different consumption lev-
els under uncertainty over the state of nature: the latter, and the inverse of
the former, are both measured in absolute terms by −v′′t (c )/v′t (c ) at time
t and for consumption level c . (We will expand on this point on page 6.)
� Finally, we make the simplifying assumption that there exists only one
financial asset with certain and constant rate of return r . Financial wealth
A is the stock of the safe asset allowing the agent to transfer resources
through time in a perfectly forecastable way; the only uncertainty is on
the (exogenously given) future labor incomes y. Stochastic rates of return
on n financial assets are introduced in Section 1.4 below.
Under the set of hypotheses above, the consumer’s problem may be speci-
fied as follows:
max
{c t +i ,i =0,1,... }
Ut = E t
[ ∞∑
i =0
(
1
1 + Ò
)i
u(c t +i )
]
(1.1)
¹ A strand of the recent literature (see the last section of this chapter for references) has explored
the implications of a different discount function: a “hyperbolic”discount factor declines at a relatively
higher rate in the short run (consumers are relatively “impatient” at short horizons) than in the long
run (consumers are “patient” at long horizons, implying dynamic inconsistent preferences).
4 CONSUMPTION
subject to the constraint (for i = 0, . . . , ∞):2
At +i +1 = (1 + r ) At +i + yt +i − c t +i , At given. (1.2)
In (1.1) Ò is the consumer’s intertemporal rate of time preference and E t [·]
is the (rational) expectation formed using information available at t : for a
generic variable xt +i we have E t xt +i = E (xt +i | It ). The hypothesis of rational
expectations implies that the forecast error xt +i − E (xt +i | It ) is uncorrelated
with the variables in the information set It : E t (xt +i − E (xt +i | It )) = 0 (we
will often use this property below). The value of current income yt in included
in It .
In the constraint (1.2) financial wealth A may be negative (the agent is not
liquidity-constrained); however, we impose the restriction that the consumer’s
debt cannot grow at a rate greater than the financial return r by means of the
following condition (known as the no-Ponzi-game condition):
lim
j →∞
(
1
1 + r
) j
At + j ≥ 0. (1.3)
The condition in (1.3) is equivalent, in the infinite-horizon case, to the non-
negativity constraint AT +1 ≥ 0 for an agent with a life lasting until period T :
in the absence of such a constraint, the consumer would borrow to finance
infinitely large consumption levels. Although in its general formulation (1.3)
is an inequality, if marginal utility of consumption is always positive this
condition will be satisfied as an equality. Equation (1.3) with strict equality
is called transversality condition and can be directly used in the problem’s
solution.
Similarly, without imposing (1.3), interests on debt could be paid for by
further borrowing on an infinite horizon. Formally, from the budget con-
straint (1.2) at time t , repeatedly substituting At +i up to period t + j , we get
the following equation:
1
1 + r
j −1∑
i =0
(
1
1 + r
)i
c t +i +
(
1
1 + r
) j
At + j =
1
1 + r
j −1∑
i =0
(
1
1 + r
)i
yt +i + At .
The present value of consumption flows from t up to t + j − 1 can exceed the
consumer’s total available resources, given by the sum of the initial financial
wealth At and the present value of future labor incomes from t up to t + j − 1.
In this case At + j < 0 and the consumer will have a stock of debt at the begin-
ning of period t + j . When the horizon is extended to infinity, the constraint
(1.3) stops the agent from consuming more than his lifetime resources, using
further borrowing to pay the interests on the existing debt in any period up
to infinity. Assuming an infinite horizon and using (1.3) with equality, we get
² In addition, a non-negativity constraint on consumption must be imposed: c t +i ≥ 0. We assume
that this constraint is always fulfilled.
CONSUMPTION 5
the consumer’s intertemporal budget constraint at the beginning of period t (in
the absence of liquidity constraints that would rule out, or limit, borrowing):
1
1 + r
∞∑
i =0
(
1
1 + r
)i
c t +i =
1
1 + r
∞∑
i =0
(
1
1 + r
)i
yt +i + At . (1.4)
1.1.1. OPTIMAL CONSUMPTION DYNAMICS
Substituting the consumption level derived from the budget constraint (1.2)
into the utility function, we can write the consumer’s problem as
max Ut = E t
∞∑
i =0
(
1
1 + Ò
)i
u((1 + r ) At +i − At +i +1 + yt +i )
with respect to wealth At +i for i = 1, 2, . . . , given initial wealth At and subject
to the transversality condition derived from (1.3). The first-order conditions
E t u
′(c t +i ) =
1 + r
1 + Ò
E t u
′(c t +i +1)
are necessary and sufficient if utility u(c ) is an increasing and concave function
of consumption (i.e. if u′(c ) > 0 and u′′(c ) < 0). For the consumer’s choice
in the first period (when i = 0), noting that u′(c t ) is known at time t , we get
the so-called Euler equation:
u′(c t ) =
1 + r
1 + Ò
E t u
′(c t +1). (1.5)
At the optimum the agent is indifferent between consuming immediately one
unit of the good, with marginal utility u′(c t ), and saving in order to consume
1 + r units in the next period, t + 1. The same reasoning applies to any period t
in which the optimization problem is solved: the Euler equation gives the
dynamics of marginal utility in any two successive periods.3
³ An equivalent solution of the problem is found by maximizing the Lagrangian function:
Lt = E t
∞∑
i =0
(
1
1 + Ò
)i
u(c t +i )
− Î
[ ∞∑
i =0
(
1
1 + r
)i
E t c t +i − (1 + r ) At −
∞∑
i =0
(
1
1 + r
)i
E t yt +i
]
,
where Î is the Lagrange multiplier associated with the intertemporal budget constraint (here evaluated
at the end of period t ). From the first-order conditions for c t and c t +1 , we derive the Euler equa-
tion (1.5). In addition, we get u′(c t ) = Î. The shadow value of the budget constraint, measuring the
increase of maximized utility that is due to an infinitesimal increase of the resources available at the end
of period t , is equal to the marginal utility of consumption at t . At the optimum, the Euler equation
holds: the agent is indifferent between consumption in the current period and consumption in any
6 CONSUMPTION
The evolution over time of marginal utility and consumption is governed
by the difference between the rate of return r and the intertemporal rate of
time preference Ò. Since u′′(c t ) < 0, lower consumption yields higher marginal
utility: if r > Ò, the consumer will find it optimal to increase consumption over
time, exploiting a return on saving higher than the utility discount rate; when
r = Ò, optimal consumption is constant, and when r < Ò it is decreasing. The
shape of marginal utility as a function of c (i.e. the concavity of the utility
function) determines the magnitude of the effect of r − Ò on the time path of
consumption: if |u′′(c )| is large relative to u′(c ), large variations of marginal
utility are associated with relatively small fluctuations in consumption, and
then optimal consumption shows little changes over time even when the rate
of return differs substantially from the utility discount rate.
Also, the agent’s degree of risk aversion is determined by the concavity of
the utility function. It has been already mentioned that our assumptions on
preferences imply a negative relationship between risk aversion and intertem-
poral substitutability (where the latter measures the change in consumption
between two successive periods owing to the difference between r and Ò or,
if r is not constant, to changes in the rate of return). It is easy to find such
relationship for the case of a CRRA (constant relative risk aversion) utility
function, namely:
u(c t ) =
c
1−„
t − 1
1 − „ , „ > 0,
with u′(c ) = c −„. The degree of relative risk aversion—whose general measure
is the absolute value of the elasticity of marginal utility, −u′′(c ) c /u′(c )—is in
this case independent of the consumption level, and is equal to the parameter
„.4 The measure of intertemporal substitutability is obtained by solving the
consumer’s optimization problem under certainty. The Euler equation corre-
sponding to ( 1.5) is
c
−„
t =
1 + r
1 + Ò
c
−„
t +1 ⇒
(
c t +1
c t
)„
=
1 + r
1 + Ò
.
Taking logarithms, and using the approximations log(1 + r ) � r and
log(1 + Ò) � Ò, we get
� log c t +1 =
1
„
(r − Ò).
future period, since both alternatives provide additional utility given by u′(c t ). In the Appendix to this
chapter, the problem’s solution is derived by means of dynamic programming techniques.
⁴ The denominator of the CRRA utility function is zero if „ = 1, but marginal utility can never-
theless have unitary elasticity: in fact, u′(c ) = c −„ = 1/c if u(c ) = log(c ). The presence of the constant
term “−1” in the numerator makes utility well defined also when „ → 1. This limit can be computed,
by l’Hôpital’s rule, as the ratio of the limits of the numerator’s derivative, d c 1−„/d „ = − log(c )c 1−„ ,
and the denominator’s derivative, which is −1.
CONSUMPTION 7
The elasticity of intertemporal substitution, which is the effect of changes in
the interest rate on the growth rate of consumption � log c , is constant and is
measured by the reciprocal of the coefficient of relative risk aversion „.
1.1.2. CONSUMPTION LEVEL AND DYNAMICS
Under uncertainty, the expected value of utility may well differ from its real-
ization. Letting
u′(c t +1) − E t u′(c t +1) ≡ Át +1,
we have by definition that E t Át +1 = 0 under the hypothesis of rational expec-
tations. Then, from (1.5), we get
u′(c t +1) =
1 + Ò
1 + r
u′(c t ) + Át +1. (1.6)
If we assume also that r = Ò, the stochastic process describing the evolution
over time of marginal utility is
u′(c t +1) = u
′(c t ) + Át +1, (1.7)
and the change of marginal utility from t to t + 1 is given by a stochastic term
unforecastable at time t (E t Át +1 = 0).
In order to derive the implications of the above result for the dynamics of
consumption, it is necessary to specify a functional form for u(c ). To obtain
a linear relation like (1.7), directly involving the level of consumption, we can
assume a quadratic utility function u(c ) = c − (b/2)c 2, with linear marginal
utility u′(c ) = 1 − bc (positive only for c < 1/b). This simple and somewhat
restrictive assumption lets us rewrite equation (1.7) as
c t +1 = c t + ut +1, (1.8)
where ut +1 ≡ −(1/b)Át +1 is such that E t ut +1 = 0. If marginal utility is linear in
consumption, as is the case when the utility function is quadratic, the process
(1.8) followed by the level of consumption is a martingale, or a random walk,
with the property:5
E t c t +1 = c t . (1.9)
This is the main implication of the intertemporal choice model with rational
expectations and quadratic utility: the best forecast of next period’s con-
sumption is current consumption. The consumption change from t to t + 1
⁵ A martingale is a stochastic process xt with the property E t xt +1 = xt . With r = Ò, marginal
utility and, under the additional hypothesis of quadratic utility, the level of consumption have this
property. No assumptions have been made about the distribution of the process xt +1 − xt , for example
concerning time-invariance, which is a feature of a random walk process.
8 CONSUMPTION
cannot be forecast on the basis of information available at t : formally, ut +1 is
orthogonal to the information set used to form the expectation E t , including
all variables known to the consumer and dated t, t − 1, . . . This implication
has been widely tested empirically. Such orthogonality tests will be discussed
below.
The solution of the consumer’s intertemporal choice problem given by (1.8)
cannot be interpreted as a consumption function. Indeed, that equation does
not link consumption in each period to its determinants (income, wealth, rate
of interest), but only describes the dynamics of consumption from one period
to the next. The assumptions listed above, however, make it possible to derive
the consumption function, combining what we know about the dynamics of
optimal consumption and the intertemporal budget constraint (1.4). Since
the realizations of income and consumption must fulfill the constraint, (1.4)
holds also with expected values:
1
1 + r
∞∑
i =0
(
1
1 + r
)i
E t c t +i =
1
1 + r
∞∑
i =0
(
1
1 + r
)i
E t yt +i + At . (1.10)
Linearity of the marginal utility function, and a discount rate equal to
the interest rate, imply that the level of consumption expected for any
future period is equal to current consumption. Substituting E t c t +i with c t on
the left-hand side of (1.10), we get
1
r
c t = At +
1
1 + r
∞∑
i =0
(
1
1 + r
)i
E t yt +i ≡ At + Ht . (1.11)
The last term in (1.11), the present value at t of future expected labor incomes,
is the consumer’s “human wealth” Ht . The consumption function can then be
written as
c t = r ( At + Ht ) ≡ y Pt (1.12)
Consumption in t is now related to its determinants, the levels of financial
wealth At and human wealth Ht . The consumer’s overall wealth at the begin-
ning of period t is given by At + Ht . Consumption in t is then the annuity
value of total wealth, that is the return on wealth in each period: r ( At + Ht ).
That return, that we define as permanent income (y Pt ), is the flow that could
be earned for ever on the stock of total wealth. The conclusion is that the agent
chooses to consume in each period exactly his permanent income, computed
on the basis of expectations of future labor incomes.
CONSUMPTION 9
1.1.3. DYNAMICS OF INCOME, CONSUMPTION, AND SAVING
Given the consumption function (1.12), we note that the evolution through
time of consumption and permanent income coincide. Leading (1.12) one
period, we have
y Pt +1 = r ( At +1 + Ht +1). (1.13)
Taking the expectation at time t of y Pt +1, subtracting the resulting expression
from (1.13), and noting that E t At +1 = At +1 from (1.2), since realized income
yt is included in the consumer’s information set at t , we get
y Pt +1 − E t y Pt +1 = r ( Ht +1 − E t Ht +1). (1.14)
Permanent income calculated at time t + 1, conditional on information avail-
able at that time, differs from the expectation formed one period earlier,
conditional on information at t , only if there is a “surprise” in the agent’s
human wealth at time t + 1. In other words, the “surprise” in permanent
income at t + 1 is equal to the annuity value of the “surprise” in human wealth
arising from new information on future labor incomes, available only at t + 1.
Since c t = y
P
t , from (1.9) we have
E t y
P
t +1 = y
P
t .
All information available at t is used to calculate permanent income y Pt ,
which is also the best forecast of the next period’s permanent income. Using
this result, the evolution over time of permanent income can be written as
y Pt +1 = y
P
t + r
[
1
1 + r
∞∑
i =0
(
1
1 + r
)i
(E t +1 − E t )yt +1+i
]
,
where the “surprise” in human wealth in t + 1 is expressed as the revision in
expectations on future incomes: y P can change over time only if those expect-
ations change, that is if, when additional information accrues to the agent in
t + 1, (E t +1 − E t )yt +1+i ≡ E t +1 yt +1+i − E t yt +1+i is not zero for all i . The
evolution over time of consumption follows that of permanent income, so
that we can write
c t +1 = c t + r
[
1
1 + r
∞∑
i =0
(
1
1 + r
)i
(E t +1 − E t )yt +1+i
]
= c t + ut +1. (1.15)
It can be easily verified that the change of consumption between t and t + 1
cannot be foreseen as of time t (since it depends only on information available
in t + 1): E t ut +1 = 0. Thus, equation (1.15) enables us to attach a well defined
economic meaning and a precise measure to the error term ut +1 in the Euler
equation (1.8).
10 CONSUMPTION
Intuitively, permanent income theory has important implications not only
for the optimal consumption path, but also for the behavior of the agent’s
saving, governing the accumulation of her financial wealth. To discover these
implications, we start from the definition of disposable income y D , the sum of
labor income, and the return on the financial wealth:
y Dt = r At + yt .
Saving s t (the difference between disposable income and consumption) is
easily derived by means of the main implication of permanent income theory
(c t = y
P
t ):
s t ≡ y Dt − c t = y Dt − y Pt = yt − r Ht . (1.16)
The level of saving in period t is then equal to the difference between current
(labor) income yt and the annuity value of human wealth r Ht . Such a dif-
ference, being transitory income, does not affect consumption: if it is positive
it is entirely saved, whereas, if it is negative it determines a decumulation of
financial assets of an equal amount. Thus, the consumer, faced with a variable
labor income, changes the stock of financial assets so that the return earned
on it (r A) allows her to keep consumption equal to permanent income.
Unfolding the definition of human wealth Ht in (1.16), we can write saving
at t as
s t = yt −
r
1 + r
∞∑
i =0
(
1
1 + r
)i
E t yt +i
=
1
1 + r
yt −
[
1
1 + r
−
(
1
1 + r
)2]
E t yt +1
−
[(
1
1 + r
)2
−
(
1
1 + r
)3]
E t yt +2 + . . .
= −
∞∑
i =1
(
1
1 + r
)i
E t �yt +i , (1.17)
where �yt +i = yt +i − yt +i −1. Equation (1.17) sheds further light on the
motivation for saving in this model: the consumer saves, accumulating finan-
cial assets, to face expected future declines of labor income (a “saving for a
rainy day” behavior). Equation (1.17) has been extensively used in the empir-
ical literature, and its role will be discussed in depth in Section 1.2.
CONSUMPTION 11
1.1.4. CONSUMPTION, SAVING, AND CURRENT INCOME
Under certainty on future labor incomes, permanent income does not change
over time. As a consequence, with r = Ò, consumption is constant and unre-
lated to current income yt . On the contrary, when future incomes are uncer-
tain, permanent income changes when new information causes a revision in
expectations. Moreover, there is a link between current income and consump-
tion if changes in income cause revisions in the consumer’s expected perma-
nent income. To explore the relation between current and permanent income,
we assume a simple first-order autoregressive process generating income y:
yt +1 = Îyt + (1 − Î)ȳ + εt +1, E t εt +1 = 0, (1.18)
where 0 ≤ Î ≤ 1 is a parameter and ȳ denotes the unconditional mean of
income. The stochastic term εt +1 is the component of income at t + 1 that
cannot be forecast on the basis of information available at t (i.e. the income
innovation). Suppose that the stochastic process for income is in the con-
sumer’s information set. From ( 1.18) we can compute the revision, between
t and t + 1, of expectations of future incomes caused by a given realization of
the stochastic term εt +1. The result of this calculation will then be substituted
into (1.15) to obtain the effect on consumption c t +1.
The revision in expectations of future incomes is given by
E t +1 yt +1+i − E t yt +1+i = Îi εt +1, ∀i ≥ 0.
Substituting this expression into (1.15) for each period t + 1 + i , we have
r
[
1
1 + r
∞∑
i =0
(
1
1 + r
)i
Îi εt +1
]
=
[
εt +1
r
1 + r
∞∑
i =0
(
Î
1 + r
)i]
, (1.19)
and solving the summation, we get6
c t +1 = c t +
(
r
1 + r − Î
)
εt +1, (1.20)
which directly links current income innovation εt +1 to current consumption
c t +1. Like equation (1.8), (1.20 ) is a Euler equation; the error term is the inno-
vation in permanent income, here expressed in terms of the current income
innovation. Given an unexpected increase of income in period t + 1 equal to
εt +1, the consumer increases consumption in t + 1 and expected consumption
in all future periods by the annuity value of the increase in human wealth,
⁶ The right-hand side expression in (1.19) can be written εt +1 (r /1 + r )S∞(Î/1 + r ) if we denote by
SN (·) a geometric series with parameter ·, of order N. Since SN (·) − ·SN (·) = (1 + · + ·2 + . . . +
·N ) − (· + ·2 + ·3 + . . . + ·N+1 ) = 1 + ·N+1 , such a series takes values SN (·) = (1 + ·N+1 )/(1 − ·)
and, as long as · < 1, converges to S∞(·) = (1 − ·)−1 as N tends to infinity. Using this formula in
(1.19) yields the result.
12 CONSUMPTION
r εt +1/(1 + r − Î). The portion of additional income that is not consumed, i.e.
εt +1 −
r
1 + r − Î εt +1 =
1 − Î
1 + r − Î εt +1,
is saved and added to the outstanding stock of financial assets. Starting from
the next period, the return on this saving will add to disposable income,
enabling the consumer to keep the higher level of consumption over the whole
infinite future horizon.
It is important to notice that the magnitude of the consumption change
between t and t + 1 resulting from an innovation in current income εt +1
depends, for a given interest rate r , on the parameter Î, capturing the degree
of persistence of an innovation in t + 1 on future incomes. To see the role of
this parameter, it is useful to consider two polar cases.
1. Î = 0. In this case yt +1 = ȳ + εt +1. The innovation in current income is
purely transitory and does not affect the level of income in future peri-
ods. Given an innovation εt +1, the consumer’s human wealth, calculated
at the beginning of period t + 1, changes by εt +1/(1 + r ). This change in
Ht +1 determines a variation of permanent income—and consumption—
equal to r εt +1/(1 + r ). In fact, from (1.20) with Î = 0, we have
c t +1 = c t +
( r
1 + r
)
εt +1. (1.21)
2. Î = 1. In this case yt +1 = yt + εt +1. The innovation in current income is
permanent, causing an equal change of all future incomes. The change
in human wealth is then εt +1/r and the variation in permanent income
and consumption is simply εt +1. From (1.20), with Î = 1, we get
c t +1 = c t + εt +1.
Exercise 1 In the two polar cases Î = 0 and Î = 1, find the effect of εt +1 on saving
in t + 1 and on saving and disposable income in the following periods.
Exercise 2 Using the stochastic process for labor income in (1.18), prove that the
consumption function that holds in this case (linking c t to its determinants At ,
yt , and ȳ) has the following form:
c t = r At +
r
1 + r − Î yt +
1 − Î
1 + r − Î ȳ.
What happens if Î = 1 and if Î = 0?
CONSUMPTION 13
1.2. Empirical Issues
The dynamic implications of the permanent income model of consumption
illustrated above motivated many recent empirical studies on consump-
tion. Similarly, the life-cycle theory of consumption developed mainly
by F. Modigliani has been subjected to empirical scrutiny. The partial-
equilibrium perspective of this chapter makes it difficult to discuss the rela-
tionship between long-run saving and growth rates at the aggregate level:
as we shall see in Chapter 4, the link between income growth and saving
depends also on the interest rate, and becomes more complicated when the
assumption of an exogenously given income process is abandoned. But even
empirical studies based on cross-sectional individual data show that saving, if
any, occurs only in the middle and old stages of the agent’s life: consumption
tracks income too closely to explain wealth accumulation only on the basis of
a life-cycle motive.
As regards aggregate short-run dynamics, the first empirical test of the fun-
damental implication of the permanent income/rational expectations model
of consumption is due to R. E. Hall (1978), who tests the orthogonality of
the error term in the Euler equation with respect to past information. If the
theory is correct, no variable known at time t − 1 can explain changes in con-
sumption between t − 1 and t . Formally, the test is carried out by evaluating
the statistical significance of variables dated t − 1 in the Euler equation for
time t . For example, augmenting the Euler equation with the income change
that occurred between t − 2 and t − 1, we get
�c t = ·�yt−1 + e t , (1.22)
where · = 0 if the permanent income theory holds. Hall’s results for the USA
show that the null hypothesis cannot be rejected for several past aggregate
variables, including income. However, some lagged variables (such as a stock
index) are significant when added to the Euler equation, casting some doubt
on the validity of the model’s basic version.
Since Hall’s contribution, the empirical literature has further investigated
the dynamic implications of the theory, focusing mainly on two empirical
regularities apparently at variance with the model: the consumption’s excess
sensitivity to current income changes, and its excess smoothness to income
innovations. The remainder of this section illustrates these problems and
shows how they are related.
1.2.1. EXCESS SENSITIVITY OF CONSUMPTION TO CURRENT INCOME
A different test of the permanent income model has been originally proposed
by M. Flavin (1981). Flavin’s test is based on (1.15) and an additional equation
14 CONSUMPTION
for the stochastic process for income yt . Consider the following stochastic
process for income (AR(1) in first differences):
�yt = Ï + Î�yt−1 + εt , (1.23)
where εt is the change in current income, �yt , that is unforecastable using
past income realizations. According to the model, the change in consumption
between t − 1 and t is due to the revision of expectations of future incomes
caused by εt . Letting Ë denote the intensity of this effect, the behavior of
consumption is then
�c t = Ëεt . (1.24)
Consumption is excessively sensitive to current income if c t reacts to changes
of yt by more than is justified by the change in permanent income, measured
by Ëεt .
Empirically, the Excess Sensitivity Hypothesis is formalized by augmenting
(1.24) with the change in current income,
�c t = ‚�yt + Ëεt + vt , (1.25)
where ‚ (if positive) measures the overreaction of consumption to a change in
current income, and vt captures the effect on consumption of information
about permanent income, available to agents at t but unrelated to current
income changes.
According to the permanent income model, an increase in current income
causes a change in consumption only by the amount warranted by the revision
of permanent income. Only innovations (that is, unpredictable changes) in
income cause consumption changes: the term Ëεt in (1.25) captures precisely
this effect. An estimated value for ‚ greater than zero is then interpreted as
signaling an overreaction of consumption to anticipated changes in income.
The test on ‚ in (1.25) is equivalent to Hall’s orthogonality test in (1.22). In
fact, substituting the stochastic process for income (1.23) into (1.25), we get
�c t = ‚Ï + ‚Î �yt−1 + (Ë + ‚)εt + vt . (1.26)
From this expression for the consumption change, we note that the hypothesis
‚ = 0 in (1.25) implies that · = 0 in (1.22): if consumption is excessively
sensitive to income, then the orthogonality property of the error term in
the equation for �c t does not hold. Equation (1.26) highlights a potential
difficulty with the orthogonality test. Indeed, �c t may be found to be uncor-
related with �yt−1 if the latter does not forecast future income changes. In
this case Î = 0 and the orthogonality test fails to reject the theory, even though
consumption is excessively sensitive to predictable changes in income. Thus,
differently from Hall’s test, the approach of Flavin provides an estimate of the
CONSUMPTION 15
excess sensitivity of consumption, measured by ‚, which is around 0.36 on US
quarterly data over the 1949–79 period.7
Among the potential explanations for the excess sensitivity of consumption,
a strand of the empirical literature focused on the existence of liquidity
constraints, which limit the consumer’s borrowing capability, thus prevent-
ing the realization of the optimal consumption plan. With binding liquid-
ity constraints, an increase in income, though perfectly anticipated, affects
consumption only when it actually occurs.8 A different rationale for excess
sensitivity, based on the precautionary saving motive, will be analyzed in
Section 1.3.9
1.2.2. RELATIVE VARIABILITY OF INCOME AND CONSUMPTION
One of the most appealing features of the permanent income theory, since
the original formulation due to M. Friedman, is a potential explanation of
why consumption typically is less volatile than current income: even in simple
textbook Keynesian models, a marginal propensity to consume c < 1 in aggre-
gate consumption functions of the form C = c̄ + c Y is crucial in obtaining
the basic concept of multiplier of autonomous expenditure. By relating con-
sumption not to current but to permanent, presumably less volatile, income,
the limited reaction of consumption to changes in current income is theoret-
ically motivated. The model developed thus far, adopting the framework of
intertemporal optimization under rational expectations, derived the implica-
tions of this original intuition, and formalized the relationship between cur-
rent income, consumption, and saving. (We shall discuss in the next chapter
formalizations of simple textbook insights regarding investment dynamics:
investment, like changes in consumption, is largely driven by revision of
expectations regarding future variables.)
In particular, according to theory, the agent chooses current consumption
on the basis of all available information on future incomes and changes
optimal consumption over time only in response to unanticipated changes
(innovations) in current income, causing revisions in permanent income.
⁷ However, Flavin’s test cannot provide an estimate of the change in permanent income resulting
from a current income innovation Ë, if ε and v in (1.26) have a non-zero covariance. Using aggregate
data, any change in consumption due to vt is also reflected in innovations in current income εt , since
consumption is a component of aggregate income. Thus, the covariance between ε and v tends to be
positive.
⁸ Applying instrumental variables techniques to (1.25), Campbell and Mankiw (1989, 1991) directly
interpret the estimated ‚ as the fraction of liquidity-constrained consumers, who simply spend their
current income.
⁹ While we do not focus in this chapter on aggregate equilibrium considerations, it is worth
mentioning that binding liquidity constraints and precautionary savings both tend to increase the
aggregate saving rate: see Aiyagari (1994), Jappelli and Pagano (1994).
16 CONSUMPTION
Therefore, on the empirical level, it is important to analyze the relationship
between current income innovations and changes in permanent income, tak-
ing into account the degree of persistence over time of such innovations.
The empirical research on the properties of the stochastic process generat-
ing income has shown that income y is non-stationary: an innovation at time
t does not cause a temporary deviation of income from trend, but has perma-
nent effects on the level of y, which does not display any tendency to revert
to a deterministic trend. (For example, in the USA the estimated long-run
change in income is around 1.6 times the original income innovation.10) The
implication of this result is that consumption, being determined by permanent
income, should be more volatile than current income.
To clarify this point, consider again the following process for income:
�yt +1 = Ï + Î�yt + εt +1, (1.27)
where Ï is a constant, 0 < Î < 1, and E t εt +1 = 0. The income change between
t and t + 1 follows a stationary autoregressive process; the income level is
permanently affected by innovations ε.11 To obtain the effect on permanent
income and consumption of an innovation εt +1 when income is governed by
(1.27), we can apply the following property of ARMA stochastic processes,
which holds whether or not income is stationary (Deaton, 1992). For a given
stochastic process for y of the form
a (L )yt = Ï + b(L )εt ,
where a (L ) = a0 + a1 L + a2 L
2 + . . . and b(L ) = b0 + b1 L + b2 L
2 + . . . are
two polynomials in the lag operator L (such that, for a generic variable x,
we have L i xt = xt−i ), we derive the following expression for the variance of
the change in permanent income (and consequently in consumption):12
r
1 + r
∞∑
i =0
(
1
1 + r
)i
(E t +1 − E t )yt +1+i =
r
1 + r
∑∞
i =0
(
1
1+r
)i
bi∑∞
i =0
(
1
1+r
)i
ai
εt +1. (1.28)
In the case of (1.27), we can write
yt = Ï + (1 + Î)yt−1 − Îyt−2 + εt ;
¹⁰ The feature of non-stationarity of income (in the USA and in other countries as well) is still
an open issue. Indeed, some authors argue that, given the low power of the statistical tests used to
assess the non-stationarity of macroeconomic time series, it is impossible to distinguish between non-
stationarity and the existence of a deterministic time trend on the basis of available data.
¹¹ A stochastic process of this form, with Î = 0.44, is a fairly good statistical description of the
(aggregate) income dynamics for the USA, as shown by Campbell and Deaton (1989) using quarterly
data for the period 1953–84.
¹² The following formula can also be obtained by computing the revisions in expectations of future
incomes, as has already been done in Section 1.1.
CONSUMPTION 17
hence we have a (L ) = 1 − (1 + Î)L + ÎL 2 and b(L ) = 1. Applying the general
formula (1.28) to this process, we get
�c t +1 =
r
1 + r
(
r (1 + r − Î)
(1 + r )2
)−1
εt +1 =
1 + r
1 + r − Î εt +1.
This is formally quite similar to (1.20), but, because the income process is
stationary only in first differences, features a different numerator on the right-
hand side: the relation between the innovation εt +1 and the change in con-
sumption �c t +1 is linear, but the slope is greater than 1 if Î > 0 (that is if, as is
realistic in business-cycle fluctuations, above-average growth tends to be fol-
lowed by still fast—if mean-reverting—growth in the following period). The
same coefficient measures the ratio of the variability of consumption (given
by the standard deviation of the consumption change) and the variability
of income (given by the standard deviation of the innovation in the income
process):
Û�c
Ûε
=
1 + r
1 + r − Î .
For example, Î = 0.44 and a (quarterly) interest rate of 1% yield a coefficient
of 1.77. The implied variability of the (quarterly) change of consumption
would be 1.77 times that of the income innovation. For non-durable goods
and services, Campbell and Deaton (1989) estimate a coefficient of only
0.64. Then, the response of consumption to income innovations seems to
be at variance with the implications of the permanent income theory: the
reaction of consumption to unanticipated changes in income is too smooth
(this phenomenon is called excess smoothness). This conclusion could be
questioned by considering that the estimate of the income innovation, ε,
depends on the variables included in the econometric specification of the
income process. In particular, if a univariate process like (1.27) is specified,
the information set used to form expectations of future incomes and to derive
innovations is limited to past income values only. If agents form their expecta-
tions using additional information, not available to the econometrician, then
the “true” income innovation, which is perceived by agents and determines
changes in consumption, will display a smaller variance than the innovation
estimated by the econometrician on the basis of a limited information set.
Thus, the observed smoothness of consumption could be made consistent
with theory if it were possible to measure the income innovations perceived
by agents.13
A possible solution to this problem exploits the essential feature of the
permanent income theory under rational expectations: agents choose optimal
consumption (and saving) using all available information on future incomes.
¹³ Relevant research includes Pischke (1995) and Jappelli and Pistaferri (2000).
18 CONSUMPTION
It is the very behavior of consumers that reveals their available information.
If such behavior is observed by the econometrician, it is possible to use it
to construct expected future incomes and the associated innovations. This
approach has been applied to saving, which, as shown by ( 1.17), depends
on expected future changes in income.
To formalize this point, we start from the definition of saving and make
explicit the information set used by agents at time t to forecast future
incomes, It :
s t = −
∞∑
i =1
(
1
1 + r
)i
E (�yt +i | It ). (1.29)
The information set available to the econometrician is �t , with �t ⊆ It
(agents know everything the econometrician knows but the reverse is not
necessarily true). Moreover, we assume that saving is observed by the econo-
metrician: s t ∈ �t . Then, taking the expected value of both sides of (1.29)
with respect to the information set �t and applying the “law of iterated
expectations,” we get
E (s t | �t ) = −
∞∑
i =1
(
1
1 + r
)i
E [E (�yt +i | It ) | �t ]
=⇒ s t = −
∞∑
i =1
(
1
1 + r
)i
E (�yt +i | �t ), (1.30)
where we use the assumption that saving is included in �t . According to
theory, then, saving is determined by the discounted future changes in labor
incomes, even if they are forecast on the basis of the smaller information
set �t .
Since saving choices, according to (1.29), are made on the basis of all
information available to agents, it is possible to obtain predictions on future
incomes that do not suffer from the limited information problem typical of
the univariate models widely used in the empirical literature. Indeed, pre-
dictions can be conditioned on past saving behavior, thus using the larger
information set available to agents. This is equivalent to forming predictions
of income changes �yt by using not only past changes, �yt−1, but also past
saving, s t−1.
In principle, this extension of the forecasting model for income could
reduce the magnitude of the estimated innovation variance Ûε. In practice,
as is shown in some detail below, the evidence of excess smoothness of con-
sumption remains unchanged after this extension.
CONSUMPTION 19
1.2.3. JOINT DYNAMICS OF INCOME AND SAVING
Studying the implications derived from theory on the joint behavior of income
and saving usefully highlights the connection between the two empirical puz-
zles mentioned above (excess sensitivity and excess smoothness). Even though
the two phenomena focus on the response of consumption to income changes
of a different nature (consumption is excessively sensitive to anticipated
income changes, and excessively smooth in response to unanticipated income
variations), it is possible to show that the excess smoothness and excess sensi-
tivity phenomena are different manifestations of the same empirical anomaly.
To outline the connection between the two, we proceed in three successive
steps.
1. First, we assume a stochastic process jointly governing the evolution of
income and saving over time and derive its implications for equations
like (1.22), used to test the orthogonality property of the consumption
change with respect to lagged variables. (Recall that the violation of
the orthogonality condition entails excess sensitivity of consumption to
predicted income changes.)
2. Then, given the expectations of future incomes based on the assumed
stochastic process, we derive the behavior of saving implied by theory
according to (1.17), and obtain the restrictions that must be imposed on
the estimated parameters of the process for income and saving to test the
validity of the theory.
3. Finally, we compare such restrictions with those required for the orthog-
onality property of the consumption change to hold.
We start with a simplified representation of the bivariate stochastic process
governing income—expressed in first differences as in (1.27) to allow for
non-stationarity, and imposing Ï = 0 for simplicity—and saving:
�yt = a11�yt−1 + a12s t−1 + u1t , (1.31)
s t = a21�yt−1 + a22s t−1 + u2t . (1.32)
With s t−1 in the model, it is now possible to generate forecasts on future
income changes by exploiting the additional informational value of past sav-
ing. Inserting the definition of saving (s t = r At + yt − c t ) into the accumula-
tion constraint (1.2), we get
At +1 = At + (r At + yt − c t ) ⇒ s t = At +1 − At . (1.33)
Obviously, the flow of saving is the change of the stock of financial assets
from one period to the next, and this makes it possible to write the change in
consumption by taking the first difference of the definition of saving
20 CONSUMPTION
used above:
�c t = �yt + r � At − �s t
= �yt + r s t−1 − s t + s t−1
= �yt + (1 + r )s t−1 − s t . (1.34)
Finally, substituting for �yt and s t from (1.31) and ( 1.32), we obtain the
following expression for the consumption change �c t :
�c t = „1�yt−1 + „2s t−1 + vt , (1.35)
where
„1 = a11 − a21, „2 = a12 − a22 + (1 + r ), vt = u1t − u2t .
The implication of the permanent income theory is that the consumption
change between t − 1 and t cannot be predicted on the basis of information
available at time t − 1. This entails the orthogonality restriction „1 = „2 = 0,
which in turn imposes the following restrictions on the coefficients of the joint
process generating income and savings:
a11 = a21, a22 = a12 + (1 + r ). (1.36)
If these restrictions are fulfilled, the consumption change �c t = u1t − u2t
is unpredictable using lagged variables: the change in consumption (and in
permanent income) is equal to the current income innovation (u1t ) less the
innovation in saving (u2t ), which reflects the revision in expectations of future
incomes calculated by the agent on the basis of all available information. Now,
from the definition of savings (1.17), using the expectations of future income
changes derived from the model in (1.31) and (1.32), it is possible to obtain
the restrictions imposed by the theory on the stochastic process governing
income and savings. Letting
xt ≡
(
�yt
s t
)
, A ≡
(
a11 a12
a21 a22
)
, ut =
(
u1t
u2t
)
,
we can rewrite the process in (1.31)–(1.32) as
xt = Axt−1 + ut . (1.37)
From (1.37), the expected values of �yt +i can be easily derived:
E t xt +i = A
i xt , i ≥ 0;
CONSUMPTION 21
hence (using a matrix algebra version of the geometric series formula)
−
∞∑
i =1
(
1
1 + r
)i
E t xt +i = −
∞∑
i =1
(
1
1 + r
)i
Ai xt
= −
[(
I − 1
1 + r
A
)−1
− I
]
xt . (1.38)
The element of vector x we are interested in (saving s ) can be “extracted” by
applying to x a vector e2 ≡ (0 1)′, which simply selects the second element of
x. Similarly, to apply the definition in (1.17), we have to select the first element
of the vector in (1.38) using e1 ≡ (1 0)′. Then we get
e′2xt = −e′1
[(
I − 1
1 + r
A
)−1
− I
]
xt ⇒ e′2 = −e′1
[(
I − 1
1 + r
A
)−1
− I
]
,
yielding the relation
e′2 = (e
′
2 − e′1)
1
1 + r
A. (1.39)
Therefore, the restrictions imposed by theory on the coefficients of matrix
A are
a11 = a21, a22 = a12 + (1 + r ). (1.40)
These restrictions on the joint process for income and saving, which rule
out the excess smoothness phenomenon, are exactly the same as those—in
equation (1.35)—that must be fulfilled for the orthogonality property to hold,
and therefore also ensure elimination of excess sensitivity.14 Summarizing,
the phenomena of excess sensitivity and excess smoothness, though related to
income changes of a different nature (anticipated and unanticipated, respec-
tively), signal the same deviation from the implications of the permanent
income theory. If agents excessively react to expected income changes, they
must necessarily display a lack of reaction to unanticipated income changes.
In fact, any variation in income is made up of a predicted component and a
(unpredictable) innovation: if the consumer has an “excessive” reaction to the
former component, the intertemporal budget constraint forces him to react in
an “excessively smooth” way to the latter component of the change in current
income.
¹⁴ The coincidence of the restrictions necessary for orthogonality and for ruling out excess smooth-
ness is obtained only in the special case of a first-order stochastic process for income and saving. In
the more general case analyzed by Flavin (1993), the orthogonality restrictions are nested in those
necessary to rule out excess smoothness. Then, in general, orthogonality conditions analogous to
(1.36) imply—but are not implied by—those analogous to (1.40).
22 CONSUMPTION
1.3. The Role of Precautionary Saving
Recent developments in consumption theory have been aimed mainly at
solving the empirical problems illustrated above. The basic model has been
extended in various directions, by relaxing some of its most restrictive
assumptions. On the one hand, as already mentioned, liquidity constraints
can prevent the consumer from borrowing as much as required by the optimal
consumption plan. On the other hand, it has been recognized that in the basic
model saving is motivated only by a rate of interest higher than the rate-of-
time preference and/or by the need for redistributing income over time, when
current incomes are unbalanced between periods. Additional motivations for
saving may be relevant in practice, and may contribute to the explanation of,
for example, the apparently insufficient decumulation of wealth by older gen-
erations, the high correlation between income and consumption of younger
agents, and the excess smoothness of consumption in reaction to income
innovations. This section deals with the latter strand of literature, studying
the role of a precautionary saving motive in shaping consumers’ behavior.
First, we will spell out the microeconomic foundations of precautionary
saving, pointing out which assumption of the basic model must be relaxed to
allow for a precautionary saving motive. Then, under the new assumptions,
we shall derive the dynamics of consumption and the consumption function,
and compare them with the implications of the basic version of the permanent
income model previously illustrated.
1.3.1. MICROECONOMIC FOUNDATIONS
Thus far, with a quadratic utility function, uncertainty has played only a
limited role. Indeed, only the expected value of income y affects consumption
choices—other characteristics of the income distribution (e.g. the variance)
do not play any role.
With quadratic utility, marginal utility is linear and the expected value of
the marginal utility of consumption coincides with the marginal utility of
expected consumption. An increase in uncertainty on future consumption,
with an unchanged expected value, does not cause any reaction by the con-
sumer.15 As we shall see, if marginal utility is a convex function of consump-
tion, then the consumer displays a prudent behavior, and reacts to an increase
in uncertainty by saving more: such saving is called precautionary, since it
depends on the uncertainty about future consumption.
¹⁵ In the basic version of the model, the consumer is interested only in the certainty equivalent value
of future consumption.
CONSUMPTION 23
Convexity of the marginal utility function u′(c ) implies a positive sign
of its second derivative, corresponding to the third derivative of the utility
function: u′′′(c ) > 0. A precautionary saving motive, which does not arise
with quadratic utility (u′′′(c ) = 0), requires the use of different functional
forms, such as exponential utility.16 With risk aversion (u′′(c ) < 0) and convex
marginal utility (u′′′(c ) > 0), under uncertainty about future incomes (and
consumption), unfavorable events determine a loss of utility greater than the
gain in utility obtained from favorable events of the same magnitude. The
consumer fears low-income states and adopts a prudent behavior, saving in
the current period in order to increase expected future consumption.
An example can make this point clearer. Consider a consumer living for two
periods, t and t + 1, with no financial wealth at the beginning of period t . In
the first period labor income is ȳ with certainty, whereas in the second period
it can take one of two values—y At +1 or y
B
t +1 < y
A
t +1—with equal probability.
To focus on the precautionary motive, we rule out any other motivation
for saving by assuming that E t (yt +1) = ȳ and r = Ò = 0. In equilibrium the
following relation holds: E t u
′(c t +1) = u′(c t ). At time t the consumer chooses
saving s t (equal to ȳ − c t ) and his consumption at time t + 1 will be equal to
saving s t plus realized income. Considering actual realizations of income, we
can write the budget constraint as
c At +1
c Bt +1
}
= ȳ − c t +
{
y At +1
y Bt +1
= s t +
{
y At +1
y Bt +1
.
Using the definition of saving, s t ≡ ȳ − c t , the Euler equation becomes
E t (u
′( yt +1 + s t )) = u
′( ȳ − s t ). (1.41)
Now, let us see how the consumer chooses saving in two different cases,
beginning with that of linear marginal utility (u′′′(c ) = 0). In this case we have
E t u
′(·) = u′(E t (·)). Recalling that E t ( yt +1) = ȳ, condition ( 1.41) becomes
u′( ȳ + s t ) = u
′( ȳ − s t ), (1.42)
and is fulfilled by s t = 0. The consumer does not save in the first period,
and his second-period consumption will coincide with current income. The
uncertainty on income in t + 1 reduces overall utility but does not induce
the consumer to modify his choice: there is no precautionary saving. On the
contrary, if, as in Figure 1.1, marginal utility is convex (u′′′(c ) > 0), then,
¹⁶ A quadratic utility function has another undesirable property: it displays increasing absolute risk
aversion. Formally, −u′′(c )/u′(c ) is an increasing function of c . This implies that, to avoid uncertainty,
the agent is willing to pay more the higher is his wealth, which is not plausible.
24 CONSUMPTION
Figure 1.1. Precautionary savings
from “Jensen’s inequality,” E t u
′(c t +1) > u′(E t (c t +1)).17 If the consumer were
to choose zero saving, as was optimal under a linear marginal utility, we would
have (for s t = 0, and using Jensen’s inequality)
E t (u
′(c t +1)) > u
′(c t ). (1.43)
The optimality condition would be violated, and expected utility would not
be maximized. To re-establish equality in the problem’s first-order condition,
marginal utility must decrease in t + 1 and increase in t : as shown in the figure,
this may be achieved by shifting an amount of resources s t from the first to the
second period. As the consumer saves more, decreasing current consumption
c t and increasing c t +1 in both states (good and bad), marginal utility in t
increases and expected marginal utility in t + 1 decreases, until the optimal-
ity condition is satisfied. Thus, with convex marginal utility, uncertainty on
future incomes (and consumption levels) entails a positive amount of saving
in the first period and determines a consumption path trending upwards over
time ( E t c t +1 > c t ), even though the interest rate is equal to the utility discount
rate. Formally, the relation between uncertainty and the upward consumption
path depends on the degree of consumer’s prudence, which we now define
rigorously. Approximating (by means of a second-order Taylor expansion)
around c t the left-hand side of the Euler equation E t u
′(c t +1) = u′(c t ), we get
E t (c t +1 − c t ) = −
1
2
u′′′(c t )
u′′(c t )
E t (c t +1 − c t )2 ≡
1
2
a E t (c t +1 − c t )2, (1.44)
¹⁷ Jensen’s inequality states that, given a strictly convex function f (x ) of a random variable x , then
E ( f (x )) > f ( E x ).
CONSUMPTION 25
where a ≡ −u′′′(c )/u′′(c ) is the coefficient of absolute prudence. Greater
uncertainty, increasing E t ((c t +1 − c t )2), induces a larger increase in con-
sumption between t and t + 1. The definition of the coefficient measuring
prudence is formally similar to that of risk-aversion coefficients: however, the
latter is related to the curvature of the utility function, whereas prudence is
determined by the curvature of marginal utility. It is also possible to define
the coefficient of relative prudence, −u′′′(c )c /u′′(c ). Dividing both sides of
(1.44) by c t , we get
E t
(
c t +1 − c t
c t
)
= − 1
2
u′′′(c t ) · c t
u′′(c t )
E t
(
c t +1 − c t
c t
)2
=
1
2
p E t
(
c t +1 − c t
c t
)2
,
where p ≡ −(u′′′(c ) · c /u′′(c )) is the coefficient of relative prudence. Readers
can check that this is constant for a CRRA function, and determine its rela-
tionship to the coefficient of relative risk aversion.
Exercise 3 Suppose that a consumer maximizes
log (c 1) + E [log (c 2)]
under the constraint c 1 + c 2 = w1 + w2 (i.e., the discount rate of period 2 utility
and the rate of return on saving w1 − c 1 are both zero). When c 1 is chosen, there
is uncertainty about w2: the consumer will earn w2 = x or w2 = y with equal
probability. What is the optimal level of c 1?
1.3.2. IMPLICATIONS FOR THE CONSUMPTION FUNCTION
We now solve the consumer’s optimization problem in the case of a non-
quadratic utility function, which motivates precautionary saving. The setup
of the problem is still given by (1.1) and (1.2), but the utility function in each
period is now of the exponential form:
u(c t +i ) = −
1
„
e −„c t +i , (1.45)
where „ > 0 is the coefficient of absolute prudence (and also, for such a
constant absolute risk aversion—CARA—utility function, the coefficient of
absolute risk aversion).18 Assume that labor income follows the AR(1) sto-
chastic process:
yt +i = Îyt +i −1 + (1 − Î)ȳ + εt +i , (1.46)
¹⁸ Since for the exponential utility function u′(0) = 1 < ∞, in order to rule out negative values
for consumption it would be necessary to explicitly impose a non-negativity constraint; however, a
closed-form solution to the problem would not be available if that constraint were binding.
26 CONSUMPTION
where εt +i are independent and identically distributed (i.i.d.) random vari-
ables, with zero mean and variance Û2ε. We keep the simplifying hypothesis
that r = Ò.
The problem’s first-order condition, for i = 0, is given by
e −„c t = E t (e
−„c t +1 ). (1.47)
To proceed, we guess that the stochastic process followed by consumption over
time has the form
c t +i = c t +i −1 + K t +i −1 + vt +i , (1.48)
where K t +i −1 is a deterministic term (which may however depend on the
period’s timing within the individual’s life cycle) and vt +i is the innovation
in consumption (E t +i −1vt +i = 0). Both the sequence of K t terms and the
features of the distribution of v must be determined so as to satisfy the Euler
equation (1.47) and the intertemporal budget constraint (1.4). Using (1.48),
from the Euler equation, after eliminating the terms in c t , we get
e „K t = E t (e
−„vt +1 ) ⇒ K t =
1
„
log E t (e
−„vt +1 ). (1.49)
The value of K depends on the characteristics of the distribution of v,
yet to be determined. Using the fact that log E (·)>E (log(·)) by Jensen’s
inequality and the property of consumption innovations E t vt +1 = 0, we can
however already write
K t =
1
„
log E t (e
−„vt +1 ) >
1
„
E t (log(e
−„vt +1 )) =
1
„
E t (−„vt +1) = 0 ⇒ K t > 0.
(1.50)
The first result is that the consumption path is increasing over time: the
consumption change between t and t + 1 is expected to equal K t > 0, whereas
with quadratic utility (maintaining the assumption Ò = r ) consumption
changes would have zero mean. Moreover, from (1.49) we interpret −K t as
the “certainty equivalent” of the consumption innovation vt +1, defined as the
(negative) certain change of consumption from t to t + 1 that the consumer
would accept to avoid the uncertainty on the marginal utility of consumption
in t + 1.
To obtain the consumption function (and then to determine the effect of
the precautionary saving motive on the level of consumption) we use the
intertemporal budget constraint (1.10) computing the expected values E t c t +i
from (1.48). Knowing that E t vt +i = 0, we have
1
1 + r
∞∑
i =0
(
1
1 + r
)i
c t +
1
1 + r
∞∑
i =1
(
1
1 + r
)i i∑
j =1
K t + j −1 = At + Ht . (1.51)
CONSUMPTION 27
Solving for c t , we finally get
c t = r ( At + Ht ) −
r
1 + r
∞∑
i =1
(
1
1 + r
)i i∑
j =1
K t + j −1. (1.52)
The level of consumption is made up of a component analogous to the def-
inition of permanent income, r ( At + Ht ), less a term that depends on the
constants K and captures the effect of the precautionary saving motive: since
the individual behaves prudently, her consumption increases over time, but
(consistently with the intertemporal budget constraint) the level of consump-
tion in t is lower than in the case of quadratic utility.
As the final step of the solution, we derive the form of the stochastic term
vt +i , and its relationship to the income innovation εt +i . To this end we use the
budget constraint (1.4), where c t +i and yt +i are realizations and not expected
values, and write future realized incomes as the sum of the expected value
at time t and the associated “surprise”: yt +i = E t yt +i + ( yt +i − E t yt +i ). The
budget constraint becomes
1
1 + r
∞∑
i =0
(
1
1 + r
)i
c t +i = At + Ht +
1
1 + r
∞∑
i =1
(
1
1 + r
)i
( yt +i − E t yt +i ).
Substituting for c t +i (with i > 0) from (1.48) and for c t from the consumption
function (1.52), we get
∞∑
i =1
(
1
1 + r
)i i∑
j =1
vt + j =
∞∑
i =1
(
1
1 + r
)i
( yt +i − E t yt +i ).
Given the stochastic process for income (1.46) we can compute the income
“surprises,”
yt +i − E t yt +i =
i −1∑
k=0
Îk εt +i −k ,
and insert them into the previous equation, to obtain
∞∑
i =1
(
1
1 + r
)i i∑
j =1
vt + j =
∞∑
i =1
(
1
1 + r
)i i −1∑
k=0
Îk εt +i −k . (1.53)
Developing the summations, collecting terms containing v and ε with the
same time subscript, and using the fact that v and ε are serially uncorrelated
processes, we find the following condition that allows us to determine the form
of vt +i :
∞∑
i =1
(
1
1 + r
)i
(vt +h − Îi −1εt +h ) = 0, ∀h ≥ 1. (1.54)
28 CONSUMPTION
Solving the summation in (1.54), we arrive at the final form of the stochastic
terms of the Euler equation guessed in (1.48): at all times t + h,
vt +h =
r
1 + r − Î εt +h . (1.55)
As in the quadratic utility case (1.20), the innovation in the Euler equation can
be interpreted as the annuity value of the revision of the consumer’s human
wealth arising from an innovation in income for the assumed stochastic
process.
Expression (1.55) for vt +1 can be substituted in the equation for K t (1.49).
The fact that the innovations ε are i.i.d. random variables implies that K t does
not change over time: K t +i −1 = K in (1.48). The evolution of consumption
over time is then given by
c t +1 = c t + K +
r
1 + r − Î εt +1. (1.56)
Substituting the constant value for K into (1.52), we get a closed-form con-
sumption function:19
c t = r ( At + Ht ) −
r
1 + r
∞∑
i =1
(
1
1 + r
)i
i · K
= r ( At + Ht ) −
r
1 + r
K
1 + r
r 2
= r ( At + Ht ) −
K
r
.
Finally, to determine the constant K and its relationship with the uncer-
tainty about future labor incomes, some assumptions on the distribution of
ε have to be made. If ε is normally distributed, ε ∼ N(0, Û2ε), then, letting
¹⁹ To verify this result, note that
∞∑
i =1
·i i =
∞∑
i =1
·i +
∞∑
i =2
·i +
∞∑
i =3
·i + …
=
∞∑
i =1
·i + ·
∞∑
i =1
·i + ·2
∞∑
i =1
·i + …
= (1 + · + ·2 + …)
∞∑
i =1
·i
=
∞∑
i =0
·i
( ∞∑
i =0
·i − 1
)
,
which equals 11−·
·
1−· = ·/(1 − ·)2 as long as · < 1, which holds true in the relevant · = 1/(1 + r )
case with r > 0.
CONSUMPTION 29
Ë ≡ r /(1 + r − Î), we have20
K t =
1
„
log E t (e
−„Ëεt +1 ) =
1
„
log e
„2 Ë2 Û2ε
2 =
„Ë2Û2ε
2
. (1.57)
The dynamics of consumption over time and its level in each period are then
given by
c t +1 = c t +
„Ë2Û2ε
2
+ Ëεt +1,
c t = r ( At + Ht ) −
1
r
„Ë2Û2ε
2
.
The innovation variance Û2ε has a positive effect on the change in consumption
between t and t + 1, and a negative effect on the level of consumption in t .
Increases in the uncertainty about future incomes (captured by the variance of
the innovations in the process for y) generate larger changes of consumption
from one period to the next and drops in the level of current consumption.
Thus, allowing for a precautionary saving motive can rationalize the slow
decumulation of wealth by old individuals, and can explain why (increasing)
income and consumption paths are closer to each other than would be implied
by the basic permanent income model. Moreover, if positive innovations in
current income are associated with higher uncertainty about future income,
the excess smoothness phenomenon may be explained, since greater uncer-
tainty induces consumers to save more and may then reduce the response of
consumption to income innovations.
Exercise 4 Assuming u(c ) = c 1−„/(1 − „) and r �= Ò, derive the first-order con-
dition of the consumer’s problem under uncertainty. If c t +1/c t has a lognormal
distribution (i.e. if the rate of change of consumption � log c t +1 is normally
distributed with constant variance Û2), write the Euler equation in terms of the
expected rate of change of consumption E t (� log c t +1). How does the variance Û
2
affect the behavior of the rate of change of c over time? (Hint: make use of the
fact mentioned in note 20, recall that c t +1/c t = e
� log c t +1 , and express the Euler
equations in logarithmic terms.)
1.4. Consumption and Financial Returns
In the model studied so far, the consumer uses a single financial asset with a
certain return to implement the optimal consumption path. A precautionary
saving motive has been introduced by abandoning the hypothesis of quadratic
²⁰ To derive (1.57) we used the following statistical fact: if x ∼ N( E (x ), Û2 ), then e x is a lognormal
random variable with mean E (e x ) = e E (x )+Û
2 /2 .
30 CONSUMPTION
utility. However, there is still no choice on the allocation of saving. If we
assume that the consumer can invest his savings in n financial assets with
uncertain returns, we generate a more complicated choice of the composition
of financial wealth, which interacts with the determination of the optimal
consumption path. The chosen portfolio allocation will depend on the char-
acteristics of the consumer’s utility function (in particular the degree of risk
aversion) and of the distribution of asset returns. Thereby extended, the model
yields testable implications on the joint dynamics of consumption and asset
returns, and becomes the basic version of the consumption-based capital asset
pricing model (CCAPM).
With the new hypothesis of n financial assets with uncertain returns,
the consumer’s budget constraint must be reformulated accordingly. The
beginning-of-period stock of the j th asset, measured in units of consumption,
is given by A
j
t +i . Therefore, total financial wealth is At +i =
∑n
j =1 A
j
t +i . r
j
t +i +1
denotes the real rate of return of asset j in period t + i , so that A
j
t +i +1 =
(1 + r
j
t +i +1) A
j
t +i . This return is not known by the agent at the beginning
of period t + i . (This explains the time subscript t + i + 1, whereas labor
income—observed by the agent at the beginning of the period—has subscript
t + i .) The accumulation constraint from one period to the next takes the form
n∑
j =1
A
j
t +i +1 =
n∑
j =1
(1 + r
j
t +i +1) A
j
t +i + yt +i − c t +i , i = 0, . . . , ∞. (1.58)
The solution at t of the maximization problem yields the levels of consump-
tion and of the stocks of the n assets from t to infinity. Like in the solution of
the consumer’s problem analyzed in Section 1.1 (but now with uncertain asset
returns), we have a set of n Euler equations,
u′(c t ) =
1
1 + Ò
E t
[
(1 + r
j
t +1) u
′(c t +1)
]
for j = 1, . . . , n. (1.59)
Since u′(c t ) is not stochastic at time t , we can write the first-order condi-
tions as
1 = E t
[
(1 + r
j
t +1)
1
1 + Ò
u′(c t +1)
u′(c t )
]
≡ E t
[
(1 + r
j
t +1) Mt +1
]
, (1.60)
where Mt +1 is the “stochastic discount factor” applied at t to consumption
in the following period. Such a factor is the intertemporal marginal rate of
substitution, i.e. the discounted ratio of marginal utilities of consumption in
any two subsequent periods. From equation (1.60) we derive the fundamental
CONSUMPTION 31
result of the CCAPM, using the following property:
E t
[(
1 + r
j
t +1
)
Mt +1
]
= E t (1 + r
j
t +1) E t ( Mt +1) + covt (r
j
t +1, Mt +1). (1.61)
Inserting (1.61) into (1.60) and rearranging terms, we get
E t (1 + r
j
t +1) =
1
E t (Mt +1)
[
1 − covt
(
r
j
t +1, Mt +1
)]
. (1.62)
In the case of the safe asset (with certain return r 0) considered in the previous
sections,21 (1.62) reduces to
1 + r 0t +1 =
1
E t (Mt +1)
. (1.63)
Substituting (1.63) into (1.62), we can write the expected return of each asset
j in excess of the safe asset as
E t (r
j
t +1) − r 0t +1 = − (1 + r 0t +1) covt (r jt +1, Mt +1). (1.64)
Equation (1.64) is the main result from the model with risky financial
assets: in equilibrium, an asset j whose return has a negative covariance with
the stochastic discount factor yields an expected return higher than r 0. In fact,
such an asset is “risky” for the consumer, since it yields lower returns when
the marginal utility of consumption is relatively high (owing to a relatively
low level of consumption). The agent willingly holds the stock of this asset in
equilibrium only if such risk is appropriately compensated by a “premium,”
given by an expected return higher than the risk-free rate r 0.
1.4.1. EMPIRICAL IMPLICATIONS OF THE CCAPM
In order to derive testable implications from the model, we consider a CRRA
utility function,
u(c ) =
c 1−„ − 1
1 − „ ,
where „ > 0 is the coefficient of relative risk aversion. In this case, (1.60)
becomes
1 = E t
[
(1 + r
j
t +1)
1
1 + Ò
(
c t +1
c t
)−„]
for j = 1, . . . , n. (1.65)
²¹ The following results hold also if the safe return rate r 0 is random, as long as it has zero
covariance with the stochastic discount factor M.
32 CONSUMPTION
Moreover, let us assume that the rate of growth of consumption and the rates
of return of the n assets have a lognormal joint conditional distribution.22
Taking logs of (1.65) (with the usual approximation log(1 + Ò) � Ò), we get
0 = −Ò + log E t
[
(1 + r
j
t +1)
(
c t +1
c t
)−„]
,
and by the property mentioned in the preceding footnote we obtain
log E t
[
(1 + r
j
t +1)
(
c t +1
c t
)−„]
= E t (r
j
t +1 − „� log c t +1) +
1
2
� j , (1.66)
where
� j = E
{[
(r
j
t +1 − „� log c t +1) − E t (r jt +1 − „� log c t +1)
]2}
.
Note that the unconditional expectation E [ · ] in the definition of � j may be
used under the hypothesis that the innovations in the joint process for returns
and the consumption growth rate have constant variance (homoskedasticity).
Finally, from (1.66) we can derive the expected return on the j th asset:
E t r
j
t +1 = „E t (� log c t +1) + Ò −
1
2
� j . (1.67)
Several features of equation (1.67) can be noticed. In the first place, (1.67)
can be immediately interpreted as the Euler equation that holds for each
asset j . This interpretation can be seen more clearly if (1.67) is rewritten with
the expected rate of change of consumption on the left-hand side. (See the
solution to exercise 4 for the simpler case of only one safe asset.)
Second, the most important implication of (1.67) is the existence of a
precise relationship between the forecastable component of (the growth rate
of) consumption and asset returns. A high growth rate of consumption is
associated with a high rate of return, so as to enhance saving, for a given
intertemporal discount rate Ò. The degree of risk aversion „ is a measure of this
effect, which is the same for all assets. At the empirical level, (1.67) suggests
the following methodology to test the validity of the model.
1. A forecasting model for � log c t +1 is specified; vector xt contains only
those variables, from the wider information set available to agents at
time t , which are relevant for forecasting consumption growth.
²² In general, when two random variables x and y have a lognormal joint conditional
probability distribution, then log E t (xt +1 yt +1 ) = E t (log(xt +1 yt +1 )) +
1
2 vart (log(xt +1 yt +1 )), where
vart (log(xt +1 yt +1 )) = E t {[log(xt +1 yt +1 ) − E t (log(xt +1 yt +1 ))]2}.
CONSUMPTION 33
2. The following system for � log c t +1 and r
j
t +1 is estimated:
� log c t +1 = δ
′ xt + ut +1,
r
j
t +1 = π
′
j xt + k j + v
j
t +1, j = 1, . . . , n,
where k j is a constant and u and v are random errors uncorrelated with
the elements of x.
3. The following restrictions on the estimated parameters are tested:
π j = „ δ, j = 1, . . . , n.
Finally, the value of � j differs from one asset return to another, because
of differences in the variability of return innovations and differences in the
covariances between such innovations and the innovation of the consumption
change. In fact, by the definition of � j and the lognormality assumption, we
have
� j = E
[
(r
j
t +1 − E t (r jt +1))2
]
+ „2 E
[
(� log c t +1 − E t (� log c t +1))2
]
− 2„E
[
(r
j
t +1 − E t (r jt +1))(� log c t +1 − E t (� log c t +1))
]
≡ Û2j + „2Û2c − 2„Û j c . (1.68)
The expected return of an asset is negatively affected by the variance of the
return itself and is positively affected by its covariance with the rate of change
in consumption. Thus, using (1.67) and (1.68), we obtain, for any asset j ,
E t r
j
t +1 = „ E t ( � log c t +1) + Ò −
„2Û2c
2
−
Û2j
2
+ „Û j c . (1.69)
This equation specializes the general result given in (1.62), and it is interesting
to interpret each of the terms on its right-hand side. Faster expected consump-
tion growth implies that the rate of return should be higher than the rate of
time preference Ò, to an extent that depends on intertemporal substitutability
as indexed by „. “Precaution,” also indexed by „, implies that the rate of
return consistent with optimal consumption choices is lower when consump-
tion is more volatile (a higher Û2c ). The variance of returns has a somewhat
counterintuitive negative effect on the required rate or return: however, this
term appears only because of Jensen’s inequality, owing to the approximation
that replaced log E t (1 + r
j
t +1) with E t r
j
t +1 in equation (1.69). But it is again
interesting and intuitive to see that the return’s covariance with consumption
growth implies a higher required rate of return. In fact, the consumer will be
satisfied by a lower expected return if an asset yields more when consumption
is decreasing and marginal utility is increasing; this asset provides a valuable
hedge against declines in consumption to risk-averse consumers. Hence an
asset with positive covariance between the own return innovations and the
34 CONSUMPTION
innovations in the rate of change of consumption is not attractive, unless (as
must be the case in equilibrium) it offers a high expected return.
When there is also an asset with a safe return r 0, the model yields the
following relationship between r 0 and the stochastic properties of � log c t +1
(see again the solution of exercise 4):
r 0t +1 = „ E t (� log c t +1) + Ò −
„2Û2c
2
. (1.70)
(The return variance and covariance with consumption are both zero in
this case.) Equations (1.69) and (1.70) show the determinants of the returns
on different assets in equilibrium. All returns depend positively on the
intertemporal rate of time preference Ò, since, for a given growth rate of con-
sumption, a higher discount rate of future utility induces agents to borrow in
order to finance current consumption: higher interest rates are then required
to offset this incentive and leave the growth rate of consumption unchanged.
Similarly, given Ò, a higher growth rate of consumption requires higher rates
of return to offset the incentive to shift resources to the present, reducing
the difference between the current and the future consumption levels. (The
strength of this effect is inversely related to the intertemporal elasticity of
substitution, given by 1/„ in the case of a CRRA utility function.) Finally, the
uncertainty about the rate of change of consumption captured by Û2c generates
a precautionary saving motive, inducing the consumer to accumulate financial
assets with a depressing effect on their rates of return. According to (1.69), the
expected rate of return on the j th risky asset is also determined by Û2j (as a
result of the approximation) and by the covariance between rates of return
and consumption changes. The strength of the latter effect is directly related
to the degree of the consumer’s risk aversion.
For any asset j , the “risk premium,” i.e. the difference between the expected
return E t r
j
t +1 and the safe return r
0
t +1, is
E t r
j
t +1 − r 0t +1 = −
Û2j
2
+ „Û j c . (1.71)
An important strand of literature, originated by Mehra and Prescott (1985),
has tested this implication of the model. Many studies have shown that the
observed premium on stocks (amounting to around 6% per year in the USA),
given the observed covariance Û j c , can be explained by (1.71) only by values
of „ too large to yield a plausible description of consumers’ attitudes towards
risk. Moreover, when the observed values of � log c and Û2c are plugged into
(1.70), with plausible values for Ò and „, the resulting safe rate of return is
much higher than the observed rate. Only the (implausible) assumption of a
negative Ò could make equation (1.70) consistent with the data.
These difficulties in the model’s empirical implementation are known as
the equity premium puzzle and the risk-free rate puzzle, respectively, and have
CONSUMPTION 35
motivated various extensions of the basic model. For example, a more gen-
eral specification of the consumer’s preferences may yield a measure of risk
aversion that is independent of the intertemporal elasticity of substitution.
It is therefore possible that consumers at the same time display a strong
aversion toward risk, which is consistent with (1.71), and a high propensity to
intertemporally substitute consumption, which solves the risk-free rate puzzle.
A different way of making the above model more flexible, recently
put forward by Campbell and Cochrane (1999), relaxes the hypothesis of
intertemporal separability of utility. The next section develops a simple ver-
sion of their model.
1.4.2. EXTENSION: THE HABIT FORMATION HYPOTHESIS
As a general hypothesis on preferences, we now assume that what provides
utility to the consumer in each period is not the whole level of consumption
by itself, but only the amount of consumption in excess of a “habit” level.
An individual’s habit level changes over time, depending on the individual’s
own past consumption, or on the history of aggregate consumption.
In each period t , the consumer’s utility function is now
u(c t , xt ) =
(c t − xt )1−„
1 − „ ≡
(zt c t )
1−„ − 1
1 − „ ,
where zt ≡ (c t − xt )/c t is the surplus consumption ratio, and xt (with c t >
xt ) is the level of habit. The evolution of x over time is here determined by
aggregate (per capita) consumption and is not affected by the consumption
choices of the individual consumer. Then, marginal utility is simply
uc (c t , xt ) = (c t − xt )−„ ≡ (zt c t )−„.
The first-order conditions of the problem—see equation (1.65)—now have
the following form:
1 = E t
[
(1 + r
j
t +1)
1
1 + Ò
(
zt +1
zt
)−„ (
c t +1
c t
)−„]
, for j = 1, . . . , n.
(1.72)
The evolution over time of habit and aggregate consumption, denoted by c̄ ,
are modeled as
� log zt +1 = ˆεt +1, (1.73)
� log c̄ t +1 = g + εt +1. (1.74)
36 CONSUMPTION
Aggregate consumption grows at the constant average rate g , with innova-
tions ε ∼ N(0, Û2c ). Such innovations affect the consumption habit,23 with the
parameter ˆ capturing the sensitivity of z to ε. Under the maintained hypoth-
esis of lognormal joint distribution of asset returns and the consumption
growth rate (and using the fact that, with identical individuals, in equilibrium
c = c̄ ), taking logarithms of (1.72), we get
0 = −Ò + E t r jt +1 − „E t (� log zt +1) − „E t (� log c t +1)
+ 1
2
vart (r
j
t +1 − „� log zt +1 − „� log c t +1).
Using the stochastic processes specified in (1.73) and (1.74), we finally obtain
the risk premium on asset j and the risk-free rate of return:
E t r
j
t +1 − r 0t +1 = −
Û2j
2
+ „ (1 + ˆ)Û j c , (1.75)
r 0t +1 = „ g + Ò −
„2(1 + ˆ)2 Û2c
2
. (1.76)
Comparing (1.75) and (1.76) with the analogous equations (1.71) and (1.70),
we note that the magnitude of ˆ has a twofold effect on returns. On the one
hand, a high sensitivity of habit to innovations in c enhances the precaution-
ary motive for saving, determining a stronger incentive to asset accumulation
and consequently a decrease in returns, as already shown by the last term in
(1.70).24 On the other hand, a high ˆ magnifies the effect of the covariance
between risky returns and consumption (Û j c ) on the premium required to
hold risky assets in equilibrium.
Therefore, the introduction of habit formation can (at least partly) solve the
two problems raised by empirical tests of the basic version of the CCAPM: for
given values of other parameters, a sufficiently large value of ˆ can bring the
risk-free rate implied by the model closer to the lower level observed on the
markets, at the same time yielding a relatively high risk premium.
� APPENDIX A1: DYNAMIC PROGRAMMING
This appendix outlines the dynamic programming methods widely used in the macro-
economic literature and in particular in consumption theory. We deal first with the
representative agent’s intertemporal choice under certainty on future income flows;
the extension to the case of uncertainty follows.
²³ The assumed stochastic process for the logarithm of s satisfies the condition c > x (s > 0):
consumption is never below habit.
²⁴ A constant ˆ is assumed here for simplicity. Campbell and Cochrane (1999) assume that ˆ
decreases with s : the variability of consumption has a stronger effect on returns when the level of
consumption is closer to habit.
CONSUMPTION 37
A1.1. Certainty
Let’s go back to the basic model of Section 1.1, assuming that future labor incomes are
known to the consumer and that the safe asset has a constant return. The maximiza-
tion problem then becomes
max
c t +i
[
Ut =
∞∑
i =0
(
1
1 + Ò
)i
u(c t +i )
]
,
subject to the accumulation constraint (for all i ≥ 0),
At +i +1 = (1 + r ) At +i + yt +i − c t +i .
Under certainty, we can write the constraint using the following definition of
total wealth, including the stock of financial assets A and human capital H : Wt =
(1 + r )( At + Ht ). Wt measures the stock of total wealth at the end of period t but
before consumption c t occurs, whereas At and Ht measure financial and human
wealth at the beginning of the period. In terms of total wealth W, the accumulation
constraint for period t becomes
Wt +1 = (1 + r )
[
At +1 +
1
1 + r
∞∑
i =0
(
1
1 + r
)i
yt +1+i
]
= (1 + r )
[
(1 + r ) At + yt − c t +
1
1 + r
∞∑
i =0
(
1
1 + r
)i
yt +1+i
]
= (1 + r ) [(1 + r )( At + Ht ) − c t ]
= (1 + r ) (Wt − c t ).
The evolution over time of total wealth is then (for all i ≥ 0)
Wt +i +1 = (1 + r ) (Wt +i − c t +i ).
Formally, Wt +i is the state variable, giving, in each period t + i , the total amount
of resources available to the consumer; and c t +i is the control variable, whose level,
optimally chosen by the utility-maximizing consumer, affects the amount of resources
available in the next period, t + i + 1. The intertemporal separability of the objective
function and the accumulation constraints allow us to use dynamic programming
methods to solve the above problem, which can be decomposed into a sequence of two-
period optimization problems. To clarify matters, suppose that the consumer’s horizon
ends in period T , and impose a non-negativity constraint on final wealth: WT +1 ≥ 0.
Now consider the optimization problem at the beginning of the final period T , given
the stock of total wealth WT . We maximize u(c T ) with respect to c T , subject to the
constraints WT +1 = (1 + r )(WT − c T ) and WT +1 ≥ 0. The solution yields the optimal
level of consumption in period T as a function of wealth: c T = c T (WT ). Also, the
maximum value of utility in period T (V ) depends, through the optimal consumption
38 CONSUMPTION
choice, on wealth. The resulting value function VT (WT ) summarizes the solution of
the problem for the final period T .
Now consider the consumer’s problem in the previous period, T − 1, for a given
value of WT −1. Formally, the problem is
max
c T −1
(
u(c T −1) +
1
1 + Ò
VT (WT )
)
,
subject to the constraint WT = (1 + r )(WT −1 − c T −1). As in the case above, the prob-
lem’s solution has the following form: c T −1 = c T −1(WT −1), with an associated max-
imized value of utility (now over periods T − 1 and T ) given by VT −1(WT −1). The
same procedure can be applied to earlier periods recursively (backward recursion). In
general, the problem can be written in terms of the Bellman equation:
Vt (Wt ) = max
c t
(
u(c t ) +
1
1 + Ò
Vt +1(Wt +1)
)
, (1.A1)
subject to Wt +1 = (1 + r )(Wt − c t ). Substituting for Wt +1 into the objective function
and differentiating with respect to c t , we get the following first-order condition:
u′(c t ) =
1 + r
1 + Ò
V ′t +1(Wt +1). (1.A2)
Using the Bellman equation at time t and differentiating with respect to Wt , we obtain
V ′t +1(Wt +1):
V ′t (Wt ) = u
′(c t )
∂c t
∂ Wt
+
1 + r
1 + Ò
V ′t +1(Wt +1) −
1 + r
1 + Ò
V ′t +1(Wt +1)
∂c t
∂ Wt
=
(
u′(c t ) −
1 + r
1 + Ò
V ′t +1(Wt +1)
)
∂c t
∂ Wt
+
1 + r
1 + Ò
V ′t +1(Wt +1)
=
1 + r
1 + Ò
V ′t +1(Wt +1),
where we use the fact that the term in square brackets in the second line equals zero by
(1.A2). Finally, using again the first-order condition, we find
V ′t (Wt ) = u
′(c t ). (1.A3)
The effect on utility Vt of an increase in wealth Wt is equal to the marginal utility
from immediately consuming the additional wealth. Along the optimal consumption
path, the agent is indifferent between immediate consumption and saving. (The term
in square brackets is zero.) The additional wealth can then be consumed in any period
with the same effect on utility, measured by u′(c t ) in (1.A2): this is an application of
the envelope theorem.
CONSUMPTION 39
Inserting condition (1.A3) in period t + 1 into (1.A2), we get the Euler equation,
u′(c t ) =
1 + r
1 + Ò
u′(c t +1),
which is the solution of the problem (here under certainty) already discussed in
Section 1.1.
The recursive structure of the problem and the backward solution procedure pro-
vide the optimal consumption path with the property of time consistency. Maximiza-
tion of (1.A1) at time t takes into account Vt +1(Wt +1), which is the optimal solution of
the same problem at time t + 1, obtained considering also Vt +2(Wt +2), and so forth.
As time goes on, then, consumption proceeds optimally along the path originally
chosen at time t . (This time consistency property of the solution is known as Bellman’s
optimality principle.)
Under regularity conditions, the iteration of Bellman equation starting from a
(bounded and continuous) value function VT (·) leads to a limit function V (·), which
is unique and invariant over time. Such a function V = lim j →∞ VT − j solves the
consumer’s problem over an infinite horizon. In this case also, the function that gives
the agent’s consumption c (W) is invariant over time. Operationally, if the problem
involves (1) a quadratic utility function, or (2) a logarithmic utility function and
Cobb–Douglas constraints, it can be solved by first guessing a functional form for
V (·) and then checking that such function satisfies Bellman equation (1.A1).
As an example, consider the case of the CRRA utility function25
u(c ) =
c 1−„
1 − „ .
The Bellman equation is
V (Wt ) = max
c t
(
c
1−„
t
1 − „ +
1
1 + Ò
V (Wt +1)
)
,
subject to the constraint Wt +1 = (1 + r )(Wt − c t ). Let us assume (to be proved later
on) that the value function has the same functional form as utility:
V (Wt ) = K
W
1−„
t
1 − „ , (1.A4)
with K being a positive constant to be determined. Using (1.A4), we can write the
Bellman equation as
K
W
1−„
t
1 − „ = maxc t
(
c
1−„
t
1 − „ +
1
1 + Ò
K
W
1−„
t +1
1 − „
)
. (1.A5)
²⁵ The following solution procedure can be applied also when „ > 1 and the utility function is
unbounded. To guarantee this result an additional condition will be imposed below; see Stokey, Lucas,
and Prescott (1989) for further details.
40 CONSUMPTION
From this equation, using the constraint and differentiating with respect to c t , we get
the first-order condition
c
−„
t =
1 + r
1 + Ò
K [(1 + r )(Wt − c t )]−„ ,
and solving for c t we obtain the consumption function c t (Wt ):
c t =
1
1 + (1 + r )
1−„
„ (1 + Ò)−
1
„ K
1
„
Wt , (1.A6)
where K is still to be determined.
To complete the solution, we combine the Bellman equation (1.A5) with the con-
sumption function (1.A6) and define
B ≡ (1 + r )1−„/„(1 + Ò)−1/„
to simplify notation. We can then write
K
W
1−„
t
1 − „ =
1
1 − „
[
Wt
1 + B K
1
„
]1−„
+
1
1 + Ò
K
1 − „
[
(1 + r )
(
B K
1
„
1 + B K
1
„
)
Wt
]1−„
, (1.A7)
where the terms in square brackets are, respectively, Ct and Wt +1. The value of K that
satisfies (1.A7) is found by equating the coefficient of W
1−„
t on the two sides of the
equation, noting that (1 + r )1−„(1 + Ò)−1 ≡ B „, and solving for K :
K =
(
1
1 − B
)„
. (1.A8)
Under the condition that B < 1, the complete solution of the problem is
V (Wt ) =
(
1
1 − (1 + r )
1 − „
„ (1 + Ò)−
1
„
)„
W
1−„
t
1 − „ ,
c (Wt ) =
[
1 − (1 + r )
1 − „
„ (1 + Ò)−
1
„
]
Wt .
A1.2. Uncertainty
The recursive structure of the problem ensures that, even under uncertainty, the solu-
tion procedure illustrated above is still appropriate. The consumer’s objective function
CONSUMPTION 41
to be maximized now becomes
Ut = E t
∞∑
i =0
(
1
1 + Ò
)i
u(c t +1),
subject to the usual budget constraint (1.2). Now we assume that future labor incomes
yt +i (i > 0) are uncertain at time t , whereas the interest rate r is known and constant.
The state variable at time t is the consumer’s certain amount of resources at the end of
period t : (1 + r ) At + yt . The value function is then Vt ((1 + r ) At + yt ), where subscript
t means that the value of available resources depends on the information set at time t .
Under uncertainty, the Bellman equation becomes
Vt [(1 + r ) At + yt ] = max
c t
{
u(c t ) +
1
1 + Ò
E t Vt +1[(1 + r ) At +1 + yt +1]
}
. (1.A9)
The value of Vt +1(·) is stochastic, since future income are uncertain, and enters (1.A9)
as an expected value.
Differentiating with respect to c t and using the budget constraint, we get the follow-
ing first-order condition:
u′(c t ) =
1 + r
1 + Ò
E t V
′
t +1[(1 + r ) At +1 + yt +1].
As in the certainty case, by applying the envelope theorem and using the condition
obtained above, we have
V ′t (·) =
1 + r
1 + Ò
E t V
′
t +1(·)
= u′(c t ).
Combining the last two equations, we finally get the stochastic Euler equation
u′(c t ) =
1 + r
1 + Ò
E t u
′(c t +1),
already derived in Section 1.1 as the first-order condition of the problem.
REVIEW EXERCISES
Exercise 5 Using the basic version of the rational expectations/permanent income model
(with quadratic utility and r = Ò), assume that labor income is generated by the following
stochastic process:
yt +1 = ȳ + εt +1 − ‰εt , ‰ > 0,
where ȳ is the mean value of income and ε is an innovation with E t εt +1 = 0.
(a) Discuss the impact of an increase of ȳ (�ȳ > 0) on the agent’s permanent income,
consumption and saving.
42 CONSUMPTION
(b) Now suppose that, in period t + 1 only, a positive innovation in income occurs:
εt +1 > 0. In all past periods income has been equal to its mean level: yt−i = ȳ for
i = 0, . . . , ∞. Find the change in consumption between t and t + 1 (�c t +1) as a
function of εt +1, providing the economic intuition for your result.
(c) With reference to question (b), discuss what happens to saving in periods t + 1 and
t + 2.
Exercise 6 Suppose the consumer has the following utility function:
Ut =
∞∑
i =0
(
1
1 + Ò
)i
u(c t +i , St +i ),
where St +i is the stock of durable goods at the beginning of period t + i . There is no
uncertainty. The constraints on the optimal consumption choice are:
St +i +1 = (1 − ‰)St +i + dt +i ,
At +i +1 = (1 + r ) At +i + yt +i − c t +i − pt +i dt +i ,
where ‰ is the physical depreciation rate of durable goods, d is the expenditure on durable
goods, p is the price of durable goods relative to non-durables, and St and At are given.
Note that the durable goods purchased at time t + i start to provide utility to the consumer
only from the following period, as part of the stock at the beginning of period t + i + 1
(St +i +1). Set up the consumer’s utility maximization problem and obtain the first-order
conditions, providing the economic intuition for your result.
Exercise 7 The representative consumer maximizes the following intertemporal utility
function:
Ut = E t
∞∑
i =0
(
1
1 + Ò
)i
u(c t +i , c t +i −1),
where
u(c t +i , c t +i −1) = (c t +i − „c t +i −1) −
b
2
(c t +i − „c t +i −1)2, „ > 0.
In each period t + i , utility depends not only on current consumption, but also on con-
sumption in the preceding period, t + i − 1. All other assumptions made in the chapter
are maintained (in particular Ò = r ).
(a) Give an interpretation of the above utility function in terms of habit formation.
(b) From the first-order condition of the maximization problem, derive the dynamic
equation for c t +1, and check that this formulation of utility violates the property of
orthogonality of �c t +1 with respect to variables dated t .
CONSUMPTION 43
Exercise 8 Suppose that labor income y is generated by the following stochastic process:
yt = Îyt−1 + xt−1 + ε1t ,
xt = ε2t ,
where xt (= ε2t ) does not depend on its own past values ( xt−1, xt−2, . . .) and E (ε1t ·
ε2t ) = 0. xt−1 is the only additional variable (realized at time t − 1) which affects income
in period t besides past income yt−1. Moreover, suppose that the information set used
by agents to calculate their permanent income y Pt is It−1 = {yt−1, xt−1}, whereas the
information set used by the econometrician to estimate the agents’ permanent income
is �t−1 = {yt−1}. Therefore, the additional information in xt−1 is used by agents in
forecasting income but is ignored by the econometrician.
(a) Using equation (1.7) in the text (lagged one period), find the changes in perma-
nent income computed by the agents (�y Pt ) and by the econometrician (�ỹ
P
t ),
considering the different information set used ( It−1 or �t−1).
(b) Compare the variance of �y Pt e �ỹ
P
t , and show that the variability of permanent
income according to agents’ forecast is lower than the variability obtained by the
econometrician with limited information. What does this imply for the interpreta-
tion of the excess smoothness phenomenon?
Exercise 9 Consider the consumption choice of an individual who lives for two periods
only, with consumption c 1 and c 2 and incomes y1 and y2. Suppose that the utility function
in each period is
u(c ) =
{
a c − (b/2)c 2 for c < a/b;
(a 2/2b) for c ≥ a/b.
(Even though the above utility function is quadratic, we rule out the possibility that a
higher consumption level reduces utility.)
(a) Plot marginal utility as a function of consumption.
(b) Suppose that r = Ò = 0, y1 = a/b, and y2 is uncertain:
y2 =
{
a/b + Û, with probability 0.5;
a/b − Û, with probability 0.5.
Write the first-order condition relating c 1 to c 2 (random variable) if the consumer
maximizes expected utility. Find the optimal consumption when Û = 0, and discuss
the effect of a higher Û on c 1.
� FURTHER READING
The consumption theory based on the intertemporal smoothing of optimal consump-
tion paths builds on the work of Friedman (1957) and Modigliani and Brumberg
(1954). A critical assessment of the life-cycle theory of consumption (not explicitly
44 CONSUMPTION
mentioned in this chapter) is provided by Modigliani (1986). Abel (1990, part 1),
Blanchard and Fischer (1989, para. 6.2), Hall (1989), and Romer (2001, ch. 7) present
consumption theory at a technical level similar to ours. Thorough overviews of the
theoretical and empirical literature on consumption can be found in Deaton (1992)
and, more recently, in Browning and Lusardi (1997) and Attanasio (1999), with a
particular focus on the evidence from microeconometric studies. When confronting
theory and microeconomic data, it is of course very important (and far from straight-
forward) to account for heterogeneous objective functions across individuals or house-
holds. In particular, empirical work has found that theoretical implications are typi-
cally not rejected when the marginal utility function is allowed to depend flexibly on
the number of children in the household, on the household head’s age, and on other
observable characteristics. Information may also be heterogeneous: the information
set of individual agents need not be more refined than the econometrician’s (Pischke,
1995), and survey measures of expectations formed on its basis can be used to test
theoretical implications (Jappelli and Pistaferri, 2000).
The seminal paper by Hall (1978) provides the formal framework for much later
work on consumption, including the present chapter. Flavin (1981) tests the empirical
implications of Hall’s model, and finds evidence of excess sensitivity of consumption
to expected income. Campbell (1987) and Campbell and Deaton (1989) derive theor-
etical implication for saving behavior and address the problem of excess smoothness of
consumption to income innovations. Campbell and Deaton (1989) and Flavin (1993)
also provide the joint interpretation of “excess sensitivity” and “excess smoothness”
outlined in Section 1.2.
Empirical tests of the role of liquidity constraints, also with a cross-country
perspective, are provided by Jappelli and Pagano (1989, 1994), Campbell and Mankiw
(1989, 1991) and Attanasio (1995, 1999). Blanchard and Mankiw (1988) stress the
importance of the precautionary saving motive, and Caballero (1990) solves analyt-
ically the optimization problem with precautionary saving assuming an exponential
utility function, as in Section 1.3. Weil (1993) solves the same problem in the case of
constant but unrelated intertemporal elasticity of substitution and relative risk aver-
sion parameters. A precautionary saving motive arises also in the models of Deaton
(1991) and Carroll (1992), where liquidity constraints force consumption to closely
track current income and induce agents to accumulate a limited stock of financial
assets to support consumption in the event of sharp reductions in income (buffer-stock
saving). Carroll (1997, 2001) argues that the empirical evidence on consumers’ behav-
ior can be well explained by incorporating in the life-cycle model both a precautionary
saving motive and a moderate degree of impatience. Sizeable responses of consump-
tion to predictable income changes are also generated by models of dynamic inconsis-
tent preferences arising from hyperbolic discounting of future utility; Angeletos et al.
(2001) and Frederick, Loewenstein, and O’Donoghue (2002) provide surveys of this
strand of literature.
The general setup of the CCAPM used in Section 1.4 is analyzed in detail by
Campbell, Lo, and MacKinley (1997, ch. 8) and Cochrane (2001). The model’s empir-
ical implications with a CRRA utility function and a lognormal distribution of returns
and consumption are derived by Hansen and Singleton (1983) and extended by,
among others, Campbell (1996). Campbell, Lo, and MacKinley (1997) also provide
CONSUMPTION 45
a complete survey of the empirical literature. Campbell (1999) has documented the
international relevance of the equity premium and the risk-free rate puzzles, origi-
nally formulated by Mehra and Prescott (1985) and Weil (1989). Aiyagari (1993),
Kocherlakota (1996), and Cochrane (2001, ch. 21) survey the theoretical and empirical
literature on this topic. Costantinides, Donaldson, and Mehra (2002) provide an
explanation of those puzzles by combining a life-cycle perspective and borrowing
constraints. Campbell and Cochrane (1999) develop the CCAPM with habit formation
behavior outlined in Section 1.4 and test it on US data. An exhaustive survey of the
theory and the empirical evidence on consumption, asset returns, and macroeconomic
fluctuations is found in Campbell (1999).
Dynamic programming methods with applications to economics can be found in
Dixit (1990), Sargent (1987, ch. 1) and Stokey, Lucas, and Prescott (1989), at an
increasing level of difficulty and analytical rigor.
� REFERENCES
Abel, A. (1990) “Consumption and Investment,” in B. Friedman and F. Hahn (ed.), Handbook of
Monetary Economics, Amsterdam: North-Holland.
Aiyagari, S. R. (1993) “Explaining Financial Market Facts: the Importance of Incomplete Markets
and Transaction Costs,” Federal Reserve Bank of Minneapolis Quarterly Review, 17, 17–31.
(1994) “Uninsured Idiosyncratic Risk and Aggregate Saving,” Quarterly Journal of Eco-
nomics, 109, 659–684.
Angeletos, G.-M., D. Laibson, A. Repetto, J. Tobacman, and S. Winberg (2001) “The Hyperbolic
Consumption Model: Calibration, Simulation and Empirical Evaluation,” Journal of Economic
Perspectives, 15(3), 47–68.
Attanasio, O. P. (1995) “The Intertemporal Allocation of Consumption: Theory and Evidence,”
Carnegie–Rochester Conference Series on Public Policy, 42, 39–89.
(1999) “Consumption,” in J. B. Taylor and M. Woodford (ed.), Handbook of Macroeco-
nomics, vol. 1B, Amsterdam: North-Holland, 741–812.
Blanchard, O. J. and S. Fischer (1989) Lectures on Macroeconomics, Cambridge, Mass.: MIT Press.
and N. G. Mankiw (1988) “Consumption: Beyond Certainty Equivalence,” American Eco-
nomic Review (Papers and Proceedings), 78, 173–177.
Browning, M. and A. Lusardi (1997) “Household Saving: Micro Theories and Micro Facts,”
Journal of Economic Literature, 34, 1797–1855.
Caballero, R. J. (1990) “Consumption Puzzles and Precautionary Savings,” Journal of Monetary
Economics, 25, 113–136.
Campbell, J. Y. (1987) “Does Saving Anticipate Labour Income? An Alternative Test of the
Permanent Income Hypothesis,” Econometrica, 55, 1249–1273.
(1996) “Understanding Risk and Return,” Journal of Political Economy, 104,
298–345.
(1999) “Asset Prices, Consumption and the Business Cycle,” in J. B. Taylor and M. Wood-
ford (ed.), Handbook of Macroeconomics, vol. 1C, Amsterdam: North-Holland.
and J. H. Cochrane (1999) “By Force of Habit: A Consumption-Based Explanation of
Aggregate Stock Market Behavior,” Journal of Political Economy, 2, 205–251.
46 CONSUMPTION
and A. Deaton (1989) “Why is Consumption So Smooth?” Review of Economic Studies, 56,
357–374.
and N. G. Mankiw (1989) “Consumption, Income and Interest Rates: Reinterpreting the
Time-Series Evidence,” NBER Macroeconomics Annual, 4, 185–216.
(1991) “The Response of Consumption to Income: a Cross-Country Investigation,”
European Economic Review, 35, 715–721.
A. W. Lo, and A. C. MacKinley (1997) The Econometrics of Financial Markets, Princeton:
Princeton University Press.
Carroll, C. D. (1992) “The Buffer-Stock Theory of Saving: Some Macroeconomic Evidence,”
Brookings Papers on Economic Activity, 2, 61–156.
(1997) “Buffer-Stock Saving and the Life Cycle/Permanent Income Hypothesis,” Quarterly
Journal of Economics , 102, 1–55.
(2001) “A Theory of the Consumption Function, With and Without Liquidity Constraints,”
Journal of Economic Perspectives, 15 (3), 23–45.
Cochrane, J. H. (2001) Asset Pricing, Princeton: Princeton University Press.
Costantinides G. M., J. B. Donaldson, and R. Mehra (2002) “Junior Can’t Borrow: A New
Perspective on the Equity Premium Puzzle,” Quarterly Journal of Economics, 117, 269–298.
Deaton, A. (1991) “Saving and Liquidity Constraints,” Econometrica, 59, 1221–1248.
(1992) Understanding Consumption, Oxford: Oxford University Press.
Dixit, A. K. (1990) Optimization in Economic Theory, 2nd edn, Oxford: Oxford University Press.
Flavin, M. (1981) “The Adjustment of Consumption to Changing Expectations about Future
Income,” Journal of Political Economy , 89, 974–1009.
(1993) “The Excess Smoothness of Consumption: Identification and Interpretation,”
Review of Economic Studies, 60, 651–666.
Frederick S., G. Loewenstein, and T. O’Donoghue (2002) “Time Discounting and Time Prefer-
ence: A Critical Review,” Journal of Economic Literature, 40, 351–401.
Friedman, M. (1957) A Theory of the Consumption Function, Princeton: Princeton University
Press.
Hall, R. E. (1978) “Stochastic Implications of the Permanent Income Hypothesis: Theory and
Evidence,” Journal of Political Economy, 96, 971–987.
(1989) “Consumption,” in R. Barro (ed.), Handbook of Modern Business Cycle Theory,
Oxford: Basil Blackwell.
Hansen, L. P. and K. J. Singleton (1983) “Stochastic Consumption, Risk Aversion,
and the Temporal Behavior of Asset Returns,” Journal of Political Economy, 91,
249–265.
Jappelli, T. and M. Pagano (1989) “Consumption and Capital Market Imperfections: An Inter-
national Comparison,” American Economic Review, 79, 1099–1105.
(1994) “Saving, Growth and Liquidity Constraints,” Quarterly Journal of Economics, 108,
83–109.
and L. Pistaferri (2000), “Using Subjective Income Expectations to Test for Excess Sensitiv-
ity of Consumption to Predicted Income Growth,” European Economic Review 44, 337–358.
Kocherlakota, N. R. (1996) “The Equity Premium: It’s Still a Puzzle,” Journal of Economic
Literature, 34(1), 42–71.
CONSUMPTION 47
Mehra, R. and E. C. Prescott (1985) “The Equity Premium: A Puzzle,” Journal of Monetary
Economics , 15(2), 145–161.
Modigliani, F. (1986) “Life Cycle, Individual Thrift, and the Wealth of Nations,” American
Economic Review, 76, 297–313.
and R. Brumberg (1954) “Utility Analysis and the Consumption Function: An Inter-
pretation of Cross-Section Data,” in K. K. Kurihara (ed.), Post-Keynesian Economics, New
Brunswick, NJ: Rutgers University Press.
Pischke, J.-S. (1995) “Individual Income, Incomplete Information, and Aggregate Consump-
tion,” Econometrica, 63, 805–840.
Romer, D. (2001) Advanced Macroeconomics, 2nd edn, New York: McGraw-Hill.
Sargent, T. J. (1987) Dynamic Macroeconomic Theory, Cambridge, Mass.: Harvard University
Press.
Stokey, N., R. J. Lucas, and E. C. Prescott (1989) Recursive Methods in Economic Dynamics,
Cambridge, Mass.: Harvard University Press.
Weil, P. (1989) “The Equity Premium Puzzle and the Risk-Free Rate Puzzle,” Journal of Monetary
Economics, 24, 401–421.
(1993) “Precautionary Savings and the Permanent Income Hypothesis,” Review of Economic
Studies, 60, 367–383.
2 Dynamic Models of
Investment
Macroeconomic IS–LM models assign a crucial role to business investment
flows in linking the goods market and the money market. As in the case of con-
sumption, however, elementary textbooks do not explicitly study investment
behavior in terms of a formal dynamic optimization problem. Rather, they
offer qualitatively sensible interpretations of investment behavior at a point
in time. In this chapter we analyze investment decisions from an explicitly
dynamic perspective. We simply aim at introducing dynamic continuous-time
optimization techniques, which will also be used in the following chapters,
and at offering a formal, hence more precise, interpretation of qualitative
approaches to the behavior of private investment in macroeconomic models
encountered in introductory textbooks. Other aspects of the subject matter are
too broad and complex for exhaustive treatment here: empirical applications
of the theories we analyze and the role of financial imperfections are men-
tioned briefly at the end of the chapter, referring readers to existing surveys of
the subject.
As in Chapter 1’s study of consumption, in applying dynamic optimiza-
tion methods to macroeconomic investment phenomena, one can view the
dynamics of aggregate variables as the solution of a “representative agent”
problem. In this chapter we study the dynamic optimization problem of a firm
that aims at maximizing present discounted cash flows. We focus on technical
insights rather than on empirical implications, and the problem’s setup may at
first appear quite abstract. When characterizing its solution, however, we will
emphasize analogies between the optimality conditions of the formal problem
and simple qualitative approaches familiar from undergraduate textbooks.
This will make it possible to apply economic intuition to mathematical for-
mulas that would otherwise appear abstruse, and to verify the robustness of
qualitative insights by deriving them from precise formal assumptions.
Section 2.1 introduces the notion of “convex” adjustment costs, i.e. techno-
logical features that penalize fast investment. The next few sections illustrate
the character of investment decisions from a partial equilibrium perspective:
we take as given the firm’s demand and production functions, the dynamics
of the price of capital and of other factors, and the discount rate applied to
future cash flows. Optimal investment decisions by firms are forward looking,
and should be based on expectations of future events. Relevant techniques and
mathematical results introduced in this context are explained in detail in the
INVESTMENT 49
Appendix to this chapter. The technical treatment of firm-level investment
decisions sets the stage for a discussion of an explicitly dynamic version of
the familiar IS–LM model. The final portion of the chapter returns to the
firm-level perspective and studies specifications where adjustment costs do
not discourage fast investment, but do impose irreversibility constraints, and
Section 2.8 briefly introduces technical tools for the analysis of this type of
problem in the presence of uncertainty.
2.1. Convex Adjustment Costs
In what follows, F (t ) denotes the difference between a firm’s cash receipts
and outlays during period t . We suppose that such cash flows depend on the
capital stock K (t ) available at the beginning of the period, on the flow I (t ) of
investment during the period, and on the amount N(t ) employed during the
period of another factor of production, dubbed “labor”:
F (t ) = R(t, K (t ), N(t )) − Pk (t )G ( I (t ), K (t )) − w(t )N(t ). (2.1)
The R(·) function represents the flow of revenues obtained from sales of the
firm’s production flow. This depends on the amounts employed of the two
factors of production, K and N, and also on the technological efficiency of
the production function and/or the strength of demand for the firm’s product.
In (2.1), possible variations over time of such exogenous features of the firm’s
technological and market environment are taken into account by including
the time index t alongside K and N as arguments of the revenue function. We
assume that revenue flows are increasing in both factors, i.e.
∂ R(·)
∂ K
> 0,
∂ R(·)
∂ N
> 0, (2.2)
as is natural if the marginal productivity of all factors and the market price of
the product are positive. To prevent the optimal size of the firm from diverging
to infinity, it is necessary to assume that the revenue function R(·) is concave
in K and N. If the price of its production is taken as given by the firm, this is
ensured by non-increasing returns to scale in production. If instead physical
returns to scale are increasing, the revenue function R(·) can still be concave
if the firm has market power and its demand function’s slope is sufficiently
negative.
The two negative terms in the cash-flow expression (2.1) represent costs
pertaining to investment, I , and employment of N. As to the latter, in this
chapter we suppose that its level is directly controlled by the firm at each point
in time and that utilization of a stock of labor N entails a flow cost w per
unit time, just as in the static models studied in introductory microeconomic
courses. As to investment costs, a formal treatment of the problem needs to
50 INVESTMENT
be precise as to the moment when the capital stock used in production during
each period is measured. If we adopt the convention that the relevant stock
is measured at the beginning of the period, it is simply impossible for the
firm to vary K (t ) at time t . When the production flow is realized, the firm
cannot control the capital stock, but can only control the amount of positive
or negative investment: any resulting increase or decrease of installed capital
begins to affect production and revenues only in the following period. On this
basis, the dynamic accumulation constraint reads
K (t + �t ) = K (t ) + I (t )�t − ‰K (t )�t, (2.3)
where ‰ denotes the depreciation rate of capital, and �t is the length of the
time period over which we measure cash flows and the investment rate per
unit time I (t ).
By assumption, the firm cannot affect current cash flows by varying the
available capital stock. The amount of gross investment I (t ) during period �t
does, however, affect the cash flow: in (2.1) investment costs are represented
by a price Pk (t ) times a function G (·) which, as in Figure 2.1, we shall assume
increasing and convex in I (t ):
∂ G (·)
∂ I
> 0,
∂ 2 G (·)
∂ I 2
> 0. (2.4)
The function G (·) is multiplied by a price in the definition (2.1) of cash
flows. Hence it is defined in physical units, just like its arguments I and
K . For example, it might measure the physical length of a production line,
or the number of personal computers available in an office. The investment
Figure 2.1. Unit investment costs
INVESTMENT 51
rate I (t ) is linearly related to the change in capital stock in equation (2.3)
but, since G (·) is not linear, the cost of each unit of capital installed is not
constant. For instance, we might imagine that a greenhouse needs to purchase
G ( I, K ) flower pots in order to increase the available stock by I units, and that
the quantities purchased and effectively available for future production are
different because a certain fraction (variable as a function of I and K ) of pots
purchased break and become useless. In the context of this example it is also
easy to imagine that a fraction of pots in use also break during each period,
and that the parameter ‰ represents this phenomenon formally in (2.3).
While such examples can help reduce the rather abstract character of the
formal model we are considering, its assumptions may be more easily justified
in terms of their implications than in those of their literal realism. For pur-
poses of modeling investment dynamics, the crucial feature of the G ( I, K )
function is the strict convexity assumed in (2.4). This implies that the average
unit cost (measured, after normalization by Pk , by the slope of lines such as
OA and OB in Figure 2.1) of investment flows is increasing in the total flow
invested during a period. Thus, a given total amount of investment is less
costly when spread out over multiple periods than when it is concentrated
in a single period. For this reason, the optimal investment policy implied by
convex adjustment costs is to some extent gradual.
The functional form of investment costs plays an important role not only
when the firm intends to increase its capital stock, but also when it wishes
to keep it constant, or decrease it. It is quite natural to assume that the firm
should not bear costs when gross investment is zero (and capital may evolve
over time only as a consequence of exogenous depreciation at rate ‰). Hence,
as in Figure 2.1,
G (0, ·) = 0,
and the positive first derivative assumed in (2.4) implies that G ( I, ·) < 0 for
I < 0: the cost function is negative (and makes positive contributions to the
firm’s cash flow) when gross investment is negative, and the firm is selling used
equipment or structures.
In the figure, the G (·) function lies above a 45◦ line through the origin, and
it is tangent to it at zero, where its slope is unitary:
∂ G (0, ·)/∂ I = 1.
This property makes it possible to interpret Pk as “the” unit price of capital
goods, a price that would apply to all units installed if the convexity of G ( I, ·)
did not deter larger than infinitesimal investments of either sign.
When negative investment rates are considered, convexity of adjustment
costs similarly implies that the unit amount recouped from each unit scrapped
(as measured by the slope of lines such as OB) is smaller when I is more
negative, and this makes speedy reduction of the capital stock unattractive.
52 INVESTMENT
Comparing the slope of lines such as OA and OB, it is immediately apparent
that alternating positive and negative investments is costly: even though there
are no net effects on the final capital stock, the firm cannot fully recoup
the original cost of positive investment from subsequent negative invest-
ment. First increasing, then decreasing the capital stock (or vice versa) entails
adjustment costs.
In summary, the form of the function displayed in Figure 2.1 implies that
investment decisions should be based not only on the contribution of capital
to profits at a given moment in time, but also on their future outlook. If
the relevant exogenous conditions indexed by t in R(·) and the dynamics
of the other, equally exogenous, variables Pk (t ), w(t ), r (t ) suggest that the
firm should vary its capital stock, the adjustment should be gradual, as will
be set out below. Moreover, if large positive and negative fluctuations of
exogenous variables are expected, the firm should not vary its investment rate
sharply, because the cost and revenues generated by upward and downward
capital stock fluctuations do not offset each other exactly. Convexity of the
adjustment cost function implies that the total cost of any given capital stock
variation is smaller when that variation is diluted through time, hence the firm
should behave in a forward looking fashion when choosing the dynamics of its
investment rate and should try to keep the latter stable by anticipating the
dynamics of exogenous variables.
2.2. Continuous-Time Optimization
Neither the realism nor the implications of convex adjustment costs depend
on the length �t of the period over which revenue, cost, and investment flows
are measured. The discussion above, however, was based on the idea that
current investment cannot increase the capital stock available for use within
each such period, implying that K (t ) could be taken as given when evaluating
opportunities for further investment. This accounting convention, of course,
is more accurate when the length of the period is shorter.
Accordingly, we consider the limit case where �t → 0, and suppose that the
firm makes optimizing choices at every instant in continuous time. Optimiza-
tion in continuous time yields analytically cleaner and often more intuitive
results than qualitatively similar results from discrete time specifications, such
as those encountered in this book when discussing consumption (in Chap-
ter 1) and labor demand under costly adjustment (in Chapter 3). We also
assume, for now, that the dynamics of exogenous variables is deterministic.
(Only at the end of the chapter do we introduce uncertainty in a continuous-
time investment problem.) This also makes the problem different from that
discussed in Chapter 1: the characterization offered by continuous-time
INVESTMENT 53
models without uncertainty is less easily applicable to empirical discrete-
time observations, but is also quite insightful, and each of the modeling
approaches we outline could fruitfully be applied to the various substantive
problems considered. The economic intuition afforded by the next chapter’s
models of labor demand under uncertainty would be equally valid if applied
to investment in plant and equipment investment rather than in workers,
and we shall encounter consumption and investment problems in continuous
time (and in the absence of uncertainty) when discussing growth models in
Chapter 4.
In continuous time, the maximum present value (discounted at rate r ) of
cash flows generated by a production and investment program can be written
as an integral:
V (0) ≡ max
∫ ∞
0
F (t )e −
∫ t
0 r (s )d s d t,
subject to K̇ (t ) = I (t ) − ‰K (t ), for all t . (2.5)
The Appendix to this chapter defines the integral and offers an introduction to
Hamiltonian dynamic optimization. This method suggests a simple recipe for
solution of this type of problem (which will also be encountered in Chapter 4).
The Hamiltonian of optimization problem (2.5) is
H (t ) = e −
∫ t
0
r (s )d s ( F (t ) + Î(t ) ( I (t ) − ‰K (t ))) ,
where Î(t ) denotes the shadow price of capital at time t in current value terms
(that is, in terms of resources payable at the same time t ).
The first-order conditions of the dynamic optimization problem we are
studying are
∂ H
∂ N
= 0 ⇒ ∂ F (·)
∂ N
= 0 ⇒ ∂ R(·)
∂ N
= w(t ),
∂ H
∂ I
= 0 ⇒ ∂ F (·)
∂ I
= −Î(t ) ⇒ Pk
∂ G
∂ I
= Î(t ), (2.6)
− ∂ H
∂ K
=
d
d t
(
Î(t )e −
∫ t
0
r (s )d s
)
⇒ Î̇ − r Î = −
(
∂ F (·)
∂ K
− ‰Î
)
.
The limit “transversality” condition must also be satisfied, in the form
lim
t→∞
e −
∫ t
0 r (s )d s Î(t )K (t ) = 0. (2.7)
The Appendix shows that these optimality conditions are formally analogous
to those of more familiar static constrained optimization problems. Here, we
discuss their economic interpretation. The condition
∂ R(·)
∂ N
= w(t ) (2.8)
54 INVESTMENT
simply requires that, in flow terms, the marginal revenue yielded by employ-
ment of the flexible factor N be equal to its cost w, at every instant t . This is
quite intuitive, since the level of N may be freely determined by the firm. The
condition
Pk
∂ G (·)
∂ I
= Î(t ) (2.9)
calls for equality, along an optimal investment path, of the marginal value of
capital Î(t ) and the marginal cost of the investment flows that determine an
increase (or decrease) of the capital stock at every instant. That marginal cost,
in turn, is − Pk ∂ G (·)/∂ I in the problem we are considering. Such considera-
tions, holding at every given time t , do not suffice to represent the dynamic
aspects of the firm’s problem. These aspects are in fact crucial in the third
condition listed in (2.6), which may be rewritten in the form
r Î =
∂ F (·)
∂ K
− ‰Î + Î̇
and interpreted in terms of financial asset valuation. For simplicity, let ‰ = 0.
From the viewpoint of time t , the marginal unit of capital adds ∂ F /∂ K to
current cash flows, and this is a “dividend” paid by that unit to its owner
at that time (the firm). The marginal unit of capital, however, also offers
capital gains, in the amount Î̇. If the firm attaches a (shadow) value Î to the
unit of capital, then it must be the case that its total return in terms of both
dividends and capital gains is financially fair. Hence it should coincide with
the return r Î that the firm could obtain from Î units of purchasing power
in a financial market where, as in (2.5), cash flows are discounted at rate r .
If ‰ > 0, similar considerations hold true but should take into account that
a fraction of the marginal unit of capital is lost during every instant of time.
Hence its value, amounting to ‰Î per unit time, needs to be subtracted from
current “dividends.”
Such considerations also offer an intuitive economic interpretation of the
transversality condition (2.7), which would be violated if the “financial” value
Î(t ) grew at a rate greater than or equal to the equilibrium rate of return r (s )
while the capital stock, and the marginal dividend afforded by the investment
policy, tend to a finite limit. In such a case, Î(t ) would be influenced by a
speculative “bubble”: the only reason to hold the asset corresponding to the
marginal value of capital is the expectation of everlasting further capital gains,
not linked to profits actually earned from its use in production. Imposing
condition (2.7), we acknowledge that such expectations have no economic
basis, and we deny that purely speculative behavior may be optimal for the
firm.
INVESTMENT 55
2.2.1. CHARACTERIZING OPTIMAL INVESTMENT
Consider the variable
q (t ) ≡ Î(t )
Pk (t )
,
the ratio of the marginal capital unit’s shadow value to parameter Pk , which
represents the market price of capital (that is, the unit of cost of investment in
the neighborhood of the zero gross investment point, where adjustment costs
are negligible).
This variable, known as marginal q , has a crucial role in the determination
of optimal investment flows. In fact, the first condition in (2.6) implies that
∂ G ( I (t ), K (t ))
∂ I (t )
= q (t ), (2.10)
and if (2.4) holds then ∂ G (·)/∂ I is a strictly increasing function of I . Such a
function has an inverse: let È(·) denote the inverse of ∂ G (·)/∂ I as a function
of I . Both ∂ G (·)/∂ I and its inverse may depend on the capital stock K . The
È(q , K ) function implicitly defined by
∂ G (È(q , K ), K )
∂È
≡ q
returns investment flows in such a way as to equate the marginal investment
cost ∂ G (·)/∂ I to a given q , for a given K . Condition (2.10) may then be
equivalently written
I (t ) = È(q (t ), K (t )). (2.11)
Since K (t ) is given at time t , (2.11) determines the investment rate as a
function of q (t ).
Since, by assumption, the investment cost function G ( I, ·) has unitary slope
at I = 0, zero gross investment is optimal when q = 1; positive investment
is optimal when q > 1; and negative investment is optimal when q < 1.
Intuitively, when q > 1 (hence Î > Pk ) capital is worth more inside the firm
than in the economy at large; hence it is a good idea to increase the capital
stock installed in the firm. Symmetrically, q < 1 suggests that the capital stock
should be reduced. In both cases, the speed at which capital is transferred
towards the firm or away from it depends not only on the difference between
q and unity, but also on the degree of convexity of the G (·) function, that is,
on the relevance of capital adjustment costs. If the slope of the function in
Figure 2.1 increases quickly with I , even q values very different from unity are
associated with modest investment flows.
Exercise 10 Show that, if capital has positive value, then investment would
always be positive if the total investment cost were quadratic, for example if
56 INVESTMENT
G (K , I ) = x · I 2 where Pk = 1 and x ≥ 0 may depend on K . Discuss the real-
ism of more general specifications where G (K , I ) = x · I ‚ for ‚ > 0.
Determining the optimal investment rate as a function of q does not yield a
complete solution to the dynamic optimization problem. In fact, in order to
compute q one needs to know the shadow value Î(t ) of capital, which—unlike
the market price of capital, Pk (t )—is part of the problem’s solution, rather
than part of its exogenous parameterization. However, it is possible to char-
acterize graphically and qualitatively the complete solution of the problem on
the basis of the Hamiltonian conditions.
Since we expressed the shadow value of capital in current terms, calendar
time t appears in the optimality conditions only as an argument of the func-
tions, such as Î(·) and K (·), which determine optimal choices of I and N.
Noting that
q̇ (t ) =
d
d t
Î(t )
Pk (t )
=
Î̇(t )
Pk (t )
− Î(t )
Pk (t )
Ṗk (t )
Pk (t )
,
let us define Ṗk (t )/ Pk (t ) ≡ k (the rate of inflation in terms of capital), and
recall that Î̇ = (r + ‰)Î − ∂ F (·)/∂ K by the last optimality condition in (2.6).
Thus, we may write the rate of change of q as a function of q itself, of K , and
of parameters:
q̇ = (r + ‰ − k )q −
1
Pk
∂ F (·)
∂ K
. (2.12)
In this expression the calendar time t is omitted for simplicity, but all
variables—particularly those, not explicitly listed, that determine the size of
cash flows F (·) and their derivative with respect to K —are measured at a
given moment in time.
Combining the constraint K̇ (t ) = I (t ) − ‰K (t ) with condition (2.11), we
obtain a relationship between the rate of change of K , K itself, and the level
of q :
K̇ = È(q , K ) − ‰K . (2.13)
Now, if we suppose that all exogenous variables are constant (including the
price of capital Pk , to imply that k = 0), and recall that the investment rate
and N depend on q and K through the optimality conditions in (2.6), the
time-varying elements of the system formed by (2.12) and (2.13) are just q (t )
and K (t )—that is, precisely those for whose dynamics we have derived explicit
expressions.
Thus, the dynamics of the two variables may be studied in the phase diagram
of Figure 2.2. On the axes of the diagram we measure the dynamic variables
of interest. On the horizontal axis of this and subsequent diagrams, one reads
the level of K ; on the vertical axis, a level of q . If only K and q —and variables
INVESTMENT 57
Figure 2.2. Dynamics of q (supposing that ∂ F (·)/∂ K is decreasing in K )
uniquely determined by them, such as the investment rate I = È(q , K )—are
time-varying, then each point in (K , q )-space is uniquely associated with
their dynamic changes. Picking any point in the diagram, and knowing the
functional form of the expressions in (2.12) and (2.13), one could in prin-
ciple compute both q̇ and K̇ . Graphically, the movement in time of the two
variables may be represented by placing in the diagram appropriately oriented
arrows.
In practice, the characterization exercise needs first to identify points where
one of the variables remains constant in time. In Figure 2.2, the downward-
sloping line represents combinations of K and q such that the expression on
the right-hand side of (2.12) is zero. This is the case when
q = (r + ‰)−1
1
Pk
∂ F (·)
∂ K
.
Given that (r + ‰) Pk > 0, the locus of points along which q̇ = 0 has a negative
slope if a higher capital stock is associated with a smaller “dividend” ∂ F (·)/∂ K
from the marginal capital unit in (2.12).
This is not, in general, guaranteed by the condition ∂ 2 F (·)/∂ K 2 < 0. When
drawing the phase diagram, in fact, the firm’s cash flow,
F (·) = R(t, K , N) − Pk (t )G ( I, K ) − w(t )N,
should be evaluated under the assumptions that the flexible factor N is always
adjusted so as to satisfy the condition ∂ R(K , N)/∂ N = w, and that invest-
ment satisfies the condition ∂ G ( I, K )/∂ I = Î. Thus, as K varies, both the
optimal employment of N, which we may write as N∗ = n(K , w), and the
optimal investment flow È(K , Î) vary as well. Exercise 12 highlights certain
implications of this fact for a properly drawn phase diagram. It will be conve-
nient for now to suppose that the q̇ = 0 locus slopes downwards, as is the case
58 INVESTMENT
(for example) if the adjustment cost function G (·) does not depend on K and
revenues R(·) are an increasing and strictly concave function of K only.
Once we have identified the locus of points where q̇ = 0, we need to deter-
mine the sign of q̇ for points in the diagram that are not on that locus. For each
level of K , one and only one level of q implies that q̇ equals zero. If for example
we consider point A along the horizontal axis of the figure, q is steady only if
its level is at the height of point B. If we move up to a higher value of q for the
same level of K , such as that corresponding to point C in the figure, equation
(2.12)—where q is multiplied by r + ‰ > 0—implies that q̇ is not equal to
zero, as in point B, but is larger than zero. In the figure this is represented
by an upward-pointing arrow: if one imagines placing a pen on the diagram
at point C, and following the dynamic instructions given by (2.12), the pen
should slide towards even higher values of q . The same reasoning holds for
all points above the q̇ = 0 locus, for example point D, whence an upward-
sloping arrow also starts. The speed of the dynamic movement represented is
larger for larger values of r + ‰, and for greater distances from the stationary
locus: the latter fact could be represented by drawing larger arrows for points
farther from the q̇ = 0 locus. To convince oneself that q̇ > 0 in D, one may
also consider point E on q̇ = 0 and, holding q constant, note that, if (2.12)
identifies a downward-sloping locus, then a higher level of K must result in q̇
larger than zero. Symmetrically, we have q̇ < 0 at every point below and to the
left of the q̇ = 0 locus, such as those marked with downward-sloping arrows
in the figure.
Applying the same reasoning to equation (2.13) enables us to draw
Figure 2.3. To determine the slope of the locus along which K̇ = 0, note that
the right-hand side of (2.13) is certainly increasing in q since a higher q is
associated with a larger investment flow. The effect on K̇ of a higher K is
ambiguous: as long as ‰ > 0 it is certainly negative through the second term,
Figure 2.3. Dynamics of K (supposing that ∂È(·)/ ∂ K − ‰ < 0)
INVESTMENT 59
Figure 2.4. Phase diagram for the q and K system
but it may be positive through the first term. If a firm with a larger installed
capital stock bears smaller unit costs for installation of a given additional
investment flow I , a larger optimal investment flow is associated with a given
q , and a larger K has a negative effect on G ( · ) and a positive effect on È( · ).
The relevance of this channel is studied in exercise 12, but the figure is drawn
supposing that the negative effect dominates the positive one—for example,
because the adjustment cost function G ( · ) does not depend on K , and ‰ > 0
suffices to imply a positive slope for the K̇ = 0 locus. It is then easy to show
that K̇ > 0, as indicated by arrows pointing to the right, at all points above
that locus; a value of q higher than that which would maintain a steady capital
stock, in fact, can only be associated with a larger investment flow and an
increasing K . Symmetrically, arrows point to the left at all points below the
K̇ = 0 locus.
Figure 2.4, which simply superimposes the two preceding figures, considers
the joint dynamic behavior of q and K . Since arrows point up and to the
right in the region above both stationary loci, from that region the system can
only diverge (at the increasing speed implied by values of q and K that are
increasingly far from those consistent with their stability) towards infinitely
large values of q and/or K . Such dynamic behavior is quite peculiar from the
economic point of view, and in fact it can be shown to violate the transversality
condition (2.7) for plausible forms of the F (·) function. Also, starting from
points in the lower quadrant of the diagram, the dynamics of the system,
driven by arrows pointing left and downwards, can only lead to economically
nonsensical values of q and/or K .
The system’s configuration is much more sensible at the point where the
K̇ = 0 and q̇ = 0 loci cross, the unique steady state of the dynamic system
we are considering. Thus, we can focus attention on dynamic paths starting
from the left and right regions of Figure 2.4, where arrows pointing towards
the steady state allow the dynamic system to evolve in its general direction.
60 INVESTMENT
Figure 2.5. Saddlepath dynamics
As shown in Figure 2.5, however, it is quite possible for trajectories start-
ing in those regions to cross the K̇ = 0 locus (vertically) or the q̇ = 0 locus
(horizontally) and then, instead of reaching the steady state, proceed in the
regions where arrows point away from it—implying that (2.7) is violated, or
that capital eventually becomes negative.
In the figure, however, a pair of dynamic paths is drawn that start from
points to the left and right of the steady state and continue towards it (at
decreasing speed) without ever meeting the system’s stationarity loci. All
points along such paths are compatible with convergence towards the steady
state, and together form the saddlepath of the dynamic system. For any given
K , such as that labeled K (0) in the figure, only one level of q (or, equiva-
lently, only one rate of investment) puts the system on a trajectory converging
towards the steady state. If q were higher, and the I (0) investment rate larger,
the firm should continue to invest at a rate faster than that leading to the
steady state in order to keep on satisfying the last optimality condition in
(2.6), and the (2.12) dynamic equation deriving from it. Sooner or later, this
would lead the firm to cross the q̇ = 0 line and, along a path of ever increasing
investment, to violate the transversality condition. Symmetrically, if the firm
invested less than what is implied by the saddlepath value of q , it would find
itself investing less and less over time, and would diverge towards excessively
small capital stocks rather than converge to the steady state.
2.3. Steady-State and Adjustment Paths
For a given (and supposed constant) value of exogenous variables, the firm’s
investment rate should be that implied by the q level corresponding on the
saddlepath to the capital stock, which, at any point in time, is determined by
past investment decisions. The capital stock and its shadow value then move
INVESTMENT 61
towards their steady state (if they are not there yet). Setting q̇ = K̇ = 0 and
k = 0 in (2.12) and (2.13), we can study the steady-state levels qs s and K s s :
(r + ‰)qs s =
1
Pk
∂ F (·)
∂ K
∣∣∣∣
K =K s s
, (2.14)
È(qs s , K s s ) = ‰K s s . (2.15)
The second equation simply indicates that the gross investment rate Is s =
È(qs s , K s s ) must be such as to compensate depreciation in the stock of capital
(stock that is constant, by definition, in steady state). The first equation is less
obvious. Recalling that qs s = Îs s / Pk , however, we may rewrite it as
Îs s = (r + ‰)
−1 ∂ F (·)
∂ K
∣∣∣∣
K =K s s
=
∫ ∞
t
e −(r +‰)(Ù−t )
∂ F (·)
∂ K
∣∣∣∣
K =K s s
d Ù.
Thus, in steady state the shadow value of capital is equal to the stream of future
marginal contributions by capital to the firm’s cash flows, discounted at rate
r + ‰ > 0 over the infinite planning horizon. If it were the case that r + ‰ = 0,
the relevant present value would be ill-defined: hence, as mentioned above
and discussed in more detail below, it must be the case that r + ‰ > 0 in a
well-defined investment problem.
The steady state is readily interpreted along the lines of a simple approach
to investment which should be familiar from undergraduate textbooks (see
Jorgenson 1963, 1971). One may treat the capital stock as a factor of produc-
tion whose user cost is (rk + ‰) Pk when Pk is the price of each stock unit,
rk = i − � Pk / Pk is the real rate of interest in terms of capital, and ‰ is the
physical depreciation rate of capital. If the profit flow is an increasing concave
function F (K , . . . ) of capital K , the first-order condition
∂ F (K ∗(. . . ), . . . )
∂ K
= (rk + ‰) Pk (2.16)
identifies the K ∗ stock that maximizes F (K , . . . ) in each period, neglecting
adjustment costs. If capital does not depreciate and ‰ = 0, however, condi-
tion (2.15) implies that qs s = 1, since ∂ G (·)/∂ I = 1 when I = 0, and con-
dition (2.14) simply calls for capital’s marginal productivity to coincide
with its financial cost, just as in static approaches to optimal use of
capital:
∂ F (·)
∂ K
= r Pk .
If instead ‰ > 0, then steady-state investment is given by Is s = ‰K s s > 0, and
therefore qs s > 1. The unit cost of capital being installed to offset ongoing
depreciation is higher than Pk , because of adjustment costs.
Phase diagrams are useful not only for characterizing adjustment paths
starting from a given initial situation, but also for studying the investment
62 INVESTMENT
effects of permanent changes in parameters. To this end, one may specify a
functional form for cash flows F (·) in (2.1), as is done in the exercises at the
end of the chapter, and study the effects of a change in its parameters on the
q̇ = 0 locus, on the steady-state capital stock, and on the system’s adjustment
path.
Consider, for example, the effect of a smaller wage w. This event, as the
following exercise verifies in a special case, may (or may not) imply an increase
in the optimal capital stock in the static context of introductory economics
textbooks—and, equivalently, a higher stock K s s in the steady state of the
dynamic problem we are studying.
Exercise 11 Suppose that the adjustment cost function G (·) does not depend
on the capital stock, and let ‰ = 0. If the firm’s revenue function has the Cobb–
Douglas form R(K , N, t ) = K · N‚, does a lower w increase or decrease the
steady-state capital stock K s s ?
At the time when parameters change, however, the capital stock is given.
The new configuration of the system can affect only q and the investment
rate, and the resulting dynamics gradually increase (or decrease) the capital
stock. The gradual character of the optimal adjustment path derives from
strictly convex adjustment costs, which, as we know, make fast investment
unattractive. At any time, the speed of adjustment depends on the difference
between the current and steady-state levels of q . Hence the speed of movement
along the saddlepath is decreasing, and the growth rate of capital becomes
infinitesimally small as the steady state is approached. In fact, it is by avoiding
perpetually accelerating capital and investment trajectories that the “saddle”
adjustment paths can satisfy the transversality condition.
It is also interesting to study the effects on investment of future expected
events. Suppose that at time t = 0 it becomes known that the wage will remain
constant at w(0) until t = T , will then fall to w(T ) < w(0), and will remain
constant at that new level. The optimal investment flow anticipates such a
future exogenous event: if a lower wage and the resulting higher employment
of N implies a larger marginal contribution of capital to cash flows, then
the firm begins at time zero to invest more than what would be optimal
if it were known that w(t ) = w(0) for ever. However, since between t = 0
and t = T the wage is still w(0) and there is no reason to increase N for
given K , it cannot be optimal for the firm to behave as in the solution of
the exercise above, where the wage decreased permanently to w(t ) = w(T )
for all t .
In order to characterize the optimal investment policy, recall that to avoid
divergent dynamics the firm should select a dynamic path that leads towards
the steady state while satisfying the optimality conditions. From time T
onwards, all parameters are constant and we know that the firm should be
INVESTMENT 63
Figure 2.6. A hypothetical jump along the dynamic path, and the resulting time path of
Î(t ) and investment (↑ + ↓) ⇒ smaller investment costs
on the saddlepath leading to the new steady state. To figure out the dynamics
of q and K during the period when the system’s dynamics are still those
implied by w(0), note that the system should evolve so as to find itself on the
new saddlepath at time T , without experiencing discontinuous jumps. To see
why, consider the implications of a dynamic path such that a discontinuous
jump of q is needed to bring the system on the saddlepath, as in Figure 2.6.
Formally, it would be impossible to define q̇ (T ), hence Î̇(T ), and neither
equation (2.12) nor the optimality condition (2.6) could be satisfied. From
the economic point of view, recall that a sudden change of q would necessarily
entail a similarly abrupt variation of the investment flow, as in the figure. As we
know, however, strictly convex adjustment costs imply that such an investment
policy is more costly than a smoother version, such as that represented by dots
in the figure. Whenever a path with foreseeable discontinuities is considered as
an optimal-policy candidate, it can be ruled out by the fact that a more gradual
investment policy would reduce overall investment costs. (A more gradual
investment policy also affects the capital path, of course, but investment can be
redistributed over time so as to make this effect relatively small on a present
discounted basis.) Since such reasoning can be applied at every instant, the
optimal path is necessarily free of discontinuities—other than the unavoidable
64 INVESTMENT
Figure 2.7. Dynamic effects of an announced future change of w
one associated with the initial re-optimization in light of new, unforeseen
information arriving at time zero.
It is now easy to display graphically, as in Figure 2.7, the dynamic response
of the system. Starting from the steady state, the height of q ’s jump at time
zero (when the parameter change to be realized at time T is announced)
depends on how far in the future is the expected event. In the limit case where
T = 0 (that is, where the parameter change occurs immediately) q would
jump directly on the new saddlepath. If, as in Figure 2.7, T is rather far in the
future, q jumps to a point intermediate between the initial one (the old steady
state, in the figure) and the saddlepath: the firm then follows the dynamics
implied by the initial parameters until time T , when the dynamic path meets
the new saddlepath. Intuitively, the firm finds it convenient to dilute over
time the adjustment it foresees. For larger values of T the height of the ini-
tial jump would be smaller, and the apparently divergent dynamics induced
by the expectation of future events would follow slower, more prolonged,
dynamics.
These results offer a more precise interpretation of the investment deter-
mination assumptions made in the IS–LM model familiar from introductory
macroeconomics courses, where business investment I depends on exogenous
variables
(
say, Ī
)
and negatively on the interest rate. This relationship can
be rationalized qualitatively considering that the propensity to invest should
depend on (exogenous) expectations of future (hence, discounted) profits
to be obtained from capital installed through current investment. From this
point of view, any variable relevant to expectations of future profits influences
the exogenous component Ī of investment flows. Since the present discounted
value of profits is lower for large discount factors, for any given Ī the invest-
ment flow I is a decreasing function of the current interest rate i . In the
context of the dynamic model we are considering, the firm’s investment tends
INVESTMENT 65
to a steady state, which, inasmuch as it depends on future events, depends in
obvious and important ways on expectations.26
2.4. The Value of Capital and Future Cash Flows
As we have seen, in steady state it is possible to express q (t ) in terms of the
present value of future marginal effects of K on the firm’s cash flows. In fact,
a similar expression is always valid along an optimal investment path. If we
set Pk (t ) = 1 for all t (and therefore k ≡ 0) for simplicity, then q and Î are
equal. The last condition in (2.6) may be written
d
d Ù
Î(Ù) − (r + ‰)Î(Ù) = −F K (Ù), (2.17)
where
F K (Ù) =
∂ F (Ù, K (Ù), N(Ù))
∂ K
(2.18)
denotes the marginal cash-flow effect of capital at every time Ù along the firm’s
optimal trajectory. Multiplying by e −(r +‰)Ù, we can rewrite (2.17) in the form
d
d Ù
(
Î(Ù)e −(r +‰)Ù
)
= −F K (Ù)e −(r +‰)Ù,
which may be integrated between Ù = 0 and Ù = T to obtain
e −(r +‰)T Î(T ) − Î(0) = −
∫ T
0
F K (Ù)e
−(r +‰)Ù d Ù.
In the limit for T → ∞, as long as K (∞) > 0 condition (2.7) implies that the
first term vanishes and
Î(0) =
∫ ∞
0
F K (Ù)e
−(r +‰)Ù d Ù. (2.19)
Along an optimal investment trajectory, the marginal value of capital at time
zero is the present value of cash flows generated by an additional unit of
capital at time zero which, depreciating steadily over time at rate ‰, adds e −‰t
units of capital at each time t > 0. Taking as given the capital stock installed at
time t , each additional unit of capital increases cash flows according to F K (·).
The firm could indeed install such an additional unit and then, keeping its
investment policy unchanged, increase discounted cash flows by the amount in
(2.19).
²⁶ Keynes (1936, ch. 12) emphasizes the relevance of expectation later adopted as a key feature of
Keynesian IS–LM models. Of course, his framework of analysis is quite different from that adopted
here, and does not quite agree with the notion that investment should always tend to some long-run
equilibrium configuration.
66 INVESTMENT
This reasoning does not take into account the fact that a hypothetical
variation of investment (hence of capital in use in subsequent periods) should
lead the firm to vary its choices of further investment. Any such variation,
however, has no effect on capital’s marginal value as long as its size is infinites-
imally small. If at time zero a small additional amount of capital were in fact
installed, the firm would indeed vary its future investment policy, but only
by similarly small amounts. This would have no effect on discounted cash
flows around an optimal trajectory, where first-order conditions are satisfied
and small perturbations of endogenous variables have no first-order effect on
the firm’s value.
This fact, an application of the envelope theorem, makes it possible to com-
pute capital’s marginal value taking as given the optimal dynamic path of
capital—or, equivalently, to gauge the optimality of each investment decision
taking all other such decisions as given. In general, equation (2.19) does not
offer an explicit solution for Î(0), because its right-hand side depends on
future levels of K whenever ∂ F K (·)/∂ K �= 0, that is, whenever the function
linking cash flows to capital is strictly concave. Inasmuch as the marginal
contribution of capital to cash flows depends on the stock of capital, one
would need to know the level of K (Ù) for Ù > 0 in order to compute the right-
hand side of (2.19). But future capital stocks depend on current investment
flows, which in turn depend on the very Î that one is attempting to evaluate.
The obvious circularity of this reasoning generally makes it impossible to
compute the optimal policy through this route. For a finite planning horizon
T , one could obtain a solution starting from the given (possibly zero) value of
capital at the time when the firm ceases to exist. But if T → ∞ one needs to
compute the optimal policy as a whole, or at least to characterize it graphically
as we did above. In fact, it is easy to interpret the dynamics of q in Figure 2.7
in terms of expected cash flows: favorable exogenous events become nearer in
time (and are more weakly discounted) along the first portion of the dynamic
path illustrated in the figure.
It can be the case, however, that F (·) is only weakly concave (hence lin-
ear) in K ; then F K (·) ≡ ∂ F (·)/∂ K does not depend on exogenous variables,
equation (2.19) yields an explicit value for Î, and the firm’s investment policy
follows immediately. For example, if
∂ G (·)
∂ K
= 0, R(t, K (t ), N(t )) = R̃(t )K (t ), (2.20)
then (2.19) reads
Î(0) =
∫ ∞
0
R̃(Ù)e −(r +‰)Ù d Ù. (2.21)
The first equation in (2.20) states that capital’s installation costs depend only
on I , not on K . Hence, unit investment costs do depend on the size of
INVESTMENT 67
investment flows per unit time, but the cost of a given capital stock increase is
independent of the firm’s initial size. The second equation in (2.20) states that
each unit of installed capital makes the same contribution to the firm’s capital
stock, again denying that the firm’s size is relevant at the margin.
A relationship in the form (2.21) holds true, more generally, whenever the
scale of the firm’s operations is irrelevant at the margin. Consider the case of a
firm using a production function f (K , N) with constant returns to scale, and
operating in a competitive environment (taking as given prices and wages).
By the constant-returns assumption, f (K , N) = f (K /x, N/x )x , and, setting
x = K , total revenues may be written
R(t, K , N) = P (t ) f (K , N) = P (t ) f (1, N/K )K.
The first-order condition ∂ R(·)/∂ N = w, which takes the form
P (t ) f N (1, N/K ) = w(t ),
determines the optimal N/K ratio as a function Ì(·) of the w(t )/ P (t ) ratio. In
the absence of adjustment costs for factor N, this condition holds at all times,
and N(t )/K (t ) = Ì(w(t )/ P (t )) for all t . Hence,
F (t ) = P (t ) f (1, Ì(w(t )/ P (t ))) K − w(t )Ì (w(t )/ P (t ))
K − Pk (t )G ( I (t ), K (t )),
and, using the first equation in (2.20), we arrive at
∂ F (·)
∂ K
= P (t ) f (1, Ì(w(t )/ P (t ))) − w(t )Ì(w(t )/ P (t )). (2.22)
This expression is independent of K , like R̃(·) in (2.21), and allows us to
conclude that the constant-returns function F (·) is simply proportional to K.
This algebraic derivation introduces simple mathematical results that will
be useful when characterizing the average value of capital in the next section. It
also has interesting implications, however, when one allows for the possibility
that future realizations of exogenous variables such as w(t ) and P (t ) may
be random. A formal redefinition of the problem to allow for uncertainty in
continuous time requires more advanced technical tools, introduced briefly
in the last section of this chapter. Intuitively, however, if the firm’s objective
function is defined as the expected value of the integral in (2.5), an expression
similar to (2.19) should also hold in expectation:
Î(0) =
∫ ∞
0
E 0 [ F K (Ù)] e
−(r +‰)Ù d Ù. (2.23)
In discrete time, one would replace the integral with a summation and the
exponential function with compound discount factors. It would still be true,
of course, that along an optimal investment trajectory the marginal value of
68 INVESTMENT
Figure 2.8. Unit profits as a function of the real wage
capital is equal to the present expected value of its contributions to future cash
flows.
Now, if the firm operates in perfectly competitive markets, produces under
constant returns, and chooses the flexible factor optimally at all times, so that
(2.22) holds, then optimal cash flows are a convex function of the real product
wage, w(t )/ P (t ). It is easy to see why when we consider Figure 2.8, which
displays the profit accruing to the firm from each unit of capital. (A study of
unit profits is equivalent to that of total profits if, as in the present case, the
latter are proportional to the former.) If the firm did not vary its employment
of N in response to a change in the real wage, then, for given K , the difference
between revenues and variable-factor costs would be a linear function of the
real wage. By definition, the profits afforded by optimal adjustment of N
must be larger for every possible real wage, and will be equal only where the
supposedly constant employment level is optimal. Thus, profits are a convex
function of the real wage. Flexibility in employment of N allows the firm to
use each unit of capital so as to exploit favorable conditions and to limit losses
in unfavorable ones.
By Jensen’s inequality (already encountered when introducing precaution-
ary savings in Chapter 1), the conditions listed above imply that
Var
(
w(t )
p(t )
)
> 0 ⇒ E t [ F K (w(t )/ P (t ))] > F K (E t [w(t )/ P (t )]) .
Thus, uncertainty increases expected profits earned by each unit of capital, and
induces more intense investment by a firm that, like the one we are studying, is
risk-neutral (that is, is concerned only with expectations of future cash flows).
INVESTMENT 69
2.5. Average Value of Capital
We now recall the expression for F ( · ) in (2.1), and consider the case where
R( · ) and G ( · ) are linearly homogeneous as functions of K , N, and I . A
function f ( · ) is linearly homogeneous if
f (Îx, Îy) = Î f (x, y),
as in the case of constant-returns production functions. Then, Euler’s theorem
states that27
f (x, y) =
∂ f (x, y)
∂ x
x +
∂ f (x, y)
∂ y
y.
If G ( I, K ) did not depend on K , as in the case considered above, then it
could be linearly homogeneous only if adjustment costs were linear (hence
not strictly convex) in the investment flow I . But in the more general case,
omitting t and denoting partial derivatives by subscripts as in (2.18), we obtain
F (t ) = R(t, K (t ), N(t )) − Pk (t )G ( I (t ), K (t )) − w(t )N(t )
= K RK + N R N − ( I G I + K G K ) Pk − w N
= ( RK − G K Pk ) K − Pk G I I, (2.24)
where the first step applies Euler’s theorem to R(·) and G (·), and the second
recognizes that R N = w by the second condition in (2.6).
Noting that RK − G K Pk ≡ F K , and that the other conditions in (2.6) and
the accumulation constraint imply
Pk G I = Î, F K = (r + ‰)Î − Î̇, I = K̇ + ‰K ,
equation (2.24) simplifies to
F (t ) = r Î(t )K (t ) − Î̇(t )K (t ) − Î(t )K̇ (t ) (2.25)
along an optimal trajectory. It is immediately verified that this is equivalent
to
e −r t F (t ) =
d
d t
[
−e −r t Î(t )K (t )
]
, (2.26)
²⁷ This implies that, if x and y are factors of production whose units are compensated according
to marginal productivity, then the total compensation of the two factors exhausts production. (There
are no pure profits.) This will be relevant when, in Ch. 4, we discuss income distribution in a dynamic
general equilibrium.
70 INVESTMENT
and it is easy to evaluate the integral in the definition (2.5) of the firm’s value:
V (0) =
∫ ∞
0
F (t )e −r t d t
=
(
−e −r t Î(t )K (t )
)∞
0
= Î(0)K (0), (2.27)
where the last step recognizes that limt→∞ e −r t Î(t )K (t ) = 0 if the limit con-
dition (2.7) holds.
Thus, Î(0) = V (0)/K (0), and since this holds true for any time zero and all
steps are valid for any Pk (t ) (constant, or variable), we have in general
q (t ) ≡ Î(t )
Pk (t )
=
V (t )
Pk (t )K (t )
. (2.28)
Hence marginal q , which in the models considered above determines optimal
investment, is the same as the ratio of the firm’s market value to the replace-
ment cost of its capital stock.
This result offers a precise interpretation for another intuitive idea familiar
from introductory textbooks, namely the Tobin (1969) notion that investment
flows may be interpreted on the basis of financial considerations. In other
words, it is profitable to install capital and increase the production possibilities
of each firm and of the whole economy only if the cost of investment compares
favorably to the value of installed capital, as measured by the value of firms in
the financial market. As we have seen, the average q measure identified by
the Tobin approach is indeed the determinant of investment decisions when
firms face convex, linearly homogeneous adjustment costs, and produce under
constant returns.
Exercise 12 If the F (·) and G (·) functions are linearly homogeneous in K , I,
and N (so that average and marginal q coincide), what is the shape of the K̇ = 0
and q̇ = 0 loci in the phase diagram discussed in Section 2.2.1?
Such reasoning and results suggest an empirical approach to the study of
investment. On the basis of equation (2.11), investment should be completely
explained by q , which in turn is directly measurable from stock market
and balance-sheet data under the hypotheses listed above. Investment does
depend on (unobservable) expectations of future events. But, since the same
expectations also affect the value of the firm in a rational financial market,
one may test the proposed theoretical framework by considering empirical
relationships between investment flows and measured q . Of course, both
the value of the firm (and the average q it implies) and its investment are
endogenous variables. Hence the empirical strategy is akin to that, based
on Euler equations for aggregate consumption, encountered in Chapter 1.
One does not estimate a function relating investment (or consumption) to
INVESTMENT 71
exogenous variables, but rather verifies a property that endogenous variables
should display under certain theoretical assumptions.
As regards revenues, the assumption leading to the conclusion that invest-
ment and average q should be strictly related may be interpreted supposing
that the firm produces under constant returns to scale and behaves in perfectly
competitive fashion. As regards adjustment costs, the assumption is that they
pertain to proportional increases of the firm’s size, rather than to absolute
investment flows. A larger firm bears smaller costs to undertake a given
amount of investment, and the whole optimal investment program may be
scaled upwards or downwards if doubling the size of the firms yields the same
unit investment costs for twice-as-large investment flows, that is if the adjust-
ment cost function has constant returns to scale and G ( I, K ) = g ( I /K )K .
The realism of these (like any other) assumptions is debatable, of course. They
do imply that different initial sizes of the firm simply yield a proportionally
rescaled optimal investment program. As always under constant returns to
scale and perfectly competitive conditions, the firm does not have an optimal
size and, in fact, does not quite have a well-defined identity. In more general
models, the value of the firm is less intimately linked to its capital stock and
therefore may vary independently of optimal investment flows.
2.6. A Dynamic IS–LM Model
We are now ready to apply the economic insights and technical tools intro-
duced in the previous sections to study an explicitly macroeconomic, and
explicitly dynamic, modeling framework. Specifically, we discuss a simplified
version of the dynamic IS–LM model of Blanchard (1981), capturing the
interactions between forward-looking prices of financial assets and output
and highlighting the role of expectations in determining (through investment)
macroeconomic outcomes and the effects of monetary and fiscal policies. As in
the static version of the IS–LM model, the level of goods prices is exogenously
fixed and constant over time. However, the previous sections’ positive rela-
tionship between the forward-looking q variable and investment is explicitly
accounted for by the aggregate demand side of the model.
A linear equation describes the determinants of aggregate goods spending
y D (t ):
y D (t ) = · q (t ) + c y(t ) + g (t ), · > 0, 0 < c < 1. (2.29)
Spending is determined by aggregate income y (through consumption), by
the flow g of public spending (net of taxes) set exogenously by the fiscal
authorities, and by q as the main determinant of private investment spending.
72 INVESTMENT
We shall view q as the market valuation of the capital stock of the economy
incorporated in the level of stock prices: for simplicity, we disregard the dis-
tinction between average and marginal q , as well as any role of stock prices in
determining aggregate consumption.
Output y evolves over time according to the following dynamic equation:
ẏ(t ) = ‚ (y D (t ) − y(t )), ‚ > 0. (2.30)
Output responds to the excess demand for goods: when spending is larger
than current output, firms meet demand by running down inventories and
by increasing production gradually over time. In our setting, output is a
“predetermined” variable (like the capital stock in the investment model of
the preceding sections) and cannot be instantly adjusted to fill the gap between
spending and current production.
A conventional linear LM curve describes the equilibrium on the money
market:
m(t )
p
= h0 + h1 y(t ) − h2 r (t ), (2.31)
where the left-hand side is the real money supply (the ratio of nominal money
supply m to the constant price level p), and the right-hand side is money
demand. The latter depends positively on the level of output and negatively on
the interest rate r on short-term bonds.28 Conveniently, we assume that such
bonds have an infinitesimal duration; then, the instantaneous rate of return
from holding them coincides with the interest rate r with no possibility of
capital gains or losses.
Shares and short-term bonds are assumed to be perfect substitutes in
investors’ portfolios (a reasonable assumption in a context of certainty); con-
sequently, the rates of return on shares and bonds must be equal for any
arbitrage possibility to be ruled out. The following equation must then hold
in equilibrium:
(t )
q (t )
+
q̇ (t )
q (t )
= r (t ), (2.32)
where the left-hand side is the (instantaneous) rate of return on shares, made
up of the firms’ profits (entirely paid out as dividends to shareholders) and
the capital gain (or loss) q̇ . At any time this composite rate of return on shares
must equal the interest rate on bonds r .29 Finally, profits are positively related
to the level of output:
(t ) = a0 + a1 y(t ). (2.33)
²⁸ The assumption of a constant price level over time implies a zero expected inflation rate; there
is then no need to make explicit the difference between the nominal and real rates of return.
²⁹ If long-term bonds were introduced as an additional financial asset, a further “no arbitrage”
equation similar to (2.32) should hold between long and short-term bonds.
INVESTMENT 73
Figure 2.9. A dynamic IS–LM model
The two dynamic variables of interest are output y and the stock market
valuation q . In order to study the steady-state and the dynamics of the system
outside the steady-state, following the procedure adopted in the preceding
sections, we first derive the two stationary loci for y and q and plot them in
a (q , y)-phase diagram. Setting ẏ = 0 in (2.30) and using the specification
of aggregate spending in (2.29), we get the following relationship between
y and q :
y =
·
1 − c q +
1
1 − c g , (2.34)
represented as an upward-sloping line in Figure 2.9. A higher value of q stim-
ulates aggregate spending through private investment and increases output
in the steady state. This line is the equivalent of the IS schedule in a more
traditional IS–LM model linking the interest rate to output. For each level of
output, there exists a unique value of q for which output equals spending:
higher values of q determine larger investment flows and a corresponding
excess demand for goods, and, according to the dynamic equation for y,
output gradually increases. As shown in the diagram by the arrows pointing
to the right, ẏ > 0 at all points above the ẏ = 0 locus. Symmetrically, ẏ < 0 at
all points below the stationary locus for output.
The stationary locus for q is derived by setting q̇ = 0 in (2.32), which yields
q =
r
=
a0 + a1 y
h0/ h2 + h1/ h2 y − 1/ h2 m/ p
, (2.35)
where the last equality is obtained using (2.33) and (2.31). The steady-state
value of q is given by the ratio of dividends to the interest rate, and both
are affected by output. As y increases, profits and dividends increase, raising
q ; also, the interest rate (at which profits are discounted) increases, with a
depressing effect on stock prices. The slope of the q̇ = 0 locus then depends
74 INVESTMENT
on the relative strength of those two effects; in what follows we assume that
the “interest rate effect” dominates, and consequently draw a downward-
sloping stationary locus for q .30 The dynamics of q out of its stationary locus
are governed by the dynamic equation (2.32). For each level of output (that
uniquely determines dividends and the interest rate), only the value of q on
the stationary locus is such that q̇ = 0. Higher values of q reduce the dividend
component of the rate of return on shares, and a capital gain, implying q̇ > 0,
is needed to fulfill the “no arbitrage” condition between shares and bonds: q
will then move upwards starting from all points above the q̇ = 0 line, as shown
in Figure 2.9. Symmetrically, at all points below the q̇ = 0 locus, capital losses
are needed to equate returns and, therefore q̇ < 0.
The unique steady state of the system is found at the point where the two
stationary loci cross and output and stock prices are at ys s and qs s respec-
tively. As in the dynamic model analyzed in previous sections, in the present
framework too there is a unique trajectory converging to the steady-state, the
saddlepath of the dynamic system. To rationalize its negative slope in the (q ,
y) space, let us consider at time t0 a level of output y(t0) < ys s . The associated
level q (t0) on the saddlepath is higher than the value of q on the stationary
locus ẏ = 0. Therefore, there is excess demand for goods owing to a high level
of investment, and output gradually increases towards its steady-state value.
As y increases, the demand for money increases also and, with a given money
supply m, the interest rate rises. The behavior of q is best understood if the
dynamic equation (2.32) is solved forward, yielding the value of q (t0) as the
present discounted value of future dividends:31
q (t0) =
∫ ∞
t0
(t ) e
−
∫ t
t0
r (s ) d s
d t. (2.36)
Over time q changes, for two reasons: on the one hand, q is positively
affected by the increase in dividends (resulting from higher output); on the
other, future dividends are discounted at higher interest rates, with a negative
effect on q . Under our maintained assumption that the “interest rate effect”
dominates, q declines over time towards its steady-state value qs s .
Let us now use our dynamic IS–LM model to study the effects of a change
in macroeconomic policy. Suppose that at time t = 0 a future fiscal restriction
is announced, to be implemented at time t = T : public spending, which is
initially constant at g (0), will be decreased to g (T ) < g (0) at t = T and will
then remain permanently at this lower level. The effects of this anticipated
fiscal restriction on the steady-state levels of output and the interest rate
are immediately clear from a conventional IS–LM (static) model: in the new
³⁰ Formally, d q /d y|q̇ =0 < 0 ⇔ a1 < q (h1/ h2 ). Moreover, as indicated in Fig. 2.9, the q̇ = 0 line
has the following asymptote: limy→∞ q |q̇ =0 = a1 h2/ h1 .
³¹ In solving the equation, the terminal condition limt→∞ (t )e
−
∫ t
t0
r (s )d s
= 0 is imposed.
INVESTMENT 75
Figure 2.10. Dynamic effects of an anticipated fiscal restriction
steady state both y and r will be lower. Both changes affect the new steady-state
level of q : lower output and dividends depress stock prices, whereas a lower
interest rate raises q . Again, the latter effect is assumed to dominate, leading
to an increase in the steady-state value of q . This is shown in Figure 2.10 by
an upward shift of the stationary locus ẏ = 0, which occurs at t = T along an
unchanged q̇ = 0 schedule, leading to a higher q and a lower y in steady-state.
In order to characterize the dynamics of the system, we note that, from
time T onwards, no further change in the exogenous variables occurs: to
converge to the steady state, the economy must then be on the saddlepath
portrayed in the diagram. Accordingly, from T onwards, output decreases
(since the lower public spending causes aggregate demand to fall below cur-
rent production) and q increases (owing to the decreasing interest rate). What
happens between the time of the fiscal policy announcement and that of its
delayed implementation? At t = 0, when the future policy becomes known,
agents in the stock market anticipate lower future interest rates. (They also
foresee lower dividends, but this effect is relatively weak.) Consequently, they
immediately shift their portfolios towards shares, bidding up share prices.
Then at the announcement date, with output and the interest rate still at their
initial steady-state levels, q increases. The ensuing dynamics from t = 0 up
to the date T of implementation follow the equations of motion in (2.30)
and (2.32) on the basis of the parameters valid in the initial steady state. A
higher value of q stimulates investment, causing an excess demand for goods;
starting from t = 0, then, output gradually increases, and so does the interest
rate. The dynamic adjustment of output and q is such that, when the fiscal
policy is implemented at T (and the stationary locus ẏ = 0 shifts upwards),
the economy is exactly on the saddlepath leading to the new steady-state:
76 INVESTMENT
aggregate demand falls and output starts decreasing along with the interest
rate, whereas q and investment continue to rise. Therefore, an apparently
“perverse” effect of fiscal policy (an expansion of investment and output fol-
lowing the announcement of a future fiscal restriction) can be explained by the
forward-looking nature of stock prices, anticipating future lower interest rates.
Exercise 13 Consider the dynamic IS–LM model proposed in this section, but
suppose that (contrary to what we assumed in the text) the “interest rate effect”
is dominated by the “dividend effect” in determining the slope of the stationary
locus for q .
(a) Give a precise characterization of the q̇ = 0 schedule and of the dynamic
properties of the system under the new assumption.
(b) Analyze the effects of an anticipated permanent fiscal restriction
(announced at t = 0 and implemented at t = T ), and contrast the results
with those reported in the text.
2.7. Linear Adjustment Costs
We now return to a typical firm’s partial equilibrium optimal investment
problem, questioning the realism of some of the assumptions made above and
assessing the robustness of the qualitative results obtained from the simple
model introduced in Section 2.1. There, we assumed that a given increase
of the capital stock would be more costly when enacted over a shorter time
period, but this is not necessarily realistic. It is therefore interesting to study
the implications of relaxing one of the conditions in (2.4) to
∂ 2 G (·)
∂ I 2
= 0, (2.37)
so that in Figure 2.1 the G ( I, ·) function would coincide with the 45◦ line. Its
slope, ∂ G (·)/∂ I , is constant at unity, independently of the capital stock.
Since the cost of investment does not depend on its intensity or the speed
of capital accumulation, the firm may choose to invest “infinitely quickly”
and the capital stock is not given (predetermined) at each point in time.
This appears to call into question all the formal apparatus discussed above.
However, if we suppose that all paths of exogenous variables are continuous in
time and simply proceed to insert ∂ G /∂ I = 1 (hence Î = Pk , Î̇ = Ṗk = k Pk )
in conditions (2.6), we can obtain a simple characterization of the firm’s
optimal policy. As in the essentially static cost-of-capital approach outlined
above, condition (2.12) is replaced by
∂ F (·)
∂ K
= (r + ‰ − k ) Pk (t ). (2.38)
INVESTMENT 77
Hence the firm does not need to look forward when choosing investment.
Rather, it should simply invest at such a (finite, or infinite) rate as needed
to equate the current marginal revenues of capital to its user cost. The latter
concept is readily understood noting that, in order to use temporarily an
additional unit of capital, one may borrow its purchase cost, Pk , at rate r and
re-sell the undepreciated (at rate ‰) portion at the new price implied by k . If
F K (·) is a decreasing function of installed capital (because the firm produces
under decreasing returns and/or faces a downward-sloping demand function),
then equation (2.38) identifies the desired stock of capital as a function of
exogenous variables. Investment flows can then be explained in terms of the
dynamics of such exogenous variables between the beginning and the end of
each period. In continuous time, the investment rate per unit time is well
defined if exogenous variables do not change discontinuously.
Recall that we had to rule out all changes of exogenous variables (other than
completely unexpected or perfectly foreseen one-time changes) when drawing
phase diagrams. In the present setting, conversely, it is easy to study the
implications of ongoing exogenous dynamics. This enhances the realism and
applicability of the model, but the essentially static character of the perspective
encounters its limits when applied to real-life data. In reality, not only the
growth rates of exogenous variable in (2.38), but also their past and future
dynamics appear relevant to current investment flows.
An interesting compromise between strict convexity and linearity is offered
by piecewise linear adjustment costs. In Figure 2.11, the G ( I, ·) function has
unit slope when gross investment is positive, implying that Pk is the cost of
Figure 2.11. Piecewise linear unit investment costs
78 INVESTMENT
each unit of capital purchased and installed by the firm, regardless of how
many units are purchased together. The adjustment cost function remains
linear for I < 0, but its slope is smaller. This implies that when selling pre-
viously installed units of capital the firm receives a price that is independent
of I (t ), but lower than the purchase price. This adjustment cost structure
is realistic if investment represents purchases of equipment with given off-
the-shelf price, such as personal computers, and constant unit installation
cost, such as the cost of software installation. If installation costs cannot be
recovered when the firm sells its equipment, each firm’s capital stock has a
degree of specificity, while capital would need to be perfectly transferable
into and out of each firm for (2.16) to apply at all times. Linear adjustment
costs do not make speedy investment or scrapping unattractive, as strictly
convex adjustment costs would. The kink at the origin, however, still makes
it unattractive to mix periods of positive and negative gross investment. If a
positive investment were immediately followed by a negative one, the firm
would pay installation costs without using the marginal units of capital for
any length of time. In general, a firm whose adjustment costs have the form
illustrated in Figure 2.11 should avoid investment when very temporary events
call for capital stock adjustment. Installation costs put a premium on inac-
tivity: the firm should cease to invest, even as current conditions improve,
if it expects (or, in the absence of uncertainty, knows) that bad news will
arrive soon.
To study the problem formally in the simplest possible setting, it is con-
venient to suppose that the price commanded by scrapped units of capital
is so low as to imply that investment decisions are effectively irreversible.
This is the case when the slope of G ( I, ·) for I < 0 is so small as to fall
short of what can be earned, on a present discounted basis, from the use
of capital in production. Since adjustment costs do not induce the firm to
invest slowly, the investment rate may optimally jump between positive and
negative values. In fact, nothing prevents optimal investment from becoming
infinitely positive or negative, or the optimal capital stock path from jumping.
If exogenous variables follow continuous paths, however, there is no reason
for any such jump to occur along an optimal path. Hence the Hamiltonian
solution method remains applicable. Among the conditions in (2.6), only the
first needs to be modified: if capital has price Pk when purchased and is never
sold, the first-order condition for investment reads
Pk
{
= Î(t ), if I > 0,
≥ Î(t ), if I = 0. (2.39)
The optimality condition in (2.39) requires Î(t ), the marginal value of capital
at time t , to be equal to the unit cost of investment only if the firm is indeed
investing. Hence in periods when I (t ) > 0 we have Î(t ) = Pk , Î̇(t ) = k P (t ),
INVESTMENT 79
and the third condition in (2.6) implies that (2.38) is valid at all t such that
I (t ) > 0. If the firm is investing, capital installed must line up with ∂ F (·)/∂ K
and with the user cost of capital at each instant.
It is not necessarily optimal, however, always to perform positive invest-
ment. It is optimal for the firm not to invest whenever the marginal value of
capital is (weakly) lower than what it would cost to increase its stock by a unit.
In fact, when the firm expects unfavorable developments in the near future
of the variables determining the “desired” capital stock that satisfies condition
(2.38), then if it continued to invest it would find itself with an excessive of
capital stock.
To characterize periods when the firm optimally chooses zero investment,
recall that the third condition in (2.6) and the limit condition (2.7) imply, as
in (2.19), that
q (t ) ≡ Î(t )
Pk (t )
=
1
Pk (t )
∫ ∞
t
F K (Ù)e
−(r +‰)(Ù−t ) d Ù. (2.40)
In the upper panel of Figure 2.12, the curve represents a possible dynamic
path of desired capital, determined by cyclical fluctuations of F (·) for given
K . Since that curve falls faster than capital depreciation for a period, the
firm ceases to invest at time t0 and starts again at time t1. We know from the
Figure 2.12. Installed capital and optimal irreversible investment
80 INVESTMENT
optimality condition (2.39) that the present value (2.40) of marginal revenue
products of capital must be equal to the purchase price Pk (t ) at all t when
gross investment is positive, such as t0 and t1. Thus, if we write
Pk (t0) =
∫ ∞
t0
F K (Ù)e
−(r +‰)(Ù−t0 ) d Ù
=
∫ t1
t0
F K (Ù)e
−(r +‰)(Ù−t0 ) d Ù +
∫ ∞
t1
F K (Ù)e
−(r +‰)(Ù−t0 ) d Ù, (2.41)
noting that∫ ∞
t1
F K (Ù)e
−(r +‰)(Ù−t0 ) d Ù = e −(r +‰)(t1−t0 )
∫ ∞
t1
F K (Ù)e
−(r +‰)(Ù−t1 ) d Ù,
and recognizing Î(t1) = Pk (t1) in the last integral, we obtain
Pk (t0) =
∫ t1
t0
F K (Ù)e
−(r +‰)(Ù−t0 ) d Ù + e −(r +‰)(t1−t0 ) Pk (t1)
from (2.41). If the inflation rate in terms of capital is constant at k , then
Pk (t1) = Pk (t0)e
k (t1−t0 ) and
Pk (t0) =
∫ t1
t0
F K (Ù) e
−(r +‰)(Ù−t0 )d Ù + Pk (t0) e
−(r +‰−k )(t1−t0 )
⇒ Pk (t0) (1 − e −(r +‰−k )(t1−t0 )) =
∫ t1
t0
F K (Ù) e
−(r +‰)(Ù−t0 )d Ù.
Noting that∫ t1
t0
(r + ‰ − k )e −(r +‰−k )(Ù−t0 )d Ù = 1 − e −(r +‰−k )(t1−t0 ),
we obtain∫ t1
t0
F K (Ù) e
−(r +‰)(Ù−t0 )d Ù − Pk (t0)
∫ t1
t0
(r + ‰ − k ) e −(r +‰−k )(Ù−t0 )d Ù = 0.
Again, using Pk (t0)e
k (Ù−t0 ) = Pk (Ù) yields∫ t1
t0
F K (Ù) e
−(r +‰)(Ù−t0 )d Ù −
∫ t1
t0
(r + ‰ − k ) Pk (Ù) e −(r +‰)(Ù−t0 )d Ù = 0,
and (2.41) may be rewritten as∫ t1
t0
[ F K (Ù) − (r + ‰ − k ) Pk (Ù)]e −(r +‰)(Ù−t0 ) d Ù = 0. (2.42)
Thus, the marginal revenue product of capital should be equal to its user
cost in present discounted terms (at rate r + ‰) not only when the firm invests
continuously, but also over periods throughout which it is optimal not to
INVESTMENT 81
invest. In Figure 2.12, area A should have the same size as the discounted
value of B. Adjustment costs, as usual, affect the dynamic aspects of the firm’s
behavior. As the cyclical peak nears, the firm stops investing because it knows
that in the near future it would otherwise be impossible to preserve equality
between marginal revenues and costs of capital.
Similar reasoning is applicable, with some slightly more complicated nota-
tion, to the case where the firm may sell installed capital at a positive price
pk (t ) < Pk (t ) and find it optimal to do so at times. In this case, we should
draw in Figure 2.12 another dynamic path, below that representing the desired
capital stock when investment is positive, to represent the capital stock that
satisfies condition (2.38) when the user cost of capital is computed on the basis
of its resale price. The firm should follow this path whenever its desired invest-
ment is negative and optimal inaction would lead it from the former to the
latter line.
Even though the speed of investment is not constrained, the existence of
transaction costs implies that the firm’s behavior should be forward-looking.
Investment should cease before a slump reveals that it would be desirable to
reduce the capital stock. This is yet another instance of the general importance
of expectations in dynamic optimization problems. Symmetrically, the capital
stock at any given time is not independent of past events. In the latter portion
of the inaction period illustrated in the figure, the capital stock is larger than
what would be optimal if it could be chosen in light of current conditions. This
illustrates another general feature of dynamic optimization problems, namely
the character of interaction between endogenous capital and exogenous forc-
ing variables: the former depends on the whole dynamic path of the latter,
rather than on their level at any given point in time.
2.8. Irreversible Investment Under Uncertainty
Throughout the previous sections, the firm was supposed to know with
certainty the future dynamics of exogenous variables relevant to its optimiza-
tion problem. (And, in order to make use of phase diagrams, we assumed
that those variables were constant through time, or only changed discretely in
perfectly foreseeable fashion.) This section briefly outlines formal modeling
techniques allowing uncertainty to be introduced in explicit, if stylized, ways
into the investment problem of a firm facing linear adjustment costs.
We try, as far as possible, to follow the same logical thread as in the
derivations encountered above. We continue to suppose that the firm oper-
ates in continuous time. The assumption that time is indefinitely divisible
is of course far from completely realistic; also less than fully realistic are the
assumptions that the capital stock is made up of infinitesimally small particles,
82 INVESTMENT
and that it may be an argument of a differentiable production function. As
was the case under certainty, however, such assumptions make it possible to
obtain precise and elegant quantitative results by means of analytical calculus
techniques.
2.8.1. STOCHASTIC CALCULUS
First of all, we need to introduce uncertainty into the formal continuous-time
optimization framework introduced above. So far, all exogenous features of
the firm’s problem were determined by the time index, t : knowing the position
in time of the dynamic system was enough to know the product price, the cost
of factors, and any other variable whose dynamics are taken as given by the
firm. To prevent such dynamics from being perfectly foreseeable, one must let
them depend not only on time, but also on something else: an index, denoted
˘, of the unknown state of nature. A function {z(t ; ˘)} of a time index t and
of the state of nature ˘ is a stochastic process, that is, a collection of random
variables. The state of nature, by definition, is not observable. If the true ˘
were known, in fact, the path of the process would again depend on t only,
and there would be no uncertainty. But if ˘ belongs to a set on which a
probability distribution is defined, one may formally assign likelihood levels
to different possible ˘ and different possible time paths of the process. This
makes it possible to formulate precise answers to questions, clearly of interest
to the firm, concerning the probability that processes such as revenues or costs
reach a given level within a given time interval.
In order to illustrate practical uses of such concepts, it will not be neces-
sary to deal further with the theory of stochastic processes. We shall instead
introduce a type of stochastic process of special relevance in applications:
Brownian motion. A standard Brownian motion, or Wiener process, is a basic
building block for a class of stochastic process that admits a stochastic coun-
terpart to the functional relationships studied above, such as integrals and
differentials. This process, denoted {W(t )} in what follows, can be defined by
its probabilistic properties. {W(t )} is a Wiener process if
1. W(0; ˘) = 0 for “almost all” all ˘, in the sense that the probability is one
that the process takes value zero at t = 0;
2. fixing ˘, {W(t ; ˘)} is continuous in t with probability one;
3. fixing t ≥ 0, probability statements about W(t ; ˘) can be made viewing
W(t ) as a normally distributed random variable, with mean zero and
variance t as of time zero: realizations of W(t ) are quite concentrated for
small values of t , while more and more probability is attached to values
far from zero for larger and larger values of t ;
INVESTMENT 83
4. W(t ′) − W(t ), for every t ′ > t , is also a normally distributed random
variable with mean zero and variance (t ′ − t ); and W(T ′) − W(T ) is
uncorrelated with—and independent of—W(t ′) − W(t ) for all T ′ >
T > t ′ > t .
Assumption 1 is a simple normalization, and assumption 2 rules out jumps
of W(t ) to imply that large changes of W(t ) become impossible as smaller
and smaller time intervals are considered. Indeed, property 3 states that the
variance of changes is proportional to time lapsed, hence very small over
short periods of time. The process, however, has normally distributed incre-
ments over any finite interval of time. Since the normal distribution assigns
positive probability to any finite interval of the real line, arbitrarily large
variations have positive probability on arbitrarily short (but finite) intervals
of time.
Normality of the process’s increments is useful in applications, because
linear transformations of W(t ) can also be normal random variables with
arbitrary mean and variance. And the independence over time of such incre-
ments stated as property 4 (which implies their normality, by an application of
the Central Limit Theorem) makes it possible to make probabilistic statements
on all future values of W(t ) on the basis of its current level only. It is particu-
larly important to note that, if {W(t ); 0 ≤ t ≤ t1} is known with certainty, or
equivalently if observation of the process’s trajectory has made it possible to
rule out all states of the world ˘ that would not be consistent with the observed
realization of the process up to time t1, then the probability distribution of the
process’s behavior in subsequent periods is completely characterized. Since
increments are independent over non-overlapping periods, W(t ) − W(t1) is
a normal random variable with mean zero and variance t − t1. Hence the
process enjoys the Markov property in levels, in that its realization at any time
Ù contains all information relevant to formulating probabilistic statements as
to its realizations at all t > Ù.
Independence of the process’s increments has an important and somewhat
awkward implication: for a fixed ˘, the path {W(t )} is continuous but (with
probability one) not differentiable at any point t . Intuitively, a process with
differentiable sample paths would have locally predictable increments, because
extrapolation of its behavior over the last d t would eliminate all uncertainty
about the behavior of the process in the immediate future. This, of course,
would deny independence of the process’s increments (property 4 above). For
increments to be independent over any t interval, including arbitrarily short
ones, the direction of movement must be random at arbitrarily close t points.
A typical sample path then turns so frequently that it fails to be differentiable
at any t point, and has infinite variation: the absolute value of its increments
over infinitesimally small subdivisions of an arbitrarily short time interval is
infinite.
84 INVESTMENT
Non-existence of the derivative makes it impossible to apply familiar cal-
culus tools to functions when one of their arguments is a Brownian process
{W}. Such functions—which, like their argument, depend on t, ˘ and are
themselves stochastic process—may however be manipulated by stochastic
calculus tools, developed half a century ago by Japanese mathematician T.
Itô along the lines of classical calculus. Given a process { A(t )} with finite
variation, a process {y(t )} which satisfies certain regularity conditions, and
a Wiener process {W(t )}, the integral
z(T ; ˘) = z(t ; ˘) +
∫ T
t
y(Ù; ˘) d W(Ù; ˘) +
∫ T
t
d A(t ; ˘) (2.43)
defines an Itô process {z(t )}. The expression
∫
y d W denotes a stochastic or
Itô integral. Its exact definition need not concern us here: we may simply note
that it is akin to a weighted sum of the Wiener process’s increments d W(t ),
where the weight function {y(t )} is itself a stochastic process in general.
The properties of Itô integrals are similar to those of more familiar integrals
(or summations). Stochastic integrals of linear combinations can be written
as linear combinations of stochastic integrals, and the integration by parts
formula
z(t )x (t ) = z(0)x (0) +
∫ t
0
z(Ù)d x (Ù) +
∫ t
0
x (Ù)d z(Ù). (2.44)
holds when z and x are processes in the class defined by (2.43) and one of
them has finite variation. The stochastic integral has one additional important
property. By the unpredictable character of the Wiener process’s increments,
E t
(∫ T
t
y(Ù) d W(Ù)
)
= 0,
for any {y(t )} such that the expression is well defined, where E t [·] denotes the
conditional expectation at time t (that is, an integral weighting possible realiza-
tions with the probability distribution reflecting all available information on
the state of nature as of that time).
Recall that, if function x (t ) has first derivative x ′(t ) = d x (t )/d t = ẋ , and
function f ( · ) has first derivative f ′(x ) = d f (x )/d x, then the following rela-
tionships are true:
d x = ẋ d t, d f (x ) = f ′(x ) d x, d f (x ) = f ′(x ) ẋ d t. (2.45)
The integral (2.43) has differential form
d z(t ) = y(t ) d W(t ) + d A(t ), (2.46)
and it is natural to formulate a stochastic version of the “chain rule”
relationships in (2.45), used in integration “by substitution.” The rule is as fol-
lows: if a function f ( · ) is endowed with first and second derivatives, and {z(t )}
INVESTMENT 85
is an Itô process with differential as in (2.46), then
d f (z(t )) = f ′(z(t ))y(t ) d W(t ) + f ′(z(t )) d A(t ) + 1
2
f ′′(z(t ))(y(t ))2 d t.
(2.47)
Comparing (2.46–2.47) with (2.45), note that, when applied to an Itô
process, variable substitution must take into account not only the first, but also
the second, derivative of the transformation. Heuristically, the order of mag-
nitude of d W(t ) increments is higher than that of d t if uncertainty is present
in every d t interval, no matter how small. Independent increments also imply
that the sign of d W(t ) is just as likely to be positive as to be negative, and
by Jensen’s inequality the curvature of f (z) influences locally non-random
behavior even in the infinitesimal limit. Taking conditional expectations in
(2.47), where E t [d W(t )] = 0 by unpredictability of the Wiener process, we
have
E t [d f (z(t ))] = f
′(z(t )) d A(t ) + 1
2
f ′′(z(t ))(y(t )2) d t.
Hence E t [d f (z(t ))] � f ′(z(t )) E t [d z(t )] depending on whether f ′′(z(t ))
� 0.
2.8.2. OPTIMIZATION UNDER UNCERTAINTY AND IRREVERSIBILITY
We are now ready to employ these formal tools in the study of a firm that,
in partial equilibrium, maximizes the present discounted value at rate r of
its cash flows. In the presence of uncertainty, exogenous variables relevant to
profits are represented by the realization of a stochastic process, Z (t ), rather
than by the time index t . As seen above, the optimal profit flow may be
a convex function of exogenous variables (but it may also, under different
assumptions, be concave). In such cases Jensen’s inequality introduces a link
between the expected value and variability of capital’s marginal revenue prod-
uct. For simplicity, we will disregard such effects, supposing that the profit
flow is linear in Z . Like in the previous section, let K (t ) be the capital stock
installed at time t . For simplicity, let this be the only factor of production, so
that the firm’s cash flow gross of investment-related expenditures is F (K ) Z .
We suppose further that units of capital may be purchased at a constant price
Pk and have no scrap value. As long as capital is useful—that is, as long as
F ′(K ) > 0—this implies that all investment is irreversible.
The exogenous variable Z , which multiplies a function of installed capital,
could be interpreted as the product’s price. Let its dynamics be described by a
stochastic process with differential
d Z (t ) = ËZ (t ) d t + ÛZ (t ) d W(t ).
86 INVESTMENT
This is a simple special case of the general expression in (2.46), with A(t ) =
ËZ (t ) d t and y(t ) = ÛZ (t ) for Ë and Û constant parameters. This process is
a geometric Brownian motion, and it is well suited to economic applications
because Z (t ) is positive (as a price should be) for all t > 0 if Z (0) > 0; as
it gets closer to zero, in fact, this process’s increments become increasingly
smaller in absolute value, and it can never reach zero. If Û is equal to zero,
the proportional growth rate of Z , d Z/Z = Ë, is constant and known with
certainty, implying that Z (T ) is known for all T > 0 if Z (0) is. But if Û is
larger than zero, that deterministic proportional growth rate is added, during
each time interval (t ′ − t ), to the realization of a normally distributed ran-
dom variable with mean zero and variance (t ′ − t )Û2. This implies that the
logarithm of Z is normally distributed (that is, Z (t ) is a lognormal random
variable), and that the dispersion of future possible levels of Z is increasingly
wide over longer forecasting horizons.
As we shall see, the firm’s optimal investment policy implies that one may
not generally write an expression for K̇ = d K (t )/d t . If capital depreciates at
rate ‰, the accumulation constraint is better written in differential form,
d K (t ) = d X (t ) − ‰K (t ) d t,
for a process X (t ) that would correspond to the integral
∫ t
o
I (Ù)d Ù of gross
investment I (t ) per unit time if such concepts were well defined.
Apart from such formal peculiarities, the firm’s problem is substantially
similar to those studied above. We can define, also in the presence of uncer-
tainty, the shadow value of capital at time t , which still satisfies the relationship
Î(t ) =
∫ T
t
E t [ F
′(K (Ù)) Z (Ù)]e −(r +‰)(Ù−t ) d Ù. (2.48)
As in the previous sections, and quite intuitively, the optimal investment
policy must be such as to equate Î(t )—the marginal contribution of capital to
the firm’s value—to the marginal cost of investment. If the second derivative
of F (·) is not zero, however, the marginal revenue products on the right-hand
side of (2.48) depend on the (optimal) investment policy, which therefore
must be determined simultaneously with the shadow value of capital.
If investment is irreversible and has constant unit cost Pk , then the firm, as
we saw in (2.39), must behave so as to obtain Î(t ) = Pk when gross investment
is positive, that is when d X (t ) > 0 in the notation introduced here; and to
ensure that Î(t ) ≤ Pk at all times. The shadow value of capital is smaller
than its cost when a binding irreversibility constraint prevents the firm from
keeping them equal to one another, as would be possible (and optimal) if, as in
the first few sections of this chapter, investment costs were uniformly convex
and smoothly differentiable at the origin.
Now, if the firm only acts when Î(t )/ Pk ≡ q equals unity, and since the
future path of the {Z (Ù)} process depends only on its current level Z (t ), the
INVESTMENT 87
expected value in (2.48) and the level of q must be functions of K (t ) and Z (t ).
Thus, we may write Î(t )/ Pk ≡ q (K (t ), Z (t )), noting that (2.48) implies
(r + ‰)
Î(t )
Pk
d t =
F ′(K (t )) Z (t )
Pk
d t +
E t [d Î(t )]
Pk
, (2.49)
and we use a multivariate version of the differentiation rule (2.47) to expand
the expectation in (2.49) to
(r + ‰)q (K , Z ) =
F ′(K ) Z
Pk
+
∂q (K , Z )
∂ K
K (−‰)
+
∂q (K , Z )
∂ Z
ËZ +
∂ 2q (K , Z )
∂ Z 2
Û2
2
Z 2, (2.50)
an equation satisfied by q at all times when the firm is not investing (and
therefore when capital is depreciating at a rate ‰).
This is a relatively simple differential equation, which may be further sim-
plified supposing that ‰ = 0. A particular solution of
r q (K , Z ) =
F ′(K ) Z
Pk
+
∂q (K , Z )
∂ Z
ËZ +
∂ 2(K , Z )
∂ Z 2
Û2
2
Z 2 (2.51)
is linear in Z and reads
q0(K , Z ) =
F ′(K ) Z
(r − Ë) Pk
.
The “homogeneous” part of the equation,
r qi (K , Z ) =
∂qi (K , Z )
∂ Z
ËZ +
∂ 2qi (K , Z )
∂ Z 2
Û2
2
Z 2,
is solved by functions in the form
qi (K , Z ) = Ai Z
‚ (2.52)
if ‚ is a solution of the quadratic equation
r = ‚Ë + (‚ − 1)‚ Û
2
2
, (2.53)
for any constant Ai , as is easily checked inserting its derivatives
∂qi (K , Z )
∂ Z
= Ai Z
‚−1‚,
∂ 2qi (K , Z )
∂ Z 2
= (‚ − 1)‚ Ai Z ‚−2
in the differential equation and simplifying the resulting expression.
88 INVESTMENT
The quadratic equation has two distinct roots if Û2 > 0:
‚1 =
1
Û2
[
−
(
Ë − 1
2
Û2
)
+
√(
Ë − 1
2
Û2
)2
+ 2Û2r
]
> 0,
‚2 =
1
Û2
[
−
(
Ë − 1
2
Û2
)
−
√(
Ë − 1
2
Û2
)2
+ 2Û2r
]
< 0.
Thus, there exist two groups of solutions in the form (2.52), q1(K , Z ) =
A1 Z
‚1 and q2(K , Z ) = A2 Z
‚2 . Hence, all solutions to (2.51) may be written
q (K , Z ) =
F ′(K ) Z
(r − Ë) Pk
+ A1 Z
‚1 + A2 Z
‚2 .
(Recall that we have set ‰ = 0.)
To determine the constants A1 and A2, we recall that this expression
represents the ratio of capital’s marginal value to its purchase price. From this
economic point of view, it is easy to argue that A2, the constant associated with
the negative root of (2.53), must be zero. Otherwise, as Z tends to zero the
shadow value of capital would diverge towards infinity (or negative infinity),
which would be quite difficult to interpret since capital’s contribution to
profits tends to vanish in that situation.
We also know that the firm’s investment policy prevents q (K , Z ) from
exceeding unity. The other constant, A1, and the firm’s investment policy
should therefore satisfy the equation
F ′(K ∗( Z )) Z
(r − Ë) Pk
+ A1 Z
‚1 = 1, (2.54)
where K ∗( Z ) denotes the capital stock chosen by the firm when exogenous
conditions are indexed by Z and the irreversibility constraint is not binding,
so that it is possible to equate capital’s shadow value and cost (Î = Pk , and
q = 1).
The single equation (2.54) does not suffice to determine both K ∗( Z ) and
A1. But the structure of the problem implies that another condition should
also be satisfied by these two variables: this is the smooth pasting or high-
contact condition that
∂q (·)
∂ Z
= 0
whenever q = 1, K = K ∗( Z ) and, therefore, gross investment d X may be
positive. To see why, consider the character of the firm’s optimal investment
policy. When following the proposed optimal policy, the firm invests if and
only if an infinitesimal stochastic increment of the Z process would other-
wise lead q to exceed unity. Since the stochastic process that describes Z ’s
dynamics has “infinite variation,” each instant when investment is positive is
followed immediately by an instant (at least) when Z declines and there is no
INVESTMENT 89
investment. (It is for this reason that the time path of the capital stock, while
of finite variation, is not differentiable and the notation could not feature the
usual rate of investment per unit time, I (t ) = d K (t )/d t .) When K = K ∗( Z ),
a relationship in the form (2.50) should be satisfied:
(r + ‰)q (K ∗( Z ), Z )d t =
F ′(K ∗( Z )) Z
Pk
d t +
∂q (K ∗( Z ), Z )
∂ K
d K
+
∂q (K ∗( Z ), Z )
∂ Z
ËZ d t +
∂ 2(K ∗( Z ), Z )
∂ Z 2
Û2
2
Z 2d t,
(2.55)
where the variation of both arguments of the q (·) function is taken into
account, and d K (t ) may be positive when d X > 0.
Along the K = K ∗( Z ) locus the relationship q (K ∗( Z ), Z ) = 1 is also sat-
isfied. As long as the function is differentiable (as it is in this model), total
differentiation yields
∂q (K ∗( Z ), Z )
∂ K
d K = − ∂q (K
∗( Z ), Z )
∂ Z
d Z.
Inserting this in (2.55) yields
(r + ‰)q (K ∗( Z ), Z )d t =
F ′(K ∗( Z )) Z
Pk
d t +
∂q (K ∗( Z ), Z )
∂ Z
(ËZ d t − d Z )
+
∂ 2(K ∗( Z ), Z )
∂ Z 2
Û2
2
Z 2d t. (2.56)
Since the path of all variables is continuous, the q (·) function must also satisfy
the differential equation (2.51) that holds during zero-investment periods.
Thus, it must be the case that
r q (K , Z ) =
F ′(K ) Z
Pk
+
∂q (K , Z )
∂ Z
ËZ +
∂ 2(K , Z )
∂ Z 2
Û2
2
Z 2 (2.57)
for K and Z values arbitrarily close to those that induce the firm to invest. By
continuity, in the limit where investment becomes positive (for an instant) and
(ËZ d t − d Z ) �= ËZ d t , equations (2.56) and (2.57) can hold simultaneously
only if
∂q (K ∗( Z ), Z )
∂ Z
= 0,
the smooth-pasting condition.
In the case we are studying,
∂q (K , Z )
∂ Z
∣∣∣∣
K =K ∗( Z )
=
F ′(K ∗( Z ))
(r − Ë) Pk
+ A1‚1 Z
‚1−1.
90 INVESTMENT
Setting this expression to zero we have
A1 = −
F ′(K ∗( Z )) Z 1−‚1
‚1(r − Ë) Pk
,
and inserting this in (2.54) we obtain a characterization of the firm’s optimal
investment policy:
F ′(K ∗( Z )) Z =
‚1
‚1 − 1
(r − Ë) Pk ,
or, recalling that ‚1 is the positive solution of equation (2.53) and rearranging
it to read ‚/(‚ − 1) =
(
r + 1
2
‚Û2
)
/(r − Ë),
F ′(K ∗( Z )) Z =
(
r + 1
2
‚1Û
2
)
Pk .
It is instructive to compare this equation with that which would hold
if the firm could sell as well as purchase capital at the constant price Pk .
In that case, since ‰ = 0, it should be true that F ′(K ) Z = r Pk at all times.
Since ‚1 > 0, at times of positive investment the marginal revenue product of
capital is higher—and, with F ′′(·) < 0, the capital stock is lower—than in the
case of reversible investment. Intuitively, the irreversibility constraint makes
it suboptimal to invest so as to equate the current marginal revenue product
of capital to its user cost r Pk : the firm knows that it will be impossible to
reduce the capital stock in response to future negative developments, and aims
at avoiding large excessive capacity in such instances by restraining investment
in good times. The ‚1 root is a function of Û, and it is possible to show that
‚1Û
2 is increasing in Û—or that, quite intuitively, a larger wedge between the
current marginal profitability of capital and its user cost is needed to trigger
investment when the wedge may very quickly be erased by more highly volatile
fluctuations.
Substantially similar, but more complex, derivations can be performed for
cases where capital depreciates and/or the firm employs perfectly flexible
factors (such as N in the previous sections’ models). To obtain closed-form
solutions in such cases, it is necessary to assume that the firm’s demand and
production functions have constant-elasticity forms. (Further details are in
the references at the end of the chapter.)
Irreversible investment models are more complex and realistic than the
models introduced above. They do not, however, deny their fundamental
assumption that optimal investment policies rule out arbitrage opportunities,
and similarly support simple present-value financial considerations. Intu-
itively, if investment is irreversible future decisions to install capital may only
increase its stock and—under decreasing returns—reduce its marginal rev-
enue product. Just like in the certainty model of Section 2.6, it is precisely
the expectation of future excess capacity (and low marginal revenue products)
that makes the firm reluctant to invest. In present expected value terms, in
INVESTMENT 91
fact, capital’s marginal revenue product fluctuates around the same user-cost
level that would determine it in the absence of adjustment costs.
A model with long periods of inaction, of course, cannot represent well the
dynamics of aggregate investment, which is empirically much smoother than
would be implied by the dynamics illustrated in Figure 2.12, or by similar
pictures one might draw tracing the dynamics of a stochastic desired capital
stock and of the irreversibly installed stock associated with it. This suggests
that aggregate dynamics should not be interpreted as the optimal choices of a
single, “representative” firm—as is assumed by most micro-founded macro-
economic models. If one allows part of uncertainty to be “idiosyncratic,” that
is relevant only to individual firms but completely offset in the aggregate, then
aggregation of intermittent and heterogeneous firm-level investment poli-
cies yields smoother macroeconomic dynamics. Inaction by individual firms
implies some degree of inertia in the aggregate series’ response to aggregate
shocks. Such inertia could be interpreted in terms of convex adjustment costs
for a hypothetical representative firm, but reflects heterogeneity of micro-
economic dynamics if one maintains that adjustment costs do not necessar-
ily imply higher unit costs for faster investment. This interpretation allows
aggregate variables to react quickly to unusual large events and, in particular,
to drastic changes of future expectations. Co-existence of firms with very
different dynamic experiences is perhaps most obvious from a labor-market
perspective, since employment typically increases in some sectors and firms
at the same time as it declines in others. Employment changes can in fact be
interpreted as “investment” if, as is often the case, hiring and firing workers
entails costs for employers. The next chapter discusses dynamic labor demand
issues from this perspective.
� APPENDIX A2: HAMILTONIAN OPTIMIZATION METHODS
The chapter’s main text makes use of Hamiltonian methods for the solution of
dynamic optimization problems in continuous time, emphasizing economic inter-
pretations of optimality conditions which, as usual, impose equality at the margin
between (actual or opportunity) costs and revenues. The more technical treatment in
this appendix illustrates the formal meaning of the same conditions. More detailed
expositions of the relevant continuous-time optimization techniques may be found in
Dixit (1990) or Barro and Sala-i-Martin (1995).
In continuous time, any optimization problem must be posed in terms of
relationships among functions, more complex mathematical objects than ordinary real
numbers. As we shall see, however, the appropriate methods may still be interpreted
in light of the simple notions that are familiar from the solution of static constrained
optimization problems. Dynamic optimization problems under uncertainty may be
formulated in substantially similar ways, taking into account that optimality con-
ditions introduce links among yet more complex mathematical objects: stochastic
92 INVESTMENT
processes which, like those introduced in the final section of the chapter, are functions
not only of time, but also of the state of nature.
It is important, first of all, to recall what precisely is meant by posing a problem in
continuous time. Investment is a flow variable, measured during a time period; capital
is a stock variable, measured at a point in time, for example at the beginning of each
period. In continuous time, stock and flow variables are measured at extremely small
time intervals. If in discrete time the obvious accounting relationship (2.3) holds,
that is if
K (t + �t ) = K (t ) + I (t )�t − ‰K (t )�t,
where I (t ) denotes the average rate of investment between t and t + �t , then
when going to continuous time we need to consider the limit for �t → 0 in that
relationship. Recalling the definition of a derivative, we obtain
lim
�t→0
K (t + �t ) − K (t )
�t
≡ d K (t )
d t
≡ K̇ (t ) = I (t ) − ‰K (t ).
This expression is a particular case of more general dynamic constraints encountered
in economic applications. The level and rate of change of y(t ), a stock variable, are
linked to one or more flow variables z and to exogenous variables by an accumulation
constraint in the form of
ẏ(t ) = g (t, z(t ), y(t )). (2.A1)
The presence of t as an argument of this function represents exogenous variables
which, in the absence of uncertainty, may all be simply indexed by calendar time. The
flow variable, z(t ), is directly controlled by an economic agent, hence it is endogenous
to the problem under study. It is important to realize, however, that the stock vari-
able y(t ) is also under dynamic control by the agent. In (2.A1), the rate of change of
the stock ẏ depends, at every point in time, on the levels of y(t ) and z(t ). Since (2.A1)
states that the stock y(t ) has a first derivative with respect to time, its level is given by
past history at every t . Hence, y(t ) cannot be an instrument of optimization at time t ;
however, z(t ) may be chosen, and since in turn this affects ẏ(t ), it is possible to change
future levels of the y stock variable.
Working in continuous time, it will be necessary to make use of simple integrals.
Recall that the accumulation relationship
K (t + n�t ) = K (t ) +
n∑
j =1
[ I (t + j �t ) − ‰K (t + j �t )]�t
has a continuous time counterpart: fixing t + n�t = T and letting n → ∞ as �t → 0,
K (T ) = K (t ) +
∫ T
t
[ I (Ù) − ‰K (Ù)] d Ù.
INVESTMENT 93
Given a function of time I (Ù) and a starting point K (t ) = Í̄, the integral determining
K (T ) is solved by a function K (·) such that
d
d Ù
K (Ù) = I (Ù) − ‰K (Ù), K (t ) = Í̄.
For example, let I (Ù) = ˇK (Ù). A function in the form K (Ù) = C e (ˇ−‰) Ù with C =
e −(ˇ−‰)t Í̄ satisfies the conditions, and capital grows exponentially at rate ˇ − ‰:
K (T ) = K (t )e (ˇ−‰)(T −t ).
A2.1. Objective function
In order to characterize the economic agent’s optimal choices, we need to know not
only the form of the accumulation constraint,
K̇ (t ) = I (t ) − ‰K (t ), (2.A2)
but also that of an objective function which also explicitly recognizes the time dimen-
sion.
If flows of benefits (utility, profits,. . .) are given by some function f (t, I (t ), K (t )),
the total value as of time zero of a dynamic optimization program may be measured
by the integral
∫ ∞
0
f (t, I (t ), K (t ))e −Òt d t, (2.A3)
where Ò ≥ 0 (supposed constant for simplicity) is the intertemporal rate of discount
of the relevant benefits.
One could of course express in integral (rather than summation) form the objective
function of consumption problems, such as those encountered in Chapter 1: we shall
deal with such expressions in Chapter 4. Here, we will interpret the optimization
problem in terms of capital and investment. It is sensible to suppose that
∂ f (t, I, K )
∂ K
> 0,
i.e., a larger stock of capital must increase the cash flow;
∂ f (t, I, K )
∂ I
< 0,
i.e., investment expenditures reduce current cash flows; and
∂ 2 f (t, I, K )
∂ K 2
≤ 0, ∂
2 f (t, I, K )
∂ I 2
≤ 0,
94 INVESTMENT
with at least one strict inequality; i.e., returns to capital must be decreasing, and/or
marginal investment costs are increasing. As usual, such concavity of the objective
function ensures that, subject to the linear constraint (2.A2), the optimization prob-
lem has a unique internal solution, identified by first-order conditions, where the
second-order condition is surely satisfied.
A2.2. Constrained optimization
The problem is that of maximizing the objective function in (2.A3), while satisfying the
constraint (2.A2). The instruments of optimization are two functions, I (t ) and K (t ).
Hence an infinitely large set of choices are available: one needs to choose the flow
variable I (t ) for each of uncountably many time intervals [t, t + d t ), taking into
account its direct (negative) effects on f ( · ) and thus on the integral in (2.A3); its
(positive) effects on the K ( · ) accumulation path; and the (positive) effects of K ( · )
on f ( · ) and on the integral. “The” constraint (2.A2) is a functional constraint, that is,
a set of infinitely many constraints in the form
I (t ) − ‰K (t ) − K̇ (t ) = 0,
each valid at an instant t .
From the economic point of view, the agent is faced by a clear trade-off: investment
is costly, but it makes it possible to increase the capital stock and enjoy additional
benefits in the future.
We recall at this point that, in order to maximize a function subject to one or
more constraints, one forms a Lagrangian as a linear combination of the objective
functions and the constraints. To each constraint, the Lagrangian assigns a coefficient
(a Lagrange multiplier, or shadow price) measuring the variation of the optimized
objective function in response to a marginal loosening of the constraint. In the case
we are considering, loosening the accumulation constraint (2.A2) at time t means
granting additional capital at the margin without requiring the costs entailed by
additional investment. Thus, the shadow price is the marginal value of capital at time
t evaluated—like all of (2.A3)—at time zero.
In the case we are considering, a continuum of constraints is indexed by t , and the
Lagrange multipliers define a function of t , denoted Ï(t ) in what follows. In practice,
the Lagrangian linear combination has uncountably many terms and adds them up
giving infinitesimal weight “d t ” to each; we may write it in integral form:
L =
∫ ∞
0
f (t, I (t ), K (t ))e −Òt d t +
∫ ∞
0
Ï(t )( I (t ) − ‰K (t ) − K̇ (t )) d t.
Using the integration by parts rule,
∫ b
a
f ′(x )g (x ) d x = f (b)g (b) − f (a )g (a ) −
∫ b
a
f (x )g ′(x ) d x,
INVESTMENT 95
we obtain
−
∫ ∞
0
Ï(t )K̇ (t ) d t = −
(
lim
t→∞
Ï(t )K (t ) − Ï(0)K (0)
)
+
∫ ∞
0
Ï̇(t )K (t ) d t.
The optimization problem is ill defined if the limit does not exist. If it exists, it must
be zero (as we shall see). Setting
lim
t→∞
Ï(t )K (t ) = 0, (2.A4)
we can rewrite the “Lagrangian” as
L̃ =
∫ ∞
0
[ f (t, I (t ), K (t ))e −Òt + Ï(t )( I (t ) − ‰K (t )) + Ï̇(t )K (t )] d t + Ï(0)K (0).
Given the (2.A4), this form is completely equivalent to the previous one, and, conve-
niently, it does not feature K̇ .
The necessary conditions for a constrained maximization problem are that all
derivatives of the Lagrangian with respect to instruments (here, I (t ) and K (t )) and
shadow prices (here, Ï(t )) be zero. If we were dealing with summations rather than
integrals of expressions, which depend on the various instruments and shadow prices,
we would equate to zero the derivative of each expression. By analogy, we can differen-
tiate the function being integrated with respect to Ï(t ) for each t in L : comfortingly,
this procedure retrieves the functional accumulation constraint
( I (t ) − ‰K (t ) − K̇ (t )) = 0;
and we can differentiate the function being integrated in the equivalent expression L̃
with respect to I (t ) and K (t ) for each t , obtaining
∂ f (t, I (t ), K (t ))
∂ I (t )
e −Òt + Ï(t ) = 0, (2.A5)
∂ f (t, I (t ), K (t ))
∂ K (t )
e −Òt − Ï(t )‰ + Ï̇(t ) = 0. (2.A6)
Note that we have disregarded the term Ï(0)K (0) in L̃ when differentiating (2.A6)
at t = 0. In fact, the initial stock of capital is a parameter rather than an endogenous
variable in the optimization problem: it would be nonsensical to impose a first-order
condition in the form Ï(0) = 0. Similar considerations also rationalize the assumption
made in (2.A4). Intuitively, if the limit of K (t )Ï(t ) were finite but different from zero,
then Ï(∞)K (∞) should satisfy first-order conditions: differentiating with respect to
Ï(∞), we would need K (∞) = 0, and differentiating with respect to K (∞), we would
need Ï(∞) = 0. Either one of these conditions implies (2.A4). (Of course, this is a very
heuristic argument: it is not really rigorous to take such derivatives directly at the limit.
A more rigorous approach would consider a similar problem with a finite planning
horizon T , and take the limit for T → ∞ of first-order conditions at T .)
96 INVESTMENT
A2.3. The Hamiltonian recipe
Conditions (2.A5) and (2.A6) can be derived directly from the Hamiltonian (rather
than Lagrangian) of the problem, defined as follows:
H (t ) = [ f (t, I (t ), K (t )) + Î(t )( I (t ) − ‰K (t ))]e −Òt .
In this definition the shadow price Ï(t ) (which measures values “at time zero”) is
replaced by Î(t ) ≡ Ï(t )e Òt (which measures values “at time t ,” without discounting
them back to zero). In all other respects, the Hamiltonian expression is similar to
the function integrated in the Lagrangians introduced above, and therefore measures
the flow of benefits offered by a dynamic policy.32 The expression proposed, in fact,
multiplies the accumulation constraint’s shadow price by I (t ) − ‰K (t ), which is the
same as K̇ (t ), along any possible dynamic path. Thus, the benefit flow includes the
value (in terms of the objective function) of the increase in the stock variable K (t ).
This term makes it possible to take into account properly the problem’s intertemporal
linkages when maximizing the Hamiltonian expression.
In current-value terms, the optimality conditions encountered above read
∂ H
∂ I
= 0, (2.A7)
− ∂ H
∂ K
=
d (Î(t )e −Òt )
d t
, (2.A8)
lim
t→∞
Î(t )e −Òt K (t ) = 0. (2.A9)
The constraint, which in a static problem would be equivalent to the first-order
condition with respect to the shadow price, is imposed by the condition
∂ H (·)
∂[Îe −Òt ]
= K̇ .
A2.4. The general case
All the above derivations took the accumulation constraint to be linear, as in (2.A2),
and the rate of discount per unit time to be constant at Ò. More general specifications
can be treated similarly. If the problem given is
max
[∫ ∞
0
f (t, z(t ), y(t ))e −
∫ t
0 r (s )d s d t
]
(2.A10)
³² Note that the benefit flow is discounted back to zero in the expression proposed. Since the
discount factor is always strictly positive, one may equivalently define a current value Hamiltonian, a
similar expression without the discount factor. The interpretation of the shadow value and the relevant
dynamic optimality conditions will be different, but jointly equivalent to those outlined here.
INVESTMENT 97
subject to
ẏ(t ) = g (t, z(t ), y(t )), for all t ≥ 0, (2.A11)
then one forms the Hamiltonian
H (t ) = [ f (t, z(t ), y(t )) + Î(t )( g (t, z(t ), y(t )))]e −
∫ t
0 r (s )d s ,
and imposes first-order conditions in the form
∂ H
∂ z
= 0, (2.A12)
− ∂ H
∂ y
=
d (Î(t )e −
∫ t
0 r (s )d s )
d t
, (2.A13)
lim
t→∞
Î(t )e −
∫ t
0 r (s )d s y(t ) = 0. (2.A14)
If the constraint is not linear, then these conditions can identify an optimum even
when f (·) is not strictly concave. For example, one may have ∂ f /∂ z < 0 constant
(a constant unit investment cost) if g (·) is increasing and convex in z.
REVIEW EXERCISES
Exercise 14 Consider a firm with capital as the only factor of production. Its revenues at
time t are R(K (t )) if installed capital is K (t ). The accumulation constraint has the usual
form, K̇ (t ) = I (t ) − ‰K (t ), and the cost of investing I (t ) is a function G ( I (t )) that does
not depend on installed capital (for simplicity, Pk ≡ 1).
(a) Suppose the firm aims at maximizing the present discounted value at rate r of its
cash flows, F (t ). Express cash flows, in terms of the functions R( · ) and G ( · ),
derive the relevant first-order conditions, and characterize the solution graphically
making specific assumptions as to the derivatives of R( · ) and G ( · ).
(b) Characterize the solution under more specific assumptions: suppose revenues are
a linear function of installed capital, R(K ) = ·K , and let the investment cost
function be quadratic, G ( I ) = I + b I 2. Derive and interpret an expression for the
steady-state capital stock. What happens if ‰ = 0?
Exercise 15 A firm’s cash flows are
K · N‚ − Pk G (K̇ , K ) − w N,
where K is the capital stock, K̇ its rate of change, and N is a perfectly flexible factor. Let r
be the rate of discount applied to future cash flows, over an infinite planning horizon.
(a) What needs to be assumed about ·, ‚, G (·, ·) to ensure that the Hamiltonian first-
order conditions identify the optimal solution?
98 INVESTMENT
(b) Let · + ‚ < 1. Draw a saddlepath diagram for given Pk , w, and r ; be specific as
to what you assume about the form of G (·, ·). Show the effects of an unexpected,
permanent change of Pk , starting from the steady state.
(c) What does Pk represent in the problem? Would it be a good idea to let
G (K̇ t , K t ) =
(
K̇ t
K t
)2
?
Or would it be preferable to let G (x, y) = (x/y)3? What about G (x, y) = (x )3?
(d) Suppose that · + ‚ = 1, and let Pk G (K̇ , K ) = g (K̇ ). (Adjustment costs do not
depend on K .) The wage is constant at w(t ) = 1 only for 0 ≤ t < T : thereafter,
its level is random, and for t ≥ T
w(t ) =
{
1 + Ó, with prob. 1/2
1 − Ó, with prob. 1/2.
Write the first-order condition for investment at t = 0. How does the investment
flow depend on Ó for t < 1?
Exercise 16 A firm’s production function is
Y (t ) = ·
√
K (t ) + ‚
√
L (t ),
and its product is sold at a given price, normalized to unity. Factor L is not subject to
adjustment costs, and is paid w per unit time. Factor K obeys the accumulation constraint
K̇ (t ) = I (t ) − ‰K (t ),
and the cost of investing I is
G ( I ) = I +
„
2
I 2
per unit time (letting Pk = 1). The firm maximizes the present discounted value at rate r
of its cash flows.
(a) Write the Hamiltonian for this problem, derive and discuss briefly the first-order
conditions, and draw a diagram to illustrate the solution.
(b) Analyze graphically the effects of an increase in ‰ (faster depreciation of installed
capital) and give an economic interpretation of the adjustment trajectory.
(c) If, instead of being constant, the cost of factor L were a random variable, would
this matter for the firm’s investment policy? Explain.
Exercise 17 As a function of installed capital K , a firm’s revenues are given by
R(K ) = K − 1
2
K 2.
INVESTMENT 99
The usual accumulation constraint has ‰ = 0.25, so K̇ = I − 0.25K . Investing I costs
Pk G ( I ) = Pk
(
I + 1
2
I 2
)
.
The firm maximizes the present discounted value at rate r = 0.25 of its cash flows.
(a) Write the first-order conditions of the dynamic optimization problem, and charac-
terize the solution graphically supposing that Pk = 1 (constant).
(b) Starting from the steady state of the Pk = 1 case, show the effects of a 50% subsidy
of investment (so that Pk is halved).
(c) Discuss the dynamics of optimal investment if at time t = 0, when Pk is halved, it
is also announced that at some future time T > 0 the interest rate will be tripled,
so that r (t ) = 0.75 for t ≥ T .
Exercise 18 The revenue flow of a firm is given by
R(K , N) = 2K 1/2 N1/2,
where N is a freely adjustable factor, paid a wage w(t ) at time t ; K is accumulated
according to
K̇ = I − ‰K ,
and an investment flow I costs
G ( I ) =
(
I + 1
2
I 2
)
.
(Note that Pk = 1, hence q = Î.)
(a) Write the first-order conditions for maximization of present discounted (at rate r )
value of cash flows over an infinite planning horizon.
(b) Taking r and ‰ to be constant, write an expression for Î(0) in terms of w(t ), the
function describing the time path of wages.
(c) Evaluate that expression in the case where w(t ) = w̄ is constant, and characterize
the solution graphically.
(d) How could the problem be modified so that investment is a function of the average
value of capital (that is, of Tobin’s average q)?
� FURTHER READING
Nickell (1978) offers an early, very clear treatment of many issues dealt with in
this chapter. Section 2.5 follows Hayashi (1982). For a detailed and clear treatment
of saddlepath dynamics generated by anticipated and non-anticipated parameter
changes, see Abel (1982). The effects of uncertainty on optimal investment flows under
convex adjustment costs, sketched in Section 2.4, were originally studied by Hartman
(1972). A more detailed treatment of optimal inaction in a certainty setting may be
found in Bertola (1992).
100 INVESTMENT
Dixit (1993) offers a very clear treatment of optimization problems under
uncertainty in continuous time, introduced briefly in the last section of the chapter.
Dixit and Pindyck (1994) propose a more detailed and very accessible discussion of
the relevant issues. Bertola (1998) contains a more complete version of the irreversible
investment problem solved here. For a very complex model of irreversible investment
and dynamic aggregation, and for further references, see Bertola and Caballero (1994).
When discussing consumption in Chapter 1, we emphasized the empirical
implications of optimization-based theory, and outlined how theoretical refinements
were driven by the imperfect fit of optimality conditions and data. Of course, the-
oretical relationships have also been tested and estimated on macroeconomic and
microeconomic investment data. These attempts have met with considerably less
success than in the case of consumption. While aggregate consumption changes are
remarkably close to the theory’s unpredictability implication, aggregate investment’s
relationship to empirical measures of q is weak and leaves much to be explained by
output and by distributed lags of investment, and its relationship to empirical meas-
ures of Jorgenson’s user cost are also empirically elusive. (For surveys, see Chirinko,
1993, and Hubbard, 1998.) The evidence does not necessarily deny the validity of
theoretical insights, but it certainly calls for more complex modeling efforts. Even
more than in the case of consumption, financial constraints and expectation formation
mechanisms play a crucial role in determining investment in an imperfect world.
Together with monetary and fiscal policy reactions, financial and expectational
mechanisms are relevant to more realistic models of macroeconomic dynamics of
the type studied in Section 2.5. As in the case of consumption, however, attention
to microeconomic detail (as regards heterogeneity of individual agents’ dynamic
environment, and adjustment-cost specifications leading to infrequent bursts of
investment) has proven empirically useful: aggregate cost-of-capital measures are
statistically significant in the long run, and short-run dynamics can be explained by
fluctuations of the distribution of individual firms within their inaction range (Bertola
and Caballero, 1994).
� REFERENCES
Abel, A. B. (1982) “Dynamic Effects of Temporary and Permanent Tax Policies in a q model of
Investment,” Journal of Monetary Economics, 9, 353–373.
Barro, R. J., and X. Sala-i-Martin (1995) “Appendix on Mathematical Methods,” in Economic
Growth, New York: McGraw-Hill.
Bertola, G. (1992) “Labor Turnover Costs and Average Labor Demand,” Journal of Labor Eco-
nomics, 10, 389–411.
(1998) “Irreversible Investment (1989),” Ricerche Economiche/Research in Economics, 52,
3–37.
and R. J. Caballero (1994) “Irreversibility and Aggregate Investment,” Review of Economic
Studies, 61, 223–246.
Blanchard, O. J. (1981) “Output, the Stock Market and the Interest Rate,” American Economic
Review, 711, 132–143.
INVESTMENT 101
Chirinko, R. S. (1993) “Business Fixed Investment Spending: A Critical Survey of Modelling
Strategies, Empirical Results, and Policy Implications,” Journal of Economic Literature, 31,
1875–1911.
Dixit, A. K. (1990) Optimization in Economic Theory, Oxford: Oxford University Press.
(1993) The Art of Smooth Pasting, London: Harcourt.
and R. S. Pindyck (1994) Investment under Uncertainty, Princeton: Princeton University
Press.
Hartman, R. (1972) “The Effect of Price and Cost Uncertainty on Investment,” Journal of
Economic Theory, 5, 258–266.
Hayashi, F. (1982) “Tobin’s Marginal q and Average q : A Neoclassical Interpretation,” Economet-
rica, 50, 213–224.
Hubbard, R. G. (1998) “Capital-Market Imperfections and Investment,” Journal of Economic
Literature 36, 193–225.
Jorgenson, D. W. (1963) “Capital Theory and Investment Behavior,” American Economic Review
(Papers and Proceedings), 53, 247–259.
(1971) “Econometric Studies of Investment Behavior,” Journal of Economic Literature, 9,
1111–1147.
Keynes, J. M. (1936) General Theory of Employment, Interest, and Money, London: Macmillan.
Nickell, S. J. (1978) The Investment Decisions of Firms, Cambridge: Cambridge University Press.
Tobin, J. (1969) “A General Equilibrium Approach to Monetary Theory,” Journal of Money,
Credit, and Banking, 1, 15–29.
3 Adjustment Costs in
the Labor Market
In this chapter we use dynamic methods to study labor demand by a single
firm and the equilibrium dynamics of wages and employment. As in previous
chapters, we aim at familiarizing readers with methodological insights. Here
we focus on how uncertainty may be treated simply in an environment that
allows economic circumstances to change, with given probabilities, across a
well-defined and stable set of possible states (a Markov chain). We derive
some generally useful technical results from first principles and, again as in
previous chapters, we discuss their economic significance intuitively, with
reference to their empirical and practical relevance in a labor market context.
In reality, adjustment costs imposed on firms by job security legislation are
widely different across countries, sectors, and occupations, and the literature
has given them a prominent role when comparing European and American
labor market dynamics. (See Bertola, 1999, for a survey of theory and evi-
dence.) In most European countries, legislation imposes administrative and
legal costs on employers wishing to shed redundant workers. Together with
other institutional differences (reviewed briefly in the suggestions for further
reading at the end of the chapter), this has been found to be an important
factor in shaping the European experience of high unemployment in the last
three decades of the twentieth century.
Section 3.1 derives the optimal hiring and firing decisions of a firm that is
subject to adjustment costs of labor. The next two sections characterize the
implications of these optimal policies for the dynamics and the average level
of employment. Finally, in Section 3.4 we study the interactions between the
decisions of firms and workers when workers are subject to mobility costs,
focusing in particular on equilibrium wage differentials. The entire analysis of
this chapter is based on a simple model of uncertainty, characterized formally
in the appendix to the chapter.
Remember that in Chapter 2 we viewed the factor N, which was not subject
to adjustment costs, as labor. Hence we called its remuneration per unit of
time, w, the “wage rate.” In the absence of adjustment costs, the optimal
labor input had a simple and essentially static solution: that is, the optimal
employment level needed to satisfy the condition
∂ R(t, K (t ), N)
∂ N
= w(t ). (3.1)
LABOR MARKET 103
Figure 3.1. Static labor demand
This first-order condition is necessary and sufficient if the total revenues
R(·) are an increasing and concave function of N. Under this condition,
∂ R(·)/∂ N is a decreasing function of N and (3.1) implicitly defines the
demand function for labor N∗(t, K (t ), w(t )).
If the above condition holds, the employment level depends only on the
levels of K , of wages, and of the exogenous variables that, in the absence of
uncertainty, are denoted by t . This relationship between employment, wages,
and the value of the marginal product of labor is illustrated in Figure 3.1,
which is familiar from any elementary textbook. In fact, the same relation
can be obtained assuming that firms simply maximize the flow of profits in
a given period, rather than the discounted flow of profits over the entire time
horizon.
The fact that the static optimality condition remains valid in the potentially
more complex dynamic environment illustrates a general principle. In order
for the dynamic aspects of an economic problem to be relevant, the effects
of decisions taken today need to extend into the future; likewise, decisions
taken in the past must condition current decisions. Adjustments costs (linear
or strictly convex) introduced for investment in Chapter 2 make it costly for
firms to undo previous choices. As a result, when firms decide how much
to invest, they need to anticipate their future input of capital. But if labor is
simply compensated on the basis of its effective use, and if variations in N(t )
do not entail any cost, then forward-looking considerations are irrelevant.
Firms do not need to form expectations about the future because they know
104 LABOR MARKET
that it will always be possible for them to react immediately, and without any
cost, to future events.33
3.1. Hiring and Firing Costs
In Chapter 2, on investment, the presence of more than one state variable
would have complicated the analysis of the dynamic aspects of optimal invest-
ment behavior. In particular, we would not have been able to use the sim-
ple two-dimensional phase diagram. It was therefore helpful to assume that
no factors other than capital were subject to adjustment costs. Since in this
chapter we aim to analyze the dynamic behavior of employment, it would
not be very useful or realistic to retain the assumption that variations in
employment do not entail any costs for the firm. For example, as a result of the
technological and organizational specificity of labor, firms incur hiring costs
because they need to inform and instruct newly hired workers before they are
as productive as the incumbent workers. The creation and destruction of jobs
(turnover) often entails costs for the workers too, not only because they may
need to learn to perform new tasks, but also in terms of the opportunity cost
of unemployment and the costs of moving. The fact that mobility is costly
for workers affects the equilibrium dynamics of wages and employment, as
we will see below. In fact, it is in order to protect workers against these costs
of mobility that labor contracts and laws often impose firing costs, so that
firms incur costs both when they expand and when they reduce their labor
demand.
We start this chapter by considering the optimal hiring and firing policies
of a single firm that is subject to hiring and firing costs. As in the case of
investment, the solution described by (3.1), in which the marginal produc-
tivity and the marginal costs of labor are equated in every period, is no longer
efficient with adjustment costs. Like the installation costs for machinery and
equipment, the costs of hiring and firing require a firm to adopt a forward-
looking employment policy.
The economic implications of such behavior could well be studied using the
continuous time optimization methods introduced in the previous chapter,
and some of the exercises below explore analogies with the methods used
in the study of investment there. We adopt a different approach, however, in
order to explore new aspects of the dynamic problems that we are dealing with
and to learn new techniques. As in Chapter 1, we assume that the decisions
³³ Even in the absence of adjustment costs, the consumption and savings decisions studied in
Chapter 1 have dynamic implications via the budget constraint of agents, since current consumption
affects the resources available for future consumption. Adjustment costs may also be relevant for the
consumption of non-durable goods if the utility of agents depends directly on variations (and not just
levels) of consumption. This could occur for instance as a result of habits or addiction.
LABOR MARKET 105
are taken in discrete time and under uncertainty about the future. Since we
also want to take adjustment costs and equilibrium features into account, it is
useful to simplify the model.
In what follows, we assume that firms operate in an environment in which
one or more exogenous variables (like the retail price of the output, the
productive efficiency, or the costs of inputs other than labor) fluctuate so
that a firm is sometimes more and sometimes less inclined to hire workers.
In (3.1), the capital stock of a firm K (t ) (which we do not analyze explicitly
in this chapter) and the time index t could represent these exogenous factors.
To simplify the analysis as much as possible, we assume that the complex of
factors that are relevant for the intensity of labor demand has only two states:
a strong state indexed by g , and a weak state indexed by b. If the alternation
between these two states were unambiguously determined by t , the firm would
be able to determine the evolution of the exogenous variables. Here we shall
assume that the evolution of demand is uncertain. In each period the demand
conditions change with probability p from weak to strong or vice versa. Hence,
in each period the firm takes its decisions on how many workers to hire or
fire knowing that the prevailing demand conditions remain unchanged with
probability (1 − p).
As in the analysis of investment, we assume that the firm maximizes the
current discounted value of future cash flows. Given that the variations of Z
are stochastic, the objective of the firm needs to be expressed in terms of the
expected value of future cash flows. To simplify the interpretation of the trans-
ition probability p, it is convenient to adopt a discrete-time setup. Assuming
that firms are risk-neutral, we can then write
Vt = E t
[ ∞∑
i =0
(
1
1 + r
)i
( R( Zt +i , Nt +i ) − w Nt +i − G (�Nt +i ))
]
, (3.2)
where:
� E t [·] denotes the expected value conditional on the information avail-
able at date t (this concept is defined formally in the chapter’s appendix
within the context of the simple model studied here);
� r is the discount rate of future cash flow, which we assume constant for
simplicity; likewise, w denotes the constant wage that a worker receives
in any given period;
� the total revenues R(·) depend on employment N and a variety of
exogenous factors indexed by Zt +i : if the demand for labor is strong
in period t + i , then Zt +i = Zg , while if labor demand is weak, then
Zt +i = Zb ;
� the function G (·) represents the costs of hiring and firing, or turnover,
which in any given period t + i depends on the net variation �Nt +i ≡
106 LABOR MARKET
Nt +i − Nt +i −1 of the employment level with respect to the preceding
period; this net variation of employment plays the same role as the
investment level I (t ) in the analysis of capital in the preceding chapter.
Exercise 19 To explore the analogy with the investment problem of the previous
chapter, rewrite the objective function of the firm assuming that the turnover costs
depend on the gross variations of employment, and that this does not coincide
with �N because a fraction ‰ of the workers employed in each period resign, for
personal reasons or because they reach retirement age, without costs for the firm.
Note also that (3.2) does not feature the price of capital, Pk : what could such a
parameter mean in the context of the problems we study in this chapter?
In order to solve the model, we need to specify the functional form of G (·).
As in the case of investment, the adjustment costs may be strictly convex. In
that case, the unit costs of turnover would be an increasing function of the
actual variation in the employment level. This would slow down the optimal
response to changes in the exogenous variables. However, there are also good
reasons to suppose that adjustment costs are concave. For instance, a single
instructor can train more than one recruit, and the administrative costs of a
firing procedure may well be at least partially independent of the number of
workers involved.
The case of linear adjustment costs that we consider here lies in between
these extremes. The simple proportionality between the cost and the amount
of turnover simplifies the characterization of the optimal labor demand poli-
cies. We therefore assume that
G (�N) =
{
(�N) H if �N ≥ 0,
−(�N) F if �N < 0, (3.3)
where the minus sign that appears in the �N < 0 case ensures that any
variation in employment is costly for positive values of parameters H and F .
By (3.3), the firm incurs a cost H for each unit of labor hired, while any unit
of labor that is laid off entails a cost F . Both unit costs are independent of the
size of �N, and, since H is not necessarily equal to F , the model allows for a
separate analysis of hiring and firing costs.
As in the analysis of investment, firms’ optimal actions are based on the
shadow value of labor, defined as the marginal increase in the discounted cash
flow of the firm if it hires one additional unit of labor. When a firm increases
the employment level by hiring an infinitesimally small unit of labor while
keeping the hiring and firing decisions unchanged, the objective function
defined in (3.2) varies by an amount of
Ît = E t
[ ∞∑
i =0
(
1
1 + r
)i (
∂ R( Zt +i , Nt +i )
∂ Nt +i
− w
)]
(3.4)
LABOR MARKET 107
per unit of additional employment. If the employment levels Nt +i on the right-
hand side of this equation are the optimal ones, (3.4) measures the marginal
contribution of an infinitesimally small labor input variation around the opti-
mally chosen one. This follows from the envelope theorem, which implies that
infinitesimally small variations in the employment level do not have first-order
effects on the value of the firm.
3.1.1. OPTIMAL HIRING AND FIRING
To characterize the optimal policies of the firm, we assume that the realiza-
tion of Zt is revealed at the beginning of period t , before a firm chooses
the employment level Nt that remains valid for the entire time period.
34
Hence,
E t
[
∂ R( Zt , Nt )
∂ Nt
− w
]
=
∂ R( Zt , Nt )
∂ Nt
− w.
We can separate the first term of the summation in (3.4), whose discount
factor is equal to one, from the remaining terms. To simplify notation, we
define
Ï( Zt +i , Nt +i ) ≡
∂ R( Zt +i , Nt +i )
∂ Nt +i
,
and write
Ît = Ï( Zt , Nt ) − w + E t
[ ∞∑
i =1
(
1
1 + r
)i
(Ï( Zt +i , Nt +i ) − w)
]
= Ï( Zt , Nt ) − w +
(
1
1 + r
)
E t
[ ∞∑
i =0
(
1
1 + r
)i
(Ï( Zt +1+i , Nt +1+i ) − w)
]
.
At date t + 1 agents know the realization of Zt +1, while at t they know only the
probability distribution of Zt +1. The conditional expectation at date E t +1[·] is
therefore based on a broader information set than that at E t [·].
³⁴ We could have adopted other conventions for the timing of the exogenous and endogenous stock
variables. For example, retaining the assumption that Nt is determined at the start of period t , we
could assume that the value of Zt is not yet observed when firms take their hiring and firing decisions;
it would be a useful exercise to repeat the preceding analysis under this alternative hypothesis. Such
timing conditions would be redundant in a continuous-time setting, but the elegance of a reformula-
tion in continuous time would come at the cost of additional analytical complexity in the presence of
uncertainty.
108 LABOR MARKET
Applying the law of iterative expectations, which is discussed in detail in the
Appendix, we can then write
E t
[ ∞∑
i =0
(
1
1 + r
)i
(Ï( Zt +1+i , Nt +1+i ) − w)
]
= E t
[
E t +1
[ ∞∑
i =0
(
1
1 + r
)i
(Ï( Zt +1+i , Nt +1+i ) − w)
]]
.
Recognizing the definition of Ît +1 in the above expression, we obtain a recurs-
ive relation between the shadow value of labor in successive periods:
Ît = Ï( Zt , Nt ) − w +
1
1 + r
E t [Ît +1]. (3.5)
This relationship is similar to the expression that was obtained by differentiat-
ing the Bellman equation in the appendix to Chapter 1, and is thus equivalent
to the Euler equation that we have already encountered on various occasions
in the preceding chapters.
Exercise 20 Rewrite this equation in a way that highlights the analogy between
this expression and the condition r Î = ∂ R(·)/∂ K + Î̇, which was derived when
we solved the investment problem using the Hamiltonian method.
The optimal choices of the firm are obvious if we express them in terms
of the shadow value of labor. First of all, the marginal value of labor cannot
exceed the costs of hiring an additional unit of labor. Otherwise the firm
could increase profits by choosing a higher employment level, contradicting
the hypothesis that employment maximizes profits. Hence, given that the costs
of a unit increase in employment are equal to H , while the marginal value of
this additional unit is Ît , we must have Ît ≤ H .
Similarly, if Ît < −F , the firm could increase profits immediately by fir-
ing workers at the margin: the immediate cost of firing one unit of labor,
−F , would be more than compensated by an increase in the cash flow of
the firm. Again, this contradicts the assumption that firms maximize profits.
Hence, if the dynamic labor demand of a firm is such that it maximizes (3.2),
we must have
−F ≤ Ît ≤ H (3.6)
for each t . Moreover, either the first or the second inequality turns into an
equality sign if �Nt �= 0: formally, at an interior optimum for the hiring and
firing policies of a firm, we have d G (�Nt )/d (�Nt ) = Ît .
Whenever the firm prefers to adjust the employment level rather than wait
for better or worse circumstances, the marginal cost and benefit of that action
need to equal each other. If the firm hires a worker we have Ît = H , which
LABOR MARKET 109
implies that the marginal benefit of an additional worker is equal to the hiring
costs. Similarly, if a firm fires workers, it must be true that Î = −g ; that is,
the negative marginal value of a redundant worker needs to be compensated
exactly by the cost of firing this worker g . Notice also that the shadow value
of the marginal worker can be negative only if the wage exceeds the value of
marginal productivity.
As in the case of investment, the conditions based on the shadow value
defined in (3.4) are not in themselves sufficient to formulate a solution for
the dynamic optimization problem. In particular, if ∂ R(·)/∂ N depends on N,
then in order to calculate Ît as in (3.4) we need to know the distribution of
{Nt +i , i = 0, 1, 2, . . . }, and thus we need to have already solved the optimal
demand for labor. It would be useful if we could study the case in which the
revenues of the firm are linear in N. This would be analogous to the model we
used to show that optimal investment (with convex adjustment costs) could
be based on the average q . However, in this case static labor demand is not
well defined. In Figure 3.1 the value of marginal productivity would give rise
to a horizontal line at the height of w and the optimality conditions would
be satisfied for a continuum of employment levels. In fact, in the case of
investment we saw that the value of capital stock and the size of firms were ill
defined when the average value of q was the only determinant of investment;
to characterize optimal investment decisions, we needed convex adjustment
costs.
These difficulties are familiar from the study of dynamic investment prob-
lems in an environment without uncertainty. In the presence of uncertainty,
even after solving the dynamic optimization problem, we could not assume
that firms know their future employment levels: the evolution of employment
{Nt +i , for i = 0, 1, 2, . . . } depends not only on the passing of time i , but also
on the stochastic realizations of {Zt +i }. To tackle this difficulty, we can use the
fact that a profit-maximizing firm will react optimally to each realization of
this random variable. Hence we can deduce the probability distribution of the
endogenous variable Nt +i from the probability distribution of {Zt +i }.
At this point, the advantage of restricting the state space to two realizations
becomes clear. In what follows we guess that the endogenous variables take
on only two different values depending on the realization of Zt . If Zt = Zg ,
then Nt = Ng and Ît = Îg ; on the contrary, if Zt = Zb , then the employment
level is given by Nt = Nb , and its shadow value is equal to Ît = Îb . When labor
demand is strong, equation (3.5) can therefore be written in the form
Îg = Ï( Zg , Ng ) − w +
1
1 + r
[(1 − p)Îg + pÎb ]. (3.7)
The shadow value Îg is given by the expected discounted shadow value in the
next period plus the “dividend” in the current period, which is equal to the
difference between the value of marginal productivity Ï(·) and the wage w.
110 LABOR MARKET
Given that Ît +1 has only two possible values, the expected value in (3.7) is
simply the product of Îg and the probability (1 − p) that the state remains
unchanged, plus Îb times the probability p that the state changes from good
to bad. Similarly, when labor demand is weak, we can write
Îb = Ï(Nb , Zb ) − w +
1
1 + r
[ pÎg + (1 − p)Îb ]. (3.8)
If each transition from the “strong” to the “weak” state induces a firm to fire
workers, then in order to satisfy (3.6) we need to have Îb = −F in bad states,
and Îg = H in good states. Given that H and F are constants, Ît indeed takes
only two values, as was guessed in order to derive (3.7) and (3.8). Substituting
Îb = −F and Îg = H in these expressions, we can solve the resulting system
of linear equations to obtain
Ï(Ng , Zg ) = w + p
F
1 + r
+ (r + p)
H
1 + r
,
Ï(Nb , Zb ) = w − (r + p)
F
1 + r
− p H
1 + r
.
(3.9)
3.2. The Dynamics of Employment
The character of the optimal labor demand policy is illustrated in Figure 3.2.
The weak case is associated with a demand curve that lies below the demand
curve in the strong case. Without hiring and firing costs, firms would equalize
the value of marginal productivity to the wage rate w in each of the two
states. Hence, with H = F = 0, the costs of labor are simply equal to w and
employment oscillates between the levels identified by vertical dashed lines
in the figure. If the cost of hiring H and/or the cost of firing F are posi-
tive, this equality no longer holds. If labor demand (Z = Zg ) is strong, the
marginal productivity of labor exceeds the wage rate. Symmetrically, when
labor demand is weak ( Z = Zb ), the value of marginal productivity is less
than the wage. Hence, it looks as if the optimal hiring decisions are based
on a wage that is higher than w, while the firing decisions seem to be based
on one that is lower. The dashed lines in Figure 3.2 illustrate a pair of “shadow
wages” and employment levels that may be compatible with this. The vertical
arrows indicate how these “shadow wages” differ from the actual wage, while
the horizontal arrows indicate the differences between the static and dynamic
employment levels in both states.
Exercise 21 In Figure 3.2 both demand curves for labor are decreasing functions
of employment. That is, we have assumed that ∂ 2 R(·)/∂ N2 < 0. How would the
problem of optimal labor demand change if ∂ 2 R( Z i , N)/∂ N2 = 0 for i = b, g ?
And if this were true only for i = b?
LABOR MARKET 111
Figure 3.2. Adjustment costs and dynamic labor demand
Hiring and firing costs reduce the size of fluctuations in the employment
level between good and bad states. As mentioned in the introduction to this
chapter, this very intuitive insight can be brought to bear on empirical evi-
dence from markets characterized by differently stringent employment pro-
tection legislation. In fact, the evidence unsurprisingly indicates that countries
with more stringent labor market regulations feature less pronounced cyclical
variations in employment. This is consistent with the simple model considered
here (which takes wages to be exogenously given and constant) if the “firm”
represents all employers in the economy, since wages in all countries are quite
insensitive to cyclical fluctuations at the aggregate level (see Bertola, 1990,
1999, and references therein).
It is certainly not surprising to find that turnover costs imply employment
stability. If a negative cash flow is associated with each variation in the employ-
ment level, firms optimally prefer to respond less than fully to fluctuations
in labor demand. As indicated by the term labor hoarding, the firm values
its labor force when considering the future as well as the current marginal
revenue product of labor.
Exercise 22 Show that it is optimal for the firm not to hire or fire any worker if
both H and F are large relatively to the fluctuations in Z .
To illustrate the role of the various parameters and of the functional form of
R(·), it is useful to examine some limit cases. First of all, we consider the case in
which F = 0 and H > 0: firms can fire workers at no cost, but hiring workers
entails a cost over and above the wage. In order to evaluate how these costs
affect firms’ propensity to create jobs, we rewrite the first-order condition for
112 LABOR MARKET
the strong labor demand case as
Ï(Ng , Zg ) − w = r
H
1 + r
+ p
H
1 + r
. (3.10)
The first term on the right-hand side of this expression can be interpreted
as a pure financial opportunity cost. If invested in an alternative asset with
interest rate r , the hiring cost would yield a perpetual flow of dividends equal
to r H from next period onwards, or, equivalently, a flow return of r H/(1 + r )
starting this period. Hence, if the good state lasts for ever and p = 0, the
presence of hiring costs simply corresponds to a higher wage rate. If, on
the contrary, the future evolution of labor demand is uncertain and p > 0,
the hiring costs also influence the employment level via the second term on the
right of (3.10). The higher is p, the less inclined are firms to hire workers. The
explanation is that firms might lose the resources invested in hiring a worker
if this worker is laid off when labor demand switches from the good to the
bad state. In the limit case with p = 1, labor demand oscillates permanently
between the two states and (3.10) simplifies to Ï(Ng , Zg ) = w + H : since the
marginal unit of labor that is hired in a good state is fired with probability one
in the next period, we need to add the entire hiring cost to the salary.
In periods with weak labor demand, the firm does not hire and hence does
not incur any hiring cost. Nonetheless, the firm’s choices are still influenced
by H : the employment level in the bad state needs to satisfy the following
condition:
Ï(Nb , Zb ) = w − p
H
1 + r
. (3.11)
In this equation a higher value of H is equivalent to a lower wage flow. This
may seem surprising, but is easily explained. Retaining one additional unit of
labor in the bad state costs the firm w, but the firm saves the cost of hiring an
additional unit of labor in the next period if the demand conditions improve,
which occurs with probability p.
The reasoning for the case H = 0 and F > 0 is similar. In periods with weak
labor demand,
Ï(Nb , Zb ) = w − (r + p)
F
1 + r
. (3.12)
The firing cost F —which is saved if the firm decides not to fire a marginal
worker—is equivalent to a lower wage in periods with weak labor demand.
Conversely, in periods of strong labor demand we have
Ï(Ng , Zg ) = w + p
F
1 + r
, (3.13)
and in this case the firing costs have the same effect as a wage increase: the
fear that the firm may have to pay the firing cost if (with probability p) labor
LABOR MARKET 113
demand weakens in the next period deters the firm from hiring. Like hiring
costs, firing costs therefore induce labor hoarding on the part of firms. In the
case of firing costs, the firm values the units of labor it decides not to fire:
moreover, the fear that the firm may not be able to reduce employment levels
enough in periods with weak labor demand deters firms from hiring workers
in good states.
Before turning to further implications and applications of these simple
results, it is worth mentioning that qualitatively similar insights would of
course be valid in more formally sophisticated continuous-time models, such
as those introduced in Chapter 2’s treatment of investment. Convex adjust-
ment costs are not a particularly realistic representation of real-life employ-
ment protection legislation, but it is conceptually simple to let downward
adjustment be costly (rather than impossible, or never profitable) in the irre-
versible investment models introduced in Sections 2.6 and 2.7.
Readers familiar with that material may wish to try the following exercises,
which propose relatively simple versions of the models solved in the references
given. Such readers, however, should be warned that both settings only yield
a set of equations whose solutions have to be sought numerically, thus illus-
trating the advantages in terms of tractability of the Markov chain methods
discussed in this chapter.
Exercise 23 (Bertola, 1992) Let time be continuous. Suppose labor’s revenue is
given by
R(L , Z ) = Z
L 1−‚
1 − ‚ , 0 < ‚ < 1,
and let the cyclical index Z be the following trigonometric function of time:
Z (Ù) = K 1 + K 2 sin
(
2
p
Ù
)
, K 1 > K 2 > 0.
Discuss the possible realism of such perfectly predictable cycles, and outline the
optimality conditions that must be obeyed over each cycle by the optimal employ-
ment path if the wage is given at w and the employer faces adjustment costs
C (L̇ (Ù))L̇ (Ù) for C (x ) = h if Ẋ > 0, C (x ) = − f if Ẋ < 0.
Exercise 24 (Bentolila and Bertola, 1990). Let the dynamics of the exogenous
variables relevant for labor demand be given by
d Z (t ) = ËZ (t ) d t + ÛZ (t ) d W(t ),
and let the marginal revenue product of labor be written in the form Z L −‚. The
wage is given at w, hiring is costless, firing costs f per unit of labor, and workers
quit costlessly at rate ‰ so that d L (t ) = −‰L (t ) if the firm neither hires nor fires
at time t . Write the optimality conditions for the firm’s employment policy and
discuss how a solution may be found.
114 LABOR MARKET
3.3. Average Long-Run Effects
We have seen that positive values of H and F reduce a firm’s propensity to
hire and fire workers. Adjustment costs therefore reduce fluctuations in the
employment level. Their effect on the average employment level is less clear-
cut. This depends essentially on the magnitude of the increase in employment
in periods with strong labor demand, relative to the decrease in employment
in periods with a weak labor demand. In general, either of the two effects
may dominate. The net effect on average employment is therefore a priori
ambiguous and depends, as we will see, on two specific elements of the model:
on the one hand, that firms discount future cash flows at a positive rate, and on
the other hand, that optimal static labor demand is often a non-linear function
of the wage and of aggregate labor market conditions denoted by Z .
Since transitions between strong and weak states are symmetric, the ergodic
distribution is very simple: as shown in the appendix to this chapter, the
probability that we observe weak labor demand in a period indefinitely far
away in the future is independent of the current state. Hence, in the long run,
both states have equal probability. Assigning a probability of one-half to each
of the two first-order conditions in (3.9), we can calculate the average value of
the marginal productivity of labor:
Ï(Ng , Zg ) + Ï(Nb , Zb )
2
= w +
r
2
H − F
1 + r
. (3.14)
If r > 0, then the costs of hiring tend to increase the value of marginal pro-
ductivity in the long run: intuitively, the quantity 1
2
r H/(1 + r ) is added to
the wage w, because in half of the periods the firm pays a cost H to hire the
marginal unit of labor. In doing so, the firm forgoes the flow proceed r H that
would accrue from next period onwards if it had invested H in a financial
asset. The effects of firing costs F are similar, but perhaps less intuitive.
If F > 0 and discount r is positive, then average marginal productivity is
reduced by an amount equal to 1
2
r F /(1 + r ). To understand how a higher-
cost F may reduce marginal productivity despite the increase in labor costs,
it is useful to note that this effect is absent if r = 0. Hence, the reduction
in marginal productivity is a dynamic feature. Because the firm discounts
future revenues, the cash flow in different periods is not equivalent: firing
costs increase the willingness of a firm to pay any given wage level by more
than they reduce this willingness in periods with a strong labor demand when
only in the smaller discounted value is taken into consideration.
Graphically, with a positive value of r , firing costs are more important
than hiring costs in the determination of the length of the arrow that points
downwards in Figure 3.2. Conversely, hiring costs are more important in the
determination of the length of the arrows that point upwards. Considering the
employment levels associated with each level of the (shadow) wage, we can
LABOR MARKET 115
conclude that the positive impact of firing costs on low levels of employment
are more pronounced than their negative impact on the employment level in
the good state.
3.3.1. AVERAGE EMPLOYMENT
Figure 3.2 shows that variations in employment levels depend not only on
differences between marginal products in the two cases and the wage, but also
on the slope of the demand curve. If, as is the case in the figure, the slope
of the demand curve is much steeper in the good state than in the bad state,
the relative length of the two horizontal arrows can be such as to imply net
employment effects that differ from what is suggested by the shadow wages in
the two states. To isolate this effect, it is useful to set r = 0. In that case optimal
demand maximizes the average rather than the actual value of the cash flow,
and (3.14) then simplifies to
Ï(Ng , Zg ) + Ï(Nb , Zb )
2
= w. (3.15)
The turnover costs no longer appear in this expression. This indicates that a
firm can maximize average profits by setting the average value of marginal
productivity equal to wages. The average equality does not imply that both
terms are necessarily the same. In fact, rewriting the conditions in (3.9) for
the case in which r = 0 gives
Ï(Ng , Zg ) = w + p( F + H ),
Ï(Nb , Zb ) = w − p( F + H ).
(3.16)
Hence the firm imputes a share p of the total turnover costs that it incurs along
a completed cycle to the marginal unit on the hiring and the firing margin.
Exercise 25 Discuss the case in which the firm receives a payment each time it
hires a worker, for example because the state subsidizes employment creation, and
H = −F . What would happen if the cost of hiring were so strongly negative that
H + F < 0 even in the case of firing costs F ≥ 0?
Even in the case when r = 0 and hiring and firing costs do not affect the
expected marginal productivity of labor, the effect of adjustment costs on
average employment is zero only when the slope of the labor demand curve
is constant. In fact, if
Ï(N, Zg ) = f ( Zg ) − ‚N, Ï(N, Zb ) = g ( Zb ) − ‚N,
116 LABOR MARKET
then, for any pair of functions f (·) and g (·), the relationships in (3.16) imply
Ng =
f ( Zg ) − w − p( F + H )
‚
, Nb =
g ( Zb ) − w + p( F + H )
‚
.
Hence, in this case average employment,
Ng + Nb
2
=
1
2
f ( Zg ) + g ( Zb ) − w
‚
,
coincides with the employment level that would be generated by the (wider)
fluctuations that would keep the marginal productivity of labor always equal
to the wage rate.
Conversely, if the slope of the labor demand curve depends on the employ-
ment level and/or on Z , then the average of Ng and Nb that satisfies (3.16) for
H + F > 0, and thus (3.15), is not equal to the average of the employment
levels that satisfy the same relationships for H = F = 0. The mechanism by
which nonlinearities with respect to N generate mean effects, even in the
case where r = 0, is similar to the one encountered in the discussion of the
effects of uncertainty on investment in Chapter 2. If y = Ï(N; Z ) is a convex
function in its first argument, then the inverse Ï−1( y; Z ) is also convex, so that
N = Ï( y; Z ). For each given value of Z , therefore, Jensen’s inequality implies
that
Ï(x ; Z ) + Ï( y; Z )
2
> Ï
(x + y
2
; Z
)
.
As illustrated in Figure 3.3, this means that, if deviations from the wage in
(3.16) occurred around a stable marginal revenue product of labor function,
that function’s convexity would imply that employment fluctuations average
to a lower level, because the lower N associated with a given productivity
increase is larger in absolute value than the employment increase associated
with a symmetric productivity decline.
Exercise 26 Suppose that r = 0, so that (3.15) holds, and that Ï(N, Zg ) =
f ( Zg ) + ‚(N) and Ï(N, Zb ) = g ( Zb ) + ‚(N) for a decreasing function ‚(·)
which does not depend on Z . Discuss the relationship between variations of
employment and its average level.
In general, the functional form of the labor demand function need not be
constant and may depend on the average conditions of the labor market. The
shape of labor demand may depend not only on N, but also on Z . Hence,
Jensen’s inequality does not suffice to pin down an unambiguous relationship
between the convexity of the demand function in each of the states and the
average level of employment. State dependency of the functional form of
labor demand is therefore an additional (and ambiguous) element in the
determination of average employment.
LABOR MARKET 117
Figure 3.3. Nonlinearity of labor demand and the effect of turnover costs on average
employment, with r = 0
Exercise 27 Consider the case where Ï(N, Zg ) = Zg − ‚N and Ï(N, Zb ) =
Zb − „N, and where ‚ and „ satisfy ‚ > „ > 0. What is the general effect of
firing costs on the average employment level? And what is its effect in the limit
case with r = 0? Why can’t we analyze this effect in the limit case with „ = 0 as
in exercise 21?
3.3.2. AVERAGE PROFITS
In summary, average employment is very mildly and ambiguously related to
turnover costs and, in particular, to firing costs. This is consistent with empir-
ical evidence across countries characterized by differently stringent employ-
ment protection legislation, in that it is hard to find convincing effects of such
legislation on average long-run unemployment when other relevant factors
(such as the upward pressure on wages exercised by unions) are appropriately
taken into account (see Bertola 1990, 1999, and references therein).
If not for employment levels, one can obtain unambiguous results for the
average profits of the firm, or, more precisely, the average of the objective
function in (3.2). Defined in this chapter as the surplus of the revenues of
the firm over the total cost of labor, that function could obviously also include
costs that are not related to labor, like the compensation of other factors of
production. The negative slope of the demand curve for labor implies that
a firm’s revenues would exceed the costs of labor in a static environment if
all units of labor were paid according to marginal productivity (the striped
area in Figure 3.4). Since total revenues correspond to the area below the
marginal revenue curve, this surplus is given by the dotted area in Figure 3.4.
118 LABOR MARKET
Figure 3.4. The employer’s surplus when marginal productivity is equal to the wage
The same negative slope guarantees that the dynamic optimization problem
studied above has a well defined solution, and that the firm’s surplus is smaller
when turnover costs are larger—not only when these costs are associated with
a lower average employment level, but also when the adjustment costs induce
an increase in the average employment level of the firm.
To illustrate these (general) results, we shall consider the simple case of a
linear demand curve for labor: with Ï(N, Z ) = Z − ‚N, the total revenues
associated with given values of Z and N are simply given by (N, Z ) = Z N −
1
2
‚N2. Since the surplus (N, Z ) − w N is maximized when N = N∗ = ( Z −
w)/‚ and the marginal return from labor coincides with the wage, the first-
order term is zero in a Taylor expansion of the surplus around the optimum.
In the case considered here, all terms of order three and above are also zero,
and from
(N, Z ) − w N = (N∗, Z ) − w N∗ + 1
2
∂ 2[(N, Z ) − w N]
∂ N2
∣∣∣∣
N∗
(N − N∗)2,
we can conclude that the choice of employment level N �= N∗ implies a loss of
surplus equal to 1
2
‚(N − N∗)2.
As a result of hiring and firing costs, firms choose employment levels that
differ from those that maximize the static optimality conditions and thus
accept lower flow returns. In the case examined here, the marginal produc-
tivity of labor is a linear function and optimal employment levels can easily be
LABOR MARKET 119
derived from (3.9):
Ng =
(
Zg − w −
p F + (r + p) H
1 + r
)
1
‚
,
Nb =
(
Zb − w +
(r + p) F + p H
1 + r
)
1
‚
.
Hence, the surplus is inferior to the static optimum by a quantity equal to(
p F + (r + p) H
1 + r
)2
1
2
in the strong case, and by (
(r + p) F + p H
1 + r
)2
1
2
in the weak case.
Given the presence of turnover costs, it is rational for the firm to accept these
static losses, because the smaller variations in employment permit the firm
to save expenses on hiring and firing costs. But even though firms correctly
weigh the marginal loss of revenues and the costs of turnover, the firm does
experience the lower revenues and adjustment costs. Hence, both average
profits and the optimized value of the firm are necessarily lower in the presence
of turnover costs, and this can have adverse implications for the employers’
investment decisions.
3.4. Adjustment Costs and Labor Allocation
In this section we shift attention from the firms to workers, and we analyze
the factors that determine the equilibrium value of wages in this dynamic
environment. If the entire aggregate demand for labor came from a single firm,
then wages and aggregate employment should fluctuate along a curve that is
equally “representative” of the supply side of the labor market. Looking at the
implications of hiring and firing restrictions from this aggregate perspective
suggests that the increased stability of wages and employment around a more
or less stable average may or may not be desirable for workers. Moreover,
these costs reduce the surplus of firms, which in turn may have a negative
impact on investment and growth. Here readers should remember the results
of Chapter 2, which showed that a higher degree of uncertainty increased
firms’ willingness to invest as long as labor was flexible. Conversely, the rigidity
of employment due to turnover costs can therefore be expected to reduce
investment.
120 LABOR MARKET
Obviously, however, it is not very realistic to interpret variations in aggre-
gate employment in terms of a more or less intense use of labor by a represen-
tative agent. In fact, real wages are more or less constant along the business
cycle, making it very difficult to interpret the dynamics of employment in
terms of the aggregate supply of labor. Moreover, unemployment is typically
concentrated within some subgroups of the population. Higher firing costs are
associated with a smaller risk of employment loss and therefore have impor-
tant implications when, as is realistic, losing one’s job is painful (because real
wages do not make agents indifferent to employment). In order to concentrate
on these disaggregate aspects, it is instructive to consider the implications of
adjustment costs for the flow of employment between firms subject to the type
of demand shocks analyzed above. To abstract from purely macroeconomic
phenomena, it is useful to assume that there is such a large number of firms
that the law of large numbers holds, so that exactly half of the firms are
in the good state in any period. The same arguments used to compute the
ergodic distribution of a single firm imply that, if the transition probability
is the same for all firms, and if transitions are independent events, then the
aggregate distribution of firms is stable over time. In fact, if we denote the
share of firms with a strong demand by Pt , then a fraction p Pt of these firms
will move to the state with a low demand. Hence, if the transitions of firms
are independent events, the effective share of firms that is hit by a decline
in demand approaches the expected value if the number of firms is higher.35
Symmetrically, we can expect that a share p of the 1 − Pt firms in the bad state
receive a positive shock. The inflow of firms into ranks of the firms with strong
product demand is thus equal to a share p − p Pt of the total number of firms
if the latter is infinitely large. Since Pt diminishes in proportion to p Pt and
increases in proportion to p(1 − Pt ), the variation in the fraction Pt of firms
with strong product demand is given by
Pt +1 − Pt = p − p Pt − p Pt = p(1 − 2 Pt ). (3.17)
This expression is positive if Pt < 0.5, negative if Pt > 0.5, and equal to zero if
Pt = P∞ = 0.5. Hence, the frequency distribution of a large number of firms
tends to stabilize at P = 0.5, as does the probability distribution of a single
firm (discussed in the chapter’s appendix).
Exercise 28 What is the role of p in (3.17)? Discuss the case p = 0.5.
³⁵ Imagine that the “relevant states of nature” are represented by the outcome of a series of coin
tosses. Associate the value one with the outcome “heads” and zero with “tails,” so that the resulting
random variable X has expected value 12 and variance
1
4 . The fraction of Xi = 1 with n tosses, Pn =∑n
i =1 Xi /n, has expected value
1
2 , and, if the realizations are independent, its variance (1/n
2 )(n/4) =
(1/4n) decreases with n. Hence, in the limit with n → ∞, the variation is zero and P∞ = 0.5 with
certainty.
LABOR MARKET 121
Figure 3.5. Dynamic supply of labor from downsizing firms to expanding firms, without
adjustment costs
This analogy between the probability and frequency distributions is valid
whenever a large population of agents faces “idiosyncratic uncertainty,” and
not just in the simple case described above. The “idiosyncratic” character of
uncertainty means that individual agents are hit by independent events. With
a large enough number of agents, the flows into and out of a certain state will
then cancel each other out and the frequency distribution of these states will
tend to converge to a stable distribution.
Exercise 29 Assume that the probability of a transition from b to g is still given
by p, while the probability of a transition in the opposite direction is now allowed
to be q �= p. What is the steady-state proportion of firms in state g ?
In the steady state with idiosyncratic uncertainty, in which Pt +1 = Pt = 0.5,
each time a firm incurs a negative shock, another firm will incur a positive
shock to labor productivity. Notice that this does not rely on a causal relation-
ship between these events. That is, given that the demand shocks are assumed
to be idiosyncratic, the above simultaneity does not refer to a particular other
firm. We do not know which particular firm is hit by a symmetric shock,
but we do know that there are as many firms with strong and weak product
demand. It is therefore the relative size of these two groups that is constant
over time, while the identity of individual firms belonging to each group
changes over time.
As before, the downward-sloping curves in Figure 3.5 correspond to the
two possible positions of the demand curve for labor. Owing to the linearity
of these curves, we can directly translate predictions in terms of wages into
122 LABOR MARKET
predictions about employment, abstracting from relatively unimportant
effects deriving from Jensen’s inequality. The length of the horizontal axis
represents the total labor force that is available to firms. The workers who
are available for employment within a hiring firm are those who cannot find
employment elsewhere—and, in particular, those who decided to leave their
jobs in firms that are hit by a negative shock and are firing workers. The
dotted line in the figure represents the labor demand by one such firm which
is measured from right to left, that is in terms of residual employment after
accounting for employment generated by firms with a strong demand.
The workers who move from a shrinking firm to an expanding firm lose
their employment in the first firm. The alternative wage of workers who are
hired by expanding firms is therefore given by the demand curve for labor of
downsizing firms, which essentially plays the role of an aggregate supply curve
of labor. Hence, in the absence of firing costs, the equilibrium will be located
at point E ∗ in Figure 3.5, at which the marginal productivity is the same in all
firms and is equal to the common wage rate w.
3.4.1. DYNAMIC WAGE DIFFERENTIALS
As noted in the introduction to this chapter, it is certainly not very realistic to
assume that labor mobility is costless for workers. Therefore we shall assume
here that workers need to pay a cost Í each time they move to a new job.
In reality, these costs could correspond at least partly to the loss of income
(unemployment); however, for simplicity we shall assume that labor mobility
is instantaneous. The objective in the dynamic optimization program of work-
ers is to maximize the net expected income from work—given by the wage wt
in periods in which the worker remains with her current employer, and by
wt − Í in the other periods. Denoting the net expected value of labor income
(or “human capital”) of individual j by W
j
t implies the following relation:
W
j
t =
{
w
j
t +
1
1+r
E t (W
j
t +1) if she does not move,
w
j
t − Í + 11+r E t (W
j
t +1) if she moves to a new job.
(3.18)
Notice that each individual worker can be in two states only. At the begin-
ning of a period a worker may be employed by a firm with a strong demand for
labor, in which case the worker can earn wg without having to incur mobility
costs. Since a firm in state g may receive a negative shock with probability p,
the human capital Wg of each of its workers satisfies the following recursive
relationship:
Wg = wg +
1
1 + r
[ p Wb + (1 − p)Wg ], (3.19)
LABOR MARKET 123
where Wb denotes the human capital of a worker employed by a firm with
weak demand. The human capital of these workers satisfies the relationship
Wb = wb +
1
1 + r
[ p Wg + (1 − p)Wb ] (3.20)
if the worker chooses to remain with the same firm. In this case the worker
earns a wage wb , which, as we will see, is generally lower than wg . Because
a transition to the bad state is accompanied by a wage reduction, it pays
the worker to consider a move to a firm in the good state. In the long-run
equilibrium there is a constant fraction of these firms in the economy. Hence,
each time a firm incurs a negative shock, there is another firm that incurs
a positive shock and will be willing to hire the workers who choose to leave
their old firm. For these workers, (3.18) implies that
Wb = wg − Í +
1
1 + r
[(1 − p)Wg + p Wb ]. (3.21)
The mobility to a good firm g entails a cost Í, but, since the move is instant-
aneous, it immediately entitles the worker to a wage wg and to consider the
future from the perspective of a firm with strong demand—which is different
from the firms considered in (3.20), since the probability is 1 − p rather
than p that state g will be realized next period. Since the option to move is
available to all workers, the two alternatives considered in (3.20) and (3.21)
need to be equivalent; otherwise there would be an arbitrage opportunity
inducing all or none of the workers to move. Both of these outcomes would be
inconsistent with equilibrium. From the equality between (3.20) and (3.21),
we can immediately obtain
wg − wb = k −
1 − 2 p
1 + r
(Wg − Wb ). (3.22)
If p = 0.5, the wage differential between expanding and shrinking firms is
exactly equal to Í, the cost for a worker of moving between any two firms
in a period. But if p < 0.5, that is if shocks to demand are persistent, then
(3.22) takes into account the capital gains Wg − Wb from mobility. Subtract-
ing (3.20) from (3.19) and using (3.22), we obtain
Wg − Wb = Í. (3.23)
In equilibrium, the cost of mobility needs to be equal to the gain in terms
of higher future income. Substituting (3.23) into (3.22), we obtain an explicit
expression for the difference between the flow salaries in the two states:
wg − wb =
2 p + r
1 + r
Í. (3.24)
As mentioned above, firms in the good state pay a higher wage if mobility
is voluntary and costly for workers. Equilibrium is illustrated in Figure 3.6:
124 LABOR MARKET
Figure 3.6. Dynamic supply of labor from downsizing firms to expanding firms, without
employers’ adjustment costs, if mobility costs Í per unit of labor
in order to offer a higher salary, firms in state g employ fewer workers than
in Figure 3.5, where we assumed that labor mobility was costless. Intuitively,
workers are willing to bear the cost Í only if there are advantages associated
with mobility, and the market can offer this advantage in terms of a higher
wage. As in the case of the hiring and firing costs for firms, workers face a
trade-off between the maximization of the static flow income—which would
be obtained at point E ∗ in Figure 3.5—and minimizing the costs of mobility—
which obviously would be zero if employment at each firm were completely
stable, and if we would consider a uniform allocation of labor across firms
without taking into account the differences in idiosyncratic productivity. The
equilibrium allocation illustrated in Figure 3.6, in which (3.24) is satisfied,
balances two requirements: the shaded area represents the loss of flow output
in each period, which is such that it exactly offsets the mobility costs that
would have to be incurred to move the economy closer to E ∗.
This modeling perspective has interesting empirical implications. Wage
dispersion should be more pronounced in situations of higher uncertainty
for given workers’ mobility costs, and when workers bear larger mobility
costs. Bertola and Ichino (1995) and Bertola and Rogerson (1997) find that
these implications offer useful insights when comparing disaggregate wage
and employment dispersion statistics from different countries and different
periods. From a methodological point of view, it is important to note that the
relationships between the various endogenous variables that are implied by the
optimal dynamic mobility choices of workers satisfy non-arbitrage conditions
LABOR MARKET 125
of a financial nature. If a single worker intends to maximize the net expected
value of her future income, then the labor market needs to offer workers who
decide to move the appropriate increase in wages to make this “investment”
profitable.
� APPENDIX A3: (TWO-STATE) MARKOV PROCESSES
We now illustrate some of the techniques that are applicable to stochastic processes in
the form
xt +1 =
{
xb with prob. p if xt = xg , with prob. 1 − p if xt = xb
xg with prob. p if xt = xb , with prob. 1 − p if xt = xg ,
(3.A1)
the simple Markov chain that describes the evolution of all of the endogenous and
exogenous variables in this chapter.
A3.1. Conditional probabilities
Let Pt, t +i = Probt (xt +i = xg |It ) denote the probability, based on all the information
available at time t , of the realization (or ‘the actual level’) of the process at t + i equals
xg . From (3.A1) it is clear that
Pt, t +1 =
{
p if xt = xb
1 − p if xt = xg
(3.A2)
Figure A3.1 illustrates how we can compute this probability for i > 1. If the process
starts from xg at t = 1, probability P1,3 is given by the sum of the two paths that are
consistent with x3 = xg : the first one, which is constant, has probability (1 − p)2; the
second one, in which we observe two consecutive variations of opposite sign, has p2.
Hence, P1,3 = (1 − p2) + p2 = 1 − 2 p(1 − p) if x1 = xg . Similar reasoning implies
that P1,3 = 2 p(1 − p) if x1 = xb .
Figure A3.1. Possible time paths of a two-state Markov chain
126 LABOR MARKET
Using similar techniques, we could calculate the probability of observing xg at each
date i > 2, starting from xb or xg . However, it is not necessary to do so in order to
understand that all conditional probabilities from the point of view of period t are
functions of xt . In fact,
Pt,t +1 ≡ Probt (xt +1 = xg |It ) = p if xt = xb ,
Pt,t +1 = 1 − p if xt = xg ,
and the two possible values of Pt,t +1 are different if p �= 1 − p, that is if p �= 0.5.
Conversely, any other information available at t is irrelevant for the evaluation of both
Probt (xt +1 = xg ) and Pt,t +i for i > 1. Since the transition probabilities in (3.A1) are
valid between t + 1 and t + 2,
Pt,t +2 = (1 − p) Pt,t +1 + p(1 − Pt,t +1) = (1 − 2 p) Pt,t +1 + p (3.A3)
depends only on Pt,t +1, which in turn depends only on xt (or is constant, and equal to
0.5, if p = 0.5). Equation (3.A3) can be generalized to any pair of dates, in the form
Pt,t +i +1 = (1 − p) Pt,t +i + p(1 − Pt,t +i ) = (1 − 2 p) Pt,t +i + p. (3.A4)
Even if we extend the length of the chain beyond time t + 2, all probabilities Pt,t +i are
still functions of xt only. (Thus, the process is Markovian process in levels.)
A3.2. The ergodic distribution
Using equation (3.A4), we can characterize the dynamics of the conditional probabil-
ities for any future period. We write
Pt,t +i +1 − Pt,t +i = (1 − 2 Pt,t +i ) p
{
> 0 if Pt,t +i < 0.5,
< 0 if Pt,t +i > 0.5.
(3.A5)
Evaluating the probability that the process is in state g for ever increasing values of i ,
that is for periods increasingly further away in the future, we find that this probability
decreases if it is above 0.5, and increases if it is below 0.5. Hence, with time the
probability Pt,t +i converges monotonically to its “ergodic” value Pt,∞ = 0.5.
A3.3. Iterated expectations
The conditional expectation of xt +i at date t for each i ≥ 0 is given by
Et [xt +i ] = xg Pt,t +i + xb (1 − Pt,t +i ) = xb + (xg − xb ) Pt,t +i , (3.A6)
which depends only on the current value of the process if (3.A1) is satisfied.
We can use (3.A1) again to obtain the relationship between Pt,t +i and Pt +1,t +i , that is
between the probabilities that are assigned at different moments in time to realizations
of xt +i within the same future period. As we saw above, the realization of xt +1 is in
LABOR MARKET 127
general not relevant for the probability of xt +i = xg from the viewpoint of t + 1. From
the viewpoint of period t , the probabilities of the same event can be written as
Pt,t +i = ( Pt +1,t +i |xt +1 = xb ) · P(xt +1 = xb |It )
+ ( Pt +1,t +i |xt +1 = xg ) · P(xt +1 = xg |It ) (3.A7)
(where P(xt +1 = xg |It ) = 1 − p if xt = xg , and so forth). This allows us to verify the
validity of the law of iterative expectations in this context. For i ≥ 2, we write
Et +1[xt +i ] = xb + (xg − xb ) Pt +1,t +i . (3.A8)
At date t + 1, the probability on the right-hand side of (3.A8) is given, while at time t
it is not possible to evaluate this probability with certainty: it could be ( Pt +1,t +i |xt +1 =
xb ), or ( Pt +1,t +i |xt +1 = xg ), depending on the realization of xt +1. Given the uncertainty
associated with this realization, from the point of view of time t the conditional
expectation E t +1[xt +1+i ] is itself a random variable, and we can therefore calculate its
expected value:
E t [ E t +1[xt +i ]] = P(xt +1 = xb |It ) E t +1[xt +i |xt +1 = xb ]
+ P(xt +1 = xg |It ) E t +1[xt +i |xt +1 = xg ].
Inserting (3.A8), using (3.A7), and recalling (3.A6), it follows that
E t [xt +i ] = xb + (xg − xb ) Pt,t +1 = E t [ E t +1[xt +i ]].
EXERCISES
Exercise 30 Consider the production function
F (k, l ; ·) = (k + l )· − ‚
2
l 2 − „
2
k2.
(a) Suppose a firm with that production function has given capital k = 1, can hire l
costlessly, pays given wage w = 1, and must pay F = 1 for each unit of l fired. If ·t
takes the values 4 or 2 with equal probability p = 0.5, and future cash flows are
discounted at rate r = 1, what is the optimal dynamic employment policy?
(b) Suppose capital depreciates at rate ‰ = 1 and can be costlessly adjusted to ensure
that its marginal product is equal to the cost of funds r + ‰. Does capital adjust-
ment change the optimal employment pattern? What are the optimal levels of
capital when ·t = 4 and when ·t = 2?
Exercise 31 Consider a labor market in which firms have a linear demand curve for labor
subject to parallel oscillations, Ï(N, Z ) = Z − ‚N. As in the main text, Z can take two
values, Zb and Zg > Zb , and oscillates between these values with transition probability
p. Also, the wage oscillates between two values, wb and wg > wb , and the oscillations of
the wage are synchronized with those of Z .
128 LABOR MARKET
(a) Calculate the levels of employment Nb and Ng that maximize the expected dis-
counted value of the revenues of the firm if the discount rate is equal to r and if the
unit hiring and firing costs are given by H and F respectively.
(b) Compute the mobility cost k at which the optimal mobility decisions are consistent
with a wage differential �w = wg − wb when workers discount their future
expected income at rate r .
(c) Assume that the labor market is populated by 1,000 workers and 100 firms of
which exactly half are in a good state in each period. What levels of the wage wb
are compatible with full employment (with wg = wb + �w as above), under the
hypothesis that labor mobility is instantaneous?
Exercise 32 Suppose that the marginal productivity of labor is given by Ï( Z, N) = Z −
‚N, and that the indicator Zt can assume three rather than two values {Zb , Z M , Zg },
with Zb < Z M < Zg , where the realizations of Zt are independent, while the wage rate
is constant and equal to w̄ in each period. Finally, hiring and firing costs are given by H
and F respectively. What form does the recursive relationship
Î( Zt , Nt ) = Ï( Zt , Nt ) − w̄ + E t [Î( Zt +1, Nt +1)]
take if the parameters are such that only fluctuations from Zb to Zg or vice versa induce
the firm to adjust its labor force, while the employment level is unaffected for fluctuations
from and to the average level of labor demand (from Zb to Z M or vice versa, or from Z M
to Zg or vice versa)? Which are the two employment levels chosen by the firm?
� FURTHER READING
Theoretical implications of employment protection legislation and firing costs
are potentially much wider than those illustrated in this chapter. For example,
Bertola (1994) discusses the implications of increased rigidity (and less efficiency) in
models of growth like the ones that will be discussed in the next chapter, using a two-
state Markov process similar to the one introduced in this chapter but specified in a
continuous-time setting where state transitions are described as Poisson events of the
type to be introduced in Chapter 5.
Economic theory can also explain why employment protection legislation is
imposed despite its apparently detrimental effects. Using models similar to those
discussed here, Saint-Paul (2000) considers how politico-economic interactions can
rationalize labor market regulation and resistance to reforms, and Bertola (2004)
shows that, if workers are risk-averse, then firing costs may have beneficial effects:
redundancy payments not only can remedy a lack of insurance but also can foster
efficiency if they allow forward-looking mobility decisions to be taken on a more
appropriate basis.
Of course, job security provisions are only one of the many institutional features
that help explain why European labor markets generate lower employment than Amer-
ican ones. Union behavior and taxation play important roles in determining high-
wage, low-employment outcomes. And macroeconomic shocks interact in interesting
LABOR MARKET 129
ways with wage and employment rigidities in determining the dynamics of employ-
ment and unemployment across the Atlantic and within Europe. For economic and
empirical analyses of the European unemployment problem from an international
comparative perspective, see Bean (1994), Alogoskoufis et al. (1995), Nickell (1997),
Nickell and Layard (1999), Blanchard and Wolfers (2000), and Bertola, Blau, and Kahn
(2002), which all include extensive references.
� REFERENCES
Alogoskoufis, G., C. Bean, G. Bertola, D. Cohen, J. Dolado, G. Saint-Paul (1995) Unemployment:
Choices for Europe, London: CEPR.
Bean, C. (1994) “European Unemployment: A Survey,” Journal of Economic Literature, 32,
573–619.
Bentolila, S., and G. Bertola (1990) “Firing Costs and Labor Demand: How Bad is Eurosclerosis?”
Review of Economic Studies, 57, 381–402.
Bertola, G. (1990) “Job Security, Employment and Wages,” European Economic Review, 34,
851–886.
(1992) “Labor Turnover Costs and Average Labor Demand,” Journal of Labor Economics,
10, 389–411.
(1994) “Flexibility, Investment, and Growth,” Journal of Monetary Economics, 34, 215–238.
(1999) “Microeconomic Perspectives on Aggregate Labor Markets,” in O. Ashenfelter
and D. Card (eds.), Handbook of Labor Economics, vol. 3B, 2985–3028, Amsterdam: North-
Holland.
Bertola, G. (2003) “A Pure Theory of Job Security and Labor Income Risk,” Review of Economic
Studies, 71(1): 43–61.
F. D. Blau, and L. M. Kahn (2002) “Comparative Analysis of Labor Market Outcomes:
Lessons for the US from International Long-Run Evidence,” in A. Krueger and R. Solow (eds.),
The Roaring Nineties: Can Full Employment Be Sustained? New York: Russell Sage, pp. 159–218.
and A. Ichino (1995) “Wage Inequality and Unemployment: US vs Europe,” in B. Bernanke
and J. Rotemberg (eds.), NBER Macroeconomics Annual 1995, 13–54, Cambridge, Mass.: MIT
Press.
and R. Rogerson (1997) “Institutions and Labor Reallocation,” European Economic Review,
41, 1147–1171.
Blanchard, O. J., and J. Wolfers (2000) “The Role of Shocks and Institutions in the Rise of
European Unemployment: The Aggregate Evidence,” Economic Journal, 110: C1–C33.
Nickell, S. (1997) “Unemployment and Labor Market Rigidities: Europe versus North America,”
Journal of Economic Perspectives, 11(3): 55–74.
and R. Layard (1999) “Labor Market Institutions and Economic Performance,” in
O. Ashenfelter and D. Card (eds.), Handbook of Labor Economics, vol. 3C, 3029–3084, Amster-
dam: North-Holland.
Saint-Paul, G. (2000) The Political Economy of Labour Market Institutions, Oxford: Oxford
University Press.
4 Growth in Dynamic
General Equilibrium
The previous chapters analyzed the optimal dynamic behavior of single
consumers, firms, and workers. The interactions between the decisions of
these agents were studied using a simple partial equilibrium model (for the
labor market). In this chapter, we consider general equilibrium in a dynamic
environment.
Specifically, we discuss how savings and investment decisions by individual
agents, mediated by more or less perfect markets as well as by institutions
and collective policies, determine the aggregate growth rate of an economy
from a long-run perspective. As in the previous chapters, we cannot review
all aspects of a very extensive theoretical and empirical literature. Rather, we
aim at familiarizing readers with technical approaches and economic insights
about the interplay of technology, preferences, market structure, and insti-
tutional features in determining dynamic equilibrium outcomes. We review
the relevant aspects in the context of long-run growth models, and a brief
concluding section discusses how the mechanisms we focus on are relevant
in the context of recent theoretical and empirical contributions in the field of
economic growth.
Section 4.1 introduces the basic structure of the model, and Section 4.2
applies the techniques of dynamic optimization to this base model. The next
two sections discuss how decentralized decisions may result in an optimal
growth path, and how one may assess the relevance of exogenous technological
progress in this case. Finally, in Section 4.5 we consider recent models of
endogenous growth. In these models the growth rate is determined endoge-
nously and need not coincide with the optimal growth rate.
The problem at hand is more interesting, but also more complex, than those
we have considered so far. To facilitate analysis we will therefore emphasize the
economic intuition that underlies the formal mathematical expressions, and
aim to keep the structure of the model as simple as possible. In what follows
we consider a closed economy. The national accounting relationship
Y (t ) = C (t ) + I (t ) (4.1)
between the flows of production (Y ), consumption (C ), and investment
therefore holds at the aggregate level. Furthermore, for simplicity, we do not
distinguish between flows that originate in the private and the public sectors.
EQUILIBRIUM GROWTH 131
The distinction between consumption and investment is based on the
concept of capital. Broadly speaking, this concept encompasses all durable
factors of production that can be reproduced. The supply of capital grows
in proportion with investments. At the same time, however, existing capital
stock is subject to depreciation, which tends to lower the supply of capital. As
in Chapter 2, we formalize the problem in continuous time. We can therefore
define the stock of capital, K (t ) at time t , without having to specify whether
it is measured at the beginning or the end of a period. In addition, we assume
that capital depreciates at a constant rate ‰. The evolution of the supply of
capital is therefore given by
lim
�t→0
K (t + �t ) − K (t )
�t
≡ d K (t )
d t
≡ K̇ (t ) = I (t ) − ‰K (t ).
The demand for capital stems from its role as an input in the productive
process, which we represent by an aggregate production function,
Y (t ) = F (K (t ), . . .).
This expression relates the flow of aggregate output between t and t + �t
to the stocks of production factors that are available during this period. In
principle, these stocks can be measured for any infinitesimally small time
period �t . However, a formal representation of the aggregate production
process in a single equation is normally not feasible. In reality, the capital
stock consists of many different durable goods, both public and private. At
the end of this chapter we will briefly discuss some simple models that make
this disaggregate structure explicit, but for the moment we shall assume that
investment and consumption can be expressed in terms of a single good as in
(4.1). Furthermore, for simplicity we assume that “capital” is combined with
only one non-accumulated factor of production, denoted L (t ).
In what follows, we will characterize the long-run behavior of the economy.
More precisely, we will consider the time period in which per capita income
grows at a non-decreasing rate and in which the ratio between aggregate
capital K and the flow of output Y tends to stabilize. The amount of capital
per worker therefore tends to increase steadily. The case in which the growth
rate of output and capital exceeds the growth rate of the population represents
an extremely important phenomenon: the steady increase in living standards.
But in this chapter our interest in this type of growth pattern stems more from
its simplicity than from reality. Even though simple models cannot capture all
features of world history, analyzing the economic mechanisms of a growing
economy may help us understand the role of capital accumulation in the real
world and, more generally, characterize the economic structure of growth
processes.
132 EQUILIBRIUM GROWTH
4.1. Production, Savings, and Growth
The dynamic models that we consider here aim to explain, in the simplest pos-
sible way, on the one hand the relationship between investments and growth,
and on the other hand the determinants of investments. The production
process is defined by
Y (t ) = F (K (t ), L (t )) = F (K (t ), A(t )N(t )), (4.2)
where N(t ) is the number of workers that participate in production in period
t and A(t ) denotes labor productivity; at time t each of the N(t ) workers
supplies A(t ) units of labor. Clearly, there are various ways to specify the
concept of productive efficiency in more detail. The amount of work of an
individual may depend on her physical strength, on the time and energy
invested in production, on the climate, and on a range of other factors. How-
ever, modeling these aspects not only complicates the analysis, but also forces
us to consider economic phenomena other than the ones that most interest
us.
To distinguish the role of capital accumulation (which by definition
depends endogenously on savings and investment decisions) from these other
factors, it is useful to assume that the latter are exogenous. The starting point
of our analysis is the Solow (1956) growth model. This model is familiar from
basic macroeconomics textbooks, but the analysis of this section is relatively
formal. We assume that L (t ) grows at a constant rate g ,
L̇ (t ) = g L (t ), L (t ) = L (0)e g t ,
and for the moment we abstract from any economic determinant for the level
or the growth rate of this factor of production. Furthermore, we assume that
the production function exhibits constant returns to scale, so that
F (ÎK , ÎL ) = ÎF (K , L )
for any Î. The validity of this assumption will be discussed below in the light
of its economic implications. Formally, the assumption of constant returns to
scale implies a direct relationship between the level of output and capital per
unit of the non-accumulated factor,
y(t ) ≡ Y (t )/L (t ) and k(t ) ≡ K (t )/L (t ).
Omitting the time index t , we can write
y =
F (K , L )
L
=
L F (K /L , 1)
L
= f (k),
EQUILIBRIUM GROWTH 133
which shows that the per capita production depends only on the capital/labor
ratio. The accumulation of the stock of capital per worker is given by
k̇(t ) =
d
d t
(
K (t )
L (t )
)
=
K̇ (t )L (t ) − L̇ (t )K (t )
L (t )2
=
K̇ (t )
L (t )
− L̇ (t )
L (t )
K (t )
L (t )
.
Since K̇ (t ) = I (t ) − ‰K (t ) and L̇ (t ) = g L (t ), we thus get
k̇(t ) =
I (t )
L (t )
− ( g + ‰)k(t ).
Assuming that the economy as a whole devotes a constant proportion s of
output to the accumulation of capital,
C (t ) = (1 − s )Y (t ), I (t ) = s Y (t ),
then I (t )/L (t ) = s Y (t )/L (t ) = s y(t ) = s f (k(t )), and thus
k̇(t ) = s f (k(t )) − ( g + ‰)k(t ).
The main advantage of this expression, which is valid only under the simpli-
fying assumptions above, is that it refers to a single variable. For any value of
k(t ), the model predicts whether the capital stock per worker tends to increase
or decrease, and using the intermediate steps described above one can fully
characterize the ensuing dynamics of the aggregate and per capita income.
The amount of capital per worker tends to increase when
s f (k(t )) > ( g + ‰)k(t ), (4.3)
and to decrease when
s f (k(t )) < ( g + ‰)k(t ). (4.4)
Having reduced the dynamics of the entire economy to the dynamics of
a single variable, we can illustrate the evolution of the economy in a simple
graph as shown in Figure 4.1. Clearly, the function s f (k) plays a crucial role
in these relationships. Since f (k) = F (k, 1) and F (·) has constant returns to
scale, we have
f (Îk) = F (Îk, 1) ≤ F (Îk, Î) = ÎF (k, 1) = Î f (k) for Î > 1, (4.5)
where the inequality is valid under the hypothesis that increasing L , the
second argument of F (·, ·), cannot decrease production. Note, however, that
the inequality is weak, allowing for the possibility that using more L may leave
production unchanged for some values of Î and k.
If the inequality in (4.5) is strict, then income per capita tends to increase
with k, but at a decreasing rate, and f (k) takes the form illustrated in the
figure. If a steady state ks s exists, it must satisfy
s f (ks s ) = ( g + ‰)ks s . (4.6)
134 EQUILIBRIUM GROWTH
Figure 4.1. Decreasing marginal returns to capital
4.1.1. BALANCED GROWTH
The expression on the right in (4.3) defines a straight line with slope ( g + ‰).
In Figure 4.2, this straight line meets the function s f (k) at ks s : for k < ks s ,
k̇ = s f (k) − ( g + ‰)k > 0, and the stock of capital tends to increase towards
ks s ; for k > ks s , on the contrary, k̇ < 0, and in this case k tends to decrease
towards its steady state value ks s .
Figure 4.2. Steady state of the Solow model
EQUILIBRIUM GROWTH 135
The speed of convergence is proportional to the vertical distance between
the two functions, and thus decreases in absolute value while k approaches its
steady-state value. In the long-run the economy will be very close to the steady
state. If k ≈ ks s �= 0, then k = K /L is approximately constant; given that
d
d t
K (t )
L (t )
=
(
K̇ (t )
K (t )
− L̇ (t )
L (t )
)
K (t )
L (t )
≈ 0 ⇒ K̇ (t )
K (t )
≈ L̇ (t )
L (t )
,
the long-run growth rate of K is close to the growth rate of L . Moreover, since
F (K , L ) has constant returns to scale, Y (t ) will grow in the same proportion.
Hence, in steady state the model follows a “balanced growth” path, in which
the ratio between production and capital is constant. For the per capita capital
stock and output, we can use the definition that L (t ) = A(t )N(t ). This yields
Y (t )
N(t )
=
Y (t )
L (t )
L (t )
N(t )
= f (kt ) A(t ),
K (t )
N(t )
= kt A(t ).
In terms of growth rates, therefore, we get the expression
(d/d t )[Y (t )/N(t )]
Y (t )/N(t )
=
(d/d t ) f (kt )
f (kt )
+
Ȧ(t )
A(t )
.
When kt tends to a constant ks s , as in the above figure, then d f (kt )/d t =
f ′(kt )k̇ tends to zero; only a positive growth rate Ȧ(t )/ A(t ) can allow a long-
run growth in the levels of per capita income and capital. In other words, the
model predicts a long-run growth of per capita income only when L grows
over time and whenever this growth is at least partly due to an increase in A
rather than an increase in the number of workers N.
If we assume that the effective productivity of labor A(t ) grows at a positive
rate g A, and that
g ≡ L̇
L
=
Ȧ
A
+
Ṅ
N
= g A + g N ,
then the economy tends to settle in a balanced growth path with exogenous
growth rate g A: the only endogenous mechanism of the model, the accumula-
tion of capital, tends to accompany rather than determine the growth rate of
the economy. A once and for all increase in the savings ratio shifts the curve
s f (k) upwards, as in Figure 4.3. As a result, the economy will converge to a
steady state with a higher capital intensity, but the higher saving rate will have
no effect on the long-run growth rate.
In particular, the accumulation of capital cannot sustain a constant growth
of income (whatever the value of s ) if g = 0 and f ′′(k) < 0. For simplicity,
consider the case in which L is constant and ‰ = 0. In that case,
Ẏ
Y
=
f ′(k)k̇
f (k)
= s f ′(k), (4.7)
136 EQUILIBRIUM GROWTH
Figure 4.3. Effects of an increase in the savings rate
and an increase in k clearly reduces the growth rate of per capita income.
Asymptotically, the growth rate of the economy is zero if limk→∞ f ′(k) = 0,
or it reaches a positive limit if for k → ∞ the limit of f ′(k) = ∂ F (·)/∂ K is
strictly positive.
Exercise 33 Retaining the assumption that s is constant, let ‰ > 0. How does the
asymptotic behavior of Ẏ /Y depend on the value of limk→∞ f ′(k)?
4.1.2. UNLIMITED ACCUMULATION
Even if f ′(k) is decreasing in k, nothing prevents the expression on the left
of (4.3) from remaining above the line ( g + ‰)k for all values of k, implying
that no finite steady state exists (ks s → ∞). For this to occur the following
condition needs to be satisfied:
lim
k→∞
f ′(k) ≡ f ′(∞) ≥ g + ‰
s
, (4.8)
so that the distance between the functions does not diminish any further when
k increases from a value that is already close to infinity.
Consider, for example, the case in which g = ‰ = 0: in this case the steady-
state capital stock k is infinite even if limk→∞ f ′(k) = 0. This does not imply
that the growth rate remains high, but only that the growth rate slows down
so much that it takes an infinite time period before the economy approaches
something like a steady state in which the ratio between capital and output
remains constant. In fact, given that the speed of convergence is determined
EQUILIBRIUM GROWTH 137
by the distance between the two curves in (4.2), which tends to zero in the
neighborhood of a steady state, the economy always takes an infinite time
period to attain the steady state. The steady state is therefore more like a
theoretical reference point than an exact description of the final configuration
for an economy that departs from a different starting position.
Nevertheless, in the long-run a positive growth rate is sustainable if the
inequality in (4.8) holds strictly:
lim
k→∞
f ′(k) ≡ f ′(∞) > g + ‰
s
.
If L is constant, and if there is no depreciation (‰ = 0), the long-run growth
rate is
Ẏ
Y
= s f ′(∞) > 0,
and it is dependent on the savings ratio s and the form of the production
function.
Consider, for example, the case of a constant elasticity of substitution (CES)
production function:
F (K , L ) = [·K Î + (1 − ·)L Î]1/Î, Î ≤ 1. (4.9)
In this case we have
f (k) = [·kÎ + (1 − ·)]1/Î
and thus
f ′(k) = [·kÎ + (1 − ·)](1/Î)−1·kÎ−1 = ·[· + (1 − ·)k−Î](1−Î)/Î.
If Î is positive, the term k−Î tends to zero if k approaches infinity, and
limk→∞ f ′(k) = ·(·)(1/Î)−1 = ·1/Î > 0: hence, this production function sat-
isfies f ′(∞) > 0 when 0 ≤ Î < 1.
The production function (4.9) is also well defined for Î < 0. In this case,
the term in parentheses tends to infinity and, since its exponent (1 − Î)/Î is
negative, limk→∞ f ′(k) = 0. For Î = 0 the functional form (4.9) raises unity
to an infinitely large exponent, but is well defined. Taking logarithms, we get
ln( f (k)) =
1
Î
ln
(
·kÎ + (1 − ·)
)
.
The limit of this expression can be evaluated using l’Hôpital’s rule, and is equal
to the ratio of the limit of the derivatives with respect to Î of the numerator
and the denominator. Using the differentiation rules d ln(x )/d x = 1/x and
d y x /d x = y x ln y, the derivative of the numerator can be written as(
·kÎ + (1 − ·)
)−1
(·kÎ ln k),
138 EQUILIBRIUM GROWTH
while the derivative of the denominator is equal to one. Since limÎ→0 kÎ = 1,
the limit of the logarithm of f (k) is thus equal to · ln k, which corresponds to
the logarithm of the Cobb–Douglas function k·.
Exercise 34 Interpret the limit condition in terms of the substitutability between
K and L . Assuming ‰ = g = 0, analyze the growth rate of capital and production
in the case where Î = 1, and in the case where · = 1.
4.2. Dynamic Optimization
The model that we discussed in the previous section treated the savings ratio
s as an exogenous variable. We therefore could not discuss the economic
motivation of agents to save (and invest) rather than to consume, nor could
we determine the optimality of the growth path of the economy. To introduce
these aspects into the analysis, we will now consider the welfare of a repre-
sentative agent who consumes an amount C (t )/N(t ) ≡ c (t ) in each period
t . Suppose that the welfare of this agent at date zero can be measured by the
following integral
U =
∫ ∞
0
u(c (t ))e −Òt d t. (4.10)
The parameter Ò is the discount rate of future consumption; given Ò > 0, the
agent prefers immediate consumption over future consumption. The function
u(·) is identical to the one introduced in Chapter 1: the positive first derivative
u′(·) > 0 implies that consumption is desirable in each period; however, the
marginal utility of consumption is decreasing in consumption, u′′(·) < 0,
which gives agents an incentive to smooth consumption over time.
The decision to invest rather than to consume now has a precise economic
interpretation. For simplicity, we assume that g = 0, so that normalizing by
population as in (4.10) is equivalent to normalizing by the labor force. Assum-
ing that ‰ = 0 too, the accumulation constraint,
f (k(t )) − c (t ) − k̇(t ) = 0, (4.11)
implies that higher consumption (for a given k(t )) slows down the accumula-
tion of capital and reduces future consumption opportunities. At each date t ,
agents thus have to decide whether to consume immediately, obtaining utility
u(c (t )), or to save, obtaining higher (discounted) utility in the future.
This problem is equivalent to the maximization of objective func-
tion (4.10) given the feasibility constraint (4.11). Consider the associated
Hamiltonian,
H (t ) = [u(c (t )) + Î(t )( f (k(t )) − c (t ))] e −Òt ,
EQUILIBRIUM GROWTH 139
where the shadow price is defined in current values. This shadow price
measures the value of capital at date t and satisfies Î(t ) = Ï(t )e Òt where Ï(t )
measures the value at date zero. The optimality conditions are given by
∂ H
∂c
= 0, (4.12)
− ∂ H
∂k
=
d
(
Î(t )e −Òt
)
d t
, (4.13)
lim
t→∞
Î(t )e −Òt k(t ) = 0. (4.14)
4.2.1. ECONOMIC INTERPRETATION AND OPTIMAL GROWTH
Equations (4.12) and (4.13) are the first-order conditions for the optimal
path of growth and accumulation. In this section we provide the economic
intuition for these conditions, which we shall use to characterize the dynamics
of the economy. The advantage of using the present-value shadow price Î(t ) is
that we can draw a phase diagram in terms of Î (or c ) and k, leaving the time
dependence of these variables implicit.
From (4.12), we have
u′(c ) = Î. (4.15)
Î(t ) measures the value in terms of utility (valued at time t ) of an infinitesimal
increase in k(t ). Such an increase in capital can be obtained only by a reduc-
tion of current consumption. The loss of utility resulting from lower current
consumption is measured by u′(c ). For optimality, the two must be the same.
In addition, we also have the condition that
Î̇ = (Ò − f ′(k)) Î, (4.16)
which has an interpretation in terms of the evaluation of a financial asset:
the marginal unit of capital provides a “dividend” f ′(k)Î, in terms of utility,
and a capital gain Î̇. Expression (4.16) implies that the sum of the “dividend”
and the capital gain are equal to the rate of return Ò multiplied by Î. This
relationship guarantees the equivalence of the flow utilities at different dates,
and we can interpret Î as the value of a financial activity (the marginal unit of
capital).
An economic interpretation is also available for the “transversality” condi-
tion in (4.14): it imposes that either the stock of capital, or its present value
Î(t )e −Òt (or both) need to be equal to zero in the limit as the time horizon
extends to infinity.
140 EQUILIBRIUM GROWTH
Combining the relationships in (4.15) and (4.16), we derive the following
condition:
d
d t
u′(c ) = (Ò − f ′(k)) u′(c ).
Along the optimal path of growth and accumulation, the proportional
growth rate of marginal utility is equal to Ò − f ′(k), the difference between
the exponential discount rate of utility and the growth rate of the available
resources arising from the accumulation of capital. This condition is a Euler
equation, like that encountered in Chapter 1. (Exercise 36 asks you to show that
it is indeed the same condition, expressed in continuous rather than discrete
time.)
Making the time dependence explicit and differentiating the function on
the left of this equation with respect to t yields
d u′(c (t ))/d t = u′′(c (t ))d c (t )/d t.
Thus, we can write (omitting the time argument)
ċ =
(
u′(c )
−u′′(c )
)
( f ′(k) − Ò). (4.17)
Since the law of motion for capital is given by
k̇ = f (k) − c , (4.18)
we can therefore study the dynamics of the system in c , k-space.
4.2.2. STEADY STATE AND CONVERGENCE
The steady state of the system of equations (4.17) and (4.18) satisfies
f ′(ks s ) = Ò, c s s = f (ks s ),
if it exists. For the dynamics we make use of a phase diagram as in Chapter 2.
On the horizontal axis we measure the stock of capital k (which now refers
to the economy-wide capital stock rather than the capital stock of a single
firm). On the vertical axis we measure consumption, c , rather than the shadow
price of capital. (The two quantities are univocally related, as was the case
for q and investment in Chapter 2.) If f (·) has decreasing marginal returns
and in addition there exists a ks s < ∞ such that f ′(ks s ) = Ò, then we have the
situation illustrated in Figure 4.4.
Clearly, more than one initial consumption level c (0) can be associated with
a given initial capital stock k(0). However, only one of these consumption
levels leads the economy to the steady state: the dynamics are therefore of the
saddlepath type which we already encountered in Chapter 2. Any other path
EQUILIBRIUM GROWTH 141
Figure 4.4. Convergence and steady state with optimal savings
leads the economy towards points where c = 0, or where k = 0 (which in turn
implies that c = 0 if f (0) = 0 and if capital cannot become negative). Under
reasonable functional form restrictions the solution is unique, and one can
show that only the saddlepath satisfies (4.14).
Exercise 35 Repeat the derivation, supposing that g A = 0 but ‰ > 0, g N > 0.
Show that the system does not converge to the capital stock associated with
maximum per capita consumption in steady state.
4.2.3. UNLIMITED OPTIMAL ACCUMULATION
In the above diagram the accumulation of capital cannot sustain an indefinite
increase of labor productivity and of per capita consumption. However, as in
the Solow model, the hypothesis that F (·) has constant returns to scale in
capital and labor does not necessarily imply that ks s < ∞. In these cases one
cannot speak about a steady state in terms of the level of capital, consumption,
and production. However, it is still possible that there exists a steady state in
terms of the growth rates of these variables—that is, a situation in which the
economy has a positive and non-decreasing long-run growth rate even in the
absence of exogenous technological change. Suppose for instance that f ′(k) =
b, which is constant and independent of k for all the relevant values of the
capital stock. If the elasticity of marginal utility is constant, so that
u′′(c )
c
u′(c )
= −Û
142 EQUILIBRIUM GROWTH
for all values of c , then we can rewrite (4.17) as
ċ (t )
c (t )
=
b − Ò
Û
, (4.19)
and consumption increases (or decreases, if b < Ò and agents can disinvest)
at a constant exponential rate. The utility function considered here is of the
constant relative risk aversion (CRRA) type, given by
u(c ) =
c 1−Û
1 − Û , u
′(c ) = c −Û, u′′(c ) = −Ûc −Û−1. (4.20)
The conditions u′(·) > 0, u′′(·) < 0 are satisfied if Û > 0. If Û = 1, the func-
tional form (4.20) is not well defined, but the marginal utility function u′(x ) =
x −1 (which completely characterizes preferences) coincides with the derivative
of log(x ): hence, for Û = 1 we can write u(c ) = log(c ). Given f ′(k) = b, we can
write f (k) = bk + Ó with Ó a constant of integration. From the law of motion
for capital,
k̇(t ) = f (k(t )) − c (t ) = Ó + bk(t ) − c (t ),
we can derive
k̇(t )
k(t )
=
Ó
k(t )
+ b − c (t )
k(t )
.
If we focus on the case in which k(t ) tends to infinity and Ó/ k(t ) to zero,
we have
lim
t→∞
k̇(t )
k(t )
= b − lim
t→∞
c (t )
k(t )
. (4.21)
The proportional growth rate of k then tends to a constant if k(t ) tends to
grow at the same (exponential) rate as c (t ).
One can show that this condition is necessarily true if the economy satisfies
the transversality condition (4.14). With equation (4.20) for u(·), we get
Î(t ) = u′(c (t )) = (c (t ))−Û.
Given that c (t ) grows at a constant exponential rate, Ï(t ) = Î(t )e −Òt has expo-
nential dynamics.
Now, consider (4.21). If c (t )/ k(t ) diminishes over time, then k(t ) grows
at a more than exponential rate and the limit in (4.14) does not exist. If, on
the contrary, c (t )/ k(t ) is growing, k̇(t )/ k(t ) becomes increasingly negative.
As a result, k(t ) will eventually equal zero, and production, consumption, and
accumulation will come to a halt—which is certainly not optimal, since for
the case of (4.20) we have u′(0) = ∞.
The first case corresponds to paths that hit or approach the vertical axis in
Figure 4.4; the second corresponds to paths that hit or approach the horizontal
axis. Hence, as in the case of the phase diagram, there is only one initial level
EQUILIBRIUM GROWTH 143
of consumption that satisfies the transversality condition. (In fact, the phase
diagram remains valid in a certain sense; however, the economy is always arbit-
rarily far from the steady state.) The consumption/capital ratio is therefore
constant over time under our assumptions. Imposing
k̇
k
=
ċ
c
in (4.19) and in (4.21), we get
c (t ) =
(Û − 1)b + Ò
Û
k(t ). (4.22)
Equation (4.22) implies that the initial consumption is an increasing function
of b, the intertemporal rate of transformation, if Û > 1. In this case the income
effect of a higher b dominates the substitution effect, which induces capital
accumulation and hence tends to reduce the level of consumption. For Û = 1,
equation (4.20) is replaced by u(c ) = ln(c ), and the ratio c / k is equal to Ò and
does not depend on b.
Since y(t ) = bk(t ), savings are a constant fraction of income as in the Solow
model:
s = 1 − (Û − 1)b + Ò
bÛ
.
Nonetheless, in the model with optimization, the savings ratio s is constant
only if u(·) is given by (4.20) and if f (·) = bk, and not in more general cases.
Moreover, s is not a given constant as in the Solow model. The savings ratio
depends on the parameters that characterize utility (Û and Ò) and technol-
ogy (b).
Having shown that capital grows at an exponential rate, we now return
to (4.14). In order to satisfy this transversality condition, the growth rate of
capital needs to be smaller than the rate at which the discounted marginal
utility diminishes along the growth path. We thus have
d
d t
(
ln(c (t )−Ûe −Òt )
)
= −Û b − Ò
Û
− Ò = −b.
Since in the case considered here Ï(t ) = e −Òt c (t )−Û and f ′(k) = b, this is a
reformulation of condition (4.13).
In addition, we have k̇ = s y = s bk where s is the savings ratio. The transver-
sality condition is therefore satisfied if
k̇
k
= s b <
d
d t
|
(
ln(c (t )−Ûe −Òt )
)
| = b,
144 EQUILIBRIUM GROWTH
or equivalently if s < 1. Hence, the propensity to save s = 1 − C /Y , which is
implied by (4.22), must be smaller than one: this leads to the condition that
0 < 1 − s = c (t )
y(t )
=
c (t )
bk(t )
=
(Û − 1)b + Ò
bÛ
,
which is equivalent to
(1 − Û)b < Ò. (4.23)
If the parameters of the model violated (4.23), the steady state growth
path that we identified would not satisfy (4.14). But in that case the optimal
solution would not be well defined since the objective function (4.10) could
take an infinite value: although technically speaking consumption could grow
at rate b, the integral in (4.10) does not converge when (1 − Û)b − Ò > 0.
The steady-state growth path describes the optimal dynamics of the econ-
omy without any transitional dynamics if f (k) = bk for each 0 ≤ k ≤ ∞. We
should note, however, that the constant b is not allowed to be a function of
L if F (ÎK , ÎL ) = ÎF (K , L ). Hence, F (K , L ) = b K = F̃ (K ), and the non-
accumulated factor L cannot be productive for the economy considered, that
grows at a constant rate in the absence of any (exogenous) growth in L , if
the production function has constant returns to scale in K and L together.
Alternatively, the economy may converge asymptotically to the steady-state
growth path if limk→∞ f ′(k) = b > 0 even though f ′′(k) < 0 for any 0 ≤
k < ∞. In this case the marginal productivity of L can be positive for each
value of K and L , but the productive role of the non-accumulated factor
becomes asymptotically negligible (in a sense that we will make more precise
in Section 4.4). In both cases we have or are approaching a steady-state growth
path: the economy grows at a positive rate if b > Ò, and (less realistically) at a
negative rate if b < Ò. With ‰ > 0, it is not difficult to prove that the economy
can grow indefinitely if limk→∞ f ′(k) > ‰.
4.3. Decentralized Production and
Investment Decisions
The analysis of the preceding section proceeded directly from the maximiza-
tion of the objective function of a representative agent (4.10), subject to tech-
nological constraint represented by the production function. Under certain
conditions, the optimal solution coincides with the growth of an economy
in which the decisions to save and invest are decentralized to households
and firms. In order to study this decentralized economy, we need to define
the economic nature and the productive role of capital in greater detail. Let us
assume for now that K is a private factor of production. The property rights of
EQUILIBRIUM GROWTH 145
this factor are owned by individual agents who in the past saved part of their
disposable income.
The economy is populated by infinitely lived agents, or “households,” which
for the moment we assume to be identical. The typical household, indexed
by i , owns one unit of labor. For simplicity, we assume that the growth rate
of the population is zero. In addition, each household owns ai (t ) units of
financial wealth (measured in terms of output, consumption, or capital) at
date t . Moreover, individual agents or households take the wage rate w(t ) and
the interest rate r (t ) at which labor and capital are compensated as given. (In
other words, agents behave competitively on all markets.)
Family i maximizes
U =
∫ ∞
0
u(c i (t ))e
−Òt d t, (4.24)
subject to the budget constraint
w(t ) + r (t )ai (t ) = c (t ) + ȧi (t ).
The flow income earned by capital and labor is either consumed, or added to
(subtracted from, when negative) the family’s financial wealth.
Production is organized in firms. Firms hire the production factors from
households and offer their goods on a competitive market. At each date t , the
firm indexed by j produces F (K j (t ), L j (t )) using quantities K j (t ) and L j (t )
of the two factors, in order to maximize the difference between its revenues
and costs. Since all prices are expressed in terms of the final good, firms solve
the following static problem:
max
K j ,L j
( F (K j , L j ) − r K j − w L j ).
Given that F (·, ·) has constant returns to scale, we can write
max
K j ,L j
[
L j f
(
K j
L j
)
− r K j − w L j
]
,
where f (·) corresponds to the output per worker defined in the previous
section. The first-order conditions of the firm are therefore given by
f ′
(
K j (t )
L j (t )
)
= r (t ),
f
(
K j (t )
L j (t )
)
− K j (t )
L j (t )
f ′(K j (t )/L j (t )) = w(t ),
which are valid for each t and each j .
Since all firms face the same unit costs of capital and labor, every firm
will choose the same capital/labor ratio, K j /L j ≡ k. In equilibrium firms
therefore can differ only as regards the scale of their operation: if L is the
146 EQUILIBRIUM GROWTH
aggregate stock of labor (or the number of households), we can index the scale
of individual firms by Ó j so that
∑
j Ó j = 1, and denote L j = Ó j L . Thanks to
the assumption of constant returns to scale, we can assume that F (·, ·) has
the same functional form as at the aggregate level. We can then immediately
derive a simple expression for the aggregate output of the economy:
Y ≡
∑
j
F (K j , L j ) =
∑
j
L j f (K j /L j ) =
(∑
j
Ó j
)
L f (k) = F (K , L ).
Hence if the production function has constant returns to scale and if all
markets are competitive, the number of active firms and the scale of their
operation is irrelevant.36
At this point we note that
∑
j L j = L = AN = A
(∑N
j =1 1
)
. Hence, the
same factor of labor efficiency A is applied to each individual unit of labor
that is offered on the labor market. Moreover, we notice that in equilibrium
the profits of each firm are equal to zero. It is therefore irrelevant to know
which family owns a particular firm and at which scale this firm operates.
Let us now return to the household. The dynamic optimization problem of
the household is expressed by the following Hamiltonian:
H (t ) = e −Òt [u(c i (t )) + Îi (t )(w(t ) + r (t )ai (t ) − c i (t ))].
The first-order conditions are analogous to (4.12)–(4.14), and can be rewrit-
ten as
d
d t
c i (t ) =
−u′(c i (t ))
u′′(c i (t ))
(
r (t ) − Ò
)
,
lim
t→∞
e −Òt u′(c i (t ))ai (t ) = 0.
Exercise 36 Compare this optimality condition with
u′(c t ) =
1 + r
1 + Ò
u′(c t +1),
also known as a Euler equation, which holds in a deterministic environment with
discrete time. Complete the parallel between the consumption problems studied
here and in Chapter 1 by deriving a version of the cumulated budget restriction
in continuous time.
³⁶ For simplicity, we suppose that the stock of capital may vary without adjustment costs. The
following derivations would remain valid if, as in some of the models studied in Chapter 2, returns
to scale were constant in adjustment as well as in production, implying that—at least in the long
run—the size of firms is irrelevant.
EQUILIBRIUM GROWTH 147
4.3.1. OPTIMAL GROWTH
We close the model by imposing the restriction that the total wealth of house-
holds must equal the aggregate stock of capital. Inter-family loans and debts
cancel out on aggregate, and in any case there is no reason why such loans and
debts should exist if households are identical and start with the same initial
wealth: ai (t ) = a (t ). From
L∑
i =1
ai (t ) = L a (t ) = K (t ),
we get
ai (t ) = a (t ) = k(t ).
Furthermore given that
r = f ′(k),
it is easy to verify that optimality conditions for the accumulation of financial
wealth coincide with those for the accumulation of capital along the path of
aggregate growth that maximizes (4.10). (This also remains true if g > 0, if
‰ > 0, and even if n > 0 — where we should note that, in the presence of
population growth, the per capita rate of return on capital a is given by r − n,
and that if capital depreciates we have r = f ′(k) − ‰.)
Hence, the growth path of a market economy will coincide with the optimal
growth path if the following conditions are satisfied.
(A) Production has constant returns to scale.
(B) Markets are competitive.
(C) Savings and consumption decisions are taken by agents who independ-
ently solve identical problems.
Conditions (A) and (B) guarantee that r (t ) = f ′(k). The savings of an
individual household are compensated according to the aggregate marginal
productivity of capital. Moreover, given conditions (A) and (B), the market
structure is very simple and the entire economy behaves as a “representa-
tive” firm.
Hypothesis (C) allows us to represent the savings decisions in terms of the
optimization of a single “representative agent.” Most differences between indi-
vidual agents on the market are made irrelevant by the presence of a perfectly
competitive capital market (as implicitly assumed above). For example, the
supply of labor may follow different dynamics across households, but access
to a perfectly competitive capital market may prevent this from having any
effect at the aggregate level: individuals or households whose labor income is
temporarily low can borrow from households that are in the opposite position,
with no aggregate effects as long as total labor supply in the economy is
148 EQUILIBRIUM GROWTH
fixed. This is an application of the permanent income hypothesis discussed
in Chapter 1.
It is also useful to note that differences in individual consumption have no
impact at the aggregate level if agents have a common utility function with a
constant elasticity of substitution as in (4.20). In this case the growth rate of
consumption is the same for all households, so that
Ċ
C
=
∑
i ċ i∑
i c i
=
∑
i
r (t )−Ò
Û
c i∑
i c i
=
r (t ) − Ò
Û
.
Functional form (4.20) thus has two advantages. On the one hand, this func-
tional form is compatible with a steady-state growth path (as we saw above).
On the other hand, it allows us to aggregate the individual investment deci-
sions, even in the case in which agents consume different amounts, because
the interest rate r (t ) is the same for all agents.
4.4. Measurement of “Progress”: The Solow Residual
The hypotheses of constant returns to scale and perfectly competitive markets
(realistic or not) not only are crucial for the equivalence between the opti-
mization at the aggregate and decentralized levels, but also make it possible to
measure the technological progress that may allow unlimited growth of labor
productivity when ks s < ∞.
Differentiating the production function Y (t ) = F (K (t ), L (t )), we get
Ẏ (t ) = F K (·)K̇ (t ) + F L (·)L̇ (t ) = F K (·)K̇ (t ) + F L (·)
[
Ṅ(t ) A(t ) + N(t ) Ȧ(t )
]
,
where F L (·) and F K (·) denote the partial derivatives with respect to the pro-
duction factors, which are measured in current values. The second equality
exploits our definition of labor supply L (t ) ≡ N(t ) A(t ). Rewriting the above
expression in terms of proportional growth yields
Ẏ
Y
=
F K (·)K
Y
K̇
K
+
F L (·) AN
Y
Ṅ
N
+
F L (·)N
Y
Ȧ, (4.25)
where we have omitted the time dependence. Now, if labor markets are per-
fectly competitive, we have w = ∂ F (·)/∂ N = AF L (·). We can thus write
F L (·) AN
Y
=
w N
Y
≡ „,
which expresses labor’s share of national income, which is in general observ-
able, in terms of a derivative of the aggregate production function. Moreover,
given that the production technology has constant returns to scale in K and
L , the entire value of output will be paid to the production factors if these
EQUILIBRIUM GROWTH 149
are paid according to their marginal productivity. In fact, for each F (·, ·) with
constant returns to scale,
F
(
K
Y
,
L
Y
)
= 1 with Y = F (K , L ).
Using Euler’s Theorem, we therefore have
1 = F
(
K
Y
,
L
Y
)
=
∂ F (K , L )
∂ K
K
Y
+
∂ F (K , L )
∂ L
L
Y
. (4.26)
Hence,
F K (·)K
Y
= 1 − AF L (·)N
Y
= 1 − „,
and (4.25) implies
„
Ȧ
A
=
Ẏ
Y
− (1 − „) K̇
K
− „ Ṅ
N
. (4.27)
If accurate measures of „ (the income share of the non-accumulated factor N)
and the proportional growth rate of Y , K , and N are available, then (4.27)
provides a measure known as “Solow’s residual,” which indicates how much
of the growth in income is accounted for by an increase in the measure of
efficiency A(t ) (which as such is not measurable) rather than by an increase in
the supply of productive inputs.
If the production function has the Cobb–Douglas form,
Y = F (K , L ) = F (K , AN) = K ·( AN)1−·, (4.28)
or, equivalently, if
Y = ÃF̃ (K , N) = ÃK · N1−·, where à = A1−·, (4.29)
then „ is constant and equal to 1 − ·. The Cobb–Douglas function is therefore
convenient from an analytic point of view, and also because it does not attach
any practical relevance to the difference between a labor-augmenting technical
change as in (4.28) and a neutral technological change as in (4.29). In fact, the
Solow residual defined in (4.27) corresponds to the rate of growth of Ã.
Exercise 37 Verify that, if K̇ /K = Ȧ/ A + Ṅ/N, the income shares of capital
and labor are constant as long as the production function has constant returns to
scale, even if it does not have the Cobb–Douglas form.
Unfortunately, the functional form (4.28) implies that
lim
k→∞
f ′(k) = lim
k→∞
÷k·−1 = 0
if · < 1, that is if „ > 0 and labor realistically receives a positive share of
national income. Given that the labor share is approximately constant (around
150 EQUILIBRIUM GROWTH
60% in the long-run), the empirical evidence does not seem supportive of
unlimited growth with constant returns to scale.
More generally, for each case in which the aggregate production function
F (·, ·) has constant returns to scale and
lim
k→∞
f ′(k) = lim
k→∞
∂ F (K , L )
∂ K
= b > 0,
then F L (·)L /F (·) tends to zero when K and k approach infinity for a con-
stant L . It suffices to take the limit of expression (4.26) with K → ∞ (and
thus L /Y = L /F (K , L ) → 0), which yields
1 = F
(
lim
K →∞
K
Y
, 0
)
= b lim
K →∞
K
F (K , L )
+ lim
K →∞
(
∂ F (K , L )
∂ L
L
F (K , L )
)
, (4.30)
l’Hôpital’s rule then implies (as in exercise 33 above) that
lim
K →∞
K
F (K , L )
= 1
/(
lim
K →∞
∂ F (K , L )
∂ K
)
=
1
b
.
Hence, the first term on the right-hand side of (4.30) tends to one, and the
second term (the income share of the non-accumulated factor) therefore has
to tend to zero.
In sum, the income share of the non-accumulated factor „ needs to decline
to zero with the accumulation of an infinite amount of capital if
(i) the accumulation of capital allows the economy to grow indefinitely,
and
(ii) the production function has constant returns to scale.
This conclusion is intuitive in light of the reasoning that led us to draw a
convex production function in Figure 4.1, and to identify a steady state in
Figure 4.2; if we have equality rather than a strict inequality in (4.5), that is if
f (Îk) = F (Îk, 1) = F (Îk, Î) = ÎF (k, 1) = Î f (k)
for Î �= 1, then output is proportional to K and increasing L will not have
any effect on output. If the increase in the stock of capital tends to have pro-
portional effects on output, then both marginal productivity and the income
share of the non-accumulated factor must steadily decrease.
Exercise 38 Verify this result for the case of a function in the form (4.9).
Naturally, equation (4.27) and its implications are valid only under the twin
assumptions that the production technology exhibits constant returns to scale
and that production factors are paid according to their marginal productivity.
EQUILIBRIUM GROWTH 151
From a formal point of view, nothing would prevent us from considering
models in which either assumption is violated. As illustrated in the exercise
below, in that case it does not make much sense to measure Ȧ/ A by inserting
labor’s income share „ in (4.27).
Exercise 39 Consider a Cobb–Douglas production function with increasing
returns to scale,
Y = AN· K ‚, · + ‚ > 1.
Suppose, in addition, that wages are below the marginal productivity of labor,
AF N (·) =
w
1 − Ï ,
where Ï > 0 can be interpreted as a monopolistic mark-up. What does the Solow
residual measured by (4.27) correspond to in this case?
The above hypotheses correspond to conditions (A) and (B) in the previous
section, which allowed us to connect the macroeconomic dynamics to the
savings and consumption decisions of individual agents. Constant returns to
scale allowed us simply to aggregate the production functions of the individual
firms. And the remuneration of production factors equal to their marginal
product (which in turn followed from the assumption that all markets are
characterized by free entry and perfect competition) ensured that the dynamic
path of the economy maximized the welfare of a hypothetical representative
agent. In the rest of this chapter we consider models for which the macro-
economic dynamics are well-defined (but not necessarily optimal from the
aggregate point of view) in the absence of perfectly competitive markets and
in the presence of increasing returns to scale.
4.5. Endogenous Growth and Market Imperfections
To obtain an income share for the non-accumulated factor that is not reduced
to zero in the long-run and at the same time allow for an endogenous growth
rate that is determined by the investment decisions of individual agents, we
need to reconsider the assumption of constant returns to scale. Henceforth
we will consider steady-state growth paths only in the absence of exogenous
technological change. We know that, in order to sustain long-run (propor-
tional) growth, the economy needs to exhibit constant returns to capital: from
now on we therefore assume that f ′(k) = b, with b independent of k. If that
condition is satisfied, and if the productivity of the non-accumulated factor L
is positive, aggregate production is characterized by increasing returns to scale.
152 EQUILIBRIUM GROWTH
Multiplying K and L by the same constant increases aggregate production
more than proportionally.
As shown above, constant returns to scale are a crucial condition for the
decentralization of the socially optimal savings and investment decisions.
Allowing for increasing returns to scale means (in general) that we lose this
result. It becomes important therefore to confront the optimal growth path of
the economy with the growth path that results from decentralized investment
decisions. In addition, we need to pay attention to the criteria for the distri-
bution of income: with increasing returns to scale, it is no longer possible to
remunerate all factors of production on the basis of their marginal product
because the sum of these payments would exceed the value of production.
Some factor of production needs to receive less than the value of its marginal
product, and it is obviously of interest to know how that may result.
4.5.1. PRODUCTION AND NON-RIVAL FACTORS
To understand the economic mechanisms behind the division of the value of
output within each productive unit and in the economy as a whole, it is useful
briefly to reconsider the hypothesis of constant returns to scale.
One possible microeconomic foundation for this assumption is based on
the idea that production processes can be replicated. If a firm or productive
unit j produces Y j using quantities K j and L j (and these are the only neces-
sary factors of production), then one can obviously obtain double the amount
of production by doubling the input of both factors, simply by organizing
these additional factors in an identical production unit. The same reason-
ing applies to different factors of proportionality as long as the factors of
production are perfectly divisible, as is implicit in the concept of marginal
productivity.
A model with constant returns to scale in production implies not only that
a doubling of inputs may lead to a doubling of output, but also that such a
doubling of inputs is necessary to obtain twice the amount of output. In reality,
however, there are factors of production whose input need not be doubled in
order to double output; for instance, to build a house one needs a blueprint,
a piece of land, and a certain quantity of materials, manual labor, and energy
(all inputs that can be expressed in units of labor and other primary inputs).
To build a second house one probably needs the same amounts of materials,
labor, and energy and an identical piece of land. However, nothing prevents
the use of the same plan. The same input can therefore be used to build several
houses. This is an example of a more general phenomenon: certain factors
of production (like the architectural plan) may be used contemporaneously
by one or more production processes, and their use in a production process
need not reduce its productivity in other processes. These factors are normally
EQUILIBRIUM GROWTH 153
referred to as non-rival inputs. It is not difficult to find other examples: every
factor that provides intangible (but necessary) input of know-how (or soft-
ware) is non-rival.
The presence of non-rival factors makes the assumption of increasing
returns plausible. It is still possible to build a second house using double the
amount of all inputs including a completely new plan. But this is no longer
necessary: since the product can be doubled without any work on the plan,
doubling all inputs makes it possible to improve its quality, or perhaps to build
a larger house.
As we know, the assumption of constant returns was useful to decentralize
production and to distribute its revenues. On the contrary, with non-rival
factors and increasing returns to scale, it is no longer possible to pay all factors
according to their marginal productivity. The total productivity of the factors
used in design depends on the number of houses that are built with one and
the same design. This number can in principle be very high. Moreover, if each
additional house requires a constant amount of labor and material, then the
production technology has constant returns to these variable inputs, and if
these factors were paid according to their true marginal productivity, there
would be nothing left to pay the architect.
How can one decentralize production decisions under these circumstances?
Non-rival factors are mostly identified with intangible resources (know-how,
software) which, by their nature, are often non-rival, non-excludable. When a
productive input is non-rival it is often difficult to prevent other agents from
making an economic use of this factor. The regulation of property rights and
licenses is meant to resolve this type of problem. Nonetheless, the theft of intel-
lectual property remains difficult to prove and is also hard to punish, because
the knowledge (the stolen “object”) remains in the hands of the thief. In the
example of the house, the private property in the physical sense (calculations
and designs) can be guaranteed, and unauthorized duplication of the plan can
be punished legally. However, certain innovative aspects of the project may be
evident by simply observing the final product, and it is not easy to prevent or
punish reproduction of these aspects by third parties.
Many recent growth models allow for increasing rather than constant
returns to scale, and are therefore naturally forced to study markets and
productive structures characterized by non-rivalry and non-excludability of
certain factors.
4.5.2. INVOLUNTARY TECHNOLOGICAL PROGRESS
In the model outlined below the level of technology, A is treated as an entirely
non-rival and non-excludable productive input in Solow’s model of exogenous
154 EQUILIBRIUM GROWTH
growth. The production function therefore has three arguments, K , N, and A:
Y = F (K , L ) = F (K , AN) ≡ F̃ (K , N, A).
If F (·, ·) has constant returns to scale in K and L (both of which have strictly
positive marginal productivity), then F̃ (·, ·, ·) has increasing returns to scale
in K , N, and A. In fact, doubling K , N, and A doubles K but quadruples
L , so aggregate production more than doubles if F L (·) > 0. Since firms hire
capital and labor from households, we can interpret the situation in terms of
the non-rivalry and non-excludability of A: each unit of labor has free access
to the current level of A, which is the same for all.
As we know, growth in the Solow model is exogenous. More precisely,
the dynamics of the level of technological change or efficiency A(t ) is not
influenced by economic decisions: if one interprets A as a production factor
in the decentralized model, then this factor
(A1) is completely non-excludable from the viewpoint of production and
receives no remuneration;
(A2) is reproduced over time without any interaction with the production
system; in fact, if we have exponential technological change at a con-
stant rate g A, the expression Ȧ(t ) = g A A(t ) can be interpreted as an
expression of accumulation in which A(t ) is used in the production
of further technological progress (besides its use in the production of
final goods).
To integrate technological change in the economic structure of the model,
we can preserve aspect (A1) (no remuneration for the “factor” technology)
and relax aspect (A2), assuming that the growth in efficiency is linked to
economic activity (and remunerated).
For example, one can specify a model in which technological change is a
by-product of production (learning by doing). One can for instance assume
that
A(t ) = A
(
K (t )
N
)
, A′(·) > 0,
so that the effective productivity of labor is a function of the amount of capital
per worker. To interpret this assumption, one could assume that experience
makes workers more efficient. That is, while doing, workers learn from their
mistakes, and their additional experience thereby increases the productive
efficiency of the non-accumulated factor N.
The proposed functional form assumes that labor efficiency is a function
of the capital stock and thus of the total amount of past investments. It
may be more realistic to assume that total accumulated production, rather
than investments, determines the efficiency of N. However, such an exten-
EQUILIBRIUM GROWTH 155
sion would complicate the analysis without providing substantially different
results.
Much more important is the implicit assumption that the efficiency of each
unit of labor does not depend on its own productive activity, but rather on
aggregate economic activity. Agents in this economy learn not only from their
own mistakes, so to speak, but also from the mistakes of others. When deciding
how much to invest, agents do not consider the fact that their actions affect the
productivity of the other agents in the economy; the economic interactions
are thus affected by externalities. These externalities are similar (albeit with an
opposite sign) to the externalities that one encounters in any basic textbook
treatment of pollution, or to those that we will discuss in Chapter 5 when we
consider coordination problems.
If we retain the assumptions that firms produce homogeneous goods with
the constant-returns-to-scale production technology F (K j , AN j ), that A is
non-rival and non-excludable, and that all markets are perfectly competitive,
then output decisions can be decentralized as in Section 4.3. In particular, the
marginal productivity of capital needs to coincide with r (t ), the rate at which
it is remunerated in the market,
r (t ) =
∂ F (·)
∂ K
≡ F1(·) = f ′(K /L ),
and the dynamic optimization problem of households implies a proportional
growth rate of consumption equal to (r (t ) − Ò)/Û if the function of marginal
utility has constant elasticity. Hence, recalling that L = AN, it follows that
both individual and aggregate consumption grow at a rate
Ċ (t )
C (t )
=
[
f ′
(
K (t )
N A(t )
)
− Ò
]/
Û.
If, as in the case of a Cobb–Douglas function, the economy distributes a
constant (or non-vanishing) share of national income to the non-accumulated
factor, then limk→∞ f ′(k) = 0 < Ò and consumption growth can remain pos-
itive only if A and L grow together with K , which would prevent the marginal
productivity of capital from approaching zero. However, since A is a function
of k in the model of this section, the growth of A itself depends on the
accumulation of capital. If
lim
k→∞
A(k)
k
=
1
a
> 0,
we have
lim
K /N→∞
f ′
(
K
A(K /N)N
)
= lim
K /N→∞
F1
(
K
A(K /N)N
, 1
)
= F1(a, 1),
which may well be above Ò.
156 EQUILIBRIUM GROWTH
Exercise 40 Let F (K , L ) = K · L 1−·, and A(·) = a K /N: what is the growth
rate of the economy?
Hence, in the presence of learning by doing, the economy can con-
tinue to grow endogenously even if the non-accumulated factor receives a
non-vanishing share of national income. There is however an obvious prob-
lem. From the aggregate viewpoint, true marginal productivity is given by
d
d K
F (K , A(K /N)N) = F1(·) + F2(·) A′(k) > F1(·), for F2(·) ≡
∂ F (·)
∂ L
.
Hence, growth that is induced by the optimal savings decisions of individuals
does not correspond to the growth rate that results if one optimizes (4.10)
directly. In fact, the decentralized growth rate is below the efficient growth
rate because individuals do not take the external effects of their actions into
account, and they disregard the share of investment benefits that accrues to
the economy as a whole rather than to their own private resources.
4.5.3. SCIENTIFIC RESEARCH
It may well be the case that innovative activity has an economic character and
that it requires specific productive efforts rather than being an unintentional
by-product. For example, we may have
Y (t ) = C (t ) + K̇ (t ) = F (K y (t ), L y (t )), (4.31)
Ȧ(t ) = F (K A(t ), L A(t )), (4.32)
with K y (t ) + K A(t ) = K (t ), L y (t ) + L A(t ) = L (t ) = A(t )N(t ). In other
words, new and more efficient modes of production may be “produced” by
dedicating factors of production to research and development rather than to
the production of final goods.
If, as suggested by the notation, the production function is the same in both
sectors and has constant returns to scale, then we can write
Ȧ = F (K A, L A) =
∂ F (K A, L A)
∂ K
K A +
∂ F (K A, L A)
∂ L
L A.
Assuming that the rewards r and w of the factors employed in research are
the same as the earnings in the production sector, then
Ȧ = r K A + w L A (4.33)
is a measure of research output in terms of goods. If A is (non-rival and)
non-excludable, then this output has no market value. Since it is impossible to
prevent others from using knowledge, private firms operating in the research
EQUILIBRIUM GROWTH 157
sector would not be able to pay any salary to the factors of production that
they employ.
Nonetheless, the increase in productive efficiency has value for society as a
whole, if not for single individuals. Like other non-rival and non-excludable
goods, such as national defense or justice, research may therefore be financed
by the government or other public bodies if the latter have the authority to
impose taxes on final output that has a market value. One could for example
tax the income of all private factors at rate Ù, and use the revenue to finance
“firms” which (like universities or national research institutes, or like monas-
teries in the Middle Ages) produce only research which is of no market value.
Thanks to constant returns to scale, one can calculate national income in both
sectors by evaluating the output of the research sector at the cost of production
factors, as in (4.33). Moreover, the accumulation of tangible and intangible
assets obeys the following laws of motion:
K̇ = (1 − Ù) F (K , AN) − C,
Ȧ = ÙF (K , AN).
The return on private investments is given by
r (t ) = (1 − Ù) f ′(k),
and if f (·) has decreasing returns the economy possesses a steady-state growth
path in which A, K , Y , and C all grow at the same rate. It is not difficult
to see that there is no unambiguous relation between this growth rate and
the tax Ù (or the size of the public research sector). In fact, in the long-run
there is no growth if Ù = 0, since in that case Ȧ(t ) = 0; but neither is there
growth if Ù is so high that r (t ) = (1 − Ù) f ′(k) tends toward values below
the discount rate of utility, and prevents growth of private consumption and
capital. For intermediate values, however, growth can certainly be positive.
(We shall return to this issue in Section 4.5.5.)
4.5.4. HUMAN CAPITAL
Retaining assumptions (4.32) and (4.31), one can reconsider property (A1),
and allow A to be a private and excludable factor of production. In this case,
the problem of how to distribute income to the three factors A, K , and L
if there are increasing returns to scale can be resolved if one assumes that a
person (a unit of N) does not have productive value unless she owns a certain
amount of the measure of efficiency A. Reverting to the hypothesis implicit in
the Solow model, in which N is remunerated but not A, the presence of N is
thus completely irrelevant from a productive point of view.
158 EQUILIBRIUM GROWTH
The factor A, if remunerated, is not very different from K , and may be
dubbed human capital. In fact, for A to be excludable it should be embodied
in individuals, who have to be employed and paid in order to make productive
use of knowledge. One example of this is the case of privately funded profes-
sional education.
In the situation that we consider here, all the factors are accumulated. Given
constant returns to scale, we can therefore easily decentralize the decisions to
devote resources to any of these uses. If as in (4.31) and (4.32) the two factors
of production are produced with the same technology, and if one assumes that
all markets are competitive so that A and K are compensated at rates F A(·)
and F K (·) respectively, then the following laws of motion hold:
K̇ = F ((1 − Ù)K , (1 − Ù) A) − C = (1 − Ù) F (K , A) − C
Ȧ = ÙF (K , A).
In these equations Ù no longer denotes the tax on private income, but rather
more generally the overall share of income that is devoted to the accumulation
of human capital instead of physical capital (or consumption).
If technological change does indeed take the form suggested here, then
we need to reinterpret the empirical evidence that was advanced when we
discussed the Solow residual. Given that the worker’s income includes the
return on human capital, we need to refine the definition of labor stock, which
is no longer identical to the number of workers in any given period. The
accumulation of this factor may for example depend on the enrolment rates of
the youngest age cohorts in education more than on demographic changes as
such. However, the fact that agents have a finite life, and that they dedicate only
the first part of their life to education, implies that it is difficult to claim that
education is the only exclusive source of technological progress. Each process
of learning and transmission of knowledge uses knowledge that is generated
in the past and is not necessarily compensated. Hence also the accumulation
of human capital is subject to the type of externalities that we encountered in
the discussion of learning by doing.37
4.5.5. GOVERNMENT EXPENDITURE AND GROWTH
Besides the capacity to finance the accumulation of non-excludable technolog-
ical change, government spending may provide the economy with those (non-
rival and non-excludable) factors that make the assumption of increasing
returns plausible. Non-rivalry and non-excludability are in fact main features
³⁷ Drafting and studying the present chapter, for example, would have been much more difficult if
Robert Solow, Paul Romer, and many others had not worked on growth issues. Yet, no royalty is paid
to them by the authors and readers of this book.
EQUILIBRIUM GROWTH 159
of pure public goods like defense or police, and of quasi-public goods like
roads, telecommunications, etc. To analyze these aspects, we assume that
Y (t ) = F̃ (K (t ), L (t ), G (t )),
where, besides the standard factors K and L (the latter constant in the absence
of exogenous technological change), the amount of public goods G appears
as a separate input. Since L and K are private factors of production, the
competitive equilibrium of the private sector requires that the production
function F̃ (·, ·, ·) has constant returns to its first two arguments:
F̃ (ÎK , ÎL , G ) = ÎF̃ (K , L , G ).
Hence, given ∂ F̃ (·)/∂ G > 0, a proportional change of G and of the private
factors L and K results in a more than proportional increase in production.
The function F̃ (·, ·, ·) therefore has increasing returns to scale, but this does
not prevent the existence of a competitive equilibrium as long as G is a non-
rival and non-excludable factor which is made available to all productive units
without any cost. If the provision of public goods is constant over time (G (t ) =
Ḡ for each t ) then, as in the preceding section, constant returns to K and L
would imply decreasing returns to K . With an increase in the stock of capital,
the growth rate that is implied by the optimization of (4.10) and (4.20), i.e.
Ċ (t )
C (t )
=
(
∂ F̃ (K (t ), L (t ), Ḡ )
∂ K
− Ò
)/
Û,
can only decrease, and will fall to zero in the limit if L continues to receive a
positive share of aggregate income.
To allow indefinite growth, the provision of public goods needs to increase
exponentially. If, as seems realistic, a higher G (t ) has a positive effect on the
marginal productivity of capital, then Ġ (t ) > 0 has a similar effect to the (ex-
ogenous) growth of A(t ) in the preceding sections. Hence, an ever increasing
supply of public goods may allow the return on savings to remain above the
discount rate Ò so that the economy as a whole can grow indefinitely.
As we saw in Section 4.5.2, the development of A(t ) could be made
endogenous by assuming that the accumulation of this index of efficiency
depended on the capital stock. Similarly, and even more obviously, the provi-
sion of public goods is a function of private economic activity if one assumes
that their provision is financed by the taxation of private income. If
G (t ) = ÙF̃ (K (t ), L (t ), G (t )), (4.34)
then each increase in production will be shared in proportion between con-
sumption, investments and the increase of G (t ),which can offset the secular
decrease in the marginal productivity of capital.
160 EQUILIBRIUM GROWTH
To obtain a balanced growth path, the production function needs to have
constant returns to K and G for any constant L . In fact, if
F̃ (ÎK , L , ÎG ) = ÎF̃ (K , L , G ),
a constant increase of capital will imply proportional growth of income if G
grows at the same rate as K —this is in turn implied by the proportionality
of income, tax revenues, and the provision of public goods in (4.34). To cal-
culate the growth rate that is compatible with a balanced government budget
and with the resulting savings and investment decisions, we must to take into
account the fact that we have to subtract the tax rate Ù from the private return
on savings; hence, consumption grows at the rate
Ċ (t )
C (t )
=
(
(1 − Ù) ∂ F̃ (K (t ), L (t ), G (t ))
∂ K
− Ò
)/
Û, (4.35)
and the growth path of the economy will satisfy the above equation and (4.34).
Exercise 41 Consider the production function
F̃ (K , L , G ) = K · L ‚ G „.
Determine what relation ·, ‚, and „ need to satisfy so that the economy has
a balanced growth path. What is the growth rate along this balanced growth
path?
4.5.6. MONOPOLY POWER AND PRIVATE INNOVATIONS
An important aspect of the models described above is the fact that the decen-
tralized growth path need not be optimal in the absence of a complete set of
competitive markets. The formal analysis of economic interactions that are
less than fully efficient plays an important role in modern macroeconomics,
and in this concluding section we briefly discuss how imperfectly competitive
markets may imply inefficient outcomes.
In order to decentralize production decisions, we have so far assumed that
markets are perfectly competitive (allowing only for the possibility of missing
markets in the case of non-excludable factors). However, it is realistic to
assume that there are firms that have monopoly power and that do not take
prices as given. From the viewpoint of the preceding sections, it is interesting
to note the relationship between monopoly power and increasing returns to
scale within firms. Returning to the example of a house, we assume that the
project is in fact excludable. That is, a given productive entity (a firm) can
legally prevent unauthorized use of the project by third parties. However,
within the firm the project is still non-rival, and the firm can use the same
blueprint to build any arbitrary number of houses. If we assume that the firm
EQUILIBRIUM GROWTH 161
is competitive, it will be willing to supply houses as long as the price of each
is above marginal cost. Hence for a price above marginal cost supply tends to
infinity, while for any price below marginal cost supply is zero. But if the price
is exactly equal to marginal cost, then revenues are just enough to recover the
variable cost (materials, labor, land)—and the fixed cost (the project) would
need to be paid by the firm, which should rationally refuse to enter the market.
A firm that bears a fixed cost but does not have increasing marginal costs
(or more generally has increasing returns) has to be able to charge a price
above marginal cost in order to exist. Formally, we assume that firm j needs
to pay a fixed cost Í0 to be able to produce, and a variable cost (per unit of
output) equal to Í1. In addition, we assume that the demand function has
constant elasticity, with p j = x
·−1
j where x j is the number of units produced
and offered on the market. The total revenues are thus p j x j = x
·
j , and to
maximize profits,
max
x j
x ·j − Í0 − Í1 x j ,
the firm chooses output level
x j =
(Í1
·
)1/(·−1)
and charges price
p j =
Í1
·
.
With free entry of firms (that is any firm that pays Í0 can start production of
this item), profits will be zero in equilibrium:
( p j − Í1)x j = Í0 ⇒ x j =
Í0
Í1
·
1 − · , (4.36)
and the resulting price is equal to the average cost of production, rather than
the marginal cost, as in the case of perfect competition. The costs of each firm
are thus given by
Í0 + Í1 x j = Í0 + Í1
Í0
Í1
·
1 − · =
Í0
1 − · . (4.37)
This condition determines the scale of production, or in our example the
number of houses that are produced with each project.
To incorporate this monopolistic behavior in a dynamic general equilib-
rium model, we consider the aggregate production (valued at market prices)
of N identical firms:
X =
N∑
j =1
p j x j =
N∑
j =1
x · = N x ·.
162 EQUILIBRIUM GROWTH
If Í0 and Í1 are given and if N is an integer, then this measure of output can
only be a multiple of the scale of production calculated in (4.36). However,
nothing constrains us from indexing firms with a continuous variable and
replacing the summation sign by an integral.38 Writing
X =
∫ N
0
x ·j d j = x
·
∫ N
0
d j = N x ·,
and treating N as a continuous variable, the zero profit condition can be
exactly satisfied for any value of aggregate production. Given that profits are
zero, the value of production equals the cost of production, which in turn is
given by N times the quantity in (4.37). Assume for a moment that the costs
of a firm (both fixed and variable) are given by the quantity of K multiplied by
r (t ). For a given supply of productive factors, we can then determine the num-
ber of production processes that can be activated as well as the remuneration
of the production factors. The scale of production of each of the N identical
firms is proportional to K /N, and the constant of proportionality is given by
Í0/(1 − ·).
We thus have
X =
∫ N
0
(
Í0
1 − ·
K
N
)·
d j =
(
Í0
1 − ·
)·
N1−· K ·. (4.38)
Because the goods are imperfect substitutes, the value of output increases with
the number of varieties N for any given value of K . In other words, for a given
value of income it is more satisfying to consume a wider variety of goods.
Suppose that the value of aggregate output is defined by
Y = L 1−·
(∫ N
0
x ·j d j
)
= L 1−· X.
That is, output (which can be consumed or invested in the form of capital) is
obtained by combining the market value X of the intermediate goods x j with
factor L which, as usual, is assumed to be exogenous and fixed.
Let us assume in addition that utility has the constant-elasticity form (4.20),
so that the optimal rate of growth of consumption is constant if the rate of
return on savings is constant. Given that, in equilibrium,
Y = L 1−· X = L 1−·Ó1−· K ,
³⁸ Approximating N by a continuous variable is substantially appropriate if the number of firms is
large. Formally, one would let the economic size of each firm go to zero as their number increases, and
keep the product of the number of firms by the distance between their indexes constant at N.
EQUILIBRIUM GROWTH 163
so that ∂Y /∂ K is constant (non decreasing), we find that equilibrium has a
growth path with a constant growth rate if
∂Y
∂ K
= L 1−·Ó1−· > Ò.
In the decentralized equilibrium, the rate of growth is (r − Ò)/Û where r
denotes the remuneration of capital in terms of the final good. To determine
r , we notice that each factor is paid according to its marginal productivity in
the final goods sector provided that this sector is competitive. Hence, the total
value of income that accrues to capital is equal to
r K = ·(Y /K )K = ·L 1−·Ó1−· K
and
r = ·L 1−·Ó1−· < L 1−·Ó1−· =
∂Y
∂ K
.
The private accumulation of capital is rewarded at a rate that is below its pro-
ductivity at the aggregate level. As before, the economy therefore grows below
the optimum growth rate. Intuitively, given that the production technology is
characterized by increasing returns at the level of an individual firm, firms can
make positive profits only if prices exceed marginal costs. The rate r which
determines marginal costs is therefore below the true aggregate return on
capital. The difference between private and social returns on capital is given
by the mark-up, which distorts savings decisions and implies that growth is
slower than optimal.
Admitting that prices may be above marginal cost, one can add further
realism to the model by assuming that monopolistic market power is of a
long-run nature. This requires that fixed flow costs be incurred once the
firm is created. Over time firms can therefore gradually recover fixed costs,
thanks to monopolistic rents. Obviously, this is the right way to formalize
the above house example: the fixed cost of designing the house is paid once,
but the resulting project can be used many times. We refer readers to the
bibliographical references at the end of this chapter for a complete treatment
of the resulting dynamic optimization problem and its implications for the
aggregate growth rate.
REVIEW EXERCISES
Exercise 42 Consider the production function
Y = F (K ) =
{
·K − 1
2
K 2 if K < ·,
1
2
·2 otherwise.
164 EQUILIBRIUM GROWTH
(a) Determine the optimality conditions for the problem
max
∫ ∞
0
u(C (t ))e −Òt d t
s.t. C (t ) = F (K (t )) − K̇ (t ), K (0) < · given
with utility function
u(x ) =
{
ı + ‚x − 1
2
x 2 if x < ‚
ı + 1
2
‚2 otherwise.
(b) Calculate the steady-state value of capital, production, and consumption.
Draw the phase diagram in the capital–consumption space. (The formal
derivations can be limited to the region K < ·, C < ‚ assuming that the
parameters ·, ‚, Ò satisfy appropriate conditions. You may also provide an
(informal) discussion of the optimal choices outside this region in which
the usual assumptions of convexity are not satisfied.)
(c) To draw the phase diagram, one needs to keep in mind the role of parame-
ters · and Ò. But what is the role of ‚?
(d) The production function does not have constant returns to scale. This is a
problem (why?) if one wants to interpret the solution as a dynamic equi-
librium of a market economy. Show that for a certain g (L ) the production
function
Y = F (K , L ) = ·K − g (L )K 2
has constant returns to K and L in the relevant region. Also show that
the solution characterized above corresponds to the dynamic equilibrium
of an economy endowed with an amount L = 2 of a non-accumulated
factor.
Exercise 43 Consider an economy in which output and accumulation satisfy
Y (t ) = ln(L + K (t )),
K̇ (t ) = s Y (t ),
with L and s constant.
(a) Can this economy experience unlimited growth of consumption C (t ) =
(1 − s ) Y (t )? Explain why this may or may not be the case.
(b) Can the productive structure of this economy be decentralized to competi-
tive firms?
EQUILIBRIUM GROWTH 165
Exercise 44 Consider an economy with a production function and a law of
motion for capital given by
Y (t ) = L + L 1−· K (t )·, K̇ (t ) = Y (t ) − C (t ).
(a) Let 0 ≤ · ≤ 1. How are L and K (t ) compensated if markets are compet-
itive?
(b) Determine the growth rate of aggregate consumption C (t ) if there is a
fixed number of identical consumers that maximize the same objective
function,
U =
∫ ∞
0
c (t )1−Û − 1
1 − Û e
−Òt d t,
where r (t ) denotes the real interest rate on savings. Provide a brief discus-
sion.
(c) Given the above assumptions, characterize graphically the dynamics of the
economy in the space (C, K ) if · < 1, and calculate the steady state.
(d) How are the dynamics if · = 1? How do the income shares of the two
factors evolve? Discuss the realism of this model with reference to the
empirical plausibility of the balanced growth path.
Exercise 45 An economic system is endowed with a fixed amount of a production
factor L . Of this, L Y units are employed in the production of final goods destined
for consumption and accumulation,
Y (t ) = A(t )K · L 1−·Y , K̇ (t ) = Y (t ) − C (t ).
The remaining units of L are used to increase A(t ) according to the following
technology:
Ȧ(t ) = (L − L Y ) A(t ).
(a) Consider the case in which the propensity to save is equal to s . Characterize
the balanced growth path of this economy.
(b) What feature allows this economy to grow endogenously? What economic
interpretation can we give for the difference between K and A?
(c) Discuss the possibility of decentralizing production with the above technol-
ogy if A, K , and L are “rival” and “excludable” factors.
Exercise 46 Consider an economy in which output Y , capital K , and consump-
tion C are related as follows:
Y (t ) = F (K (t ), L ) = (K (t )„ + L „)1/„, K̇ (t ) = Y (t ) − C (t ) − ‰K (t ),
where L > 0, ‰ > 0, and „ ≤ 1 are fixed parameters.
(a) Show that the production function has constant returns to scale.
166 EQUILIBRIUM GROWTH
(b) Write the production function in the form y = f (k) for y ≡ Y /L and
k ≡ K /L .
(c) Calculate the net rate of return on capital, r = f ′(k) − ‰, and show that
in the limit with k approaching infinity this rate tends to −‰ if „ ≤ 0, and
to 1 − ‰ if „ > 0.
(d) Denote the net production by Ỹ ≡ Y − ‰K = F (K , L ) − ‰K , and
assume that C (t ) = 0.5Ỹ (t ) (aggregate consumption is equal to half the
net income). What happens to consumption if the economy approaches a
steady state?
(e) If on the contrary consumption is chosen to maximize
U =
∫ ∞
0
log(c (t ))e −Òt d t,
for which values of „ and Ò will there be endogenous growth?
Exercise 47 Consider an economy in which
Y (t ) = K (t )· L̄ ‚, K̇ (t ) = P (t )s Y (t ),
and in which the labor force is constant, and a fraction s of P (t )Y (t )is dedicated
to the accumulation of capital.
(a) Consider P (t ) = P̄ (constant). For which values of · and ‚ does there
exist a steady state in levels or in growth rates? For which values can we
decentralize the production decisions to competitive firms?
(b) Let P (t ) = e ht , where h > 0 is a constant. With · < 1, at which rate can
Y (t ) grow?
(c) How does the economy grow if on the contrary P (t ) = K (t )1−·?
(d) What does P (t ) represent in this economy? How can we interpret the
assumption made in (b) and (c)?
Exercise 48 Consider an economy in which all individuals maximize
U =
∫ ∞
0
U (c (t )) e −Òt d t, with U (c ) = 1 − 1
c
and Ò = 1.
(a) Let r denote the return in private savings and determine the rate of growth
of consumption.
(b) Suppose that production utilizes private capital and labor according to
Y (t ) = F (K , L , t ) = B (t )L + 3K .
Determine the per-unit income of L and K , denoted by w(t ) and r (t )
respectively, if capital and labor are paid their marginal productivity.
(c) Suppose that L is constant, that K̇ (t ) = Y (t ) − C (t ), and that Ḃ (t ) =
B (t ). Can capital and production grow for ever at the same rate as the
EQUILIBRIUM GROWTH 167
optimal consumption? Determine the relation between C (t ), K (t ), and
B (t ) along the balanced growth path.
(d) Suppose that at the aggregate level B (t ) = K (t ), but that factors are com-
pensated on the basis of their marginal productivity taking as given B (t ).
Show that the resulting decentralized growth rate is below the socially
efficient growth rate.
� FURTHER READING
This chapter offers a concise introduction to key notions within a subject
treated much more exhaustively by Grossman and Helpman (1991), Barro and
Sala-i-Martin (1995), and Aghion and Howitt (1998). Models of endogenous
growth were originally formulated in Romer (1986, 1990), Rebelo (1991),
and other contributions that may be fruitfully read once familiar with the
technical aspects discussed here. Blanchard and Fisher (1989, section 2.2)
offers a concise discussion of how optimal growth paths may be decentral-
ized in competitive markets. For a discussion of general equilibrium in more
complex growth environments, readers are referred to Jones and Manuelli
(1990) and Rebelo (1991). These papers consider production technologies
that enable endogenous growth, and the optimal growth paths of these
economies can be decentralized as in the models of Sections 4.2.3 and 4.5.4.
The model of Rebelo allows for a distinction between investment goods and
consumption goods. As a result, the optimal production decisions may be
decentralized even in the presence of non-accumulated factors like L in this
chapter. However, this requires that non-accumulated factors be employed
in the production of consumption goods only, and not in the production
of investment goods. An extensive recent literature lets non-accumulated
factors be employed in a (labor-intensive) research and development sector,
where endogenous growth is sustained by learning by doing or informational
spillover mechanisms of the type discussed in Sections 4.2 and 4.3 above.
McGrattan and Schmidtz (1999) offer a nice macro-oriented introduction to
the relevant insights. Romer (1990) and Grossman and Helpman (1991) are
key references in this literature. Grossman and Helpman (1991) offer fully
dynamic versions of the model with monopolistic competition, introduced
in the last section of this chapter. The role of research and development is also
treated in Barro and Sala-i-Martin (1995), who discuss the role of government
spending in the growth process, an issue that was originally dealt with in Barro
(1990).
As to empirical aspects, there is an extensive literature on the measurement
of the growth rate of the Solow residual; for a discussion of this issue see
e.g. Maddison (1987) or Barro and Sala-i-Martin (1995), chapter 10. Barro
and Sala-i-Martin (1995) and McGrattan and Schmidtz (1999) offer extensive
168 EQUILIBRIUM GROWTH
reviews of recent empirical findings regarding long-run economic growth
phenomena. Briefly, the treatment of human capital as an accumulated factor
(as in Section 5.4 above) and careful measurement of government interference
with market interactions (as in Section 5.5 above) have both proven crucial
in interpreting cross-country income dynamics. More detailed and realistic
theoretical models than those offered by this chapter’s stylized treatment have
of course proved empirically useful, especially as regards the government’s role
in protecting investors’ legal rights to the fruits of their efforts, and open-
economy aspects. Theoretical and empirical contributions have also paid well-
deserved attention to politico-economic tensions regarding all relevant poli-
cies’ implications for growth and distribution (see Bertola, 2000, and refer-
ences therein), as well as to the role of finite lifetimes in determining aggregate
saving rates (see Blanchard and Fischer, 1989, and Heijdra and van der Ploeg,
2002).
More generally, treatment of policy influences and market imperfections
along the lines of this chapter’s argument is becoming more prominent in
macroeconomic equilibrium models. As noted by Solow (1999), much of
the recent methodological progress on such aspects was prompted by the
need to allow for increasing returns to scale in endogenous growth models,
but the relevant insights have much wider applicability, and need not play a
particularly crucial role in explaining long-run growth phenomena.
� REFERENCES
Aghion, P., and P. Howitt (1998) Macroeconomic Growth Theory, Cambridge, Mass.: MIT Press.
Barro, R. J. (1990) “Government Spending in a Simple Model of Endogenous Growth,” Journal
of Political Economy, 98, S103–S125.
and X. Sala-i-Martin (1995) Economic Growth, New York: McGraw-Hill.
Bertola, G. (2000) “Macroeconomics of Income Distribution and Growth,” in A. B. Atkinson
and F. Bourguignon (eds.), Handbook of Income Distribution, vol. 1, 477–540, Amsterdam:
North-Holland.
Blanchard, O. J., and S. Fischer (1989) Lectures on Macroeconomics, Cambridge, Mass.: MIT
Press.
Grossman, G. M., and E. Helpman (1991) Innovation and Growth in the Global Economy,
Cambridge, Mass.: MIT Press.
Heijdra, B. J., and F. van der Ploeg (2002) Foundations of Modern Macroeconomics, Oxford:
Oxford University Press.
Jones, L. E., and R. Manuelli (1990) “A Model of Optimal Equilibrium Growth,” Journal of
Political Economy, 98, 1008–1038.
Maddison, A. (1987) “Growth and Slowdown in Advanced Capitalist Economies,” Journal of
Economic Literature, 25, 649–698.
EQUILIBRIUM GROWTH 169
McGrattan, E. R., and J. A. Schmidtz, Jr (1999) “Explaining Cross-Country Income Differences,”
in J. B.Taylor and M. Woodford (eds.), Handbook of Macroeconomics, vol. 1A, 669–736, Ams-
terdam: North-Holland.
Rebelo, S. (1991) “Long-Run Policy Analysis and Long-Run Growth,” Journal of Political Econ-
omy, 99, 500–521.
Romer, P. M. (1986) “Increasing Returns and Long-Run Growth,” Journal of Political Economy,
94, 1002–1037.
(1990) “Endogenous Technological Change,” Journal of Political Economy, 98, S71–S102.
(1987) “Growth Based on Increasing Returns Due to Specialization,” American Economic
Review (Papers and Proceedings), 77, 56–72.
Solow, R. M. (1956) “A Contribution to the Theory of Economic Growth,” Quarterly Journal of
Economics, 70, 65–94.
(1999) “Neoclassical Growth Theory,” in J. B. Taylor and M. Woodford (eds.), Handbook of
Macroeconomics, vol. 1A, 637–667, Amsterdam: North-Holland.
5 Coordination and
Externalities in
Macroeconomics
As we saw in Chapter 4, externalities play an important role in endogenous
growth theory. Many recent contributions have explored the relevance of
similar phenomena in other macroeconomic contexts. In general, aggregate
equilibria based on microeconomic interactions may differ from those medi-
ated by the equilibrium of a perfectly competitive market in which agents
take prices as given. If every agent correctly solves her own individual prob-
lem, taking into consideration the actions of all other agents rather than the
equilibrium price, then nothing guarantees that the resulting equilibrium is
efficient at the aggregate level. Uncoordinated “strategic” interactions may
thus play a crucial role in many modern macroeconomic models with micro
foundations.
In this chapter we begin by considering the relationship between the exter-
nalities that each agent imposes on other individuals in the same market
and the potential multiplicity of equilibria, first in an abstract trade setting
(Section 5.1) and then in a simple monetary economy (Section 5.2). (The
appendix to this chapter describes a general framework for the analysis of the
relationship between externalities, strategic interactions, and the properties
of multiplicity and efficiency of the aggregate equilibria.) Then we study a
labor market characterized by a (costly) process of search on the part of firms
and workers. This setting extends the analysis of the dynamic aspects of labor
markets of Chapter 3, focusing on the flows into and out of unemployment.
Attention to labor market flows is motivated by their empirical relevance: even
in the absence of changes in the unemployment rate, job creation and job
destruction occur continuously, and the reallocation of workers often involves
periods of frictional unemployment. The stylized “search and matching”
modeling framework introduced below is realistic enough to offer empirically
sensible insights, reviewed briefly in the “Further Readings” section at the
end of the chapter. We formally analyze determination of the steady state
equilibrium in Section 5.3 and the dynamic adjustment process in Section 5.4.
Finally, Section 5.5 characterizes the efficiency implications of externalities in
labor market search activity.
COORDINATION AND EXTERNALITIES 171
5.1. Trading Externalities and Multiple Equilibria
This section analyzes a basic model where the nature of interactions among
individuals creates a potential for multiple equilibria. These equilibria are
characterized by different levels of “activity” (employment, production) in
the economy. The model presented here is based on Diamond (1982a ) and
features a particular type of externality among agents operating in a given
market: the larger the number of potential trading partners, the higher the
probability that an agent will make a profitable trade (trading externality).
Markets with a high number of participants thus attract even more agents,
which reinforces their characteristic as a “thick” market, while “thin” markets
with a low number of participants remain locked in an inferior equilibrium.
5.1.1. STRUCTURE OF THE MODEL
The economy is populated by a high number of identical and infinitely lived
individuals, who engage in production, trade, and consumption activities.
Production opportunities are created stochastically according to a Poisson
distribution, whose parameter a defines the instantaneous probability of the
creation of a production opportunity. At each date t0, the probability that no
production opportunity is created before date t is given by e −a (t−t0 ) (and the
probability that at least one production opportunity is created within this
time interval is thus given by 1 − e −a (t−t0 )). This probability depends only on
the length of the time interval t − t0 and not on the specific date t0 chosen.
The probability that a given agent receives a production opportunity between
t0 and t is therefore independent of the distribution of production prior
to t0.
39
All production opportunities yield the same quantity of output y, but they
differ according to the associated cost of production. This cost is defined by
a random variable c , with distribution function G (c ) defined on c ≥ c > 0,
where c represents the minimum cost of production. Trade is essential in the
model, because goods obtained from exploiting a production opportunity
cannot be consumed directly by the producer. This assumption captures in
a stylized way the high degree of specialization of actual production processes,
and it implies that agents need to engage in trade before they can consume. At
each moment in time, there are thus two types of agent in the market:
1. There are agents who have exploited a production opportunity and wish
to exchange its output for a consumption good: the fraction of agents
in this state is denoted by e , which can be interpreted as a “rate of
³⁹ The stochastic process therefore has the Markov property and is completely memoryless. In a
more general model, a may be assumed to be variable. The function a (t ) is known as the hazard
function.
172 COORDINATION AND EXTERNALITIES
employment,” or equivalently as an index of the intensity of production
effort.
2. There are agents who are still searching for a production opportunity:
the corresponding fraction 1 − e can be interpreted as the “unemploy-
ment rate.”
Like production opportunities, trade opportunities also occur stochasti-
cally, but their frequency depends on the share of “employed” agents: the
probability intensity of arrivals per unit of time is not a constant, like the a
parameter introduced above, but a function b(e ), with b(0) = 0 and b′(e ) > 0.
The presence of a larger number of employed agents in the market increases
the probability that each individual agent will find a trading opportunity. This
property of the trading technology is crucial for the results of the model and its
role will be highlighted below.
Consumption takes place immediately after agents exchange their goods.
The instantaneous utility of an agent is linear in consumption (y) and in the
cost of production (−c ), and the objective of maximizing behavior is
V = E
[ ∞∑
i =1
(
− e −r ti c + e −r (ti +Ùi ) y
)]
,
where r is the subjective discount rate of future consumption, the sequence
of times {ti } denotes dates when production takes place, and {Ùi } denotes the
interval between such dates and those when consumption and trade take place.
Since production and trade opportunities are random, both {ti } and {Ùi } are
uncertain, and the agent maximizes the expected value of discounted utility
flows.
To maximize V , the agent needs to adopt an optimal rule to decide whether
or not to exploit a production opportunity. This decision is based on the cost
that is associated with each production opportunity or, equivalently, on the
effort that a producer needs to exert to exploit the production opportunity.
The agent chooses a critical level for the cost c ∗, such that all opportunities
with a cost level equal to c ≤ c ∗ are exploited, while those with a cost level
c > c ∗ are refused.
To solve the model, we need to determine this critical value c ∗ and the
dynamic path of the level of activity or “employment” e .
5.1.2. SOLUTION AND CHARACTERIZATION
To study the behavior of the economy outlined above, we first derive the
equations that describe the dynamics of the level of activity (employment) e
and the critical value of the costs c ∗ (the only choice variable of the model).
COORDINATION AND EXTERNALITIES 173
The evolution of employment is determined by the difference between the
flow into and out of employment. The first is equal to the fraction of the
unemployed agents that receive and exploit a production opportunity: this
fraction is equal to (1 − e )a G (c ∗). The flow out of employment is equal to
the fraction of employed agents who find a trading opportunity and who thus,
after consumption, return to the pool of unemployed. This fraction is equal to
e b(e ). The assumption b′(e ) > 0 that was introduced above now has a clear
interpretation in terms of the increasing returns to scale in the process of trade.
Calculating the elasticity of the flow out of employment e b(e ) with respect to
the rate of employment e , we get
ε = 1 +
e b′(e )
b(e )
,
which is larger than one if b′(e ) > 0 (implying increasing returns in the
trading technology). In other words, a higher rate of activity increases the
probability that an employed agent will meet a potential trading partner.
Given the expressions for the flows into and out of employment, we can
write the following law of motion for the employment rate:
ė = (1 − e )a G (c ∗) − e b(e ). (5.1)
In a steady state of the system the two flows exactly compensate each other,
leaving e constant. The following relation between the steady-state value of
employment and the critical cost level c ∗ therefore holds:
(1 − e )a G (c ∗) = e b(e )
⇒ d e
d c ∗
∣∣∣∣
ė =0
=
(1 − e )a G ′(c ∗)
b(e ) + e b′(e ) + a G (c ∗)
> 0. (5.2)
A rise in c ∗ increases the flow into employment, since it raises the share of
production opportunities that agents find attractive, and thus determines a
higher steady-state value for e , as depicted in the left-hand panel of Figure 5.1.
For points that are not located on the locus of stationarity, the dynamics of
employment are determined by the effect of e on ė : according to (5.1), a higher
value for e reduces ė , as is also indicated by the direction of the arrows in the
figure.
In order to determine the production cost below which it is optimal to
exploit the production opportunity, agents compare the expected discounted
value of utility in the two states: employment (the agent has produced the
good and is searching for a trading partner) and unemployment (the agent is
looking for a production opportunity with sufficiently low cost). The value of
the objective function in the two states is denoted by E and U , respectively.
These values depend on the path of employment e and thus vary over time;
174 COORDINATION AND EXTERNALITIES
Figure 5.1. Stationarity loci for e and c ∗
however, if we limit attention to steady states for a moment, then E and
U are constant over time ( Ė = 0 and U̇ = 0). The relationships that tie the
values of E and U can be derived by observing that the flow utility from
employment (r E ) needs to be equal to utility of consumption y, which occurs
with probability b(e ), plus the expected value of the ensuing change from
employment to unemployment:
r E = b(e )y + b(e )(U − E ). (5.3)
There is a clear analogy with the pricing of financial assets (which yield
periodic dividends and whose value may change over time), if we interpret
the left-hand side of (5.3) as the flow return (opportunity cost) that a risk-
neutral investor demands if she invests an amount E in a risk-free asset with
return r . The right-hand side of the equation contains the two components
of the flow return on the alternative activity “employment”: the expected
dividend derived from consumption, and the expected change in the asset
value resulting from the change from employment to unemployment. This
interpretation justifies the term “asset equations” for expressions like (5.3)
and (5.4).
Similarly, the flow utility from unemployment comprises the expected value
from a change in the state (from unemployment to employment) which
occurs with probability a G (c ∗) whenever the agent decides to produce; and
the expected cost of production, equal to the rate of occurrence of a pro-
duction opportunity a times the average cost (with a negative sign) of the
production opportunities that have a cost below c ∗ and are thus realized.
COORDINATION AND EXTERNALITIES 175
The corresponding asset equation is therefore given by
r U = a G (c ∗)(E − U ) − a
∫ c ∗
c
c d G (c )
= a
∫ c ∗
c
(E − U − c ) d G (c ), (5.4)
where G (c ∗) ≡
∫ c ∗
c
d G (c ).
Equations (5.3) and (5.4) can be derived more rigorously using the prin-
ciple of dynamic programming which was introduced in Chapter 1. In the
following we consider a discrete time interval �t , from t = 0 to t = t1, and we
keep e constant. Moreover, we assume that an agent who finds a production
opportunity and returns to the pool of unemployed does not find a new
production opportunity in the remaining part of the interval �t . Given these
assumptions, we can express the value of employment at the start of the
interval as follows:
E =
∫ t1
0
be −bt e −r t y d t + e −r �t [e −b�t E + (1 − e −b�t )U ], (5.5)
where the dependence of b on e is suppressed to simplify notation. The first
term on the right-hand side of (5.5) is the expected utility from consumption
during the interval, which is discounted to t = 0. (Remember that e−bt defines
the probability that no trading opportunity arrives before date t .) The second
term defines the expected (discounted) utility that is obtained at the end of
the interval at t = t1. At this date, the agent may be either still “employed,”
having not had a chance to exchange the produced good (which occurs with
probability e −b�t ), or “unemployed,” after having traded the good (which
occurs with complementary probability 1 − e −b�t ).40 Solving the integral in
(5.5) yields
E =
b
b + r
(1 − e −(r +b)�t )y + e −r �t [e −b�t E + (1 − e −b�t )U ]
=
b
r + b
y +
e −r �t (1 − e −b�t )
1 − e −(r +b)�t U. (5.6)
Taking the limit of (5.6) for �t → 0 and applying l’Hôpital’s rule to the
second term, so that
lim
�t→0
−r e −r �t (1 − e −b�t ) + be −r �t e −b�t
(r + b)e −(r +b)�t
=
b
r + b
,
⁴⁰ Since we limit attention to steady-state outcomes in which e is constant, E and U are also constant
over time. As a result, there is no difference between the values at the beginning and at the end of the
time interval.
176 COORDINATION AND EXTERNALITIES
we get the asset equation for E which was already formulated in (5.3):
E =
b
r + b
y +
b
r + b
U
⇒ r E = b y + b(U − E ).
Similar arguments can be used to derive the second asset equation in (5.4).
The critical value c ∗ is set in order to maximize E and U .
In the optimum, therefore, the following first-order conditions hold:
∂ E
∂c ∗
=
∂U
∂c ∗
= 0.
The derivative of the value of “unemployment” with respect to the thresh-
old cost level c ∗ can be obtained from (5.4) using Leibnitz’s rule,41
d
d b
∫ b
a
f (z)d z = f (b).
In our case, f (z) = (E − U − z)(d G /d z). Differentiating (5.4) with respect
to c ∗ and equating the resulting expression to zero yields
r
∂U
∂c ∗
= a (E − U − c ∗)G ′(c ∗) = 0
⇒ c ∗ = E − U. (5.7)
In words, whoever is unemployed (searching for a production opportunity) is
willing to bear a cost of production that is at most equal to the gain, in terms
of expected utility, from exploiting a production opportunity to move from
unemployment to “employment.” Now, subtracting (5.4) from (5.3), we get
r (E − U ) = b(e )y − b(e )(E − U ) − a G (c ∗)(E − U ) + a
∫ c ∗
c
c d G (c ).
(5.8)
Using (5.8) we can now derive the equation for the stationary value of c ∗,
which expresses c ∗ as a function of e . Writing
E − U = c ∗ =
b(e )y + a
∫ c ∗
c
c d G (c )
r + b(e ) + a G (c ∗)
, (5.9)
⁴¹ In general, the definition of an integral implies
d
d x
∫ b(x )
a (x )
f (z; x )d z =
∫ b(x )
a (x )
∂ f (z; x )
∂ x
d z + b′(x ) f (b(x )) − a ′(x ) f (a (x ))
(Leibnitz’s rule). Intuitively, the area below the curve of f (·) and between the points a (·) and b(·) is
equal to the integral of the derivative of f (·) over the interval. Moreover, an increase in the upper
limit increases this area in proportion to f (b(x )), while an increase in the lower limit decreases it in
proportion to f (a (x )).
COORDINATION AND EXTERNALITIES 177
rearranging to
b(e )y + a
∫ c ∗
c
c d G (c ) = (r + b(e ) + a G (c ∗))c ∗,
and differentiating, we find that the slope of the locus of stationarity (5.9) is
d c ∗
d e
∣∣∣∣
ċ ∗=0
=
b′(e )(y − c ∗)
r + b(e ) + a G (c ∗)
. (5.10)
The sign of this derivative is positive since y > c ∗ (agents accept only those
production possibilities with a cost below the value of output) and b′(e ) > 0.
Notice also that if e = 0 no trade ever takes place. (There are no agents with
goods to offer.) In this case, agents are indifferent between employment and
unemployment and there is no incentive to produce: c ∗ = E − U = 0. Finally,
if we assume that b′′(e ) < 0, one can show that d 2c ∗/d e 2 < 0. Hence, the
function that represents the locus of stationarity is strictly concave, and the
locus of stationarity, which is drawn in the right-hand panel of Figure 5.1,
starts in the origin and increases at a decreasing rate. The positive sign of
d c ∗/d e|ċ ∗=0 implies that there exists a strategic complementarity between
the actions of individual agents. The concept of strategic complementarity
is formally introduced in the appendix to this chapter. Intuitively, it implies
that the actions of one agent increase the payoffs from action for all other
agents; expressed in terms of the model studied here, the higher the fraction
of employed agents, the more likely each individual agent will find a trading
partner. This induces agents to increase the threshold for acceptance of pro-
duction opportunities. At the aggregate level, therefore, the optimal individual
response implies a more than proportional increase in the level of activity. To
determine the dynamics of c ∗, we need to remember that the equilibrium rela-
tions (5.3) and (5.4) are obtained on the basis of the assumption that E and
U are constant over time. In general, however, these values will depend on the
path of employment e . In that case, we need to add the terms Ė = ė∂ E (·)/∂e
and U̇ = ė∂U (·)/∂e to the right-hand sides of (5.3) and (5.4), respectively,
yielding:
r E (·) = ∂ E (·)
∂e
ė + b(e )(y − E (·) + U (·)) (5.11)
r U (·) = ∂U (·)
∂e
ė + a
∫ c ∗
c
(E (·) − U (·) − c )d G (c ). (5.12)
In terms of asset equations, Ė and U̇ represent the “capital gains”
that, together with the flow utility, give the “total returns” r E and r U .
178 COORDINATION AND EXTERNALITIES
Now, subtracting (5.12) from (5.11), and noting from (5.7) that
ċ ∗ = Ė − U̇ =
(
∂ E (·)
∂e
− ∂U (·)
∂e
)
ė,
we can derive the expression for the dynamics of c ∗:
ċ ∗ = r c ∗ − b(e )(y − c ∗) + a
∫ c ∗
c
(c ∗ − c )d G (c ). (5.13)
Moreover, if we assume that ċ ∗ = 0, we obtain exactly (5.9). Since
∂ċ ∗
∂c ∗
= r + b(e ) + a G (c ∗) > 0,
the response of ċ ∗ to c ∗ is positive, as shown by the direction of the arrows in
Figure 5.1. We are now in a position to analyze the possible equilibria of the
economy, and we can make the interpretation of individual behavior in terms
of the strategic complementarity more explicit.
First of all, given the shape of the two loci of stationarity, there may be mul-
tiple equilibria. The origin (c ∗ = e = 0) is always an equilibrium of the system.
In this case the economy has zero activity (shut-down equilibrium). If there are
more equilibria, then we may have the situation depicted in Figure 5.2. In this
case there are two additional equilibria: E 1, in which the economy has a low
level of activity, and E 2, with a high level of activity.
Figure 5.2. Equilibria of the economy
COORDINATION AND EXTERNALITIES 179
Graphically, the direction of the arrows in Figure 5.2 implies that the system
can settle in the equilibrium with a high level of activity only if it starts from
the regions to the north-east or the south-west of E 2. As in the continuous-
time models analyzed in Chapters 2 and 4, the dynamics are therefore charac-
terized by a saddlepath. Also drawn in the figure is a saddlepath that leads to
equilibrium in the origin; finally, there is an equilibrium with low (but non-
zero) activity. For a formal analysis of the dynamics we linearize the system of
dynamic equations (5.1) and (5.13) around a generic equilibrium (ē, c̄ ∗).
In matrix notation, this linearized system can be expressed as follows:(
ė
ċ ∗
)
=
(
−(a G (c̄ ∗) + b(ē ) + ē b′(ē )) (1 − ē )a G ′(c̄ ∗)
−b′(ē )(y − c̄ ∗) r + b(ē ) + a G (c̄ ∗)
)(
e − ē
c ∗ − c̄ ∗
)
≡
(
· ‚
„ ‰
)(
e − ē
c ∗ − c̄ ∗
)
, where ·, „ < 0; ‚, ‰ > 0. (5.14)
If in a given equilibrium the curve ė = 0 is steeper than ċ ∗ = 0, then this
equilibrium is a saddlepoint, as in the case of E 2. Formally, we need to verify
the following condition:
det
(
· ‚
„ ‰
)
= ·‰ − ‚„ < 0.
This can be rewritten as
− ·
‚
> − „
‰
,
where −·/‚ is the slope of the curve ė = 0 and −„/‰ is the slope of the curve
ċ ∗ = 0. In contrast, at E 1 the relationship between the steepness of the two
curves is reversed and the determinant of the matrix is positive. Such an
equilibrium is called a node. The trace of the matrix is · + ‰ = r − ē b′(ē ):
whether its node is negative or positive depends on its sign. This in turn
depends on the specific values of r and ē and on the properties of the function
b(·). The existence of a strategic complementarity, arising from the trading
externality implied by the assumption that b′(e ) > 0, has thus resulted in
multiple equilibria.
A low level of employment induces agents to accept only few production
opportunities (c ∗ is low) and in equilibrium the economy is characterized by
a low level of activity. If, on the contrary, employment is high, each agent
will accept many production opportunities and this allows the economy to
maintain an equilibrium with a high level of activity. Finally, it is important
to note that agents’ expectations play a crucial role in the selection of the
equilibrium. Looking at point e 0 in Figure 5.2, it is clear that there exist values
of e for which the economy can either jump to the saddlepath that leads to the
“inferior” equilibrium (the origin), or to the one that leads to the equilibrium
with a high level of activity. Which of these two possibilities is actually realized
180 COORDINATION AND EXTERNALITIES
depends on the beliefs of agents. If agents are “optimistic” (i.e. if they expect
a high level of activity and thus a convergence to the equilibrium at E 2),
then they choose a value of c ∗ on the higher saddlepath, while if they are
“pessimistic” (and anticipate convergence to the origin), they choose a point
on the lower saddlepath.
5.2. A Search Model of Money
The stylized Diamond model of the previous section represents a situation
where heterogeneous tastes and specialization in production force agents to
trade in order to consume. Unlike Robinson Crusoe, the economic agents
of the model cannot consume their own production: in the original article,
Diamond (1982a ) outlines how the economic decisions and interactions of his
model could be applicable to a tropical island where a religious taboo prevents
each of the natives from eating fruit he has picked. And, since trade occurs on
a bilateral basis, rather than in a competitive auctioneered market, the econ-
omy’s general equilibrium cannot be viewed as a representative-agent welfare
maximization problem of the type that is sometimes discussed in terms of
Robinson Crusoe’s activities in undergraduate microeconomics textbooks.
The insights are qualitatively relevant in many realistic settings. In particu-
lar, whenever trade does not occur simultaneously in a frictionless centralized
market, a potential role arises for a “medium of exchange”—an object that is
accepted in a trade not to be directly consumed or used in production, but
only to be exchanged in future trades. It would certainly be inconvenient
for the authors of this book to carry copies of it into stores selling groceries
they wish to consume, hoping that the owner might be interested in learn-
ing advanced macroeconomic techniques. In reality, of course, authors and
publishers exchange books for money, and money for groceries. So, money’s
medium-of-exchange role facilitates exchanges of goods and, ultimately, con-
sumption. The model presented in this section, a simplified version of that
in Kiyotaki and Wright (1993), formalizes the use of money as a medium
of exchange. As in the Diamond model of the previous section, strategic
interaction among individuals is crucial in determining the equilibrium out-
come. Moreover, different equilibria (characterized by different degrees of
acceptability of money in the exchange process) may arise, depending on the
particular traders’ beliefs: again, agents’ expectations are self-fulfilling.
5.2.1. THE STRUCTURE OF THE ECONOMY
Consider an economy populated by a large number of infinitely lived agents.
There is also a large number of differentiated and costlessly storable consump-
tion goods, called commodities, coming in indivisible units. Agents differ as to
COORDINATION AND EXTERNALITIES 181
their preferences for commodities: each individual “likes” (and can consume)
only a fraction 0 < x < 1 of the available commodities. The same exogenous
parameter x denotes the fraction of agents that like any given commodity.
Production occurs only jointly with consumption: when an agent consumes
one unit of a commodity in period t , he immediately produces one unit of
a different good, which becomes his endowment for the next period t + 1.
The utility obtained from consumption, net of any production cost, is U > 0.
As in Diamond’s model of Section 5.1, we assume that commodities cannot
be consumed directly by the producer: this motivates the need for agents to
engage in a trading activity before being able to consume.
In the economy, besides commodities, there is also a certain amount of
costlessly storable fiat money, coming in indivisible units as well as the com-
modities. Fiat money has two distinguishing features: it has no intrinsic value
(it does not yield any utility in consumption and cannot be used as a produc-
tion input), and it is inconvertible into commodities having intrinsic worth.
Initially, an exogenously given fraction 0 < M < 1 of the agents are each
endowed with one unit of money, whereas 1 − M are each endowed with one
unit of a commodity.
We can now describe how agents in the economy behave during any given
period t , in which a fraction M of them are money holders and a fraction
1 − M are commodity holders.
� A money holder will try to exchange money for a consumable commodity.
For this to happen, two conditions must jointly be fulfilled: (i) she must
meet an agent holding a commodity she “likes” (since only a fraction
x of all commodities can be consumed by each agent), and (ii) the
commodity holder must be willing to accept money in exchange for
the consumption good. Only when these two conditions are met does
trade take place: the money holder exchanges her unit of money for a
commodity that she consumes enjoying utility U ; she then immediately
produces one unit of a different commodity (that she “dislikes”), and will
start the next period as a commodity holder. If, on the contrary, trade
does not occur, she will carry money over to the next period.
� A commodity holder will also try to exchange his endowment for a com-
modity he “likes.” For this to happen, he must meet another commodity
holder and both must be willing to trade (i.e. each agent must “like”
the commodity he would receive in the exchange). Exchanges of com-
modities for commodities occur only if they are mutually agreeable, and
therefore both goods are consumed after trade.42 It is also possible that
a commodity holder meets a money holder who “likes” his particular
commodity; if trade takes place, then the agent starts the next period as
a money holder.
⁴² The introduction of an arbitrarily small transaction cost paid by the receiver can rule out the
possibility that an agent agrees to receive in a trade a commodity he cannot consume.
182 COORDINATION AND EXTERNALITIES
The artificial economy here described highlights the different degree of
acceptability of commodities and fiat money. Each consumption good will
always be accepted in exchange by some agents, whereas money will be
accepted only if agents expect to trade it in the future in exchange for con-
sumable goods.
A final assumption concerns the meeting technology generating the agents’
trading opportunities. Agents meet pairwise and at random; in each period an
agent meets another with a constant probability 0 < ‚ ≤ 1.
5.2.2. OPTIMAL STRATEGIES AND EQUILIBRIA
Each agent chooses a trading strategy in order to maximize the expected
discounted utility from consumption. A trading strategy is a rule allowing
the agent to decide whether to accept a commodity or money in exchange
for what he is offering (either a commodity or money). The optimal trading
strategy is obtained by solving the utility maximization problem, taking as
given the strategies of other traders: this is the agent’s optimal response to
other traders’ strategies. When all optimal strategies are mutually consistent,
a Nash equilibrium configuration arises. We focus attention on symmetric
and stationary equilibria, that is, on situations where all agents follow the
same time-invariant strategies. In equilibrium, agents exchange commodities
for other commodities only when both traders can consume the good they
receive, whereas fiat money is used only if it has a “value.” Such a value
depends on its acceptability, which is not an intrinsic property of money but
is determined endogenously in equilibrium.
The agent’s strategy is defined by the following rule of behavior: when a
meeting occurs, the agent accepts a commodity only if he or she “likes” it (then
with probability x ), and he or she accepts money in exchange with probability
when other agents accept money with probability �. The agent must choose
as the best response to the common strategy of other agents, �. To this
end, at the beginning of period t he or she compares the payoffs (in terms of
expected utility) from holding money and from holding a commodity, which
we call VM (t ) and VC (t ) respectively.
For a money holder, the payoff is equal to
VM (t ) =
1
1 + r
{(1 − ‚) VM (t + 1) + ‚[(1 − M) x� (U + VC (t + 1))
+(1 − (1 − M) x�) VM (t + 1)]} , (5.15)
where r is the rate of time preference. If a meeting does not occur (with prob-
ability 1 − ‚) the agent will end period t holding money with a value VM (t +
1), whereas if a meeting does occur (with probability ‚) she will end the period
COORDINATION AND EXTERNALITIES 183
with an expected payoff given by the term in square brackets on the right-hand
side of (5.15). If the agent meets a commodity holder who is offering a good
that she “likes” and is willing to accept money, the exchange can take place
and the payoff is the sum of the utility from consumption U and the value of
the newly produced commodity VC (t + 1). This event occurs with probability
(1 − M)x�. With the remaining probability, 1 − (1 − M)x�, trade does not
take place and the agent’s payoff is simply VM (t + 1).
For a commodity holder, the payoff is
VC (t ) =
1
1 + r
{
(1 − ‚) VC (t + 1) + ‚ [(1 − M) x 2 U + +M x VM (t + 1)
+(1 − Mx ) VC (t + 1)]} . (5.16)
Again, the term in square brackets gives the expected payoff if a meeting occurs
and is the sum of three terms. The first is utility from consumption U , which is
enjoyed only if the agent meets a commodity holder and both like each other’s
commodity (a “double coincidence of wants” situation), so that a barter can
take place; the probability of this event is (1 − M)x 2. The second term is
the payoff from accepting money in exchange for the commodity, yielding a
value VM (t + 1): this trade occurs only if the agent is willing to accept money
(with probability ) and meets a money holder who is willing to receive the
commodity he offers (with probability Mx ). The third term is the payoff from
ending the period with a commodity, which happens in all cases except for
trade with a money holder, so occurs with probability 1 − Mx .
To derive the agent’s best response, we focus on equilibria in which all
agents choose the same strategy, whereby = �, and payoffs are stationary,
so that VM (t ) = VM (t + 1) ≡ VM and VC (t ) = VC (t + 1) ≡ VC . Using these
properties in (5.15) and (5.16), multiplying by 1/(1 + r ), and rearranging
terms we get
r VM = ‚ {(1 − M) x� U + (1 − M) x� (VC − VM )}, (5.17)
r VC = ‚ {(1 − M) x 2U + M x� (VM − VC )}. (5.18)
Expressed in this form, (5.17) and (5.18) are readily interpreted as asset val-
uation equations. The left-hand side represents the flow return from investing
in a risk-free asset. The right-hand side is the flow return from holding either
money or a commodity and includes the expected utility from consumption
(the “dividend” component) as well as the expected change in the value of the
asset held (the “capital gains” component). Finally, subtracting (5.17) from
(5.18), we obtain
VC − VM = ‚
(1 − M)xU
r + ‚x�
(x − �). (5.19)
184 COORDINATION AND EXTERNALITIES
The sign of VC − VM depends on the sign of the difference between the degree
of acceptability of commodities (parameterized by the fraction of agents that
“like” any given commodity x ) and that of money (�). Consequently, the
agents’ optimal strategy in accepting money in a trade depends solely on �.
� If � < x , money is being accepted with lower probability than commod-
ities. Then VC > VM , and the best response is never to accept money in
exchange for a commodity: = 0.
� If � > x , money is being accepted with higher probability than com-
modities. In this case VC < VM , and the best response is to accept money
whenever possible: = 1.
� Finally, if � = x , money and commodities have the same degree of
acceptability. With VC = VM , agents are indifferent between holding
money and commodities: the best response then is any value of
between 0 and 1.
The optimal strategy = (�) is shown in Figure 5.3. Three (stationary
and symmetric) Nash equilibria, represented in the figure along the 45◦ line
where = �, are associated with the three best responses illustrated above:
(i) A non-monetary equilibrium (� = 0): agents expect that money will
never be accepted in trade, so they never accept it. Money is valueless
(VM = 0) and barter is the only form of exchange (point A).
(ii) A pure monetary equilibrium (� = 1): agents expect that money will
be universally acceptable, so they always accept it in exchange for goods
(point C ).
(iii) A mixed monetary equilibrium (� = x ): agents are indifferent between
accepting and rejecting money, as long as other agents are expected
Figure 5.3. Optimal (�) response function
COORDINATION AND EXTERNALITIES 185
to accept it with probability x . In this equilibrium money is only
partially acceptable in exchanges (point B ).
The main insight of the Kiyotaki–Wright search model of money is that
acceptability is not an intrinsic property of money, which is indeed worthless.
Rather, it can emerge endogenously as a property of the equilibrium. More-
over, as in Diamond’s model, multiple equilibria can arise. Which of the possi-
ble equilibria is actually realized depends on the agents’ beliefs: if they expect a
certain degree of acceptability of money (zero, partial or universal) and choose
their optimal trading strategy accordingly, money will display the expected
acceptability in equilibrium. Again, as in Diamond’s model, expectations are
self-fulfilling.
5.2.3. IMPLICATIONS
The above search model can be used to derive some implications concerning
the agents’ welfare and the optimal quantity of money.
Welfare
We can now compare the values of expected utility for a commodity holder
and a money holder in the three possible equilibria. Solving (5.17) and (5.18)
with � = 0, x , and 1 in turn, we find the values of V iC and V
i
M , where
the superscript i = n, m, p denotes the non-monetary, the mixed monetary,
and the pure monetary equilibria associated with � = 0, x, 1 respectively.
The resulting expected utilities are reported in Table 5.1, where K ≡ (‚(1 −
M)xU/r ) > 0.
Some welfare implications can be easily drawn from the table. First of
all, the welfare of a money holder intuitively increases with the degree of
acceptability of money. In fact, comparing the expected utilities in column
(3), we find that V nM < V
m
M < V
p
M .
Further, in the pure monetary equilibrium (third row of the table) money
holders are better off than commodity holders: V
p
C < V
p
M . Holding universally
acceptable money guarantees consumption when the money holder meets a
Table 5.1.
� V iC V
i
M
(1) (2) (3)
0 K x 0
x K x K x
1 K x
r + ‚((1 − M)x + M)
r +‚x > K x K
r + ‚x ((1 − M)x + M)
r +‚x > K x
186 COORDINATION AND EXTERNALITIES
commodity holder with a good that she “likes”: trade increases the welfare of
both agents and occurs with certainty. On the contrary, a commodity holder
can consume only if another commodity holder is met and both like each
other’s commodity: a “double coincidence of wants” is necessary, and this
reduces the probability of consumption with respect to a money holder.
Exercise 49 Check that, in a pure monetary equilibrium, when a money holder
meets a commodity holder with a good she “likes” both agents are willing to trade.
Finally, looking at column (2) of the table, we note that a commodity holder
is indifferent between a non-monetary and a mixed monetary equilibrium,
but is better off if money is universally acceptable, as in the pure monetary
equilibrium:
V nC = V
m
C < V
p
C .
Summarizing, the existence of universally accepted fiat money makes all
agents better off. Moreover, moving from a non-monetary to a mixed mon-
etary equilibrium increases the welfare of money holders without harming
commodity holders. Thus, in general, an increase in the acceptability of money
(�) makes at least some agents better off and none worse off (a Pareto
improvement).
Optimal quantity of money
We now address the issue of the optimal quantity of money from the social
welfare perspective. The amount of money in circulation is directly related
to the fraction of agents endowed with money M; we therefore consider the
possibility of choosing M so as to maximize some measure of social welfare.
A reasonable such measure is an agent’s ex ante expected utility, that is the
expected utility of each agent before the initial endowment of money and
commodities is randomly distributed among them. The social welfare crite-
rion is then
W = (1 − M)VC + M VM . (5.20)
The fraction of agents endowed with money can be optimally chosen in the
three possible equilibria of the economy. First, we note that, in both the non-
monetary and the mixed monetary equilibria, money does not facilitate the
exchange process (thus making consumption more likely); it is then optimal
to endow all agents with commodities, thereby setting M = 0. In the pure
COORDINATION AND EXTERNALITIES 187
monetary equilibrium, social welfare W p can be expressed as
W p = (1 − M)V pC + M V
p
M
= K · [M + x (1 − M)]
= ‚
U
r
(1 − M)[Mx + (1 − M)x 2], (5.21)
where we used the definition of K given above. Maximization of W p with
respect to M yields the optimal quantity of money M∗:
∂ W P
∂ M
= ‚
U
r
x [(1 − 2x ) − 2M∗(1 − x )] = 0
⇒ 1 − 2x = 2M∗(1 − x )
⇒ M∗ = 1 − 2x
2 − 2x . (5.22)
Since 0 ≤ M∗ ≤ 1, for x ≥ 1
2
we get M∗ = 0. When each agent is willing to
consume at least half of the commodities, exchanges are not very difficult
and money does not play a crucial role in facilitating trade: in this case it is
optimal to endow all agents with consumable commodities. Instead, if x < 1
2
,
fiat money plays a useful role in facilitating trade and consumption, and the
introduction of some amount of money improves social welfare (even though
fewer consumable commodities will be circulating in the economy). From
(5.22) we see that, as x → 0, M∗ → 1
2
, as shown in the left-hand panel of
Figure 5.4.
To further develop the intuition for this result, we can rewrite the last
expression in (5.21) as follows:
r W p = U · ‚(1 − M)[Mx + (1 − M)x 2], (5.23)
?
Figure 5.4. Optimal quantity of money M∗ and ex ante probability of consumption P
188 COORDINATION AND EXTERNALITIES
where the left-hand side is the “flow” of social welfare per period and the
right-hand side is the utility from consumption U multiplied by the agent’s ex
ante consumption probability. The latter is given by the probability of meeting
an agent endowed with a commodity, ‚(1 − M), times the probability that a
trade will occur, given by the term in square brackets. Trade occurs in two
cases: either the agent is a money holder and the potential counterpart in the
trade offers a desirable commodity (which happens with probability Mx ), or
the agent is endowed with a commodity and a “double coincidence of wants”
occurs (which happens with probability (1 − M)x 2). The sum of these two
probabilities yields the probability that, after a meeting with a commodity
holder, trade will take place. The optimal quantity of money is the value of
M that maximizes the agent’s ex ante consumption probability in (5.23). As
M increases, there is a trade-off between a lower probability of encountering
a commodity holder and a higher probability that, should a meeting occur,
trade takes place. The amount of money M∗ optimally weights these two
opposite effects. The behavior of the consumption probability ( P ) as a func-
tion of M is shown in the right-hand panel of Figure 5.4 for two values of x
(0.5 and 0.25) in the case where ‚ = 1. The corresponding optimal quantities
of money M∗ are 0 and 0.33 respectively.
5.3. Search Externalities in the Labor Market
We now proceed to apply some of the insights discussed in this chapter to labor
market phenomena. While introducing the models of Chapter 3, we already
noted that the simultaneous processes of job creation and job destruction
are typically very intense, even in the absence of marked changes in overall
employment. In that chapter we assumed that workers’ relocation was costly,
but we did not analyze the level or the dynamics of the unemployment rate.
Here, we review the modeling approach of an important strand of labor
economics focused exactly on the determinants of the flows into and out of
(frictional) unemployment. The agents of these models, unlike those of the
models discussed in the previous sections, are not ex ante symmetric: workers
do not trade with each other, but need to be employed by firms. Unemployed
workers and firms willing to employ them are inputs in a “productive” process
that generates employment, a process that is given a stylized and very tractable
representation by the model we study below. Unlike the abstract trade and
monetary exchange frameworks of the previous sections, the “search and
matching” framework below is qualitatively realistic enough to offer practical
implications for the dynamics of labor market flows, for the steady state of
the economy, and for the dynamic adjustment process towards the steady
state.
COORDINATION AND EXTERNALITIES 189
5.3.1. FRICTIONAL UNEMPLOYMENT
The importance of gross flows justifies the fundamental economic mechanism
on which the model is based: the matching process between firms and workers.
Firms create job openings (vacancies) and unemployed workers search for
jobs, and the outcome of a match between a vacant job and an unemployed
worker is a productive job. Moreover, the matching process does not take
place in a coordinated manner, as in the traditional neoclassical model. In
the neoclassical model the labor market is perfectly competitive and supply
and demand of labor are balanced instantaneously through an adjustment of
the wage. On the contrary, in the model considered here firms and workers
operate in a decentralized and uncoordinated manner, dedicating time and
resources to the search for a partner. The probability that a firm or a worker
will meets a partner depends on the relative number of vacant jobs and
unemployed workers: for example, a scarcity of unemployed workers relative
to vacancies will make it difficult for a firm to fill its vacancy, while workers
will find jobs easily. Hence there exists an externality between agents in the
same market which is of the same “trading” type as the one encountered in
the previous section. Since this externality is generated by the search activity
of the agents on the market, it is normally referred to as a search externality.
Formally, we define the labor force as the sum of the “employed” workers plus
the “unemployed” workers which we assume to be constant and equal to L
units. Similarly, the total demand for labor is equal to the number of filled
jobs plus the number of vacancies. The total number of unemployed workers
and vacancies can therefore be expressed as u L e v L , respectively, where u
denotes the unemployment rate and v denotes the ratio between the number
of vacancies and the total labor force. In each unit of time, the total number
of matches between an unemployed worker and a vacant firm is equal to mL
(where m denotes the ratio between the newly filled jobs and the total labor
force). The process of matching is summarized by a matching function, which
expresses the number of newly created jobs (mL ) as a function of the number
of unemployed workers (u L ) and vacancies (v L ):
mL = m(u L , v L ). (5.24)
The function m(·), supposed increasing in both arguments, is conceptually
similar to the aggregate production function that we encountered, for exam-
ple, in Chapter 4. The creation of employment is seen as the outcome of
a “productive process” and the unemployed workers and vacant jobs are
the “productive inputs.” Obviously, both the number of unemployed work-
ers and the number of vacancies have a positive effect on the number of
matches within each time period (mu > 0, mv > 0). Moreover, the creation of
employment requires the presence of agents on both sides of the labor market
(m(0, 0) = m(0, v L ) = m(u L , 0) = 0). Additional properties of the function
190 COORDINATION AND EXTERNALITIES
m(·) are needed to determine the character of the unemployment rate in
a steady-state equilibrium. In particular, for the unemployment rate to be
constant in a growing economy, m(·) needs to have constant returns to scale.43
In that case, we can write
m =
m(u L , v L )
L
= m(u, v). (5.25)
The function m(·) determines the flow of workers who find a job and who
exit the unemployment pool within each time interval. Consider the case of
an unemployed worker: at each moment in time, the worker will find a job
with probability p = m(·)/u. With constant returns to scale for m(·), we may
thus write
m(u, v)
u
= m
(
1,
v
u
)
≡ p(Ë), an increasing function of Ë ≡ v
u
. (5.26)
The instantaneous probability p that a worker finds a job is thus positively
related to the tightness of the labor market, which is measured by Ë, the ratio of
the number of vacancies to unemployed workers.44 An increase in Ë, reflecting
a relative abundance of vacant jobs relative to unemployed workers, leads to
an increase in p. (Moreover, given the properties of m, p′′(Ë) < 0.) Finally,
the average length of an unemployment spell is given by 1/ p(Ë), and thus is
inversely related to Ë. Similarly, the rate at which a vacant job is matched to a
worker may be expressed as
m(u, v)
v
= m
(
1,
v
u
) u
v
=
p(Ë)
Ë
≡ q (Ë), (5.27)
a decreasing function of the vacancy/unemployment ratio. An increase in
Ë reduces the probability that a vacancy is filled, and 1/q (Ë) measures the
average time that elapses before a vacancy is filled.45 The dependence of p and
q on Ë captures the dual externality between agents in the labor market: an
increase in the number of vacancies relative to unemployed workers increases
the probability that a worker finds a job (∂ p(·)/∂v > 0), but at the same time
it reduces the probability that a vacancy is filled (∂q (·)/∂v < 0).
⁴³ Empirical studies of the matching technology confirm that the assumption of constant returns to
scale is realistic (see Blanchard and Diamond, 1989, 1990, for estimates for the USA).
⁴⁴ As in the previous section, the matching process is modeled as a Poisson process. The probability
that an unemployed worker does not find employment within a time interval d t is thus given by
e − p(Ë) d t . For a small time interval, this probability can be approximated by 1 − p(Ë) d t . Similarly,
the probability that the worker does find employment is 1 − e − p(Ë) d t , which can be approximated by
p(Ë) d t .
⁴⁵ To complete the description of the functions p and q , we define the elasticity of p with respect
to Ë as Á(Ë). We thus have: Á(Ë) = p′(Ë)Ë/ p(Ë). From the assumption of constant returns to scale, we
know that 0 ≤ Á(Ë) ≤ 1. Moreover, the elasticity of q with respect to Ë is equal to Á(Ë) − 1.
COORDINATION AND EXTERNALITIES 191
5.3.2. THE DYNAMICS OF UNEMPLOYMENT
Changes in unemployment result from a difference between the flow of work-
ers who lose their job and become unemployed, and the flow of workers who
find a job. The inflow into unemployment is determined by the “separation
rate” which we take as given for simplicity: at each moment in time a fraction
s of jobs (corresponding to a fraction 1 − u of the labor force) is hit by a shock
that reduces the productivity of the match to zero: in this case the worker loses
her job and returns to the pool of unemployed, while the firm is free to open
up a vacancy in order to bring employment back to its original level. Given
the match destruction rate s , jobs therefore remain productive for an average
period 1/s . Given these assumptions, we can now describe the dynamics of
the number of unemployed workers. Since L is constant, d (u L )/d t = u̇ L and
hence
u̇ L = s (1 − u)L − p(Ë)u L
⇒ u̇ = s (1 − u) − p(Ë)u, (5.28)
which is similar to the difference equation for employment (5.1) derived in
the previous section. The dynamics of the unemployment rate depend on the
tightness of the labor market Ë: at a high ratio of vacancies to unemployed
workers, workers easily find jobs, leading to a large flow out of unemploy-
ment.46 From equation (5.28) we can immediately derive the steady-state
relationship between the unemployment rate and Ë:
u =
s
s + p(Ë)
. (5.29)
Since p′(·) > 0, the properties of the matching function determine a negative
relation between Ë and u: a higher value of Ë corresponds to a larger flow
of newly created jobs. In order to keep unemployment constant, the unem-
ployment rate must therefore increase to generate an offsetting increase in
the flow of destroyed jobs. The steady-state relationship (5.29) is illustrated
graphically in the left-hand panel of Figure 5.5: to each value of Ë corresponds
a unique value for the unemployment rate. Moreover, the same properties of
m(·) ensure that this curve is convex. For points above or below u̇ = 0, the
unemployment rate tends to move towards the stationary relationship: keep-
ing Ë constant at Ë0, a value u > u0 causes an increase in the flow out of unem-
ployment and a decrease in the flow into unemployment, bringing u back to
u0. Moreover, given u and Ë, the number of vacancies is uniquely determined
by v = Ëu, where v denotes the number of vacancies as a proportion of the
labor force. The picture on the right-hand side of the figure shows the curve
⁴⁶ To obtain job creation and destruction “rates,” we may divide the flows into and out of employ-
ment by the total number of employed workers, (1 − u)L . The rate of destruction is simply equal to s ,
while the rate of job creation is given by p(Ë)[u/(1 − u)].
192 COORDINATION AND EXTERNALITIES
Figure 5.5. Dynamics of the unemployment rate
u̇ = 0 in (v, u)-space. This locus is known as the Beveridge curve, and identifies
the level of vacancies v0 that corresponds to the pair (Ë0, u0) in the left-hand
panel. In the sequel we will use both graphs to illustrate the dynamics and
the comparative statics of the model. At this stage it is important to note that
variations in the labor market tightness are associated with a movement along
the curve u̇ = 0, while changes in the separation rate s or the efficiency of
the matching process (captured by the properties of the matching function)
correspond to movements of the curve u̇ = 0. For example, an increase in
s or a decrease in the matching efficiency causes an upward shift of u̇ = 0.
Equation (5.29) describes a first steady-state relationship between u and Ë. To
find the actual equilibrium values, we need to specify a second relationship
between these variables. This second relationship can be derived from the
behavior of firms and workers on the labor market.
5.3.3. JOB AVAILABILITY
The crucial decision of firms concerns the supply of jobs on the labor market.
The decision of a firm about whether to create a vacancy depends on the
expected future profits over the entire time horizon of the firm, which we
assume is infinite. Formally, each individual firm solves an intertemporal
optimization problem taking as given the aggregate labor market conditions
which are summarized by Ë, the labor market tightness. Individual firms
therefore disregard the effect of their decisions on Ë, and consequently on
the matching rates p(Ë) and q (Ë) (the external effects referred to above). To
simplify the analysis, we assume that each firm can offer at most one job. If the
job is filled, the firm receives a constant flow of output equal to y. Moreover, it
pays a wage w to the worker and it takes this wage as given. The determination
of this wage is described below. On the other hand, if the job is not filled the
COORDINATION AND EXTERNALITIES 193
firm incurs a flow cost c , which reflects the time and resources invested in
the search for suitable workers. Firms therefore find it attractive to create a
vacancy as long as its value, measured in terms of expected profits, is positive;
if it is not, the firm will not find it attractive to offer a vacancy and will exit
the labor market. The value that a firm attributes to a vacancy (denoted by V )
and to a filled job ( J ) can be expressed using the asset equations encountered
above. Given a constant real interest rate r , we can express these values as
r V (t ) = −c + q (Ë(t )) ( J (t ) − V (t )) + V̇ (t ), (5.30)
r J (t ) = (y − w(t )) + s (V (t ) − J (t )) + J̇ (t ), (5.31)
which are explicit functions of time. The flow return of a vacancy is equal
to a negative cost component (−c ), plus the capital gain in case the job is
filled with a worker ( J − V ), which occurs with probability q (Ë), plus the
change in the value of the vacancy itself (V̇ ). Similarly, (5.31) defines the flow
return of a filled job as the value of the flow output minus the wage ( y − w),
plus the capital loss (V − J ) in case the job is destroyed, which occurs with
probability s , plus the change in the value of the job ( J̇ ).
Exercise 50 Derive equation (5.31) with dynamic programming arguments,
supposing that J̇ = 0 and following the argument outlined in Section 5.1 to
obtain equations (5.3) and (5.4).
Subtracting (5.30) from (5.31) yields the following expression for the dif-
ference in value between a filled job and a vacancy:
r ( J (t ) − V (t )) = ( y − w(t ) + c )
− [s + q (Ë(t ))]( J (t ) − V (t ))
+ ( J̇ (t ) − V̇ (t )). (5.32)
Solving equation (5.32) at date t0 for the entire infinite planning horizon of
the firm, we get
J (t0) − V (t0) =
∫ ∞
t0
( y − w(t ) + c ) e −
∫ t
t0
[r +s +q (Ë(Ù))] d Ù
d t, (5.33)
where we need to impose the following transversality condition:
lim
T →∞
[ J (T ) − V (T )] e −
∫ T
t0
(r +s +q (Ë(Ù)))d Ù
= 0.
Equation (5.33) expresses the difference between the value of a job and
the value of a vacancy as the value of the difference between the flow return
of a job ( y − w) and that of a vacancy (−c ) over the entire time horizon,
which is discounted to t0 using the appropriate “discount rate.” Besides on
the real interest rate, this discount rate also depends on the separation rate s
and on the tightness of the labor market via q (Ë). Intuitively, a higher number
194 COORDINATION AND EXTERNALITIES
of vacancies relative to unemployed workers decreases the probability that a
vacant firm will meet a worker. This reduces the effective discount rate and
leads to an increase in the difference between the value of a filled job and a
vacancy. Moreover, Ë may also have an indirect effect on the flow return of a
filled job via its impact on the wage w, as we will see in the next section.
Now, if we focus on steady-state equilibria, we can impose V̇ = J̇ = 0 in
equations (5.30) and (5.31). Moreover, we assume free entry of firms and as
a result V = 0: new firms continue to offer vacant jobs until the value of the
marginal vacancy is reduced to zero. Substituting V = 0 in (5.30) and (5.31)
and combining the resulting expressions for J , we get
J = c /q (Ë)
J = ( y − w)/(r + s )
}
⇒ y − w = (r + s ) c
q (Ë)
. (5.34)
Equation (5.30) gives us the first expression for J . According to this con-
dition, the equilibrium value of a filled job is equal to the expected costs of a
vacancy, that is the flow cost of a vacancy c times the average duration of a
vacancy 1/q (Ë). The second condition for J can be derived from (5.31): the
value of a filled job is equal to the value of the constant profit flow y − w.
These flow returns are discounted at rate r + s to account for both impatience
and the risk that the match breaks down. Equating these two expressions yields
the final solution (5.34), which gives the marginal condition for employment
in a steady-state equilibrium: the marginal productivity of the worker ( y)
needs to compensate the firm for the wage w paid to the worker and for
the flow cost of opening a vacancy. The latter is equal to the product of the
discount rate r + s and the expected costs of a vacancy c /q (Ë).
This last term is just like an adjustment cost for the firm’s employment
level. It introduces a wedge between the marginal productivity of labor and
the wage rate, which is similar to the effect of the hiring costs studied in
Chapter 3. However, in the model of this section the size of the adjustment
cost is endogenous and depends on the aggregate conditions on the labor
market. In equilibrium, the size of the adjustment costs depends on the
unemployment rate and on the number of vacancies, which are summarized
at the aggregate level by the value of Ë. If, for example, the value of output
minus wages ( y − w) increases, then vacancy creation will become profitable
(V > 0) and more firms will offer jobs. As a result, Ë will increase, leading to
a reduction in the matching rate for firms and an increase in the average cost
of a vacancy, and both these effects tend to bring the value of a vacancy back
to zero.
Finally, notice that equation (5.34) still contains the wage rate w. This is an
endogenous variable. Hence the “job creation condition” (5.34) is not yet the
steady-state condition which together with (5.29) would allow us to solve for
the equilibrium values of u and Ë. To complete the model, we need to analyze
the process of wage determination.
COORDINATION AND EXTERNALITIES 195
5.3.4. WAGE DETERMINATION AND THE STEADY STATE
The process of wage determination that we adopt here is based on the fact
that the successful creation of a match generates a surplus. That is, the value
of a pair of agents that have agreed to match (the value of a filled job and an
employed worker) is larger than the value of these agents before the match
(the value of a vacancy and an unemployed worker). This surplus has the
nature of a monopolistic rent and needs to be shared between the firm and
the worker during the wage negotiations. Here we shall assume that wages are
negotiated at a decentralized level between each individual worker and her
employer. Since workers and firms are identical, all jobs will therefore pay the
same wage.
Let E and U denote the value that a worker attributes to employment and
unemployment, respectively. The joint value of a match (given by the value of
a filled job for the firm and the value of employment for the worker) can then
be expressed as J + E , while the joint value in case the match opportunity
is not exploited (given by the value of a vacancy for a firm and the value
of unemployment for a worker) is equal to V + U . The total surplus of the
match is thus equal to the sum of the firm’s surplus, J − V , and the worker’s
surplus, E − U :
( J + E ) − (V + U ) ≡ ( J − V ) + (E − U ). (5.35)
The match surplus is divided between the firm and the worker through a
wage bargaining process. We take their relative bargaining strength to be
exogenously given. Formally, we adopt the assumption of Nash bargaining.
This assumption is common in models of bilateral negotiations. It implies
that the bargained wage maximizes a geometric average of the surplus of the
firm and the worker, each weighted by a measure of their relative bargaining
strength. In our case the assumption of Nash bargaining gives rise to the
following optimization problem:
max
w
( J − V )1−‚(E − U )‚, (5.36)
where 0 ≤ ‚ ≤ 1 denotes the relative bargaining strength of the worker. Given
that the objective function is a Cobb–Douglas one, we can immediately
express the solution (the first-order conditions) of the problem as:
E − U = ‚
1 − ‚ ( J − V ) ⇒ E − U = ‚[( J − V ) + (E − U )]. (5.37)
The surplus that the worker appropriates in the wage negotiations ( E − U ) is
thus equal to a fraction ‚ of the total surplus of the job.
Similar to what is done for V and J in (5.30) and (5.31), we can express
the values E and U using the relevant asset equations (reintroducing the
196 COORDINATION AND EXTERNALITIES
dependence on time t ):
r E (t ) = w(t ) + s (U (t ) − E (t )) + Ė (t ) (5.38)
r U (t ) = z + p(Ë)(E (t ) − U (t )) + U̇ (t ). (5.39)
For the worker, the flow return on employment is equal to the wage plus
the loss in value if the worker and the firm separate, which occurs with
probability s , plus any change in the value of E itself; while the return on
unemployment is given by the imputed value of the time that a worker does
not spend working, denoted by z, plus the gain if she finds a job plus the
change in the value of U . Parameter z includes the value of leisure and/or
the value of alternative sources of income including possible unemployment
benefits. This parameter is assumed to be exogenous and fixed. Subtracting
(5.39) from (5.38), and solving the resulting expression for the entire future
time horizon, we can express the difference between the value of employment
and unemployment at date t0 as
E (t0) − U (t0) =
∫ ∞
t0
(w(t ) − z) e −
∫ t
t0
[r +s + p(Ë(Ù))] d Ù
d t. (5.40)
As in the case of firms, apart from the real interest rate r and the rate of separ-
ation s , the discount rate for the flow return of workers depends on the degree
of labor market tightness via its effect on p(Ë). A relative abundance of vacant
jobs implies a high matching rate for workers, and this tends to reduce the
difference between the value of employment and unemployment for a given
wage value.
There are two ways to obtain the effect of variations Ë on the wage. Restrict-
ing attention to steady-state equilibria, so that Ė = U̇ = 0, we can either derive
the surplus of the worker E − U directly from (5.38) and (5.39), or we can
solve equation (5.40) keeping w and Ë constant over time:
E − U = w − z
r + s + p(Ë)
. (5.41)
According to (5.41), the surplus of a worker depends positively on the differ-
ence between the flow return during employment and unemployment (w − z)
and negatively on the separation rate s and on Ë: an increase in the ratio of
vacancies to unemployed workers increases the exit rate out of unemployment
and reduces the average length of an unemployment spell. Using (5.41), and
noting that in steady-state equilibrium
J − V = J = y − w
r + s
,
COORDINATION AND EXTERNALITIES 197
we can solve the expression for the outcome of the wage negotiations given by
(5.37) as
w − z
r + s + p(Ë)
=
‚
1 − ‚
y − w
r + s
.
Rearranging terms, and using (5.34), we obtain the following equivalent
expressions for the wage:
w − z = ‚[( y + c Ë − w) + (w − z)] (5.42)
⇒ w = z + ‚( y + c Ë − z). (5.43)
Equation (5.42) is the version in terms of flows of equation (5.37): the flow
value of the worker’s surplus, i.e. the difference between the wage and alter-
native income z, is a fraction ‚ of the total flow surplus. The term y − w + c Ë
represents the flow surplus of the firm, where c Ë denotes the expected cost
savings if the firm fills a job. Moreover, the wage is a pure redistribution from
the firm to the worker. If we eliminate the wage payments in (5.42), we obtain
the flow value of the total surplus of a filled job y + c Ë − z, which is equal
to the sum of the value of output and the cost saving of the firm minus the
alternative costs of the worker. Finally, equation (5.43) expresses the wage as
the sum of the alternative income and the fraction of the surplus that accrues
to the worker.
It can easily be verified that the only influence of aggregate labor market
conditions on the wage occur via Ë, the ratio of vacancies to unemployed
workers. The unemployment rate u does not have any independent effect on
wages. The explanation is that wages are negotiated after a firm and a worker
meet. In this situation the match surplus depends on Ë, as we saw above. This
variable determines the average duration of a vacancy, and hence the expected
costs for the firm if it continued to search.
The determination of the equilibrium wage completes the description of
the steady-state equilibrium. The equilibrium can be summarized by equa-
tions (5.29), (5.34), and (5.43) which we shall refer to as B C (Beveridge
curve), J C (job creation condition), and W (wage equation):
u =
s
s + p(Ë)
( B C ) (5.44)
y − w = (r + s ) c
q (Ë)
( J C ) (5.45)
w = (1 − ‚)z + ‚( y + c Ë) (W) (5.46)
For a given value of Ë, the wage is independent of the unemployment rate.
The system can therefore be solved recursively for the endogenous variables
u, Ë, and w. Using the definition for Ë, we can then solve for v. The last
two equations jointly determine the equilibrium wage w and the ratio of
198 COORDINATION AND EXTERNALITIES
Figure 5.6. Equilibrium of the labor market with frictional unemployment
vacancies/unemployed Ë, as is shown in the left-hand panel of Figure 5.6.
Given Ë, we can then determine the unemployment rate u, and consequently
also v, which equates the flows into and out of unemployment (the right-hand
panel of the figure).
This dual representation facilitates the comparative static analysis, which
is intended to analyze the effect of changes in the parameters on the steady-
state equilibrium. (Analysis of transitional dynamics is the subject of the next
section.) In some cases, parameter changes have an unambiguous effect on
all of the endogenous variables. This is true for instance in the case of an
increase in unemployment benefits, a component of z, or an increase in the
relative bargaining strength of workers ‚: the only effect of these changes is an
upward shift of W which causes an increase in the wage and a reduction in Ë.
This reduction, along the curve B C , is accompanied by an increase in u and a
reduction in v.
In other cases the effects are more complex and not always of unambiguous
sign. Consider, for example, the effects of the following two types of shock
which may be at the root of cyclical variations in overall unemployment. The
first is an “aggregate” disturbance. This is represented by a variation in the pro-
ductivity of labor y which affects all firms at the same time and with the same
intensity. The second shock is a “reallocative” disturbance, represented by a
change in the separation rate s . This shock hits individual firms independently
of the aggregate state of the economy (captured by labor productivity y).
A reduction in y moves both J C and W downwards. This results in a reduc-
tion of the wage but has an ambiguous effect on Ë. However, formal analysis
(which is required in the exercise below) shows that in a stationary equilibrium
Ë also decreases; since the curve B C does not shift, the unemployment rate
must increase while the number of vacancies v is reduced. In the case of a
reallocative shock, we observe an inward shift of J C along W. This results in
a joint decrease of the wage and the labor market tightness Ë, as in the case
COORDINATION AND EXTERNALITIES 199
of the aggregate shock. At the same time, however, the curve B C shifts to the
right. Hence, while the unemployment rate increases unambiguously, it is in
general not possible to determine the effect on the number of vacancies. In
reality, however, v appears to be procyclical, and this suggests that aggregate
shocks are a more important source of cyclical movements in the labor market
than allocative shocks.
Exercise 51 Derive formally, using the system of equations formed by (5.44),
(5.45), and (5.46), the effects on the steady-state levels of w, Ë, u, and v
of a smaller labor productivity (�y < 0) and of a higher separation rate
(�s > 0).
5.4. Dynamics
Until now, all the relationships we derived referred to the steady-state equi-
librium of the system. In this section we will analyze the evolution of unem-
ployment, vacancies, and the wage rate along the adjustment path toward the
steady-state equilibrium.
The discussion of the flows into and out of unemployment in the previ-
ous section has already delivered the law of motion for unemployment. This
equation is repeated here (stressing the time dependence of the endogenous
variables):
u̇(t ) = s (1 − u(t )) − p(Ë(t )) u(t ). (5.47)
The dynamics of u are due to the flow of separations and the flow of newly
created jobs resulting from the matches between firms and workers. The
magnitude of the flow out of unemployment depends on aggregate labor
market conditions, captured by Ë, via its effect on p(·). Outside a steady-
state equilibrium, the path of Ë will influence unemployment dynamics in
the economy. Moreover, given the definition of Ë as the ratio of vacancies
to unemployed workers, this will also affect the value of the labor market
tightness. In order to give a complete description of the adjustment process
toward a steady-state equilibrium, we therefore need to study the dynamics of
Ë. This requires an analysis of the job creation decisions of firms.
5.4.1. MARKET TIGHTNESS
At each moment in time firms exploit all opportunities for the profitable
creation of jobs. Hence in a steady-state equilibrium, as well as along the
adjustment path, V (t ) = 0∀t , and outside a steady-state equilibrium V̇ (t ) = 0,
200 COORDINATION AND EXTERNALITIES
∀t . The value of a filled job for the firm can be derived from (5.30) and (5.31).
From the first equation, setting V (t ) = V̇ (t ) = 0, we get
J (t ) =
c
q (Ë(t ))
. (5.48)
Equation (5.48) is identical to the steady-state expression derived before.
Firms continue to create new vacancies, thereby influencing Ë, until the value
of a filled job equals the expected cost of a vacancy. Since entry into the
labor market is costless for firms (the resources are used to maintain open
vacancies), equation (5.48) will hold at each instant during the adjustment
process. Outside the steady-state, the dynamics of J needs to satisfy difference
equation (5.31), with V (t ) = 0:
J̇ (t ) = (r + s ) J (t ) − (y − w(t )) . (5.49)
The solution of (5.49) shows that the value J (t ) depends on the future path
of (expected) wages. Besides that, J (t ) also depends on labor productivity, the
real interest rate, and the rate of separation, but all these variables are assumed
to be constant:
J (t ) =
∫ ∞
t
(y − w(t )) e −(r +s )Ù d Ù. (5.50)
Wages are continuously renegotiated. Outside steady-state equilibrium the
surplus sharing rule (5.37) with V (t ) = 0 therefore remains valid:
E (t ) − U (t ) = ‚
1 − ‚ J (t ). (5.51)
Outside the steady-state E , U , and J may vary over time, but these variations
need to ensure that (5.51) is satisfied. Hence, we have
Ė (t ) − U̇ (t ) = ‚
1 − ‚ J̇ (t ). (5.52)
The dynamics of J are given by (5.49), while the dynamics of E and U can be
derived by subtracting (5.39) from (5.38):
Ė (t ) − U̇ (t ) = [r + s + p(Ë(t ))](E (t ) − U (t )) − (w(t ) − z). (5.53)
Equating (5.52) to (5.53), and using (5.49) and (5.48) to replace J̇ and J , we
can solve for the level of wages outside steady-state equilibrium as:
w(t ) = z + ‚(y + c Ë(t ) − z). (5.54)
The wage is thus determined in the same way both in a steady-state equilib-
rium and during the adjustment process. Moreover, given the values for the
exogenous variables, the wage dynamics depends exclusively on changes in the
degree of labor market tightness, which affects the joint value of a productive
match.
COORDINATION AND EXTERNALITIES 201
We are now in possession of all the elements that are needed to determine
the dynamics of Ë. Differentiating (5.48) with respect to time, where by defin-
ition q (Ë) ≡ p(Ë)/Ë, we have
J̇ (t ) =
c p(Ë(t )) − c Ë(t ) p′(Ë(t ))
p(Ë(t ))2
Ë̇(t ) =
c
p(Ë(t ))
[1 − Á(Ë(t ))] Ë̇(t ), (5.55)
where 0 < Á(Ë) < 1 (defined above) denotes the elasticity of p(Ë) with respect
to Ë. To simplify the derivations, we henceforth assume that Á(Ë) = Á is con-
stant (which is true if the matching function m(·) is of the Cobb–Douglas
type). Substituting (5.49) for J̇ and using the expression J = c (Ë/ p(Ë)), we
can rewrite equation (5.55) as
Ë̇(t )
c
p(Ë(t ))
(1 − Á) = (r + s ) c Ë(t )
p(Ë(t ))
− (y − w(t )). (5.56)
Finally, substituting the expression for the wage as a function of Ë from (5.54),
the above law of motion for Ë can be written as
Ë̇(t ) =
r + s
1 − Á Ë(t ) −
p(Ë(t ))
c (1 − Á) [(1 − ‚) ( y − z) − ‚ c Ë(t )]. (5.57)
Changes in Ë depend on (in addition to all the parameters of the model)
only the value of Ë itself. The labor market tightness does not in any indepen-
dent way depend on the unemployment rate u. In (Ë, u)-space the curve Ë̇ = 0
can thus be represented by a horizontal line at Ë̄, which defines the unique
steady-state equilibrium value for the ratio between vacancies and unem-
ployed. This is illustrated in the left-hand panel of Figure 5.7. Once we have
determined Ë̄, we can determine, for each value of the unemployment rate, the
level of v that is compatible with a stationary equilibrium. For instance, in the
case of u0 this is equal to v0.
Besides that, equation (5.57) also indicates that, for points above or below
the curve Ë̇ = 0, Ë tends to move away from its equilibrium value. Formally,
Figure 5.7. Dynamics of the supply of jobs
202 COORDINATION AND EXTERNALITIES
one can show this by calculating47
∂Ë̇
∂Ë
∣∣∣∣
Ë̇=0
= (r + s ) +
p(Ë)
1 − Á ‚ > 0
from (5.57). The apparently “unstable” behavior of Ë is due to the nature of
the job creation decision of firms. Looking at the future, firms’ decisions on
whether to open a vacancy today are based on expected future values of Ë. For
example, if firms expect a future increase in Ë resulting from an increase in the
number of vacant jobs, they will anticipate an increase in future costs to fill
a vacancy. As a result, firms have an incentive to open vacancies immediately
in anticipation of this increase in cost. At the aggregate level, this induces an
immediate increase in v (and in Ë) in anticipation of further increases in the
future. Hence, there is an obvious analogy between the variations of v and the
movement of asset prices which we already alluded to when we interpreted
(5.30) and (5.31) as asset equations: expectations of a future increase in price
cause an increase in current prices.
As a result of forward-looking behavior on the part of firms, both v and
Ë are “jump” variables. Their value is not predetermined: in response to
changes in the exogenous parameters (even if these changes are expected in the
future and have not yet materialized), v and Ë may exhibit discrete changes.
The unemployment rate, on the other hand, is a “predetermined” or state
variable. The dynamics of the unemployment rate are governed by (5.47),
and u adjusts gradually to changes in Ë, even in case of a discrete change
in the labor market tightness. An unanticipated increase in v and Ë leads to
an increase in the flow out of unemployment, resulting in a reduction of u.
However, the positive effect of the number of vacancies on unemployment
is mediated via the stochastic matching process on the labor market. The
immediate effect of an increase in Ë is an increase in the matching rate for
workers p(Ë), and this translates only gradually in an increase in the number
of filled jobs. The unemployment rate therefore will start to decrease only after
some time.
The aggregate effect of the decentralized decisions of firms (each of which
disregards the externalities of its own decision on aggregate variables) consists
of changes in the degree of labor market tightness Ë and, as a result, in
changes in the speed of adjustment of the unemployment rate. The dynamics
of u are therefore intimately linked to the presence of the externalities that
characterize the functioning of the labor market in the search and matching
literature.
⁴⁷ Note that this derivative is computed at a steady-state equilibrium point (on the Ë̇ = 0 locus).
Hence, we may use (5.34), and replace y − w with (r + s )(c Ë/ p(Ë)) to obtain the expression in the text.
COORDINATION AND EXTERNALITIES 203
Figure 5.8. Dynamics of unemployment and vacancies
5.4.2. THE STEADY STATE AND DYNAMICS
We are now in a position to characterize the system graphically, using the
differential equations (5.47) and (5.57) for u and Ë. In both panels of Figure 5.8
we have drawn the curves for Ë̇ = 0 and u̇ = 0. Moreover, for each point
outside the unique steady-state equilibrium we have indicated the movement
of Ë and u.
As we have seen in the analysis of dynamic models of investment and
growth theory, the combination of a single-state variable (u) and a single
jump variable (Ë) implies that there is only one saddlepath that converges
to the steady-state equilibrium (saddlepoint).48 Since the expression for Ë̇ = 0
does not depend on u, the saddlepath coincides with the curve for Ë̇ = 0: all
the other points are located on paths that diverge from the curve Ë̇ = 0 and
never reach the steady-state, violating the transversality conditions. Hence, as
a result of the forward-looking nature of the vacancy creation decisions of
firms, the labor market tightness Ë will jump immediately to its long-run value
and remain there during the entire adjustment process.
Let us now analyze the adjustment process in response first to a reduction
in labor productivity y (an aggregate shock) and then to an increase in the rate
of separation s (a reallocative shock). Figure 5.9 illustrates the dynamics fol-
lowing an unanticipated permanent reduction in productivity (�y < 0) at date
⁴⁸ Formally, we can determine the saddlepoint nature of the equilibrium by evaluating the linearized
system (5.47) and (5.57) around the steady-state equilibrium point (ū, Ë̄), yielding
(
u̇
Ë̇
)
=
(
−(s + p(Ë̄)) −ū p′(Ë̄)
0 (r + s ) +
p(Ë̄)
1−Á ‚
)(
u − ū
Ë − Ë̄
)
.
The pattern of signs in the matrix is
⎡
⎣− −
0 +
⎤
⎦. Thus, the determinant is negative, confirming that the
equilibrium is a saddlepoint.
204 COORDINATION AND EXTERNALITIES
Figure 5.9. Permanent reduction in productivity
t0. In the left-hand graph, the curve Ë̇ = 0 shifts downward while u̇ = 0 does
not change. In the new steady-state equilibrium (point C ) the unemployment
rate is higher and labor market tightness is lower. Moreover, from the right-
hand graph, it follows that the number of vacancies has also decreased. The
figure also illustrates the dynamics of the variables: at date t0 the economy
jumps to the new saddlepath which coincides with the new curve Ë̇ = 0. Given
the predetermined nature of the unemployment rate, the whole adjustment is
performed by v and Ë, which make a discrete jump downwards as shown by
B in the two graphs. From t0 onwards both unemployment and the number
of vacancies increase gradually, keeping Ë fixed until the new steady-state
equilibrium is reached.
The permanent reduction in labor productivity reduces the expected profits
of a filled job. Hence, from t0 onwards firms have an incentive to create fewer
vacancies. Moreover, initially the number of vacancies v falls below its new
equilibrium level because firms anticipate that the unemployment rate will
rise. In future it will therefore be easier to fill a vacancy. As a result, firms
prefer to reduce the number of vacancies at the beginning of the adjustment
process, increasing their number gradually as the unemployment rate starts to
rise.
Finally, the reduction in labor productivity also reduces wages, but this
reduction is smaller than the decrease in y. Since the labor market immediately
jumps to a saddlepath along which Ë(t ) is constant, equation (5.54) implies
that the wage w(t ) is constant along the whole adjustment process. The short-
run response of the wage is thus equal to the long-run response, which is
governed by (5.34). According to this equation, the difference y − w is pro-
portional to the expected cost of a vacancy for the firm. This cost depends on
the average time that is needed to fill a vacancy, which diminishes when v and
Ë fall in response to a productivity shock. Hence, in this version of the model
COORDINATION AND EXTERNALITIES 205
Figure 5.10. Increase in the separation rate
productivity changes do not imply proportional wage changes. (On this point
see exercise 51 at the end of the chapter.)
A similar adjustment process takes place in the case of a (unanticipated and
permanent) reallocative shock �s > 0, as shown in Figure 5.10. However, in
this case u̇ = 0 is also affected. This curve shifts to the right, which reinforces
the increase in the unemployment rate, but has an ambiguous effect on the
number of vacancies. (The figure illustrates the case of a reduction in v.)
Finally, let us consider the case of a temporary reduction of productivity:
agents now anticipate at t0 that productivity will return to its higher initial
value at some future date t1. Given the temporary nature of the shock, the
new steady-state equilibrium coincides with the initial equilibrium (point A
in the graphs of Figure 5.11). At the time of the change in productivity, t0,
the immediate effect is a reduction in the number of vacancies which causes a
discrete fall in Ë. However, this reduction is smaller than the one that resulted
from a permanent change, and it moves the equilibrium from the previous
equilibrium A to a new point B ′. From t0 onwards, the unemployment rate
and the number of vacancies increase gradually but not at the same rate: as
a result, their ratio Ë increases, following the diverging dynamics that leads
towards the new and lower stationary curve Ë̇ = 0. To obtain convergence of
the steady-state equilibrium at A, the dynamics of the adjustment need to
bring Ë to its equilibrium level at t1 when the shock ceases and productivity
returns to its previous level (point B ′′). In fact, convergence to the final
equilibrium can occur only if the system is located on the saddlepath, which
coincides with the stationary curve for Ë, at date t1. After t1 the dynamics
concerns only the unemployment rate u and the number of vacancies v, which
decrease in the same proportion until the system reaches its initial starting
point A.
The graph on the right-hand side of Figure 5.11 also illustrates that cyc-
lical variations in productivity give rise to a counter-clockwise movement of
206 COORDINATION AND EXTERNALITIES
Figure 5.11. A temporary reduction in productivity
employment and vacancies around the Beveridge curve. This is consistent with
empirical data for changes in unemployment and vacancies during recessions,
which are approximated here by a temporary reduction in productivity.
5.5. Externalities and efficiency
The presence of externalities immediately poses the question of whether the
decentralized equilibrium allocation is efficient. In particular, in the previ-
ous sections it was shown that firms disregard the effect of their private
decisions on the aggregate labor market conditions when they are deciding
whether or not to create a vacancy. In this section we analyze the implica-
tions of these external effects for the efficiency of the market equilibrium and
compare the decentralized equilibrium allocation with the socially efficient
allocation.
To simplify the comparison between individual and socially optimal
choices, we reformulate the problem of the firm so far identified with a sin-
gle job, allowing firms to open many vacancies and employ many workers.
Moreover, we also modify the production technology and replace the previ-
ous linear production technology with a standard production function with
decreasing marginal returns of labor. Let Ni denote the number of workers
of firm i . The production function is then given by F (Ni ), with F
′(·) > 0
and F ′′(·) < 0. The case F ′′(·) = 0 corresponds to the analysis in the previous
sections, while the case of decreasing marginal returns to labor corresponds to
our analysis in Chapter 3.
The employment level of a firm varies over time as a result of vacancies
that are filled and because of shocks that hit the firm and destroy jobs at rate
s . The evolution of Ni is described by the following equation, where we have
COORDINATION AND EXTERNALITIES 207
suppressed the time dependence of the variables to simplify the notation:
Ṅi = q (Ë) Xi − s Ni , (5.58)
where Xi represents the number of vacancies of a firm and is the control
variable of the firm. Each vacancy is transformed into a filled job with instan-
taneous probability q (Ë), which is a function of the aggregate tightness of the
labor market. In deciding Xi firms take Ë as given, disregarding the effect
of their decisions on the aggregate ratio of vacancies to unemployed. More
specifically, we assume that the number of firms is sufficiently high to justify
the assumption that a single firm takes the level of Ë as an exogenous variable.
The problem of the representative firm is therefore
max
Xi
∫ ∞
0
[ F (Ni ) − w Ni − c Xi ]e −r t d t, (5.59)
subject to the law of motion for employment given by (5.58). Moreover, we
assume that the firm takes the wage w as given and independent of the number
of workers that it employs.
The solution can be found by writing the associated Hamiltonian,
H (t ) = [ F (Ni ) − w Ni − c Xi + Î(q (Ë) Xi − s Ni )]e −r t (5.60)
(where Î is the Lagrange multiplier associated with the law of motion for N),
and by deriving the first-order conditions:
∂ H
∂ Xi
= 0 ⇒ Î = c
q (Ë)
, (5.61)
− ∂ H
∂ Ni
=
d (Î(t )e −r t )
d t
⇒ F ′(Ni ) − w = (r + s )Î − Î̇, (5.62)
lim
t→∞
e −r t ÎNi = 0. (5.63)
Equation (5.61) implies that firms continue to create vacancies until the
marginal profits of a job equal the marginal cost of a vacancy (c /q (Ë)). This
condition holds at any moment in time and is similar to the condition for job
creation (5.48) derived in Section ??. The Lagrange multiplier Î can therefore
be interpreted as the marginal value of a filled job for the firm, which we
denoted by J in the previous sections. The dynamics of Î are given by (5.62),
which in turn corresponds to equation (5.49) for J̇ . Finally, equation (5.63)
defines the appropriate transversality condition for the firm’s problem.
In what follows we consider only steady-state equilibria. Combining (5.61)
and (5.62) and imposing Î̇ = 0, we get
F ′(N∗i ) − w = (r + s )
c
q (Ë)′
(5.64)
208 COORDINATION AND EXTERNALITIES
where N∗i denotes the steady-state equilibrium employment level of the firm.
The optimal number of vacancies, X ∗i , can be derived from constraint (5.58),
with Ṅ = 0:
q (Ë) X ∗i = s N
∗
i ⇒ X ∗i =
s
q (Ë)
N∗i . (5.65)
Hence, if all firms have the same production function and start from the same
initial conditions, then each firm will choose the same optimal solution and
the ratio of filled jobs to vacancies for each firm will be equal to the aggregate
ratio:
X ∗i
N∗i
=
v
1 − u . (5.66)
This completes the characterization of the decentralized equilibrium.
We now proceed with a characterization of the socially efficient solution. For
simplicity we normalize the mass of firms to one. X and N therefore denote
the stock of vacancies and of filled jobs, at both the aggregate level and the
level of an individual firm. Since the relations
Ë ≡ v
u
=
v L
L − N =
X
L − N
hold true, aggregate labor market conditions as captured by Ë are endogenous
in the determination of the socially efficient allocation. The efficient allocation
can be found by solving the following maximization problem:
max
X
∫ ∞
0
[ F (N) − z N − c X ] e −r t d t, (5.67)
subject to the condition
Ṅ = q
(
X
L − N
)
X − s N. (5.68)
The bracketed expression in (5.67) denotes aggregate net output. The first term
( F (N)) is equal to the output of employed workers. From this we have to sub-
tract the flow utility of employed workers (z N), and the costs of maintaining
the vacancies (c X ). The wage rate does not appear in this expression because it
is a pure redistribution from firms to workers: in the model considered here,
distributional issues are irrelevant for social efficiency. The important point
to note is that the effect of the choice of X on the aggregate conditions on
the labor market is explicitly taken into account: the ratio Ë is expressed as
X/(L − N) and is not taken as given in the maximization of social welfare.
The problem is solved using similar methods as for the case of the problem
of individual firms. Constructing the associated Hamiltonian and deriving the
first-order conditions for X and N (with Ï as the Lagrange multiplier for the
COORDINATION AND EXTERNALITIES 209
dynamic constraint) yields
∂ H
∂ X
= 0 ⇒ Ï = c
q ′(Ë)Ë + q (Ë)
, (5.69)
− ∂ H
∂ N
=
d (Ï(t )e −r t )
d t
⇒ F ′(N) − z =
(
r + s − q ′(Ë)Ë2
)
Ï − Ï̇. (5.70)
Explicit consideration of the effects on Ë introduces various differences
between the above optimality conditions and the first-order conditions of
the individual firm, (5.61) and (5.62). First of all, comparing (5.69) with the
corresponding condition (5.61) shows that individual firms tend to offer an
excessive number of vacant jobs compared with what is socially efficient (recall
that q ′ < 0). The reason for this discrepancy is that firms disregard the effect
of their decisions on the aggregate labor market conditions.
Moreover, from the marginal condition for N (5.70), it follows that the
“social” discount rate associated with the marginal value of a filled job Ï
contains an additional term, −q ′(Ë)Ë2 > 0, which does not appear in the
analogous condition for the individual firm (5.62). That is, an increase in
the number of employed workers diminishes the probability that the firm
will hire additional workers in the future. Equation (5.70) correctly reflects
this dynamic aspect of labor demand, which tends to reduce the marginal
value of a filled job in a steady-state equilibrium (in which Ï̇ = 0). Hence,
also from this perspective the decentralized decisions of firms result in an
excessive number of vacancies compared with the social optimum. Finally,
comparing the left-hand side of equations (5.62) and (5.70) reveals that the
individual conditions contain the value of productivity net of the wage w,
while in the condition for social efficiency the value of productivity is net of the
opportunity cost z. Hence, for the same value of Ë, individual firms attribute a
lower “dividend” to filled jobs since w ≥ z, and firms thus tend to generate an
insufficient number of vacancies. This last effect runs in the opposite direction
to the two effects discussed above, and this makes a comparison between
the two solutions—the individual and the social—interesting. The socially
optimal solution may coincide with the corresponding decentralized equi-
librium if the wage determination mechanism “internalizes” the externalities
that private agents ignore. However, in the model that we have constructed,
wages are determined after a firm and a worker meet. Hence, although the
wage is perfectly flexible, it cannot perform any allocative function.
Nonetheless, we can determine the conditions that the wage determination
mechanism needs to satisfy for the decentralized equilibrium to coincide with
the efficient solution. For this to occur, the marginal value of a filled job in the
social optimum, which is given by (5.69), needs to be equal to the marginal
value that the firm and the worker attribute to this job in the decentralized
equilibrium. The latter is equal to the value that a firm and a worker attribute
210 COORDINATION AND EXTERNALITIES
to the joint surplus that is created by a match. Since firms continue to offer
vacancies until their marginal value is reduced to zero (V = 0), the condition
for efficiency of a decentralized equilibrium is
Ï = J + E − U, (5.71)
where E and U , introduced in Section 5.3, denote the value that a worker
attributes to the state of employment and unemployment, respectively.
Using (5.69), (5.48), and (5.51) we can rewrite (5.71) as:
c
q ′(Ë)Ë + q (Ë)
=
1
1 − ‚
c
q (Ë)
, (5.72)
where ‚ denotes the relative bargaining strength of the worker. From (5.72),
we obtain
‚ = − q
′(Ë) Ë
q (Ë)
≡ −
[
Á(Ë) − 1
]
,
⇒ ‚ = 1 − Á(Ë), (5.73)
where Á(Ë) and 1 − Á(Ë) denote the elasticity of the matching probability of
a worker p(Ë) and the average duration of a vacancy 1/q (Ë) with respect to
Ë. Since ‚ is constant, condition (5.73) can be satisfied only if the matching
function has constant returns to scale with respect to its arguments v and u.
This condition is satisfied for a matching function of the Cobb–Douglas type:
m = m0v
Áu1−Á, 0 < Á < 1. (5.74)
It is easy to verify that (5.74) has the following properties:
p(Ë) = m0Ë
Á
, q (Ë) = m0Ë
Á−1
,
1
q (Ë)
=
1
m0
Ë1−Á. (5.75)
The constant parameter Á represents both the elasticity of the number of
matches m with respect to the number of vacancies v, and the elasticity of
p(Ë) with respect to Ë, while 1 − Á denotes the elasticity of m with respect to
u and also the elasticity of the medium duration of a vacancy, 1/q (Ë), with
respect to Ë.
Returning to efficiency condition (5.73), we can thus deduce that, if the
average duration of a vacancy strongly increases with an increase in the num-
ber of vacancies (i.e. if 1 − Á is relatively high), there is a strong tendency
for firms to exceed the efficient number of vacancies. Only a relatively high
value of ‚, which implies high wage levels, can counterbalance this effect and
induce firms to reduce the number of vacancies. When ‚ = 1 − Á, these two
opposing tendencies exactly offset each other and the decentralized equilib-
rium allocation is efficient. For cases in which ‚ �= 1 − Á, there are two types
of inefficiency:
COORDINATION AND EXTERNALITIES 211
1. if ‚ < 1 − Á firms offer an excessive number of vacancies and the equi-
librium unemployment rate is below the socially optimal level;
2. if ‚ > 1 − Á wages are excessively high because of the strong bargaining
power of workers and this results in an unemployment rate that is above
the socially efficient level.
In sum, in the model of the labor market that we have described here
we cannot make a priori conclusions about the efficiency of the equilibrium
unemployment rate. Given the complex externalities between the actions of
firms and workers, the properties of the matching function and the wage deter-
mination mechanism are crucial to determine whether the unemployment
rate will be above or below the socially efficient level.
� APPENDIX A5: STRATEGIC INTERACTIONS AND MULTIPLIERS
This appendix presents a general theoretical structure, based on Cooper and John
(1988), which captures the essential elements of the strategic interactions in the models
discussed in this chapter. We will discuss the implications of strategic interactions
in terms of the multiplicity of equilibria and analyze the welfare properties of these
equilibria.
Consider a number I of economic agents (i = 1, …, I ), each of which chooses a
value for a variable ei ∈ [0, E ] which represents the agent’s “activity level,” with the
objective of maximizing her own payoff Û(ei , e−i , Îi ), where e−i represents (the vector
of) activity levels of the other agents and Îi is an exogenous parameter which influences
the payoff of agent i . Payoff function Û(·) satisfies the properties Ûi i < 0 and Ûi Î > 0.
(This last assumption implies that an increase in Î raises the marginal return of activity
for the agent.)
If all other agents choose a level of activity ē , the payoff of agent i can be expressed
as Û(ei , ē, Îi ) ≡ V (ei , ē ). In this case the optimization problem becomes
max
ei
V (ei , ē ), (5.A1)
from which we derive
V1(e
∗
i , ē ) = 0, (5.A2)
where V1 denotes the derivative of V with respect to its first argument, ei . First-order
condition (5.A2) defines the optimal response of agent i to the activity level of all
other agents: e ∗i = e
∗
i (ē ). Moreover, using (5.A1), we can also calculate the slope of the
reaction curve of agent i :
d e ∗i
d ē
= − V12
V11
≶ 0, if V12 ≶ 0. (5.A3)
By the second-order condition for maximization, we know that V11 < 0; the sign
of the slope is thus determined by the sign of V12(ei , ē ). In case V12 > 0, we can
212 COORDINATION AND EXTERNALITIES
make a graphical representation of the marginal payoff function V1(ei , ē ) and of the
resulting reaction function e ∗i (ē ). The left-hand graph in Figure 5.12 illustrates various
functions V1, corresponding to three different activity levels for the other agents: ē = 0,
ē = e , and ē = E .
Assuming V1(0, 0) > 0 and V1( E , E ) < 0 (points A and B ) guarantees the exis-
tence of at least one symmetric decentralized equilibrium in which e = e ∗i (e ), and agent
i chooses exactly the same level of activity as all other agents (in this case V1(e, e ) = 0
and V11(e, e ) < 0). In Figure 5.12 we illustrate the case in which the reaction has a
positive slope, and hence V12 > 0, and in which there is a unique symmetric equilib-
rium.
In general, if V12(ei , ē ) > 0 there exists a strategic complementarity between agents:
an increase in the activity level of the others increases the marginal return of activity
for agent i , who will respond to this by raising her activity level. If, on the other hand,
V12(ei , ē ) < 0, then agents’ actions are strategic substitutes. In this case agent i chooses
a lower activity in response to an increase in the activity level of others (as in the case
of a Cournot duopoly situation in which producers choose output levels). In the latter
case there exists a unique equilibrium, while in the case of strategic complementarity
there may be multiple equilibria.
Before analyzing the conditions under which this may occur, and before discussing
the role of strategic complementarity or substitutability in determining the character-
istics of the equilibrium, we must evaluate the problem from the viewpoint of a social
planner who implements a Pareto-efficient equilibrium.
Figure 5.12. Strategic interactions
COORDINATION AND EXTERNALITIES 213
The planner’s problem may be expressed as the maximization of a representative
agent’s welfare with respect to the common strategy (activity level) of all agents: the
optimum that we are looking for is therefore the symmetric outcome corresponding
to a hypothetical cooperative equilibrium. Formally,
max
e
V (e, e ), (5.A4)
from which we obtain
V1(e
∗
, e ∗) + V2(e
∗
, e ∗) = 0. (5.A5)
Comparing this first-order condition49 with the condition that is valid in a symmetric
decentralized equilibrium (5.A2), we see that the solutions for e ∗ are different if
V2(e
∗, e ∗) �= 0. In general, if V2(ei , ē ) > (<)0, there are positive (negative) spillovers.
The externalities are therefore defined as the impact of a third agent’s activity level on
the payoff of an individual.
A number of important implications for different features of the possible equilibria
follow from this general formulation.
1. Efficiency Whenever there are externalities that affect the symmetric decen-
tralized equilibrium, that is when V2(e, e ) �= 0, the decentralized equilibrium
is inefficient. In particular, with a positive externality (V2(e, e ) > 0), there exists
a symmetric cooperative equilibrium characterized by a common activity level
e ′ > e .
2. Multiplicity of equilibria As already mentioned, in the case of strategic comple-
mentarity (V12 > 0), an increase in the activity level of the other agents increases
the marginal return of activity for agent i , which induces agent i to raise her
own activity level. As a result, the reaction function of agents has a positive slope
(as in Figure 5.12). Strategic complementarity is a necessary but not a sufficient
condition for the existence of multiple (non-cooperative) equilibria. The suf-
ficient condition is that d e ∗i /d ē > 1 in a symmetric decentralized equilibrium.
If this condition is satisfied, we may have the situation depicted in Figure 5.13,
in which there exist three symmetric equilibria. Two of these equilibria (with
activity levels e 1and e 3) are stable, since the slope of the reaction curves is less
than one at the equilibrium activity levels, while at e 2 the slope of the reaction
curve is greater than one. This equilibrium is therefore unstable.
3. Welfare If there exist multiple equilibria, and if at each activity level there are
positive externalities (V2(ei , ē ) > 0 ∀ē ), then the equilibria can be ranked. Those
with a higher activity level are associated with a higher level of welfare. Hence,
agents may be in an equilibrium in which their welfare is below the level that
may be obtained in other equilibria. However, since agents choose the optimal
strategy in each of the equilibria, there is no incentive for agents to change
⁴⁹ The second-order condition that we assume to be satisfied is given by V11 (e
∗, e ∗) + 2V12 (e ∗, e ∗) +
V22 (e
∗, e ∗) < 0. Furthermore, in order to ensure the existence of a cooperative equilibrium, we
assume that V1 (0, 0) + V2 (0, 0) > 0, V1 ( E , E ) + V2 ( E , E ) < 0, which is analogous to the restrictions
imposed in the decentralized optimization above.
214 COORDINATION AND EXTERNALITIES
Figure 5.13. Multiplicity of equilibria
their level of activity. The absence of a mechanism to coordinate the actions of
individual agents may thus give rise to a “coordination failure,” in which potential
welfare gains are not realized because of a lack of private incentives to raise the
activity levels.
Exercise 52 Show formally that equilibria with a higher ē are associated with a higher
level of welfare if V2(ei , ē ) > 0. (Use the total derivative of function V (·) to derive this
result.)
4. Multipliers Strategic complementarity is necessary and sufficient to guarantee
that the aggregate response to an exogenous shock exceeds the response at
the individual level; in this case the economy exhibits “multiplier” effects. To
clarify this last point, which is of particular relevance for Keynesian models,
we will consider the simplified case of two agents with payoff functions defined
as V 1 ≡ Û1(e 1, e 2, Î1) and V 2 ≡ Û2(e 1, e 2, Î2), respectively. All the assumptions
about these payoff functions remain valid (in particular, V 113 ≡ Û113 > 0). The
reaction curves of the two agents are derived from the following first-order
conditions:
V 11 (e
∗
1 , e
∗
2 , Î1) = 0, (5.A6)
V 22 (e
∗
1 , e
∗
2 , Î2) = 0. (5.A7)
We now consider a “shock” to the payoff function of agent 1, namely d Î1 > 0, and
we derive the effect of this shock on the equilibrium activity levels of the two agents, e ∗1
and e ∗2 , and on the aggregate level of activity, e
∗
1 + e
∗
2 . Taking the total derivative of the
above system of first-order conditions (5.A6) and (5.A7), with d Î2 = 0, and dividing
COORDINATION AND EXTERNALITIES 215
the first equation by V 111 and the second by V
2
22, we have:
d e ∗1 +
(
V 112
V 111
)
d e ∗2 +
(
V 113
V 111
)
d Î1 = 0,
(
V 221
V 222
)
d e ∗1 + d e
∗
2 = 0.
The terms V 112/ V
1
11 and V
2
21/ V
2
22 represent the slopes, with opposing signs, of the
reaction curves of the agents which we denote by Ò (given that the payoff functions
are assumed to be identical, the slope of the reaction curves is also the same). The
term V 113/ V
1
11 represents the response (again with oppositing signs) of the optimal
equilibrium level of agent 1 to a shock Î1. In particular, keeping e
∗
2 constant, we have
V 11 (e
∗
1 , e
∗
2 , Î1) = 0 ⇒
∂e ∗1
∂Î1
= − V
1
13
V 111
> 0.
We can thus rewrite the system as follows:
(
1 −Ò
−Ò 1
)(
d e ∗1
d e ∗2
)
=
(
∂e ∗1
∂Î1
0
)
d Î1,
which yields the following solution:
d e ∗1
d Î1
=
1
1 − Ò2
∂e ∗1
∂Î1
(5.A8)
d e ∗2
d Î1
=
Ò
1 − Ò2
∂e ∗1
∂Î1
= Ò
d e ∗1
d Î1
. (5.A9)
Equation (5.A8) gives the total response of agent 1 to a shock Î1. This response can
also be expressed as
d e ∗1
d Î1
=
∂e ∗1
∂Î1
+ Ò
d e ∗2
d Î1
. (5.A10)
The first term is the “impact” (and thus only partial) response of agent 1 to a shock
affecting her payoff function; the second term gives the response of agent 1 that is
“induced” by the reaction of the other agent. The condition for the additional induced
effect is simply Ò �= 0. Moreover, the actual induced effect depends on Ò and d e ∗2 /d Î1,
as in (5.A9), where d e ∗2 /d Î1 has the same sign Ò: positive in case of strategic comple-
mentarity and negative in case of substitutability. The induced response of agent 1 is
therefore always positive.
This leads to a first important conclusion: the interactions between the agents always
induce a total (or equilibrium) response that is larger than the impact response. In
216 COORDINATION AND EXTERNALITIES
particular, for each Ò �= 0, we have
d e ∗1
d Î1
>
∂e ∗1
∂Î1
.
For the economy as a whole, the effect of the disturbance is given by
d (e ∗1 + e
∗
2 )
d Î1
=
(
1
1 − Ò2 +
Ò
1 − Ò2
)
∂e ∗1
∂Î1
=
1
1 − Ò
∂e ∗1
∂Î1
= (1 + Ò)
d e ∗1
d Î1
. (5.A11)
The relative size of the aggregate response compared with the size of the individual
response depends on the sign of Ò: if Ò > 0 (and limiting attention to stable equilibria
for which Ò < 1), then aggregate response is bigger than individual response. Strategic
complementarity is thus a necessary and sufficient condition for Keynesian multiplier
effects.
Exercise 53 Determine the type of externality and the nature of the strategic interactions
for the simplified case of two agents with payoff function (here expressed for agent 1)
V 1(e 1, e 2) = e
·
1 e
·
2 − e 1 (with 0 < 2· < 1). Furthermore, derive the (symmetric) decen-
tralized equilibria and compare these with the cooperative (symmetric) equilibrium.
REVIEW EXERCISES
Exercise 54 Introduce the following assumptions into the model analyzed in Section 5.1:
(i) The (stochastic) cost of production c has a uniform distribution defined on [0, 1],
so that G (c ) = c for 0 ≤ c ≤ 1.
(ii) The matching probability is equal to b(e ) = b · e , with parameter b > 0.
(a) Determine the dynamic expressions for e and c ∗ (repeating the derivation in
the main text) under the assumption that y < 1.
(b) Find the equilibria for this economy and derive the stability properties of all
equilibria with a positive activity level.
Exercise 55 Starting from the search model of money analyzed in Section 5.2, suppose
that carrying over money from one period to the next now entails a storage cost, c > 0.
Under this new assumption,
(a) Derive the expected utility for an agent holding a commodity (VC ) and for an
agent holding money (VM ), and find the equilibria of the economy.
(b) Which of the three equilibria described in the model of Section 5.2 (with c = 0)
always exists even with c > 0? Under what condition does a pure monetary
equilibrium exist?
Exercise 56 Assume that the flow cost of a vacancy c and the imputed value of free time z
in the model of Section 5.3 are now functions of the wage w (instead of being exogenous).
COORDINATION AND EXTERNALITIES 217
In particular, assume that the following linear relations hold:
c = c 0 w, z = z0 w.
Determine the effect of an increase in productivity (�y > 0) on the steady-state equilib-
rium.
Exercise 57 Consider a permanent negative productivity shock (�y < 0) in the match-
ing model of Sections 5.3 and 5.4. The shock is realized at date t1, but is anticipated by
the agents from date t0 < t1 onwards. Derive the effect of this shock on the steady-state
equilibrium and describe the transitional dynamics of u, v, and Ë.
Exercise 58 Consider the effect of an aggregate shock in the model of strategic interactions
for two agents introduced in Appendix A5. That is, consider a variation in the exogenous
terms of the payoff functions, so that d Î1 = d Î2 = d Î > 0, and derive the effect of this
shock on the individual and aggregate activity level.
� FURTHER READING
The role of externalities between agents that operate in the same market as a source
of multiplicity of equilibria is the principal theme in Diamond (1982a ). This arti-
cle develops the economic implications of the multiplicity of equilibria that have a
Keynesian spirit. The monograph by Diamond (1984) analyzes this theme in greater
depth, while Diamond and Fudenberg (1989) concentrate on the dynamic aspects
of the model. Blanchard and Fischer (1989, chapter 9) offer a compact version
of the model that we studied in the first section of this chapter. Moreover, after
elaborating on the general theoretical structure to analyze the links between strate-
gic interactions, externalities, and multiplicity of equilibria, which we discussed in
Appendix A5, Cooper and John (1988) offer an application of Diamond’s model.
Rupert et al. (2000) survey the literature on search models of money as a medium of
exchange and present extensions of the basic Kiyotaki–Wright framework discussed in
Section 5.2.
The theory of the decentralized functioning of labor markets, which is based on
search externalities and on the process of stochastic matching of workers and firms,
reinvestigates a theme that was first developed in the contributions collected in Phelps
(1970), namely the process of search and information gathering by workers and its
effects on wages. Mortensen (1986) offers an exhaustive review of the contributions in
this early strand of literature.
Compared with these early contributions, the theory developed in Section 5.3 and
onwards concentrates more on the frictions in the matching process. Pissarides (2000)
offers a thorough analysis of this strand of the literature. In this literature the base
model is extended to include a specification of aggregate demand, which makes the
interest rate endogenous, and allows for growth of the labor force, two elements that
are not considered in this chapter. Mortensen and Pissarides (1999a , 1999b) provide
an up-to-date review of the theoretical contributions and of the relevant empirical
evidence.
218 COORDINATION AND EXTERNALITIES
In addition to the assumption of bilateral bargaining, which we adopted in Section
5.3, Mortensen and Pissarides (1998a ) consider a number of alternative assumptions
about wage determination. Moreover, Pissarides (1994) explicitly considers the case of
on-the-job search which we excluded from our analysis. Pissarides (1987) develops the
dynamics of the search model, studying the path of unemployment and vacancies in
the different stages of the business cycle. The paper devotes particular attention to the
cyclical variations of u and v around their long-run relationship, illustrated here by the
dynamics displayed in Figure 5.11. Bertola and Caballero (1994) and Mortensen and
Pissarides (1994) extend the structure of the base model to account for an endogenous
job separation rate s . In these contributions job destruction is a conscious decision
of employers, and it occurs only if a shock reduces the productivity of a match below
some endogenously determined level. This induces an increase in the job destruction
rate in cyclical downturns, which is coherent with empirical evidence.
The simple Cobb–Douglas formulation for the aggregate matching function with
constant returns to scale introduced in Section 5.3 has proved quite useful in interpret-
ing the evidence on unemployment and vacancies. Careful empirical analyses of flows
in the (American) labor market can be found in Blanchard and Diamond (1989, 1990),
Davis and Haltiwanger (1991, 1992) and Davis, Haltiwanger, and Schuh (1996), while
Contini et al. (1995) offer a comparative analysis for the European countries. Cross-
country empirical estimates of the Beveridge curve have been used by Nickell et al.
(2002) to provide a description of the developments of the matching process over the
1960–99 period in the main OECD economies. They find that the Beveridge curve
gradually drifted rightwards in all countries from the 1960s to the mid-1980s. In some
countries, such as France and Germany, the shift continued in the same direction in
the 1990s, whereas in the UK and the USA the curve shifted back towards its original
position. Institutional factors affecting search and matching efficiency are responsible
for a relevant part of the Beveridge curve shifts. The Beveridge curve for the Euro area
in the 1980s and 1990s is analysed in European Central Bank (2002). Both counter-
clockwise cyclical swings around the curve of the type discussed in Section 5.4 and
shifts of the unemployment–vacancies relation occurred in this period. For example,
over 1990–3 unemployment rose and the vacancy rate declined, reflecting the influ-
ence of cyclical factors; from 1994 to 1997 the unemployment rate was quite stable
in the face of a rising vacancy rate, a shift of the Euro area Beveridge curve that is
attributable to structural factors.
Not only empirically, but also theoretically, the structure of the labor force, the
geographical dispersion of unemployed workers and vacant jobs, and the relevance
of long-term unemployment determine the efficiency of a labor market’s matching
process. Petrongolo and Pissarides (2001) discuss the theoretical foundations of the
matching function and provide an up-to-date survey of the empirical estimates for
several countries, and of recent contributions focused on various factors influencing
the matching rate.
The analysis of the efficiency of decentralized equilibrium in search models is first
developed in Diamond (1982b) and Hosios (1990), who derive the efficiency condi-
tions obtained in Section 5.5; it is also discussed in Pissarides (2000). In contrast, in a
classic paper Lucas and Prescott (1974) develop a competitive search model where the
decentralized equilibrium is efficient.
COORDINATION AND EXTERNALITIES 219
� REFERENCES
Bertola, G., and R. J. Caballero (1994) “Cross-Sectional Efficiency and Labour
Hoarding in a Matching Model of Unemployment,” Review of Economic Studies, 61,
435–456.
Blanchard, O. J., and P. Diamond (1989) “The Beveridge Curve,” Brookings Papers on Economic
Activity, no. 1, 1–60.
(1990) “The Aggregate Matching Function,” in P. Diamond (ed.), Growth, Productiv-
ity, Unemployment, Cambridge, Mass.: MIT Press, 159–201.
and S. Fischer (1989) Lectures on Macroeconomics, Cambridge, Mass.: MIT Press.
Contini, B., L. Pacelli, M. Filippi, G. Lioni, and R. Revelli (1995) A Study of Job Creation and Job
Destruction in Europe, Brussels: Commission of the European Communities.
Cooper, R., and A. John (1988) “Coordinating Coordination Failures in Keynesian Models,”
Quarterly Journal of Economics, 103, 441–463.
Davis, S., and J. Haltiwanger (1991) “Wage Dispersion between and within US Manufacturing
Plants, 1963–86,” Brookings Papers on Economic Activity, no. 1, 115–200.
(1992) “Gross Job Creation, Gross Job Destruction and Employment Reallocation,”
Quarterly Journal of Economics, 107, 819–864.
and S. Schuh (1996) Job Creation and Destruction, Cambridge, Mass.: MIT Press.
Diamond, P. (1982a ) “Aggregate Demand Management in Search Equilibrium,” Journal of Polit-
ical Economy, 90, 881–894.
(1982b) “Wage Determination and Efficiency in Search Equilibrium,” Review of Economic
Studies, 49, 227–247.
(1984) A Search-Equilibrium Approach to the Micro Foundations of Macroeconomics,
Cambridge, Mass.: MIT Press.
and D. Fudenberg (1989) “Rational Expectations Business Cycles in Search Equilibrium,”
Journal of Political Economy, 97, 606–619.
European Central Bank (2002) “Labour Market Mismatches in Euro Area Countries,” Frankfurt:
European Central Bank.
Hosios, A. J. (1990) “On the Efficiency of Matching and Related Models of Search and Unem-
ployment,” Review of Economic Studies, 57, 279–298.
Kiyotaki, N., and R. Wright (1993) “A Search-Theoretic Approach to Monetary Economics,”
American Economic Review, 83, 63–77.
Lucas, R. E., and E. C. Prescott (1974) “Equilibrium Search and Unemployment,” Journal of
Economic Theory, 7, 188–209.
Mortensen, D. T. (1986) “Job Search and Labor Market Analysis,” in O. Ashenfelter and R. Layard
(eds.), Handbook of Labor Economics, Amsterdam: North-Holland.
and C. A. Pissarides (1994) “Job Creation and Job Destruction in the Theory of Unemploy-
ment,” Review of Economic Studies, 61, 397–415.
(1999a ) “New Developments in Models of Search in the Labor Market,” in O. Ashen-
felter and D. Card (eds.), Handbook of Labor Economics, vol. 3, Amsterdam: North-Holland.
(1999b) “Job Reallocation, Employment Fluctuations and Unemployment,” in J. B.
Taylor and M. Woodford (eds.), Handbook of Macroeconomics, Amsterdam: North-Holland.
220 COORDINATION AND EXTERNALITIES
Nickell S., L. Nunziata, W. Ochel, and G. Quintini (2002) “The Beveridge Curve, Unemployment
and Wages in the OECD from the 1960s to the 1990s,” Centre for Economic Performance Dis-
cussion Paper 502; forthcoming in P. Aghion, R. Frydman, J. Stiglitz, and M. Woodford (eds.),
Knowledge, Information and Expectations in Modern Macroeconomics: In Honor of Edmund S.
Phelps, Princeton: Princeton University Press.
Petrongolo B., and C. A. Pissarides (2001) “Looking into the Black Box: A Survey of the Matching
Function,” Journal of Economic Literature, 39, 390–431.
Phelps, E. S. (ed.) (1970) Macroeconomic Foundations of Employment and Inflation Theory, New
York: W. W. Norton.
Pissarides, C. A. (1987) “Search, Wage Bargains and Cycles,” Review of Economic Studies, 54,
473–483.
(1994) “Search Unemployment and On-the-Job Search,” Review of Economic Studies, 61,
457–475.
(2000) Equilibrium Unemployment Theory, 2nd edn. Cambridge, Mass.: MIT Press.
Rupert P., M. Schindler, A. Shevchenko, and R. Wright (2000) “The Search-Theoretic Approach
to Monetary Economics: A Primer,” Federal Reserve Bank of Cleveland Economic Review, 36(4),
10–28.
� A N S W E R S T O E X E R C I S E S
Solution to exercise 1
When Î = 0 (assuming for simplicity that yt−i = ȳ ∀i ≥ 0) the agent has an
initial consumption level c t = ȳ and a stock of financial assets at the beginning
of period t + 1 equal to zero: At +1 = 0. In period t + 1, we have
c t +1 = ȳ +
r
1 + r
εt +1, s t +1 = yt +1 − c t +1 =
1
1 + r
εt +1 = At +2.
In subsequent periods (with no further innovations) current income will go
back to its mean value ȳ, and consumption will remain at the higher level
computed for t + 1. The return on financial wealth accumulated in t + 1 allows
the consumer to maintain such higher consumption level over the entire
future horizon:
y Dt +2 = yt +2 + r At +2 = ȳ +
r
1 + r
εt +1 = c t +2 ⇒ s t +2 = 0.
The same is true for all periods t + i with i > 2. There is no saving, and the
level of A remains equal to At +2. When Î = 1, the whole increase in income is
permanent and is entirely consumed. There is no need to save in order to keep
the higher level of consumption in the future.
Solution to exercise 2
We look for a consumption function of the general form
c t = r ( At + Ht ) = r At +
r
1 + r
∞∑
i =0
(
1
1 + r
)i
E t yt +i ,
as in (1.12) in the main text. Given the assumed stochastic process for income,
we can compute expectations of future incomes and then the value of human
wealth Ht . We have
E t yt +1 = Îyt + (1 − Î)ȳ
E t yt +2 = Î
2 yt + (1 + Î)(1 − Î)ȳ
. . .
E t yt +i = Î
i yt + (1 + Î + . . . + Î
i −1)(1 − Î)ȳ = Îi yt + (1 − Îi )ȳ.
222 ANSWERS TO EXERCISES
Plugging the last expression above into the definition of Ht , we get
Ht =
1
1 + r
∞∑
i =0
(
1
1 + r
)i
(Îi yt + (1 − Îi )ȳ)
=
1
1 + r
[
yt
∞∑
i =0
(
Î
1 + r
)i
+ ȳ
∞∑
i =0
(
1
1 + r
)i
− ȳ
∞∑
i =0
(
Î
1 + r
)i]
=
1
1 + r
[
yt
1 + r
1 + r − Î + ȳ
(
1 + r
r
− 1 + r
1 + r − Î
)]
=
1
1 + r − Î yt +
1 − Î
r (1 + r − Î) ȳ.
The consumption function is then
c t = r ( At + Ht ) = r At +
r
1 + r − Î yt +
1 − Î
1 + r − Î ȳ.
If Î = 1, income innovations are permanent and the best forecast of all future
incomes is simply current income yt . Thus, consumption will be equal to total
income (interest income and labor income):
c t = r At + yt .
If Î = 0, income innovations are purely temporary and the best forecast of
future incomes is mean income ȳ. Consumption will then be
c t = r At + ȳ +
r
1 + r
( yt − ȳ).
The last term measures the annuity value (at the beginning of period t ) of
the income innovation that occurred in period t and therefore known by the
consumer (indeed, yt − ȳ = εt ).
Solution to exercise 3
Since c 2 = w1 − c 1 + w2, from the first-order condition
1
c 1
= E
(
1
c 2
)
we get
1
c 1
=
p
w1 − c 1 + x
+
1 − p
w1 − c 1 + y
.
Rearranging and writing p x + (1 − p)y = z, we get
(w1 − c 1 − z + y + x ) c 1 = (w1 − c 1 + x )(w1 − c 1 + y).
ANSWERS TO EXERCISES 223
This is a quadratic equation for c 1, so a closed-form solution is available.
Writing x = z + �, y = z − �, the first-order condition reads
(w1 − c 1 + z) c 1 = (w1 − c 1 + z + �)(w1 − c 1 + z − �).
In the absence of uncertainty (� = 0), the solution is c 1 = (w1 + z)/2. (With
discount and return rates both equal to zero, the agent consumes half of the
available resources in each period.) For general � the optimality condition is
solved by
c 1 =
3
4
(w1 + z) ±
1
4
√
((w1 + z)2 + 8�2).
Selecting the negative square root ensures that the solution approaches the
appropriate limit when � → 0, and implies that uncertainty reduces first-
period consumption (for precautionary motives). An analytic solution would
be impossible for even slightly more complicated maximization problems.
This is why studies of precautionary savings prefer to specify the utility func-
tion in exponential form, rather than logarithmic or other CRRA.
Solution to exercise 4
Solving the consumer’s problem, we get the following first-order condition
(see the main text for the solution in the certainty case):
1 + r
1 + Ò
E t
[(
c t +1
c t
)−„]
= 1.
The assumption � log c t +1 ∼ N
(
E t (� log c t +1), Û
2
)
yields
−„� log c t +1 ∼ N
(
−„E t (� log c t +1), „2Û2
)
.
Using the properties of the lognormal distribution, we can write the Euler
equation as
1 + r
1 + Ò
e (−„E t (� log c t +1 )+(„
2/2)Û2 ) = 1.
Taking logarithms, the following expression for the expected rate of change of
consumption is obtained:
E t (� log c t +1) =
1
„
(r − Ò) + „
2
Û2.
The uncertainty on future consumption levels, captured by the variance Û2,
induces the (prudent) consumer to transfer resources from the present to the
future, determining an increasing path of consumption over time.
224 ANSWERS TO EXERCISES
Solution to exercise 5
(a) The increase of mean income changes the consumer’s permanent
income. Both permanent income and consumption increase by �ȳ.
Formally,
�c t +1 = �y
P
t +1 =
r
1 + r
∞∑
i =0
(
1
1 + r
)i
(E t +1 − E t ) yt +1+i
=
r
1 + r
∞∑
i =0
(
1
1 + r
)i
�ȳ = �ȳ.
Since the income change is entirely permanent, saving is not affected.
(b) In order to find the change in consumption following an innovation in
income, it is necessary to compute the revision in expectations of future
incomes caused by εt +1. Given the stochastic process for labor income,
we have
(E t +1 − E t ) yt +1 = εt +1,
(E t +1 − E t ) yt +2 = −‰εt +1,
(E t +1 − E t ) yt +i = 0 for i > 2.
Applying the general formula for the change in consumption, we get
�c t +1 = r ( Ht +1 − E t Ht +1)
=
r
1 + r
(
εt +1 −
1
1 + r
‰εt +1
)
=
r (1 + r − ‰)
(1 + r )2
εt +1.
The increase in consumption is lower than the increase in income since
the latter is only temporary. The higher is ‰, the lower is the change in
consumption, because a positive income innovation in t + 1 (εt +1) is
offset by a negative income change (−‰εt +1) in the following period.
(c) The behavior of saving reflects the expectation of future income
changes. Given εt +1 and using the stochastic process for income, we
obtain
yt +1 = ȳ + εt +1
yt +2 = ȳ − ‰εt +1 ⇒ �yt +2 = −(1 + ‰)εt +1
yt +3 = ȳ ⇒ �yt +3 = ‰εt +1.
ANSWERS TO EXERCISES 225
(No income changes are foreseen for subsequent periods.) Saving in
t + 1 and t + 2 is then
s t +1 = −
∞∑
i =1
(
1
1 + r
)i
E t +1�yt +1+i = −
[
− 1 + ‰
1 + r
+
‰
(1 + r )2
]
εt +1
=
1 + r (1 + ‰)
(1 + r )2
εt +1 > 0,
s t +2 = −
∞∑
i =1
(
1
1 + r
)i
E t +2�yt +2+i = −
‰
1 + r
εt +1 < 0.
In t + 1 a portion of the higher income is saved, since the con-
sumer knows its transitory nature and then anticipates further income
changes in the two following periods. In t + 2 income is temporar-
ily lower than average and the agent finances consumption with the
income saved in the previous period: in t + 2, then, saving is negative.
Solution to exercise 6
For each period from t onwards, the consumer must choose both the con-
sumption of non-durable goods c t +i (which coincides with expenditure), and
the expenditure on durable goods dt +i (which adds to the stock and starts
to provide utility in the period after the purchase). The utility maximization
problem is then solved for c t +i and dt +i . Besides the constraints in the main
text, we must consider the transversality condition on financial wealth A
and the non-negativity constraint on the stock of durable goods S and on
consumption c (though we will not explicitly use these additional constraints
in the solution below). Following the solution procedure already used in the
main text, we substitute the two constraints into the utility function to be
maximized. Combining the constraints, we can write consumption as
c t +i = (1 + r ) At +i − At +i +1 + yt +i − pt +i [St +i +1 − (1 − ‰)St +i ].
Plugging the above expression into the objective function, we get the following
optimization problem:
max
At +i ,St +i
Ut =
∞∑
i =0
(
1
1 + Ò
)i
u((1 + r ) At +i − At +i +1 + yt +i
− pt +i (St +i +1 − (1 − ‰)St +i ), St +i ).
Expanding the first two terms of the summation (for i = 0, 1) and differ-
entiating with respect to At +1 and St +1, we obtain the following first-order
226 ANSWERS TO EXERCISES
conditions:
∂Ut
∂ At +1
= −uc (c t , St ) +
1 + r
1 + Ò
uc (c t +1, St +1) = 0,
∂Ut
∂ St +1
= − pt uc (c t , St ) +
1
1 + Ò
pt +1(1 − ‰)uc (c t +1, St +1)
+
1
1 + Ò
u S (c t +1, St +1) = 0.
The consumer makes two decisions. First, he chooses between consumption
in the current period and in the next period (and then between consumption
and saving). Second, he chooses between spending on non-durable goods
and spending on durable goods, which yield deferred utility. The first-order
conditions above illustrate these two choices. The first condition captures the
choice between consumption and saving, as in (1.5) in the main text,
uc (c t , St ) =
1 + r
1 + Ò
uc (c t +1, St +1),
and bears the usual interpretation: the loss of marginal utility arising from
the decrease in consumption at time t must be offset by the marginal utility
(discounted with rate Ò) obtained by accumulating financial assets with gross
return 1 + r . The choice between spending for non-durable goods and pur-
chasing durables is illustrated by the second condition, rewritten as
pt uc (c t , St ) =
1
1 + Ò
[u S (c t +1, St +1) + (1 − ‰) pt +1uc (c t +1, St +1)].
One unit of the durable good purchased at time t entails a decrease of spending
on (and consumption of ) pt units of non-durable goods with a utility loss
measured by pt uc (c t , St ) on the right-hand side of the above equation. In
equilibrium, this loss must be offset, in the following period, by the higher
utility stemming from the unitary increase of the stock of durables. This
increase in utility, measured on the left-hand side of the equation, has two
components (both discounted at rate Ò). The first is the marginal utility of
the stock of durables at the beginning of period t + 1. The second accounts
for the additional resources that an increase in the stock of durables makes
available for consumption in t + 1 by reducing the need for further purchases,
dt +1. These additional resources are measured by (1 − ‰) pt +1, yielding utility
(1 − ‰) pt +1uc (c t +1, St +1).
Solution to exercise 7
(a) In each period, utility is affected positively by consumption in the
current period and negatively by consumption in the previous period.
ANSWERS TO EXERCISES 227
This formulation of utility may capture habit formation behavior: a
high level of consumption in period t decreases utility in period t + 1
(but increases period t + 1 marginal utility). Therefore, the agent is
induced to increase consumption in period t + 1. This effect is due to a
consumption “habit” (related to the last period level of c ) making the
agent increase consumption over time.
(b) Substituting c t +i and c t +i −1 from the budget constraints of two sub-
sequent periods into the objective function and differentiating with
respect to financial wealth, we obtain the following first-order condi-
tion (Euler equation):
E t u
′(c t +i −1, c t +i −2) =
1 + r + „
1 + Ò
E t u
′(c t +i , c t +i −1)
− (1 + r )„
(1 + Ò)2
E t u
′(c t +i +1, c t +i ).
Setting i = 0 and Ò = r , and assuming quadratic utility so that
u′(c t +i , c t +i −1) = 1 − b(c t +i − „c t +i −1), we get
1 − b(c t−1 − „c t−2) =
1 + r + „
1 + r
[1 − b(c t − „c t−1)]
− „
1 + r
[1 − b(E t c t +1 − „c t ],
or
„E t c t +1 = (1 + r + „ + „
2)c t
− [(1 + r + „)„ + (1 + r )]c t−1 + „(1 + r ) c t−1.
Using first differences of consumption,
„E t �c t +1 =
(
1 + r + „2
)
�c t − (1 + r )„�c t−1.
The change in consumption between t and t + 1 depends on past values
of �c and therefore is not orthogonal to all variables dated t . If in
each period utility depends on consumption in the current and the last
periods, in choosing between c t and c t +1, the agent considers the effects
on utility not only at t and t + 1 (as in the case of a time-separable
utility function), but also at t + 2. This creates an intertemporal link
between the marginal utility in three subsequent periods and then,
with quadratic utility, between the consumption levels in subsequent
periods. In this case there is a dynamic relation between c t +1and c t , c t−1
and c t−2, which makes the consumption change �c t +1 dependent on
lagged values �c t and �c t−1. Therefore, the orthogonality conditions
that hold with separable utility are not valid here.
228 ANSWERS TO EXERCISES
Solution to exercise 8
(a) The change in permanent income for agents, �y Pt , is found from the
following version of equation (1.6):
�y Pt =
r
1 + r
∞∑
i =0
(
1
1 + r
)i
[E ( yt +i | It ) − E ( yt +i | It−1)],
where the information set used by agents ( I ) has been made explicit.
It is then necessary to compute the “surprises”: yt − E ( yt | It−1),
E ( yt +1 | It )− E ( yt +1 | It−1), etc. Since agents in each period observe
the realization of x , using the stochastic process for income, we have
E ( yt | It−1) = Îyt−1 + xt−1,
from which we obtain
yt − E ( yt | It−1) = ε1t .
Recalling that the properties of x imply that E (xt | It−1) = 0, to com-
pute the second “surprise” we use the following expressions:
E ( yt +1 | It ) = Îyt + xt ,
E ( yt +1 | It−1) = ÎE ( yt | It−1),
from which
E ( yt +1 | It ) − E ( yt +1 | It−1) = Î ( yt − E ( yt | It−1)) + xt = Îε1t + xt .
Iterating the same procedure, we find, for i ≥ 1,
E ( yt +i | It ) − E ( yt +i | It−1) = Îi (Îε1t + xt ).
The change in permanent income is then given by
�y Pt =
r
1 + r
[
ε1t +
∞∑
i =1
(
1
1 + r
)i
Îi −1(Îε1t + xt )
]
=
r
1 + r
[ ∞∑
i =0
(
1
1 + r
)i
Îi ε1t +
∞∑
i =1
(
1
1 + r
)i
Îi −1 xt
]
=
r
1 + r
(
1 + r
1 + r − Î ε1t +
1
1 + r − Î xt
)
=
r
1 + r − Î
(
ε1t +
1
1 + r
xt
)
.
Now consider the change in permanent income (�ỹ Pt ) computed
by the econometrician, who does not observe the realization of x .
ANSWERS TO EXERCISES 229
The relevant “surprises” are then: yt − E ( yt | �t−1), E ( yt +1 | �t ) −
E ( yt +1 | �t−1), etc. As in the previous case, we get
E ( yt | �t−1) = Îyt−1
E ( yt +1 | �t ) = Îyt
E ( yt +1 | �t−1) = ÎE ( yt | �t−1),
from which we compute the “surprises”:
yt −E ( yt | �t−1) = ε1t + xt−1
E ( yt +1 | �t ) − E ( yt +1 | �t−1) = Î(ε1t + xt−1)
. . .
E ( yt +i | �t ) − E ( yt +i | �t−1) = Îi (ε1t + xt−1).
Finally, using equation (1.7), we obtain
�ỹ Pt =
r
1 + r
∞∑
i =0
(
1
1 + r
)i
Îi (ε1t + xt−1)
=
r
1 + r − Î (ε1t + xt−1).
(b) The variability of permanent income, measured by the variance of �y Pt
and �ỹ Pt , is
var(�y Pt ) = ¯
2
(
Û21 +
(
1
1 + r
)2
Û2x
)
,
var(�ỹ Pt ) = ¯
2(Û21 + Û
2
x ),
where ¯ ≡ r /(1 + r − Î), Û21 ≡ var(ε1), and Û2x ≡ var(x ). We find then
that var(�y Pt ) < var(�ỹ
P
t ). The variability of permanent income
estimated by the econometrician is higher than the variability perceived
by agents. Overestimating the unforeseen changes in income may lead
to the conclusion that consumption is excessively smooth, even though
agents behave as predicted by the rational expectations–permanent
income theory.
Solution to exercise 9
(a) For the assumed utility function, marginal utility is
u′(c ) =
{
a − bc for c < a/b;
0 for c ≥ a/b,
230 ANSWERS TO EXERCISES
As shown in the figure, marginal utility is convex in the neighborhood
of c = a/b where it becomes zero. Therefore, there exists a precaution-
ary saving motive.
(b) The optimality condition for c 1 is
u′(c 1) = E 1[u
′(c 2)].
If Û = 0, we get c 1 = c 2 = a/b: in each period income is entirely con-
sumed, there is no saving, and marginal utility is zero. If Û > 0, with
c 1 = a/b, in the second period the agent consumes either a/b + Û (with
zero marginal utility) or a/b − Û (with positive marginal utility) with
equal probability. The expected value of the second-period marginal
utility will then be positive, violating the optimality condition. There-
fore, when Û > 0 the agent is induced to consume less than a/b in
the first period. Writing the realizations of second-period income and
consumption, i.e.
c 2 = y2 + ( y1 − c 1) =
{
2a/b − c 1 + Û ≡ c H2 (c 1) with probability 0.5
2a/b − c 1 − Û ≡ c L2 (c 1) with probability 0.5
and noting that marginal utility is zero in the first case, the optimality
condition becomes
a − bc 1 =
1
2
(a − bc L2 (c 1))
and the value of c 1 is computed as
c 1 =
a
b
− Û
3
.
First-period consumption is decreasing in Û: income uncertainty gives
rise to a precautionary saving motive.
ANSWERS TO EXERCISES 231
Solution to exercise 10
If G (·) has the quadratic form proposed in the exercise, then the marginal
investment cost ∂ G (K , I )/∂ I = x · 2 I has the same sign as the investment
flow I . Since the optimal investment flow I ∗ must satisfy the condition
x · 2 I ∗ = Î, where Î is the marginal value of capital, Î > 0 implies I ∗ > 0.
Intuitively, this functional form (whose slope at the origin is zero, rather than
unity) implies costs for the firm not only when I > 0, but also when I < 0.
As long as installed capital has a positive value, it cannot be optimal for the
firm to pay costs in order to scrap it, and the optimal investment flow is never
negative.
The slope at the origin of functions in the form I ‚ is zero for all ‚ > 0, and
such functions are well defined for I < 0 only when ‚ is an integer. If ‚ is an
even number, then the sign of ∂ G (K , I )/∂ I = x · ‚I ‚−1 coincides with that of
I and, as in the case where ‚ = 2, negative gross investment is never optimal.
If ‚ is an odd integer then, as in the figure, the derivative of adjustment costs
is always positive.
Thus, negative investment yields positive cash flows, and may be optimal. The
second derivative ∂ 2 G (K , I )/∂ I 2 = x · ‚ (‚ − 1) I ‚−2, however, is not always
positive as assumed in (2.4). Rather, it is negative for I < 0. This implies that
the unit cash flow yielded by negative gross investment is increasingly large
when increasingly negative values of I are considered. Hence, the firm would
profit from mixing periods of gradual positive investment (of arbitrarily small
cost, since the function G (·) is flat for I near zero) with sudden spurts of
negative investment. Such functional forms make no economic sense, and
also make it impossible to obtain a unique formal characterization of optimal
investment. If the adjustment cost function had increasing returns to (neg-
ative) investment, the first-order conditions would not characterize optimal
policies, and many different intermittent investment policies could yield an
infinitely large firm value.
232 ANSWERS TO EXERCISES
Solution to exercise 11
Employment of the flexible factor N must satisfy in steady state, as always,
the familiar first-order condition ∂ R(·)/∂ N = w. As mentioned in the text,
if capital does not depreciate, its steady-state stock must satisfy the sim-
ilarly familiar condition ∂ F (·)/∂ K = r Pk . Equivalently, since ∂ F (·)/∂ K =
∂ R(·)/∂ K − Pk ∂ G (·)/∂ K and ∂ G (·)/∂ K = 0 in this exercise,
∂ R(·)/∂ K = r Pk .
Thus, we need to characterize the effects of a smaller w on the pair (K s s , Ns s )
that satisfies the two conditions. If revenues have the Cobb–Douglas form, the
conditions
·
K s s
K ·s s N
‚
s s = r Pk ,
‚
Ns s
K ·s s N
‚
s s = w,
can be solved if · + ‚ < 1 and the firm has decreasing returns in production.
Then, we have
K s s = w
‚/·+‚−1(r Pk )
1−‚/·+‚−1·‚−1/·+‚−1‚−‚/·+‚−1
and a smaller wage is associated with a higher steady-state capital stock.
Solution to exercise 12
If G (·) has constant returns to K and I , we may write
G ( I, K ) = g
(
I
K
)
K
and note that, by the investment first-order condition,
g ′
(
I
K
)
= q ,
optimal investment is proportional to K for given q :
I = È̃(q )K .
The portion of the firm’s cash flows that pertains to investment costs,
Pk G ( I, K ) = g (̃È(q ))K ,
therefore has zero second derivative with respect to K . Since revenues (once
optimized with respect to N) are also linear in K , ∂ F (·)/∂ K does not depend
on K , and the q̇ = 0 locus is horizontal. As for the K̇ = 0 locus, we noted
when tracing phase diagrams that its slope tends to be positive when ‰ > 0,
since a higher q and more intense investment flows are needed to keep a larger
capital stock constant. To determine the slope of the K̇ = 0 locus, however, the
derivative of G (·) with respect to K is also relevant when it is not zero (as was
ANSWERS TO EXERCISES 233
convenient to assume when drawing phase diagrams). In the case where G (·)
has constant returns, we can write
K̇ = È̃(q )K − ‰K = (̃È(q ) − ‰)K
and find that, even when ‰ > 0, the locus identified by setting this expression
equal to zero is horizontal. As is the case in a static environment, the optimal
size of a competitive firms with constant returns to scale is undetermined
(if the two stationarity loci coincide), or tends to be infinitely large or small
(if either locus is larger than the other).
Solution to exercise 13
(a) As shown in the text, an increase in y has two effects on the steady-
state value of q : a positive “dividend effect” and a negative “interest rate
effect.” If the former dominates, the q̇ = 0 schedule slopes upwards in
the (q , y) phase diagram, as in the figure.
Formally, from (2.35) we get
d q
d y
∣∣∣∣
q̇ =0
=
a1r − (a0 + a1 y)h1/ h2
r 2
> 0 ⇔ a1 > q
h1
h2
,
where we used the expression for q = /r which applies along the q̇ = 0
locus. This schedule crosses the stationary locus for y from above, since
lim
y→∞
d q
d y
∣∣∣∣
q̇ =0
= 0
and q approaches the value a1h2/ h1 asymptotically from below (for
y → ∞). Outside its stationary locus, q retains the same dynamic
234 ANSWERS TO EXERCISES
properties illustrated in the main text: q̇ > 0 at all points above the
curve and q̇ < 0 below the curve. In this case the saddlepath slopes
upwards, reflecting the fact that, when output increases towards the
steady state of the system, the stronger influence on q is given by
dividends, which are also rising.
(b) Under the new assumption, the effects of the fiscal restriction on the
steady-state values of output and the interest rate are similar to those
reported in the text: both y and r decrease. However, the effect on
the steady-state value of q is different: here q is affected mainly by
lower dividends, and attains a lower level in the final steady state. The
permanent reduction in output (and dividends) is foreseen by agents
at t = 0, when the future fiscal restriction is announced. The ensuing
portfolio reallocation away from shares and toward bonds determines
an immediate decrease in stock market prices, with a depressing effect
on private investment, aggregate demand, and (starting gradually from
t = 0) output. At the implementation date t = T the economy is on
the saddlepath converging to the new steady-state position. In contrast
with the case of a dominant “interest rate effect,” here in the final
steady state there is less public spending and less private investment;
moreover, the (apparently) perverse temporary effect of fiscal policy
on output does not occur.
Solution to exercise 14
(a) With F (t ) = R(K (t )) − G ( I ), the dynamic optimality conditions
G ′( I ) = Î, Î̇ − r Î = −F ′(K ) + ‰Î,
are necessary and sufficient if G ′( I ) > 0, F ′(K ) > 0, F ′′(K ) ≤ 0, and
G ′′( I ) > 0. The optimal investment flow is a function È(·) of q (or,
since Pk = 1, of Î), where È(·) is the inverse of G ′(·). Inserting I = È(q )
in the accumulation constraint, using the second optimality condi-
tions, and noting that q̇ = Î̇, we obtain a system of two differential
equations:
K̇ = È(q ) − ‰K , q̇ = (r + ‰)q − F ′(K ).
The dynamics of K and q can be studied by a phase diagram, with q on
the vertical axis and K on the horizontal axis. The locus where q̇ = 0
is negatively sloped if F ′′(K ) < 0; the locus where K̇ = 0 is positively
sloped if ‰ > 0. The point where the two meet identifies the steady
state, and the system converges toward it along a negatively sloped
saddlepath.
ANSWERS TO EXERCISES 235
(b) For these functional forms, F ′(K ) = ·, F ′′(K ) = 0, G ′( I ) = 1 + 2b I .
Hence, È(q ) = (q − 1)/(2b), and the dynamic equations are
K̇ =
q − 1
2b
− ‰K , q̇ = (r + ‰)q − ·.
The locus along which capital is constant,
(K̇ = 0) ⇒ q = 1 + 2b‰K ,
is positively sloped if ‰ > 0, while
(q̇ = 0) ⇒ q = ·
r + ‰
identifies a horizontal line: the shadow price of capital, given by
the marginal present discounted (at rate r + ‰) contribution of capital
to the firm’s cash flow, is constant if ∂ 2 F (·)/∂ K 2 = 0, as is the case
here. The saddlepath coincides with the q̇ = 0 locus, on which the
system must stay throughout its convergent trajectory. In steady state,
imposing K̇ = 0, we have
K s s =
1
‰
q − 1
2b
=
· − (r + ‰)
(r + ‰)2b‰
.
The firm’s capital stock is an increasing function of the difference
between · (the marginal revenue product of capital) and r + ‰ (the
financial and depreciation cost of each installed unit of capital). If
· > r + ‰, the steady-state capital stock is finite provided that b‰ > 0.
As the capital stock increases, in fact, an increasingly large investment
flow per unit time is needed to offset depreciation. Since unit gross
investment costs are increasing, in the long run the optimal capital
stock is such that the benefits · − (r + ‰) of an additional unit will
be exactly offset by the higher marginal cost of investment needed to
keep it constant. If · < r + ‰, revenues afforded by capital are smaller
than its opportunity cost, and it is never optimal to invest. If ‰ → 0
(and also if b → 0) the K̇ = 0 is horizontal, like the one where q̇ = 0,
and the steady state is ill-defined: the expression above implies that K s s
tends to infinity if · > r , tends to minus infinity (or zero, in light of
an obvious non-negativity constraint on capital) if · < r , and is not
determined if · = r .
Solution to exercise 15
(a) It must be the case that cash flows are concave with respect to endoge-
nous variables: · > 0, ‚ > 0, · + ‚ ≤ 1, G (·) convex.
(b) The diagram is similar to that of Figure 2.5. Since there is no depre-
ciation, the slope of the K̇ = 0 locus depends on how the capital stock
236 ANSWERS TO EXERCISES
affects the marginal cost of investment: if a given investment flow is less
expensive when more capital is already installed, that is if
∂ 2 G (x, y)
∂ x∂ y
< 0,
then the K̇ = 0 locus is negatively sloped. If it is steeper than the
q̇ = 0 locus, then the system’s dynamics will be globally unstable: when
investing, the firm will reduce the cost of further investment so strongly
as to more than offset the decline of capital’s marginal revenue product.
Dynamics are well-behaved if, instead, the K̇ = 0 schedule meets the
q̇ = 0 from above, in which case a change of Pk relocates the q̇ = 0
schedule and q jumps on the new saddlepath. A higher Pk decreases
K in the new steady state.
(c) As in exercise 10, a quadratic form for G (·) implies that investment
is almost costless when it is very small. This is not realistic, and Pk
represents the market price of capital net of adjustment costs only if
the derivative of adjustment costs is unity at K̇ = 0. Cubic functional
forms are not convex for K̇ < 0, implying that first-order conditions
do not identify an optimum.
(d) As usual, the first-order condition is g ′(K̇ (0) = Î(0). Since R(·) is lin-
early homogeneous and G (·) is independent of K , the shadow price
of capital does not depend on future capital stocks, and is a convex
function of the exogenous wage w:
Î(0) = constant ·
∫ ∞
0
e −r t E 0
[
(w(t ))
‚
‚ − 1
]
.
Larger values of Ó increase the variance of w over the period (from
T to infinity) when it is positive from the standpoint of time 0.
Thus, Jensen’s inequality (as in Figure 2.11) associates a larger Ó with
higher shadow values, and with larger investment flows between t = 0
and t = T .
Solution to exercise 16
(a) Cash flows are given by
F (t ) = ·
√
K (t ) + ‚
√
L (t ) − w L (t ) − I − „
2
I 2,
and the optimality conditions are
1 + „I = Î (marginal investment cost = shadow price of capital),
‚
2
√
L
= w (marginal revenue product of labor = wage),
ANSWERS TO EXERCISES 237
and
Î̇ − r Î = − ·
2
√
K
+ ‰Î.
(Capital gains minus the opportunity cost of funds = depreciation costs
minus the marginal revenue product of capital.)
Hence, dynamics are described by the
K̇ =
Î − 1
„
− ‰K , Î̇ = (r + ‰)Î − ·
2
√
K
.
Graphically, this can be shown as follows:
(b) Both Î̇ = 0 and K̇ = 0 move to the left, as shown in the figure.
In the new steady state the capital stock is unambiguously smaller;
intuitively, a higher marginal product is needed to offset the larger
cost of a higher replacement investment flow. The effect on capital’s
238 ANSWERS TO EXERCISES
shadow price and on the gross investment flow is ambiguous: in the
graph, it depends on the slope of the two curves in the relevant region.
Recall that Î is the present discounted value capital’s contribution to
the firm’s revenues: in the new steady state, the latter is larger but it is
more heavily discounted at rate (r + ‰).
(c) For the functional form proposed, capital’s marginal productivity is
independent of L :
∂Y
∂ K ∂ L
=
∂
∂ L
−·
2
√
K
= 0,
and therefore the cost w of factor L has no implications for the firm’s
investment policy. If instead the mixed second derivative is not zero
then, as in Figure 2.8, capital’s marginal productivity evaluated at the
optimal L ∗(w) employment of factor L is a convex function of w,
implying that variability of w will lead the firm to invest more.
Solution to exercise 17
(a) The dynamic first-order condition is
(r + ‰)q =
d R(·)
d K
1
Pk
+ q̇ ,
or, with (r + ‰) = 0.5 and d R(·)/d K = 1 − K ,
0.5q =
1 − K
Pk
+ q̇ .
The optimality condition for investment flows is G ′( I ) = q . In this
exercise, G ′( I ) = 1 + I . Hence, I = q − 1, and optimal capital dynam-
ics are described by
K̇ = q − 1 − 0.25K ,
or graphically:
ANSWERS TO EXERCISES 239
(b) If the price of capital is halved, the q̇ = 0 schedule rotates clockwise
around its intersection with the horizontal axis, and q jumps onto the
new saddlepath:
(c) From T onwards, the q̇ = 0 locus returns to its original position. (The
combination of the subsidy and higher interest rate is exactly offset in
the user cost of capital, and the marginal revenue product of capital
is unaffected throughout.) Investment is initially lower than in the
previous case: q jumps, but does not reach the saddlepath; its trajectory
reaches and crosses the k̇ = 0 locus, and would diverge if parameters
did not change again at T . At time T the original saddlepath is met,
and the trajectory converges back to its starting point. The farther in
the future is T , the longer-lasting is the investment increase; in the
limit, as T goes to infinity the initial portion of the trajectory tends
to coincide with the saddlepath:
240 ANSWERS TO EXERCISES
Solution to exercise 18
(a) The conditions requested are
K 1/2 N−1/2 = w, 1 + I = Î, −K −1/2 N1/2 + ‰Î = −r Î + Î̇.
(b) From K 1/2 N−1/2 = w, we have N = K /w 2, hence
F (t ) = 2K 1/2 N1/2 − G ( I ) − w N = 2
w
K − G ( I ) − 1
w
K ,
Î(0) =
∫ ∞
0
e −(r +‰)t
∂ F (·)
∂ K (t )
d t =
∫ ∞
0
e −(r +‰)t
1
w(t )
d t.
(c) Î = 1/[r + ‰)w̄] is constant with respect to K . The form of adjustment
costs and of the accumulation constraint imply that I = Î − 1 and that
K̇ = 0 if I = ‰K , that is, if Î = 1 + ‰K as shown in the figure.
(d) One would need to ensure that G (·) is linearly homogeneous in I and
K . For example, one could assume that
G ( I, K ) = I +
1
2K
I 2.
Solution to exercise 19
Denote gross employment variations in period t by �̃Nt : positive values of
�̃Nt represent hiring at the beginning of period t , while negative values
of �̃Nt represent firings at the end of period t − 1. Noting that effective
employment at date t is given by Nt = Nt−1 + �̃Nt − ‰Nt−1, we have �̃Nt =
�Nt + ‰Nt−1 for each t .
ANSWERS TO EXERCISES 241
If turnover costs depend on hiring and layoffs but not on voluntary quits,
we can rewrite the firm’s objective function as
Vt = E t
[ ∞∑
i =0
(
1
1 + r
)i
( R( Zt +i , Nt +i ) − w Nt +i − G (�Nt +i + ‰Nt ))
]
.
Introducing a parameter with the same role as Pk , that is multiplying G (·) by a
constant, influences the magnitude of the hiring and firing costs in relation to
the flow revenue R(·) and the salary wt Nt . Such a constant of proportionality
is not interpretable like the “price” of labor. Each unit of the factor N is in fact
paid a flow wage wt , rather than a stock payment; for this reason, the slope of
the original function G (·) is zero rather than one, as in the preceding chapter.
In the problem we consider here, the wage plays a role similar to that of
user cost of capital in Chapter 2. To formulate these two problems in a similar
fashion, we need to assume that workers can be bought and sold at a unique
price which is equivalent to the present discounted value of future earnings
of each worker. One case in which it is easy to verify the equivalence between
the flow and the stock payments is when the salary, the discount rate, and the
layoff rate are constant: since only a fraction equal to e −(r +‰)(Ù−t ) of the labor
force employed at date t is not yet laid off at date Ù, the present value of the
wage paid to each worker is given by∫ ∞
t
we −(r +‰)(Ù−t )d Ù =
w
r + ‰
.
The role of this quantity is the same as the price of capital Pk in the study
of investments, and, as we mentioned, the wage w coincides with the user
cost of capital (r + ‰) Pk . The formal analogy between investments and the
“purchase” and “sale” of workers—which remains valid if the salary and the
other variables are time-varying—obviously does not have practical relevance
except in the case of slavery.
Solution to exercise 20
To compare these two expressions, remember that
Î̇ = [Î(t + d t ) − Î(t )]/d t ≈ [Î(t + �t ) − Î(t )]/�t
for a finite �t . Assuming �t = 1, we get a discrete-time version of the opti-
mality condition for the case of the Hamiltonian method,
r Ît =
∂ R(·)
∂ K
+ Ît +1 − Ît ,
or alternatively
Ît =
1
1 + r
∂ R(·)
∂ K
+
1
1 + r
Ît +1.
242 ANSWERS TO EXERCISES
This expression is very similar to (3.5). It differs in three aspects that are easy to
interpret. First of all, the operator E t [·] will obviously be redundant in (3.5)
in which by assumption there is no uncertainty. Secondly, the discrete-time
expression applies a discount rate to the marginal cash flow, but this factor
is arbitrarily close to one in continuous time (where d t = 0 would replace
�t = 1). Finally, the two relationships differ also as regards the specification of
the cash flow itself, in that only (3.5) deducts the salary w from the marginal
revenue. This difference occurs because labor is rewarded in flow terms. (The
shadow value of labor therefore does not contain any resale value, as is the case
with capital.)
Solution to exercise 21
If both functions are horizontal lines, the shadow value of labor will not
depend on the employment level. Without loss of generality, we can then write
Ï(N, Zg ) = Zg , Ï(N, Zb ) = Zb ,
and calculate the shadow values in the two possible situations. In the case
considered here, (3.5) implies that
Îg = Ïg − w +
1
1 + r
((1 − p)Îg + pÎb ),
Îb = Ïb − w +
1
1 + r
((1 − p)Îb + pÎg ),
a system of two linear equations in two unknowns whose solution is
Îb =
1 + r
r
(r + p)Ïb + pÏg
r + 2 p
− w, Îg =
1 + r
r
(r + p)Ïg + pÏb
r + 2 p
− w.
These two expressions are simply the expected discounted values of the excess
of productivity (marginal and average) over the wage rate of each worker. In
the absence of hiring and firing costs, the firm will choose either an infinitely
large or a zero employment level, depending on which of the two shadow
values is non-zero. On the contrary, if the costs of hiring and firing are positive,
it is possible that
−F < ÎD < ÎF < H,
and thus that, as a result of (3.6), the firm will find it optimal not to vary
the employment level. If only one marginal productivity is constant, then it
may be optimal for the firm to hire and fire workers in such a way that the
first-order conditions hold with equality:
Ï(Ng , Zg ) = w + p
F
1 + r
ANSWERS TO EXERCISES 243
and
Zb = w − (r + p)
F
1 + r
can be satisfied simultaneously only if the second condition (in which all
variables are exogenous) holds by assumption. In this case, the first condition
can be solved as
Ng =
1
‚
(
Zg − ‚w − p
F
1 + r
)
.
As in many other economic applications, strict concavity of the objective
function is essential to obtain an interior solution.
Solution to exercise 22
Subtracting the two equations in (3.9) term by term yields an expression for
the difference between the two possible marginal productivities of labor:
Ï(Ng , Zg ) − Ï(Nb , Zb ) = (r + 2 p)
H + F
1 + r
.
This expression is valid under the assumption that the firm hires and fires
workers upon every change of the exogenous conditions represented by Zt .
However, H and F can be so large, relative to variations in demand for labor,
that the expression is satisfied only when Nb > Ng , as in the figure.
Such an allocation is clearly not feasible: if Nb > Ng , the firm will need to fire
workers whenever it faces an increase in demand, violating the assumptions
under which we derived (3.9) and the equation above. (In fact, the formal
solution involves the paradoxical cases of “negative firing,” and “negative hir-
ing,” with the receipt rather than the payment of turnover costs!). Hence, the
firm is willing to remain completely inactive, with employment equal to any
244 ANSWERS TO EXERCISES
level within the inaction region in the figure. It is still true that employment
takes only two values, but, these values coincide and they are completely
determined by the initial conditions.
Solution to exercise 23
A trigonometric function, such as sin(·), repeats itself every = 3.1415 . . .
units of time; hence, the Z (Ù) process has a cycle lasting p periods. If p = one
year, the proposed perfectly cyclical behavior of revenues might be a stylized
model of a firm in a seasonal industry, for example a ski resort. If the firm
aims at maximizing its value, then
Vt =
∫ ∞
t
( R(L (Ù), Z (Ù)) − w L (Ù) − C ( Ẋ (Ù)) Ẋ (Ù))e −r (Ù−t )d Ù,
where r > 0 is the rate of discount and R(·) is the given revenue function.
Then with ∂ R(·)/∂ L = M(·) as given in the exercise, optimality requires that
− f ≤
∫ ∞
t
(M(L (Ù), Z (Ù)) − w) e −r (Ù−t )d Ù ≤ h
for all t : as in the model discussed in the chapter, the value of marginal
changes in employment can never be larger than the cost of hiring, or more
negative than the cost of firing. Further, and again in complete analogy to the
discussion in the text, if the firm is hiring or firing, equality must obtain in
that relationship: if Ẋt < 0,
− f =
∫ ∞
t
(M(L (Ù), Z (Ù)) − w) e −r (Ù−t )d Ù, (*)
and if Ẋt > 0, ∫ ∞
t
(M(L (Ù), Z (Ù)) − w) e −r (Ù−t )d Ù = h. (**)
Each complete cycle goes through a segment of time when the firm is hiring
and a segment of time when the firm is firing (unless turnover costs are so
large, relative to the amplitude of labor demand fluctuations, as to make inac-
tion optimal at all times). Within each such interval the optimality equations
hold with equality, and using Leibnitz’s rule to differentiate the relevant inte-
gral with respect to the lower limit of integration yields local Euler equations
in the form
M(L (t ), Z (t )) − w = r C (L̇ (t )).
Inverting the functional form given in the exercise, the level of employment is((
K 1 + K 2 sin
(
2
p
Ù
))
/(w − r f )
)1/‚
ANSWERS TO EXERCISES 245
whenever Ù is such that the firm is firing, and((
K 1 + K 2 sin
(
2
p
Ù
))
/(w + r h)
)1/‚
whenever Ù is such that the firm is hiring. If h + f > 0, however, there must
also be periods when the firm neither hires nor fires: specifically, inaction
must be optimal around both the peaks and troughs of the sine function.
(Otherwise, some labor would be hired and immediately fired, or fired and
immediately hired, and h + f per unit would be paid with no counteracting
benefits in continuous time.) To determine the optimal length of the inaction
period following the hiring period, suppose time t is the last instant in the
hiring period, and denote with T the first time after t that firing is optimal at
that same employment level: then, it must be the case that
L (t ) =
⎛
⎝ K 1 + K 2 sin
(
2
p
t
)
w + r h
⎞
⎠
1/‚
=
⎛
⎝ K 1 + K 2 sin
(
2
p
T ′
)
w − r f
⎞
⎠
1/‚
.
This is one equation in T and t . Another can be obtained inserting the given
functional forms into equations (*) and (**), recognizing that the former
applies at T and the latter at t , and rearranging:∫ T
t
e −r (Ù−t )
[(
K 1 + K 2 sin
(
2
p
Ù
))
(L (t ))−‚ − w
]
d Ù = h + f e −r (T
′−t )
.
The integral can be solved using the formula∫
e Îx sin(„x ) d x =
Îe Îx
„2 + Î2
(
sin(„x ) − „
Î
cos(„x )
)
,
but both the resulting expression and the other relevant equation are highly
nonlinear in t and T ′, which therefore can be determined only numerically.
See Bertola (1992) for a similar discussion of optimality around the cyclical
trough, expressions allowing for labor “depreciation” (costless quits), sample
numerical solutions, and analytical results and qualitative discussion for more
general specifications.
Solution to exercise 24
Denoting by Á(t ) ≡ Z (t )L (t )−‚ labor’s marginal revenue product, the shadow
value of employment (the expected discounted cash flow contribution of a
marginal unit of labor) may be written
Î(t ) =
∫ ∞
t
E t [Á(Ù) − w]e −(r +‰)(Ù−t )d Ù,
246 ANSWERS TO EXERCISES
and, by the usual argument, an optimal employment policy should never let
it exceed zero (since hiring is costless) or fall short of −F (the cost of firing a
unit of labor). Hence, the optimality conditions have the form −F ≤ Î(t ) ≤ 0
for all t , −F = Î(t ) if the firm fires at t , Î(t ) = 0 if the firm hires at t .
In order to make the solution explicit, it is useful to define a function
returning the discounted expectation of future marginal revenue products
along the optimal employment path,
v(Á(t )) ≡
∫ ∞
t
E t [Á(Ù)]e
−(r +‰)(Ù−t )d Ù = Î(t ) +
w
r + ‰
.
This function depends on Á(t ), as written, only if the marginal revenue
product process is Markov in levels. Here this is indeed the case, because in
the absence of hiring or firing we can use the stochastic differentiation rule
introduced in Section 2.7 to establish that, at all times when the firm is neither
hiring nor firing,
d Á(t ) = d [ Z (t )L (t )−‚]
= L (t )−‚d Z (t ) − ‚Z (t )L (t )−‚−1d L (t )
= L (t )−‚[ËZ (t ) d t + ÛZ (t ) d W(t )] + ‚Z (t )L (t )−‚−1‰L (t )
= Á(t )(Ë + ‚‰) d t + Á(t )Û d W(t )
is Markov in levels (a geometric Brownian motion), and we can proceed to
show that optimal hiring and firing depend only on the current level of Á(t ),
hence preserving the Markov character of the process. In fact, we can use
the stochastic differentiation rule again and apply it to the integral in the
definition of v(·) to obtain a differential equation,
(r + ‰)v(Á) = Á +
1
d t
(
∂v(·)
∂Á
E (d Á) +
∂ 2v(·)
∂Á2
(d Á)2
)
= Á +
∂v(·)
∂Á
Á(Ë + ‚‰) +
∂ 2v(·)
∂Á2
Á2Û2,
with solutions in the form
v(Á) =
Á
r − Ë − ‰‚ + K 1Á
·1 + K 2Á
·2 ,
where ·1 and ·2 are the two solutions of the quadratic characteristic equation
(see Section 2.7 for its derivation in a similar context) and K 1, K 2 are con-
stants of integration. These two constants, and the critical levels of the Á(t )
process that trigger hiring and firing, can be determined by inserting the v(·)
function in the two first-order and two smooth-pasting conditions that must
be satisfied at all times when the firm is hiring or firing. (See Section 2.7 for a
definition and interpretation of the smooth-pasting conditions, and Bentolila
ANSWERS TO EXERCISES 247
and Bertola (1990) for further and more detailed derivations and numerical
solutions.)
Solution to exercise 25
It is again useful to consider the case where r = 0, so that (3.16) holds: if H =
−F , and thus H + F = 0, then wages and marginal productivity are equal in
every period, and the optimal hiring and firing policies of the firm coincide
with those that are valid if there are no adjustment costs. The combination
of firing costs and identical hiring subsidies does have an effect when r > 0.
Using the condition H + F = 0 in (3.9), we find that the marginal productiv-
ity of labor in each period is set equal to w + r H/(1 + r ) = w − r F /(1 + r ).
Intuitively, the moment a firm hires a worker, it deducts r H/(1 + r ) from the
flow wage, which is equivalent to the return if it invests the subsidy H in an
alternative asset, and which the firm needs to pay if it decides to fire the worker
at some future time.
If H + F < 0, then turnover generates income rather than costs, and the
optimal solution will degenerate: a firm can earn infinite profits by hiring and
firing infinite amounts of labor in each period.
Solution to exercise 26
Specializing equation (3.15) to the case proposed, we obtain
1
2
( f ( Zg ) + ‚(Ng ) + g ( Zb ) + ‚(Nb )) = w,
or, alternatively,
1
2
(‚(Ng ) + ‚(Nb )) = w − 12 ( f ( Zg ) + g ( Zb )).
The term on the right does not depend on Ng and Nb , and hence is inde-
pendent of the magnitude of the employment fluctuations (which in turn are
determined by the optimal choices of the firm in the presence of hiring and
firing costs). We can therefore write
E[‚(N)] = constant = ‚(E[N]) + Ó,
where, by Jensen’s inequality, Ó is positive if ‚(·) is a convex function, and neg-
ative if ‚(·) is a concave function. In both cases Ó is larger the more N varies.
Combining the last two equations to find the expected value of employment,
we have
E[N] = ‚−1
(
w − 1
2
( f ( Zg ) + g ( Zb ) + 2Ó)
)
,
where ‚−1(·), the inverse of ‚(·), is decreasing. We can therefore conclude that,
if ‚(·) is a convex function, the less pronounced variation of employment
248 ANSWERS TO EXERCISES
when hiring and firing costs are larger is associated with a lower average
employment level. The reverse is true if ‚(·) is concave.
Solution to exercise 27
Since we are not interested in the effects of H , we assume that H = 0. The
optimality conditions
Zg − ‚Ng = w + p
g
1 + r
,
Zb − „Nb = w − (r + p)
g
1 + r
,
imply
Ng =
1
‚
(
Zg − w − p
g
1 + r
)
,
Nb =
1
„
(
Zb − w + (r + p)
g
1 + r
)
,
and thus
Ng + Nb
2
=
1
2‚„
[
„
(
Zg − w − p
F
1 + r
)
+ ‚
(
Zb − w + (r + p)
F
1 + r
)]
=
„Zg + ‚Zb − („ + ‚)w
2‚„
+
‚ − „
2‚„
p F
1 + r
+
‚
2‚„
r F
1 + r
.
The first term on the right-hand side of the last expression denotes the average
employment level if F = 0; the effect of F > 0 is positive in the last term if
r > 0, but since ‚ < „ the second term is negative. As we saw in exercise 21,
the limit case with „ = 0 is not well defined unless the exogenous variables
satisfy a certain condition. It is therefore not possible to analyze the effects of
a variation of g that is not associated with variations in other parameters.
Solution to exercise 28
In (3.17), p determines the speed of convergence of the current value of P
to its long-run value. If p = 0, there is no convergence. (In fact, the initial
conditions remain valid indefinitely.) Writing
Pt +1 = p + (1 − 2 p) Pt ,
we see that the initial distribution is completely irrelevant if p = 0.5; the
probability distribution of each firm is immediately equal to P∞, and also the
frequency distribution of a large group of firms converges immediately to its
long-run stable equivalent.
ANSWERS TO EXERCISES 249
Solution to exercise 29
As in the symmetric case, we consider the variation of the proportion P of
firms in state F :
Pt +1 − Pt = p(1 − Pt ) − q Pt = p − (q + p) Pt = p
(
1 − q + p
p
Pt
)
.
This expression is positive if Pt < p/(q + p), negative if Pt > p/(q + p), and
zero if Pt corresponds to P∞ = p/(q + p), the stable proportion of firms in
state F . Intuitively, if p > q (if the entry rate into the strong state is higher
than the exit rate out of this state), then in the long run the strong state is
more likely than the weak state.
Solution to exercise 30
(a) Marginal productivity of labor is
∂
∂l
F (k, l ; ·) = · − ‚l .
When · = 4 and the firm is hiring, employment is the solution x of
4 − ‚x = 1 + p F
1 + r
;
therefore, with r = F = 1 and p = 0.5, the solution is 11/4‚ = 2.75/‚.
When · = 2 and the firm fires, it employs x such that
2 − ‚x = 1 − ( p + r ) F
1 + r
,
so employment is 7/4‚ = 1.75/‚.
(b) Employment is not affected by capital adjustment for this production
function because it is separable; i.e., the marginal product of (and
demand for) one factor does not depend on the level of the other. The
marginal product of capital is · − „k, so setting it equal to r + ‰ = 2
yields
4 − „k = 2 ⇒ k = 2/„
when ·t = 4 and
2 − „k = 2 ⇒ k = 0
when ·t = 2.
250 ANSWERS TO EXERCISES
Solution to exercise 31
(a) The optimality conditions of the firm, analogous to (3.9), are
Zg − ‚Ng = wg + p
F
1 + r
+ (r + p)
H
1 + r
,
Zb − ‚Nb = wb − (r + p)
F
1 + r
− p H
1 + r
,
from which we obtain
Ng =
(
Zg − wg −
p F + (r + p) H
1 + r
)
1
‚
,
Nb =
(
Zb − wb +
(r + p) F + p H
1 + r
)
1
‚
.
(b) We know that workers are indifferent between moving and staying if
(3.24) holds, that is if, wg − wb = Í(2 p + r )/(1 + r ). Hence, the given
wage differential is an equilibrium phenomenon if the mobility costs
for workers are equal to
Í =
1 + r
2 p + r
�w.
(c) Given that �w = Í(2 p + r )/(1 + r ), and that wF = wb + �w, we have
Ng =
(
Zg − wb − Í
2 p + r
1 + r
− p F + (r + p) H
1 + r
)
1
‚
,
and the full-employment condition 50 Ng + 50Nb = 1000 can therefore
be written
50
(
Zg − wb − Í
2 p + r
1 + r
− p F + (r + p) H
1 + r
)
1
‚
+ 50
(
Zb − wb +
(r + p) F + p H
1 + r
)
1
‚
= 1000.
Hence the wage rate needs to be
wb =
1
2
( Zg + Zb ) − 10‚ −
1
2
Í(2 p + r ) + ( H − F )r
1 + r
.
ANSWERS TO EXERCISES 251
Solution to exercise 32
Denote the optimal employment levels by Nb and Ng . Noting that Î( Zg , Ng ) =
H and Î( Zb , Nb ) = −F , the dynamic optimality conditions are given by
H = ( Zg , Ng ) − w̄ +
1
1 + r
H − F + Î(M,G )
3
,
−L = ( Zb , Nb ) − w̄ +
1
1 + r
H − F + Î(M,B )
3
.
In both cases the shadow value of labor is equal to the current marginal cash
flow plus the expected discounted shadow value in the next period. The latter
is equal to H or to −F in the two cases in which the firm decides to hire or fire
workers; and it will be equal to Î(·) such that it is optimal not to react if labor
demand in the next period takes the mean value. To characterize this shadow
value, consider that if Zt +1 = Z M —so that inactivity is effectively optimal—
then the shadow value Î(M,G ) satisfies
Î(M,G ) = Ï( Z M , Ng ) − w̄ +
1
1 + r
H − F + Î(M,G )
3
if the last action of the firm was to hire a worker, while the shadow value Î(M,B )
satisfies
Î(M,B ) = Ï( Z M , Nb ) − w̄ +
1
1 + r
H − F + Î(M,B )
3
if the last action of the firm was to fire workers. The last four equations can
be solved for Ng , Nb , Î(M,G ), and Î(M,B ). Under the hypothesis that Ï( Z, N)
is linear, we obtain
Nb =
1
‚
(
Zb − w̄ + L +
1
1 + r
Z M − Zb + H − 2 F
3
)
,
Ng =
1
‚
(
Zg − w̄ − A +
1
1 + r
Z M − Zg + 2 H − F
3
)
,
and the solutions for the two shadow values, which need to satisfy
−F < Î(M,B ) < H, −F < Î(M,G ) < H
if, as we assumed, the parameters are such that it is optimal for the firm not to
react if the realization of labor demand is at the intermediate value.
Solution to exercise 33
Since k̇ = s f (k) − ‰k,
Ẏ
Y
=
f ′(k)k̇
f (k)
= f ′(k)
(
s − ‰ k
f (k)
)
.
252 ANSWERS TO EXERCISES
The condition limk→∞ f ′(k) > 0 is no longer sufficient to allow a positive
growth rate: also, the limit of the second term, which defines the propor-
tional growth rate of output, needs to be strictly positive. This is the case if
‰ limk→∞(k/ f (k)) < s . If both capital and output grow indefinitely, the limit
required is a ratio between two infinitely large quantities. Provided that the
limit is well defined, it can be calculated, by l’Hôpital’s rule, as the ratio of the
limits of the numerator’s derivative—which is unity—and of the denomina-
tor’s derivative—which is f ′(k), and tends to b. Hence, for positive growth in
the limit is necessary that
lim
k→∞
f ′(k) = b >
‰
s
> 0.
When a fraction s of income is saved and capital depreciates at rate ‰, we get
lim
k→∞
Ẏ
Y
= b
(
s − ‰
b
)
= bs − ‰.
Solution to exercise 34
If Î ≤ 0, capital and labor cannot be substituted easily: no output can be
produced without an input of L . In fact, the equation that defines factor
combinations yielding a given output level,
Ȳ = (·K Î + (1 − ·)L Î)1/Î,
allows Ȳ > 0 for L = 0 only if Î > 0. In that case, the accumulation of capital
can sustain indefinite growth of the economy: the non-accumulated factor L
may substitute capital, but output can continue to grow even if the ratio L /K
tends to zero.
These particular examples both assume that ‰ = g = 0, and we know
already that indefinite growth is feasible if the marginal product of capital has
a strictly positive limit. If Î = 1, the production function is linear, i.e.
F (K , L ) = ·K + ·L , f (k) − ·k(1 − ·),
and the requested growth rates are
ẏ
y
=
·k̇
y
= ·s ,
k̇
k
= s
·k + (1 − ·)
k
= s · +
1 − ·
k
.
The growth rate of output equals ·s , which is constant if agents consume a
constant fraction s of income. Capital, on the other hand, grows at a decreas-
ing rate which approaches the same value ·s only asymptotically.
The case in which · = 1 is even simpler: since y = k, the growth rate of both
capital and output is always equal to s .
ANSWERS TO EXERCISES 253
Solution to exercise 35
As in the main text, we continue to assume that the welfare of an individual
depends on per capita consumption, c (t ) ≡ C (t )/N(t ). However, when the
population grows at rate g N we need to consider the welfare of a representative
household rather than that of a representative individual. If welfare is given by
the sum of the utility function of the N(t ) = N(0)e g N t individuals alive at date
t , objective function (4.10) becomes
U ′ =
∫ ∞
0
u(c (t ))N(t )e −Òt d t =
∫ ∞
0
u(c (t ))N(0)e −Ò
′t d t,
where Ò′ ≡ Ò − g N : a higher growth rate of the population reduces the impa-
tience of the representative agent. With g A = 0, and normalizing A(t ) =
A(0) = 1, the law of motion for per capita capital k(t ) is
d
d t
K (t )
N(t )
=
Y (t ) − C (t ) − ‰K (t )
N(t )
− K (t )Ṅ(t )
N(t )2
= f (k(t )) − c (t ) − (‰ + g N )k(t ).
The first-order conditions associated with the Hamiltonian are
H (t ) = [u(c (t )) + Î(t )( f (k(t )) − c (t ) − (‰ + g N )k(t ))]e −Ò
′t
.
Using similar techniques as in the main text, we obtain
ċ =
(
u′(c )
−u′′(c )
)
( f ′(k) − (‰ + g N ) − Ò′) =
(
u′(c )
−u′′(c )
)
( f ′(k) − ‰ − Ò).
The dynamics of the system are similar to those studied in the main text, and
tend to a steady state where
f ′(ks s ) = Ò + ‰, 0 = f (ks s ) − c s s − (‰ + g N )ks s .
The capital stock does not maximize per capita consumption in the steady
state: in each possible steady state k̇ = 0 needs to be satisfied; that is,
c s s = f (ks s ) − (‰ + g N )ks s .
The second derivative of the right-hand expression is f ′′(·) < 0. The maxi-
mum of the steady state per capita stock of capital is therefore obtained at a
value k∗ at which the first derivative is equal to zero so that
f ′(k∗) = ‰ + g N .
Hence f ′(ks s ) > f ′(k∗) if g N < Ò, which is a necessary condition to have
Ò′ > 0 and to have a well defined optimization problem. From this, and from
the fact that f ′′(·) < 0, we have k∗ > ks s . The economy evolves not toward
the capital stock that maximizes per capita consumption (the so-called golden
rule), but to a steady state with a lower consumption level. In fact, given that
254 ANSWERS TO EXERCISES
the economy needs an indefinite time period to reach the steady state, it would
make sense to maximize consumption only if Ò′ were equal to zero, that is, if
a delay of consumption to the future were not costly in itself. On the other
hand, when agents have a positive rate of time preference, which is needed
for the problem to be meaningful, then the optimal path is characterized by a
higher level of consumption in the immediate future and a convergence to a
steady state with ks s < k
∗.
Solution to exercise 36
Denote the length of a period by �t (which was normalized to one in Chap-
ter 1), and refer to time via a subscript rather than an argument between
parentheses: let rt denote the interest rate per time period (for instance on an
annual basis) valid in the period between t and t + �t ; moreover, let yt and
c t denote the flows of income and consumption in the same period but again
measured on an annual basis. Finally let At be the wealth at the beginning of
the period [t, t + �t ]. Hence, we have the discrete-time budget constraint
At +�t =
(
1 + rt
�t
n
)n
At + ( yt − c t )�t.
Interest payments are made in each of the n subperiods of �t . Moreover,
in each of the subperiods of length �t/n, an amount rt �t/n of interest is
received which immediately starts to earn interest. If n tends to infinity,
lim
n→∞
(
1 +
rt �t
n
)n
= e rt �t .
Therefore
At +�t = e
rt �t At + ( yt − c t )�t.
Rewriting the first-order condition in discrete time denoting the length of the
discrete period by �t > 0, we have
u′(c t ) =
(
1 + r
1 + Ò
)�t
u′(c t +�t ).
Recognizing that (1 + r )�t ≈ e (s −t )�t and imposing s = t + �t , we get
u′(c t ) = e
r (s −t )e −Ò(s −t )u′(c s ).
We can rewrite this expression as
u′(c t )
e −Ò(s −t )u′(c s )
= e r (s −t ),
which equates the marginal rate of substitution, the left-hand side of the
expression, to the marginal rate of substitution between the resources available
ANSWERS TO EXERCISES 255
at times t and s . Isolating any two periods, we obtain the familiar conditions
for the optimality of consumption and savings, that is the equality between the
slope of the indifference curve and of the budget restriction. In continuous
time, this condition needs to be satisfied for any t and s : hence, along the
optimal consumption path we have (differentiating with respect to s )
− u
′(c t )
(e −Ò(s −t )u′(c s ))2
(
e −Ò(s −t )
d u′(c s )
d s
− Òe −Ò(s −t )u′(c s )
)
= r e r (s −t ).
In the limit, with s → t , we get
− 1
u′(c t )
(
d u′(c t )
d t
− Òu′(c t )
)
= r,
or, equivalently,
−
(
d u′(c t )
d t
)
= (r − Ò)u′(c t ).
Given that the marginal utility of consumption u′(c t ) equals the shadow value
of wealth Ît , this relation corresponds to the Hamiltonian conditions for
dynamic optimality. Differentiating with respect to �t and letting �t tend
to 0, we get
d c t
d t
=
(
− u
′(c t )
u′′(c t )
)
(r − Ò).
In the presence of a variation of the interest rate r (or, more precisely, in the
differential r − Ò), the consumer changes the intertemporal path of her con-
sumption by an amount equal to the (positive) quantity in large parentheses:
this is the reciprocal of the well-known Arrow–Pratt measure of absolute risk
aversion. As we noted in Chapter 1, the more concave the utility function,
the less willing the consumer will be to alter the intertemporal pattern of
consumption. With regard to the cumulative budget constraint, we can write
At +�t − At
�t
=
(e rt �t − 1)
�t
At + ( yt − c t )
and evaluate the limit of this expression for �t → 0:
lim
�t→0
At +�t − At
�t
= lim
�t→0
(e rt �t − 1)
�t
At + ( yt − c t ).
On the left we have the definition of the derivative of At with respect to time.
Since both the denominator and the numerator in the first term on the right
are zero in �t = 0, we need to apply l’Hôpital’s rule to evaluate this limit. This
gives
d
d t
At = lim
�t→0
(rt e
rt �t )
1
At + ( yt − c t )
256 ANSWERS TO EXERCISES
or, in the notation in continuous time adopted in this chapter,
Ȧ(t ) = r (t ) A(t ) + y(t ) − c (t ),
which is a constraint, in flow terms, that needs to be satisfied for each t . This
law of motion for wealth relates A(t ), r (t ), c (t ), y(t ) which are all functions
of the continuous variable t . The summation of (??) obviously corresponds
to an integral in continuous time. Suppose for simplicity that the interest
rate is constant, i.e. r (t ) = r for each t , and multiply both terms in the above
expression by e −r t ; we then get
e −r t Ȧ(t ) − r e −r t A(t ) = e −r t ( y(t ) − c (t )).
Since the term on the left-hand side is the derivative of the product of e −r t and
A(t ), we can write
d
d t
(e −r t A(t )) = e −r t ( y(t ) − c (t )).
It is therefore easy to evaluate the integral of the term on the left:∫ t
0
d
d t
(e −r t A(t )) d t = [e −r t A(t )]T0 = e
−r T A(T ) − A(0).
Equating this to the integral of the term on the right, we get
e −r T A(T ) = A(0) +
∫ T
0
e −r t ( y(t ) − c (t )) d t. (5.A1)
If we let T tend to infinity, and if we impose the continuous-time version of
the no-Ponzi-game condition (1.3), i.e.
lim
T →∞
e −r T A(T ) = 0,
we finally arrive at the budget condition for an infinitely lived consumer who
takes consumption and savings decisions in each infinitesimally small time
period: ∫ ∞
0
e −r t c (t ) = A(0) +
∫ ∞
0
e −r t y(t ) d t.
Solution to exercise 37
If K̇ /K = Ȧ/ A + Ṅ/N = L̇ /L , then k ≡ K /L is constant. The rate r at which
capital is remunerated is given by
∂ F (K , L )
∂ K
=
∂[L F (K /L , 1)]
∂ K
= f ′(K /L ),
and is constant if K and L grow at the same rate. Moreover, because of
constant returns to scale, production grows at the same rate as K (and L ), and
ANSWERS TO EXERCISES 257
the income share of capital r K /Y is thus constant along a balanced growth
path, even if the production function is not Cobb–Douglas.
Solution to exercise 38
The production function
F (K , L ) = (·K Î + (1 − ·)L Î)1/Î
exhibits constant returns to scale and the marginal productivity of capital has
a strictly positive limit if Î > 0, as we saw on page 150. The income share of
labor L is given by
∂ F (K , L )
∂ L
L
F (K , L )
=
[·K Î + (1 − ·)L Î](1−Î)/Î(1 − ·)L Î−1 L
[·K Î + (1 − ·)L Î]1/Î
= [·K Î + (1 − ·)L Î]−1(1 − ·)L Î
=
[
·
(
K
L
)Î
+ (1 − ·)L Î
]−1
(1 − ·), (5.A2)
which tends to zero with the growth of K /L if Î > 0.
Solution to exercise 39
In terms of actual parameters, the Solow residual may be expressed as
Ȧ
A
+ Ï·
(
Ṅ
N
− K̇
K
)
+ (· + ‚ − 1) K̇
K
.
This measure may therefore be an overestimate or an underestimate of “true”
technological progress.
Solution to exercise 40
The return on savings and investments is
r =
∂ F (K , L )
K
= ·K ·−1 L 1−·.
Hence, recognizing that A = a K /N, so that L = N A = a K ,
r = ·K ·−1 K 1−·a 1−· = ·a 1−·,
which does not depend on K and thus remains constant during the process
of accumulation. If this r is above the discount rate of utility Ò, the rate of
258 ANSWERS TO EXERCISES
aggregate consumption growth is
Ċ
C
=
·a 1−· − Ò
Û
,
where, as usual, Û denotes the elasticity of marginal utility. Since A(·)N/K is
constant and production,
F (K , L ) = K · N1−· A1−· = K · N1−·(a K /N)1−· = K a 1−·,
is proportional to K , the economy moves immediately (and not just in the
limit) to a balanced growth path.
Solution to exercise 41
Since the production function needs to have constant returns to K and L ,
it must be the case that ‚ = 1 − ·; moreover, since the returns need to be
constant with respect to K and G , we need to have „ = 1 − ·. Hence, writing
F̃ (K , L , G ) = K · L 1−· G 1−·,
and substituting fiscal policy parameters from (4.34) we get
G = ÙK · L 1−· G 1−· ⇒ G = (ÙK · L 1−·)1/· = (ÙL 1−·)1/· K .
Given that G and K are proportional, the net return on private savings is
constant:
(1 − Ù) ∂ F̃ (K , L , G )
∂ K
= (1 − Ù)·K ·−1 L 1−· G 1−·
= (1 − Ù)·
(
G
K
)1−·
L 1−·
= (1 − Ù)·(ÙL 1−·)1/· L 1−·.
The growth rate of consumption, which can be obtained by substituting the
above expression into (4.35), and that of capital and aggregate production are
also constant.
Solution to exercise 42
(a) We know that along the optimum path of consumption the following
Euler condition holds:
−u′′(C )Ċ = ( F ′(K ) − Ò)u′(C ),
which is necessary and sufficient if u′′(C ) < 0, F ′′(K ) ≤ 0. These regu-
larity conditions are satisfied if, respectively, C < ‚ and K < ·. In this
ANSWERS TO EXERCISES 259
case the derivatives are given by u′(C ) = ‚ − C, u′′(C ) = −1, F ′(K ) =
· − K , and we can write
Ċ = (· − K − Ò)(‚ − C ).
(b) In the steady state,
(Ċ = 0) ⇐⇒ ((· − K s s − Ò)(‚ − Cs s ) = 0).
If Cs s < ‚ and K s s < · then necessarily K s s = · − Ò (or, as is usual,
F ′(K s s ) = Ò). Since
(K̇ = 0) ⇐⇒ ( ys s = F (K s s ) = Cs s ),
we have
Ys s = Cs s = F (K s s ) = ·(· − Ò) − 12 (· − Ò)2 = 12 (·2 − Ò2).
For all this to be valid, the parameters need to be such that K s s < ·,
which is true if Ò > 0 and Cs s < ‚, which in turn requires (·
2 − Ò2) <
2‚.
In the diagram, optimal consumption can never be in the region where
C > ‚, since this would provide the same flow utility as C = ‚. If
K > ·, it is optimal to consume the surplus as soon as possible, given
that production is independent of K in this region. Hence, the flow
consumption needs to be set equal to the maximum utility, C = ‚. If
that implies that K̇ < 0, then the system moves to the region studied
260 ANSWERS TO EXERCISES
above. But if the parameters do not satisfy the above conditions, then
consumption may remain the same at ‚ with capital above · forever.
In this case the maximization problem does not have economic signifi-
cance. (There is no scarcity.)
(c) Writing
Ċ = (· − K − Ò)‚ − (· − K − Ò)C,
we see that ‚ determines the speed of convergence towards the steady
state for given C and K , that is (so to speak) the strength of vertical
arrows drawn in the phase diagram, and the slope of the saddlepath.
(d) If returns to scale were decreasing in the only production factor, then,
setting F ′(K ) = r (as in a competitive economy), total income r K
would be less than production, F (K ). Thus, an additional factor must
implicitly be present, and must earn income F (K ) − F ′(K )K . For the
functional form proposed in this exercise, we have
F (ÎK , ÎL ) = ·ÎK − g (ÎL )Î2 K 2, ÎF (K , L ) = ·ÎK − g (L )ÎK 2,
hence returns to scale are constant if g (ÎL )Î = g (L ), i.e. if g (x ) = Ï/x
for Ï a constant (larger than zero, to ensure that L has positive produc-
tivity). Setting Ï = 1 and L = 2, production depends on capital accord-
ing to the functional form proposed in the exercise, and the solution
can be interpreted as the optimal path followed by a competitive market
economy.
Solution to exercise 43
(a) For the production function proposed,
Ẏ (t ) =
(
1
L + K (t )
)
K̇ (t ) =
(
1
L + K (t )
)
s Y (t ),
and the proportional growth rate of income tends to s /L > 0. Since
consumption is proportional to income, consumption can also grow
without limit.
(b) The returns to scale of this production function are non-constant:
ln(ÎL + ÎK ) = ln Î + ln(L + K ) �= Î ln(L + K ).
If both factors were compensated according to their marginal produc-
tivity, total costs would be equal to(
1
L + K
)
L +
(
1
L + K
)
K = 1,
ANSWERS TO EXERCISES 261
while the value of output may be above one (in which case there will
be pure profits) or below one (in which case profits are negative if
L + K < 1). Hence, this function is inadequate to represent an econ-
omy in which output decisions are decentralized to competitive firms.
Solution to exercise 44
(a) The returns to scale are constant. Each unit of L earns a flow income
w(t ) =
∂Y (t )
∂ L
= 1 + (1 − ·)
(
K (t )
L
)·
,
and each unit of K earns
r (t ) =
∂Y (t )
∂ K (t )
= ·
(
K (t )
L
)·−1
.
(b) From the optimality conditions associated with the Hamiltonian, we
obtain
Ċ (t )
C (t )
=
r (t ) − Ò
Û
.
Hence, if consumers have the same constant elasticity utility function,
the growth rate will not depend on the distribution of consumption
levels. Moreover, the growth rate increases with the difference between
the interest rate and the rate of time preference and is higher if agents
are more inclined to intertemporal substitution (a low Û).
(c) Production starts from L for K = 0, is an increasing and concave func-
tion of K , and coincides with the locus along which K̇ = 0. The locus
where Ċ = 0 is vertical above K s s , such that
r = f ′(K s s ) = Ò ⇒ K s s =
(
·
Ò
)1/1−·
L .
The saddlepath converges in the usual way to the steady state, where
Cs s = L + L
1−· K ·s s = L +
(
·
Ò
)·/1−·
L .
(d) The return on investments is constant and equal to one, and so aggre-
gate consumption grows at a constant rate. However, the income share
of capital is growing and approaches one asymptotically. Except in
the long run, when labor’s income share is zero, the growth rate of
production is therefore not constant and we do not have a balanced
growth path.
262 ANSWERS TO EXERCISES
Solution to exercise 45
(a) Calculating the total derivative, and using K̇ = s Y and equation Ȧ, we
get
Ẏ
Y
=
Ȧ
A
+ ·
K̇
K
= L − L Y + ·s
Y
K
.
Hence, when the growth rates are constant, Y /K needs to be constant
and
Ẏ
Y
=
K̇
K
=
L − L Y
1 − · .
(b) The growth rate of the economy does not depend on s (which deter-
mines Y /K ) but is instead endogenously determined by the allocation
of resources to the sector in which A can be reproduced with constant
returns to scale. A can be interpreted as a stock of knowledge (or
instructions), produced in a research and development sector.
(c) The sector that produces material goods has increasing returns in the
three factors; thus, no decentralized production structure could com-
pensate all three factors according to their marginal productivity.
Solution to exercise 46
(a)
F (ÎK , ÎL ) = [(ÎK )„ + (ÎL )„]1/„ = [΄(K „ + L „)]1/„
= Î(K „ + L „)1/„ = ÎF (K , L ).
(b)
y =
1
L
F (K , L ) = [L −„(K „ + L „)]1/„ =
[(
K
L
)„
+ 1„
]1/„
= (k„ + 1)1/„ ≡ f (k).
(c)
f ′(k) = (k„ + 1)(1−„)/„k„−1 = [(k„ + 1)k−„](1−„)/„ = (1 + k−„)(1−„)/„.
Taking the required limit,
lim
x→∞
(1 + k−„)(1−„)/„ =
(
1 + lim
x→∞
k−„
)(1−„)/„
.
If „ < 0, then k−„ tends to infinity and the exponent (1 − „)/„ is
negative; thus, f ′(k) tends to zero and r = f ′(k) − ‰ tends to −‰. If
„ > 0 then k−„ tends to zero, and in the limit unity is raised to the
power of (1 − „)/„ > 0. Hence, f ′(k) tends to unity, and r = f ′(k) − ‰
tends to 1 − ‰.
ANSWERS TO EXERCISES 263
(d) The economy converges to a steady state if limk→∞ k̇(t ) = 0. That is
(given that a constant fraction of income is dedicated to accumulation),
the economy converges to a steady state if net output tends to zero.
(e) For a logarithmic utility function the growth rate of consumption is
given by the difference between the net return on savings and the
discount rate of future utility: Ċ (t )/C (t ) = r (t ) − Ò. In order to have
perpetual endogenous growth, this rate needs to have a positive limit
if k approaches infinity: limk→∞ r (t ) = 1 − ‰ if „ > 0; in addition,
1 − ‰ − Ò > 0 or equivalently Ò < 1 − ‰ must hold. (Naturally, Ò needs
to be positive, otherwise the optimization problem does not have eco-
nomic significance.)
Solution to exercise 47
(a) Since capital has a constant price and does not depreciate, there does
not exist a steady state in levels: in fact, no positive value of K (t ) makes
K̇ (t ) = P̄ s Y (t ) = P̄ s K (t )· L (t )‚
equal to zero. If · = 1 a balanced growth path exists, where
K̇ (t )
K (t )
=
Ẏ (t )
Y (t )
= P̄ s L (t )‚.
The economy can be decentralized if the production function has con-
stant returns to scale, that is if · + ‚ = 1.
(b) The proportional growth rate of capital is
K̇ (t )
K (t )
= s P (t )
Y (t )
K (t )
= s P (t )K (t )·−1 L̄ ‚.
Hence K̇ (t )/K (t ) = g k is constant if
Ṗ (t )
P (t )
+ (· − 1) K̇ (t )
K (t )
= 0.
The balanced growth rate of the stock of capital is
g k =
1
1 − ·
Ṗ (t )
P (t )
=
h
1 − · ,
and the constant growth rate of output is given by
Ẏ (t )
Y (t )
= ·
K̇ (t )
K (t )
=
·
1 − · h.
(c) If P (t ) = K (t )1−·, the accumulation of capital is governed by
K̇ (t ) = K (t )1−·s Y (t ) = K (t )1−·s K (t )· L̄ ‚ = K (t )s L̄ ‚.
264 ANSWERS TO EXERCISES
Hence K̇ (t )/K (t ) = s L̄ ‚ is constant (and depends endogenously on the
savings rate, s ).
(d) P (t ) is the price of output (of savings) in terms of units of capital: if
P (t ) increases, a given flow of savings can be used to buy more units
of capital. In part (b) this increase is exogenous and, like the dynamics
of A in the Solow model, allows for perpetual growth even in the case
of decreasing marginal returns to capital. In part (c) the price of invest-
ment goods depends endogenously on the accumulation of capital: as
in models of learning by doing, this can be interpreted as assuming that
investment is more productive if the economy is endowed with a larger
capital stock.
Solution to exercise 48
(a) Inserting u′(c ) = 1/c 2, u′′(c ) = −2/c 3, and Ò = 1 in the Euler equation,
we obtain
ċ (t ) =
r − 1
2
c (t ).
In words, the growth rate of consumption is independent of its level
(since the utility function has CRRA form).
(b) w(t ) = B (t ), r (t ) = 3.
(c) The proportional rate of growth of capital is
K̇ (t )
K (t )
=
B (t )
K (t )
L + 3 − C (t )
K (t )
,
and it is constant if C (t )/K (t ) and B (t )/K (t ) are constant and capital
grows at the same rate as consumption and B (t ). In fact, if r = 3, con-
sumption does grow at the same rate as B (t ): ċ (t )/c (t ) = Ċ (t )/C (t ) =
1 = Ḃ (t )/B (t ). The level of consumption can be such as to ensure also
that K̇ (t )/K (t ) = 1:
1 =
B (t )
K (t )
L + 3 − C (t )
K (t )
⇔ C (t ) = B (t )L + 2K (t ).
(d) The aggregate production function is Y (t ) = (L + 3)K (t ). Hence, the
return to capital is L + 3 > 3 = r for the aggregate economy, and it
would be optimal for growth to proceed at rate
Ċ (t )
C (t )
=
L + 3 − 1
2
= L + 1.
ANSWERS TO EXERCISES 265
Solution to exercise 49
From the third row of the table of expected utilities in the text, we can easily
see that V
p
M > V
p
C , since x < 1; then, the commodity holder is willing to trade
with a money holder. To check that the latter is also willing to trade, we must
show that U − (V pM − V
p
C ) > 0, since after the exchange the agent initially
endowed with money enjoys utility from consumption but becomes a com-
modity holder. Using the appropriate entries of the table and the definition of
K given in the text, we have
U − (V pM − V
p
C ) = U −
K
r + ‚x
r (1 − x )
= U
(
1 − ‚x (1 − M)(1 − x )
r + ‚x
)
.
This expression is positive, because the fraction in the large parentheses is
less than unity. Hence, a money holder is willing to exchange money for a
commodity she can consume.
Solution to exercise 50
Consider a discrete time interval �t , from t = 0 to t = t1, during which Ë is
constant and therefore J̇ = V̇ = 0. Retracing the argument of Section 5.1,
suppose that, when a firm and a worker experience a separation event, the
resulting vacant job is not filled again during such an interval. (This assump-
tion is valid, of course, in the �t → 0 limit.) The value of a filled job at the
beginning of the interval is thus given by
J =
∫ t1
0
e −s t e −r t ( y − w) d t + e −r �t [e −s �t J + (1 − e −s �t )V ]. (0.1)
The first term on the right-hand side denotes the expected production flow
during the time interval, net of the wage paid to the worker, discounted back to
t = 0. (e −s t represents the probability that the job is still filled and productive
at time t .) The second term represents the (discounted) expected value of the
job at t = t1, the end of the interval. (If a separation occurs, with probability
1 − e −s �t , the job becomes a vacancy, valued at V .) Solving the integral yields
J =
1
r + s
( y − w) + e
−r �t (1 − e −s �t )
1 − e −(r +s )�t V.
The limit as �t → 0 of the second term is s /(r + s ), by l’Hôpital’s rule. Thus,
J =
1
r + s
( y − w) + s
r + s
V ⇒ r J = ( y − w) + s (V − J ).
266 ANSWERS TO EXERCISES
Solution to exercise 51
Totally differentiating (5.45) and (5.46) around a steady state equilibrium
point, we obtain(
1 −(r + s )c q ′
q 2
1 −‚c
)(
dw
d Ë
)
=
(− c
q
1
0 ‚
)(
d s
d y
)
,
and we can compute the following effects of an aggregate productivity shock
(recall that q ′ < 0):
dw
d y
=
−︷ ︸︸ ︷
−‚c + ‚(r + s )c q
′
q 2
�
> 0,
d Ë
d y
=
−︷︸︸︷
‚ − 1
�
> 0,
where the determinant
� = −‚c + (r + s )c q
′
q 2
is negative. An aggregate productivity shock moves the equilibrium wage
in the same direction but, barring extreme cases, by a smaller amount
(0 ≤ dw/d y ≤ 1); it also induces a change in the same direction of
labor market tightness. (Lower productivity is associated with a lower
vacancy/unemployment ratio in the new steady state.) As ‚ → 1 (and all the
matching surplus is captured by workers), different productivity levels affect
only the wage, with no effect on Ë. For intermediate values of ‚, the effects of
a negative productivity shock are shown in parts (a) and (b) of the figure. The
effects on unemployment and vacancy rates are uniquely determined, since
productivity variations do not affect the Beveridge curve and only cause the
equilibrium point to move along it.
(a) (b)
ANSWERS TO EXERCISES 267
For the case of a reallocative shock, we have
dw
d s
=
+︷︸︸︷
‚
c 2
q
�
< 0,
d Ë
d s
=
+︷︸︸︷
c /q
�
< 0(∗).
An increase in s leads to a reduction in both the equilibrium wage and the
labor market tightness through a leftward shift of the curve J C (which is more
pronounced for higher values of ‚). However, at the same time the Beveridge
curve shifts to the right, and the effect of s on the number of vacancies v is
therefore ambiguous, while the unemployment rate increases with certainty—
see parts (c) and (d) of the figure. Totally differentiating the expression for the
Beveridge curve and using the result obtained above for Ë, we obtain (with
p′ > 0 and 0 ≤ Á ≡ p′Ë/ p ≤ 1)
d u =
p
(s + p)2
d s − s p
′
(s + p)2
d Ë
⇒ d u
d s
=
p
(s + p)2
(
1 − Ás
(r + s )(Á − 1) − ‚p
)
> 0(∗∗).
(The denominator of the last fraction is negative.) Moreover, using the defin-
ition for Ë, we can express the effect on the mass of vacancies as
dv
d s
= u
d Ë
d s
+ Ë
d u
d s
,
which, together with (∗) and (∗∗), yields
dv
d s
=
s
s + p
Ë
(r + s )(Á − 1) − ‚p + Ë
p
(s + p)2
(
1 − Ás
(r + s )(Á − 1) − ‚p
)
=
Ë
(s + p)2
s 2 + pr (Á − 1) − ‚p
(r + s )(Á − 1) − ‚p .
From these equations one can deduce that, starting from a relatively low level
of s , a reallocative shock will lead to an increase in the mass of vacancies.
(c) (d)
268 ANSWERS TO EXERCISES
Solution to exercise 52
Totally differentiating the payoff function with respect to ē yields
d V (ei (ē ), ē )
d ē
= V1(ei , ē )
d ei
d ē
+ V2(ei , ē ) = V2(ei , ē ) > 0,
since V1(.) = 0 in an optimum. (This is an application of the envelope
theorem.)
Solution to exercise 53
There is a positive externality between the actions of agents, since
∂ V 1
∂e 2
= ·e ·1 e
·−1
2 > 0.
The reaction function of agent 1 is obtained by maximizing this agents’s
payoff:
max
e1
V 1(e 1, e 2) ⇒ ·e ·−11 e ·2 − 1 = 0 ⇒ e 1 = ·1/1−·e ·/1−·2 ,
from which we obtain
d e 1
d e 2
=
·
1 − · ·
1/1−·e 2·−1/1−·2 > 0.
Hence, there is a strategic complementarity between the actions of agents.
The symmetric decentralized equilibria are obtained by combining the two
(identical) reaction functions with e 1 = e 2 = e . There are two equilibria: one
with zero activity (e 1 = 0) and one with a positive activity level (¯̄e = ·1/1−2·).
The cooperative equilibrium (e ∗) is obtained by maximizing the payoff of the
representative agent with respect to the common activity level e :
max
e
V (e, e ) = e 2· − e ⇒ 2·(e ∗)2·−1 − 1 = 0 ⇒ e ∗ = (2·)1/1−2·.
The fact that e ∗ > ¯̄e confirms that the decentralized equilibria are inefficient
in the presence of externalities.
Solution to exercise 54
(a) Given the assumptions, the expression for the dynamics of e is given by
ė = (1 − e )a c ∗ − e 2b,
from which, setting ė = 0, an expression for the locus of stationary
points is obtained:
ė = 0 ⇒ c ∗ = e
2b
(1 − e )a ⇒
d c ∗
d e
∣∣∣∣
ė =0
=
2e b + a c ∗
(1 − e )a > 0,
d 2c ∗
d e 2
∣∣∣∣
ė =0
> 0.
ANSWERS TO EXERCISES 269
The production cost c has an upper limit equal to 1. Hence if c ∗ exceeds
this upper limit there is no need for a further increase in e to maintain a
constant level of employment, because for c ∗ ≥ 1 all production oppor-
tunities are accepted. The locus of stationary points is thus vertical for
c ∗ > 1. The dynamic expression for c ∗ is given by
ċ ∗ = r c ∗ − be ( y − c ∗) + a c
∗2
2
.
Assuming ċ ∗ = 0, one obtains a (quadratic) expression in c ∗. This
expression is drawn in the figure and along this curve:
d c ∗
d e
∣∣∣∣
ċ ∗=0
=
b( y − c ∗)
a c ∗ + r + be
> 0e
d 2c ∗
d e 2
∣∣∣∣
ċ ∗=0
= − b
2( y − c ∗)
(a c ∗ + r + be )2
< 0.
(b) There are two possible equilibria: E 0, in which c
∗ = e = 0, and E 1,
which is the only equilibrium with a positive activity level. The sta-
bility properties of E 1 can be studied by linearizing the two difference
equations around the equilibrium point (e 1, c
∗
1 ) and by determining
the sign of the determinant of the resulting matrix:(
−(2e 1b + a c ∗1 ) (1 − e 1)a
−b( y − c ∗1 ) a c ∗1 + r + be 1
)
.
If the determinant is negative, the equilibrium has the nature of a
saddlepoint. As in the general case discussed in the main text, this occurs
if in equilibrium the curve ė = 0 is steeper than the curve ċ ∗ = 0, as is
the case in E 1.
Solution to exercise 55
(a) Restricting our attention to symmetric and stationary equilibria (in
which = � and VC and VM are constant over time), with c > 0,
270 ANSWERS TO EXERCISES
the values of expected utility from holding a commodity and holding
money are:
VC =
1
1 + r
{(1 − ‚)VC + ‚[(1 − M)x 2U + (1 − M�x )VC
+ M�x (VM − c )]}
VM =
1
1 + r
{(1 − ‚)(VM − c ) + ‚[(1 − M)�x (U + VC )
+ (1 − (1 − M)�x )(VM − c )]},
where c is subtracted from VM whenever the agent ends the period with
money. Using the two equations above, we get
VC − VM =
‚(1 − M)xU (x − �) + (1 − ‚�x )c
r + ‚�x
.
Setting VC = VM , we find the value of �, dubbed �
M , that makes
agents indifferent between holding commodities and money:
�
M =
‚(1 − M)x 2U + c
‚(1 − M) xU + ‚x c = x +
(1 − ‚x 2) c
‚(1 − M) xU + ‚x c > x.
To make agents indifferent, money must be more acceptable than com-
modities: �M > x . The greater acceptability of money compensates
money holders for the storage cost they incur in the event of their
ending the period still holding money. The agents’ optimal strategy and
the corresponding equilibria (shown in the figure) are then as follows:
� � < �M : with VC > VM , agents never accept money and a non-
monetary equilibrium arises (� = 0).
� � > �M : agents always accept money since VC < VM , and the
resulting equilibrium is pure monetary (� = 1).
� � = �M : in this case VC = VM and agents are indifferent between
holding commodities and money; the corresponding equilibrium is
mixed monetary (� = �M ).
ANSWERS TO EXERCISES 271
(b) With storage costs for money the non-monetary equilibrium always
exists, whereas the existence of the other two possible equilibria
depends on the magnitude of c . Even when money is accepted with
certainty (� = 1), agents may not be fully compensated for the storage
cost if c is very large. To find the values of c for which a pure monetary
equilibrium exists, we consider VC < VM when � = 1:
VC < VM ⇒ ‚(1 − M) xU (x − 1) + (1 − ‚x ) c < 0
⇒ c < ‚(1 − M) xU (1 − x )
1 − ‚x .
To ease the interpretation of this condition, consider the case of ‚ = 1.
(Agents meet pairwise with certainty each period.) The above condi-
tion then simplifies to c < (1 − M)xU . The right-hand term is the
expected utility from consumption for a money holder (utility U times
the probability of meeting a trader offering an acceptable commodity
(1 − M)x ). Only if the storage cost of money c is lower than the
expected utility from consumption does a pure monetary (and a fortiori
a mixed monetary) equilibrium exist, as in the case portrayed in the
figure.
Solution to exercise 56
The fact that c and z depend on the wage alters the shape of the J C and W
curves, as can be seen in the figure. In steady-state equilibrium, using (5.34),
we have
y − w = (r + s ) c 0w
q (Ë)
⇒ w = 1
1 + (r + s )c 0/q (Ë)
y( J C )
and
w = z0w + ‚( y + c 0wË − z0w) ⇒ w =
‚
1 − (1 − ‚)z0 − ‚z0Ë
y.(W)
From (W), which holds in steady-state equilibrium and along the adjustment
path, it follows that the wage is proportional to productivity and the
factor of proportionality is positively correlated with the measure of labor
market tightness Ë. Combining both equations, we obtain a result that is
different from the one in exercise 51: here Ë is independent of the value
of productivity in the steady-state equilibrium. Variations in y lead to
proportional adjustments in the wage but have no effect on Ë or on the
unemployment rate u. Hence, in this version of the model, a continuous
increase in productivity (technological progress) does not lead to a decrease in
the long-run unemployment rate. The unemployment rate is determined by
the properties of the matching technology (the efficiency of the “technology”
272 ANSWERS TO EXERCISES
that governs the process of meetings between unemployed workers and
vacancies) and the exogenous separation rate s .
Solution to exercise 57
At t0 firms anticipate the future reduction in productivity and immediately
reduce the number of vacancies: v and Ë fall by a discrete amount (see figure).
Between t0 and t1, the dynamics are governed by the difference equations
associated with the initial steady state: v and Ë continue to decrease (while
the unemployment rate increases) until they reach the new saddlepath at t1.
From t1 onwards u and v increase in the same proportion, leaving the labor
market tightness Ë unchanged.
Solution to exercise 58
Following the procedure outlined in the main text, we calculate the total
differential of the two first-order conditions, which yields
V 111d e
∗
1 + V
1
12d e
∗
2 + V
1
13d Î = 0,
V 221d e
∗
1 + V
2
22d e
∗
2 + V
2
23d Î = 0.
ANSWERS TO EXERCISES 273
Using the same definitions as in the main text, we can rewrite this system of
equations as (
1 −Ò
−Ò 1
)(
d e ∗1
d e ∗2
)
=
(
∂e ∗1 (·)/∂Î
∂e ∗2 (·)/∂Î
)
d Î,
from which we obtain the following results:
d e ∗1
d Î
=
∂e ∗1 /∂Î + Ò(∂e
∗
2 /∂Î)
1 − Ò2 =
1
1 − Ò2 (1 + Ò)
∂e ∗1
∂Î
=
1
1 − Ò
∂e ∗1
∂Î
d e ∗2
d Î
=
∂e ∗2 /∂Î + Ò(∂e
∗
1 /∂Î)
1 − Ò2 =
1
1 − Ò
∂e ∗2
∂Î
d (e ∗1 + e
∗
2 )
d Î
=
2
1 − Ò
∂e ∗1
∂Î
= 2
d e ∗1
d Î
.
� I N D E X
adjustment costs
convex, 49
linear, 76, 106
balanced growth, 134
Bellman equation, 38, 41, 108
Brownian motion, 82
geometric, 86
bubble, 54
capital, 131
human, 37, 122, 157
CCAPM (consumption based capital asset
pricing model), 30
Cobb–Douglas, 39, 138, 149, 151, 210
constant absolute risk aversion (CARA),
25
constant elasticity of substitution (C.E.S.),
137, 148
constant relative risk aversion (CRRA), 6, 31,
39, 142
consumption function, 8, 12, 15, 25, 40
discounting
exponential vs hyperbolic, 3
elasticity
of substitution, 137
envelope theorem, 38, 66, 107
ergodic distribution, 114, 126
Euler equation, 108, 140, 146
Euler’s theorem, 69, 149
excess smoothness, 13, 17, 19, 29
expectations, 1, 4, 44, 65, 71
iterated, 18, 126
externalities, 155, 170, 188, 206,
213
Hamiltonian, 53, 91, 96, 98, 138,
146
idiosyncratic uncertainty, 91, 121
income
permanent, 1, 8, 13, 41
innovation (in stochastic process),
28
integral, 92
integration
by parts, 84, 94
by substitution, 84
intertemporal budget constraint, 5
irreversibility, 78, 85
Jensen’s inequality, 24, 33, 68, 85, 116
l’Hôpital’s rule, 6, 137, 150
labor hoarding, 111
Lagrangian, 5, 94
learning by doing, 154
Leibnitz rule, 176
life-cycle, 13
liquidity constraints, 15, 44
marginal rate of substitution
intertemporal, 30
mark-up, 163
Markov process, 83, 125
monopoly power, 160
multiple equilibria, 171, 178, 185, 212
Nash bargaining, 195
Nash equilibrium, 182, 184
orthogonality test, 8, 20
persistence, 12, 16
phase diagram, 56, 61
q
average, 70, 109
marginal, 55
replicability, 152
representative agent, 147
returns to scale
constant, 132, 146
decreasing, 140
increasing, 151, 152
risk premium, 34, 36
saddlepath, 60, 99, 140, 179, 203
series
discounted arithmetic, 28
geometric, 11
INDEX 275
smooth pasting, 88
Solow growth model, 132
stationarity, 16
steady-state, 59, 61, 137
optimal savings, 140
stochastic process, 3, 82
ARMA, 16
Taylor expansion, 118
transversality condition, 4, 53, 54, 59, 60, 62,
139, 143
user cost of capital, 61, 77
Wiener process, 82
CONTENTS
LIST OF FIGURES
1 Dynamic Consumption Theory
1.1 Permanent Income and Optimal Consumption
1.1.1 Optimal consumption dynamics
1.1.2 Consumption level and dynamics
1.1.3 Dynamics of income, consumption, and saving
1.1.4 Consumption, saving, and current income
1.2 Empirical Issues
1.2.1 Excess sensitivity of consumption to current income
1.2.2 Relative variability of income and consumption
1.2.3 Joint dynamics of income and saving
1.3 The Role of Precautionary Saving
1.3.1 Microeconomic foundations
1.3.2 Implications for the consumption function
1.4 Consumption and Financial Returns
1.4.1 Empirical implications of the CCAPM
1.4.2 Extension: the habit formation hypothesis
Appendix A1: Dynamic Programming
Review Exercises
Further Reading
References
2 Dynamic Models of Investment
2.1 Convex Adjustment Costs
2.2 Continuous-Time Optimization
2.2.1 Characterizing optimal investment
2.3 Steady-State and Adjustment Paths
2.4 The Value of Capital and Future Cash Flows
2.5 Average Value of Capital
2.6 A Dynamic IS-LM Model
2.7 Linear Adjustment Costs
2.8 Irreversible Investment Under Uncertainty
2.8.1 Stochastic calculus
2.8.2 Optimization under uncertainty and irreversibility
Appendix A2: Hamiltonian Optimization Methods
Review Exercises
Further Reading
References
3 Adjustment Costs in the Labor Market
3.1 Hiring and Firing Costs
3.1.1 Optimal hiring and firing
3.2 The Dynamics of Employment
3.3 Average Long-Run Effects
3.3.1 Average employment
3.3.2 Average profits
3.4 Adjustment Costs and Labor Allocation
3.4.1 Dynamic wage differentials
Appendix A3: (Two-State) Markov Processes
Exercises
Further Reading
References
4 Growth in Dynamic General Equilibrium
4.1 Production, Savings, and Growth
4.1.1 Balanced growth
4.1.2 Unlimited accumulation
4.2 Dynamic Optimization
4.2.1 Economic interpretation and optimal growth
4.2.2 Steady state and convergence
4.2.3 Unlimited optimal accumulation
4.3 Decentralized Production and Investment Decisions
4.3.1 Optimal growth
4.4 Measurement of “Progress”: The Solow Residual
4.5 Endogenous Growth and Market Imperfections
4.5.1 Production and non-rival factors
4.5.2 Involuntary technological progress
4.5.3 Scientific research
4.5.4 Human capital
4.5.5 Government expenditure and growth
4.5.6 Monopoly power and private innovations
Review Exercises
Further Reading
References
5 Coordination and Externalities in Macroeconomics
5.1 Trading Externalities and Multiple Equilibria
5.1.1 Structure of the model
5.1.2 Solution and characterization
5.2 A Search Model of Money
5.2.1 The structure of the economy
5.2.2 Optimal strategies and equilibria
5.2.3 Implications
5.3 Search Externalities in the Labor Market
5.3.1 Frictional unemployment
5.3.2 The dynamics of unemployment
5.3.3 Job availability
5.3.4 Wage determination and the steady state
5.4 Dynamics
5.4.1 Market tightness
5.4.2 The steady state and dynamics
5.5 Externalities and efficiency
Appendix A5: Strategic Interactions and Multipliers
Review Exercises
Further Reading
References
ANSWERS TO EXERCISES
INDEX
A
B
C
D
E
H
I
J
L
M
N
O
P
R
S
T
U
W