NBER WORKING PAPER SERIES
THE WELFARE EFFECTS OF NUDGES:
A CASE STUDY OF ENERGY USE SOCIAL COMPARISONS
Hunt Allcott
Judd B. Kessler
Working Paper 21671
http://www.nber.org/papers/w21671
NATIONAL BUREAU OF ECONOMIC RESEARCH
1050 Massachusetts Avenue
Cambridge, MA 02138
October 2015
We are grateful to Paula Pedro for outstanding research management, and we thank Opower and Central
Hudson Gas and Electric for productive collaboration and helpful feedback. We also thank Nava Ashraf,
Stefano DellaVigna, Avi Feller, Michael Greenstone, Ben Handel, Guido Imbens, Kelsey Jack, David
Laibson, John List, Todd Rogers, Dmitry Taubinsky, and seminar participants at Berkeley, the Consumer
Financial Protection Bureau, Cornell, Microsoft Research, New York University, Stanford Institute
for Theoretical Economics, Wesleyan, and Yale for comments. We are grateful to the Sloan Foundation
and Poverty Action Lab for grant funding. This RCT was registered in the American Economic Association
Registry for randomized control trials under trial number 713. Code to replicate the analysis is available
from Hunt Allcott's website. The views expressed herein are those of the authors and do not necessarily
reflect the views of the National Bureau of Economic Research.
NBER working papers are circulated for discussion and comment purposes. They have not been peer-
reviewed or been subject to the review by the NBER Board of Directors that accompanies official
NBER publications.
© 2015 by Hunt Allcott and Judd B. Kessler. All rights reserved. Short sections of text, not to exceed
two paragraphs, may be quoted without explicit permission provided that full credit, including © notice,
is given to the source.
The Welfare Effects of Nudges: A Case Study of Energy Use Social Comparisons
Hunt Allcott and Judd B. Kessler
NBER Working Paper No. 21671
October 2015
JEL No. C44,C53,D12,L94,Q41,Q48
ABSTRACT
"Nudge"-style interventions are typically evaluated on the basis of their effects on behavior, not social
welfare. We use a field experiment to measure the welfare effects of one especially policy-relevant
intervention, home energy conservation reports. We measure consumer welfare by sending introductory
reports and using an incentive-compatible multiple price list to determine willingness-to-pay to continue
the program. We combine this with estimates of implementation costs and externality reductions to
carry out a comprehensive welfare evaluation. We find that this nudge increases social welfare, although
traditional program evaluation approaches overstate welfare gains by a factor of five. To exploit significant
individual-level heterogeneity in welfare gains, we develop a simple machine learning algorithm to
optimally target the nudge; this would more than double the welfare gains. Our results highlight that
nudges, even those that are highly effective at changing behavior, need to be evaluated based on their
welfare implications.
Hunt Allcott
Department of Economics
New York University
19 W. 4th Street, 6th Floor
New York, NY 10012
and NBER
Judd B. Kessler
Department of Business Economics and Public Policy
The Wharton School
University of Pennsylvania
3620 Locust Walk
Philadelphia, PA 19104
A randomized controlled trials registry entry is available at:
https://www.socialscienceregistry.org/trials/713
Code for replication is available at:
https://www.dropbox.com/s/l2m8l55o3wnuexi/AllcottKessler_Replication.zip?dl=0
Policymakers and academics are increasingly interested in “nudges,” such as information provision,
reminders, social comparisons, default options, and commitment contracts, which can affect be-
havior without changing prices or choice sets. Nudges are being used to encourage a variety of
privately-beneficial and socially-beneficial behaviors, such as healthy eating, exercise, organ dona-
tion, charitable giving, retirement savings, hand washing, and environmental conservation. The
US, British, and Australian governments have set up “nudge units” to infuse these ideas into the
policy process.
1
A growing list of academic papers evaluate nudge-style interventions in various
domains.
2
With only a few exceptions discussed below, nudges are typically evaluated based on the mag-
nitude of behavior change or on cost effectiveness. When a nudge significantly increases a positive
behavior at low cost, policymakers often advocate that it be broadly adopted. A full social welfare
evaluation could produce different policy prescriptions, however, because people being nudged often
experience two types of benefits and/or costs that typical evaluations do not consider. First, nudge
recipients often incur costs in order to change behavior. For example, people who quit smoking save
money on cigarettes but give up any enjoyment from smoking, and healthy eating might mean pay-
ing more for vegetables and giving up tasty desserts.
3
Second, the nudge itself may directly impose
positive or negative utility. For example, seeing cigarette warning labels with graphic images of
smoking-related diseases can be unpleasant, and body weight report cards could make children feel
guilty or shameful. Building on Caplin (2003) and Loewenstein and O’Donoghue (2006), Glaeser
(2006) argues that many nudges are essentially emotional taxes that reduce utility but do not raise
revenues.
This paper presents a social welfare evaluation of Home Energy Reports (HERs), one-page let-
ters that compare a household’s energy use to that of its neighbors and provide energy conservation
tips. While HERs are just one case study, they are one of the most prominent and frequently-
studied nudges. Opower, the leading HER provider, now works with 95 utility companies in nine
countries, sending HERs regularly to 15 million households. There has been significant academic
interest in HERs, including seminal studies by Schultz et al. (2007) and Nolan et al. (2008) and
many follow-on evaluations of social comparisons and other “behavior-based” energy conservation
interventions.
4
There are also a plethora of industry studies and regulatory evaluations of such
1
In September 2015, the US “nudge unit,” the Social and Behavioral Sciences Team, released results from 15
experiments, and President Obama signed an executive order that directs federal agencies to use behavioral insights
when they “may yield substantial improvements in social welfare and program outcomes” (EOP 2015).
2
One indicator of academic interest is that the book Nudge (Thaler and Sunstein 2008) has been cited more than
5000 times.
3
Of course, if the policymaker has correctly designated a “good” behavior to nudge people toward, this typically
means that the behavior change generates net benefits for the individual. However, the magnitude of these net
benefits would ideally be calculated and weighed against a nudge’s other costs and benefits.
4
Academic papers on energy use social comparison reports include Kantola, Syme, and Campbell (1984), Allcott
(2011, 2015), Ayres, Raseman, and Shih (2013), Costa and Kahn (2013), Dolan and Metcalfe (2013), Allcott and
Rogers (2014), and Sudarshan (2014). Delmas, Fischlein, and Asensio (2013) review 156 published field trials studying
social comparisons and other informational interventions to induce energy conservation.
2
programs.
5
These existing evaluations of behavior-based energy conservation programs often make policy
recommendations by comparing program implementation costs to the value of energy saved. This
approach is so well-established that energy industry regulators have a name for it: the “program
administrator cost test.” As with most evaluations of other nudges, this ignores benefits and costs
(other than energy cost savings) experienced by nudge recipients. For example, what financial
costs did consumers incur to generate the observed energy savings, for example to install improved
insulation? What is the cost of time devoted to turning off lights or adjusting thermostats? What
is the value of comfort from better-insulated homes, or the discomfort from setting thermostats to
energy-saving temperatures? Are there meaningful psychological benefits or costs of using social
comparisons to inspire or guilt people into conserving energy?
Home Energy Reports have two features that we leverage to conduct a social welfare analysis
that considers the full range of recipient benefits and costs. First, they are a private good that
can be sold. Second, the standard policy is to deliver them regularly, e.g. every two months, over
several years. These two features mean that it is both possible and policy-relevant to measure
willingness-to-pay (WTP) for future HERs in a sample of experienced past recipients. In simple
terms, our approach is to send people one year of HERs, each of which has a similar structure but
includes new conservation tips and updated energy use feedback, and then ask them how much
they are willing to pay to receive HERs for a second year. Because these people have experience
with HERs from the first year, we respect their WTP as an accurate measure of their welfare
from receiving more of them. We then use standard economic tools to evaluate the welfare effects
of the second year of HERs, weighting consumer welfare gains against implementation costs and
reductions in uninternalized externalities.
More specifically, we study a program providing HERs to about 10,000 residential natural gas
consumers at a utility in upstate New York over the 2014-2015 and 2015-2016 winter heating
seasons. At the end of winter 2014-2015, we surveyed all HER recipients by mail and phone with
multiple price lists (MPLs) that trade off next winter’s HERs with checks for different amounts of
money. We designed the MPL to allow negative WTP as well as positive WTP, as some households
opt out of HER programs even though the reports are free. The MPLs were incentive compatible
depending on their responses, each household will receive a check from the utility and/or more
HERs in winter 2015-2016. Because the initial HER recipients were randomly selected from a larger
population, we can easily estimate the effects of HERs on energy use, which we then translate to
a value of uninternalized externalities using parameters such as the social cost of carbon.
We find that the average household is willing to pay just under $3 for a second year of Home
Energy Reports. While most people like HERs, 35 percent have weakly negative WTP that is,
5
These include Violette, Provencher, and Klos (2009), Ashby et al. (2012), Integral Analytics (2012), KEMA
(2012), Opinion Dynamics (2012), and Perry and Woehleke (2013), among many others.
3
they prefer not to be nudged even if the nudge is free. In support of the usual revealed preference
assumption, the data suggest that WTP is a reliable measure of how much people like HERs: for
example, WTP is highly correlated with qualitative evaluations of the HERs and beliefs about
savings made possible by future HERs. We estimate that WTP equals about 51 percent of retail
energy cost savings, meaning that the remaining 49 percent represents net financial, time, comfort,
and psychological costs required to generate the energy savings. This high ratio of energy savings
to costs suggests that, leaving aside the implementation cost, HERs provide privately-useful con-
servation information and/or psychological benefits. However, this 49 percent “non-energy cost”
is not included in previous HER evaluations, nor in most evaluations of similar nudges in other
domains.
Our main estimates suggest that the second year of this HER program increases social welfare
by $0.70 per household. However, the standard approach of ignoring non-energy costs overstates
this welfare gain by a factor of five. We find the same qualitative results in a more speculative
calculation where we generalize the 49 percent non-energy cost rule of thumb to the full course of
a typical HER program: under this assumption, the typical program likely increases welfare, but
ignoring non-energy costs overstates welfare gains by a factor of 2.4.
The nudge’s welfare effects are driven down by the fact that about 59 percent of nudge recipients
are not willing to pay the social marginal cost of the nudge, including many who have negative
WTP. On the other hand, more than 30 percent of recipients are willing to pay more than twice
the social marginal cost. A natural response to heterogeneous valuations would be to price the
nudge at expected net social cost and let people opt in if they want to. In this context, however,
inertia is extremely powerful: HERs involve much lower stakes than other contexts such as health
insurance and retirement savings plans where default settings are powerful, as studied by Madrian
and Shea (2001), Kling et al. (2012), Handel (2013), Ericson (2014), and others. We show that
even under generous assumptions, an opt-in program is unlikely to enroll enough people to generate
larger welfare gains than the current opt-out policy. Instead, we train a simple machine learning
algorithm to set a “smart default” that is, to target the program at consumers that would generate
the largest welfare gains if nudged. The smart default approach can more than double the welfare
gains, holding constant the number of nudge recipients.
These results have important but nuanced implications for energy policy. Many utilities send
HERs to help comply with regulations called Energy Efficiency Portfolio Standards (EEPS), which
require utilities to induce a specific quantity of energy savings each year. While this paper finds
that net benefits of HERs are less than previously reported, benefit-cost analyses of alternative
energy efficiency programs such as home retrofits also may suffer from systematic biases.
6
Thus,
6
One potential source of systematic bias is that actual energy savings may be different than simulation-based
assumptions; see Nadel and Keating (1991) and more recent studies such as Allcott and Greenstone (2015) and
Fowlie, Greenstone, and Wolfram (2015). A second source of bias is that according to Kushler et al. (2012), only 30
percent of energy efficiency programs measure non-energy benefits and costs such as the financial, time, and utility
4
substituting to alternative programs that have not been subjected to a complete social welfare
analysis may not be better than continuing an HER program. At a minimum, our results suggest
that there is much work to be done to correctly measure the welfare effects of energy efficiency
programs.
We are not the first or only researchers to consider the welfare effects of nudges. A handful
of previous empirical and theoretical analyses of behaviorally-motivated policies have recognized
the difference between effects on behavior and effects on welfare, including Carroll, Choi, Laibson,
Madrian, and Metrick (2009) and Bernheim, Fradkin, and Popov (2015) on optimal retirement sav-
ings plan defaults; Handel (2013) on insurance plan choice; Bhattacharya, Garber, and Goldhaber-
Fiebert (2015) on exercise commitment contracts; and Reyniers and Bhalla (2013) and Cain, Dana,
and Newman (2014) on charitable giving. There is an active literature debating the welfare gains
from cigarette graphic warning labels, including Weimer, Vining, and Thomas (2009), FDA (2011),
Chaloupka et al. (2014), Ashley, Nardinelli, and Lavaty (2015), Chaloupka, Gruber, and Warner
(2015), Cutler, Jessup, Kenkel, and Starr (2015), Jin, Kenkel, Liu, and Wang (2015), and others.
Even within these papers that are grounded in a welfare framework, however, most do not actually
implement an empirical social welfare analysis of a nudge because measuring consumer welfare can
be so challenging.
Although not a study of a nudge intervention, DellaVigna, List, and Malmendier (2012) is similar
in spirit: they point out that charitable donation appeals could increase utility by activating warm
glow of donors or instead decrease utility by imposing social pressure. They combine an “avoidance
design” measuring whether people avoid opportunities to donate with a structural model,
concluding that door-to-door fundraising drives can reduce welfare even as they raise money for
charity. Herberich, List, and Price (2012) use the same design to show that both altruism and
social pressure motivate people to buy energy efficient lightbulbs from door-to-door salespeople,
and Andreoni, Rao, and Trachtman (2011) and Trachtman et al. (2015) use a different avoidance
design to study motivations for charitable giving, although none of these latter three papers includes
a social welfare analysis. Avoidance designs achieve the same conceptual goal as our MPL: both
allow the analyst to observe people opting in or out of a nudge (or opportunity to donate) at some
cost. Our MPL design is especially useful, however, because it immediately gives a WTP, whereas
avoidance behaviors require additional assumptions or structural estimates to be translated into
dollars.
Section I formally defines a “nudge” and derives a formula for welfare effects. Sections II and
III present the experimental design and data. Sections IV and V present the empirical results
and social welfare calculation. Section VI evaluates targeting and opt-in policies, and Section VII
concludes.
costs discussed above. Depending on the program, these factors could bias welfare estimates in either direction.
5
I Theoretical Framework
This section lays out a simple theoretical framework that formalizes what we mean by a “nudge”
and derives an equation for welfare effects.
I.A Consumers and Producers
We model a population of heterogeneous consumers who derive utility from consuming numeraire
good x and a continuous choice e, which in our application is energy use. With slight modifications
to the below, e could also represent healthful eating, exercise, using preventive health care, charita-
ble giving, or other actions. e generates consumption utility f(e; α), where α is a taste parameter.
To capture imperfect information or behavioral bias, we allow a factor γ that affects choice but
not experienced utility. For example, γ could represent noise in a signal of an unknown production
function for health or household energy services, or it could represent a mistake in evaluating the
private net benefits of e, perhaps due to inattention or present bias. Consumers have perceived
consumption utility
ˆ
f(e; α, γ), which may or may not equal f (e; α).
e is produced at constant marginal cost c
e
and sold at constant price p
e
, giving markup π
e
=
p
e
c
c
per unit. For another application, one might extend the model to endogenize price. In our
application, e is sold by a regulated utility that is allowed a constant markup over marginal cost,
so π
e
is exogenous and positive. In a health application, π
e
could represent mispricing of health
care from insurance. e imposes constant externality φ
e
per unit. Consumers have income y and
pay lump-sum tax T to the government.
We include a “moral utility” term M = m µe. Following Levitt and List (2007), moral
utility arises when actions impose externalities, are subject to social norms, or are scrutinized by
others. This concept is especially appropriate for our setting, where energy production causes
environmental externalities and Home Energy Reports scrutinize energy use and present social
norms. The moral price µ can be thought of as a “psychological tax” or “moral tax” on e, as in
Glaeser (2006, 2014) and Loewenstein and O’Donoghue (2006), or as fear of future consequences
of e, as in Caplin (2003). More positive µ can also represent a moral subsidy for reducing e. To
model a moral subsidy, imagine that consumers receive utility µ for every unit of e not consumed,
up to m
s
, where m
s
> e. Moral utility is then M = µ(m
s
e), which equals m µe when we set
m = µm
s
. This framework can also allow moral utility to depend on consumption relative to a
social norm s: if M = m
s
µ(e s), this equals m µe when we set m = m
s
+ µs. m also captures
any “windfall” utility change, if recipients like or dislike the nudge regardless of e.
Let the vector θ = {y, α, γ, m, µ} summarize all factors that vary across consumers. We assume
that utility is quasilinear in x, so
ˆ
f
0
> 0,
ˆ
f
00
< 0,
ˆ
f
0
(0) = , and the consumer maximizes
max
x,e
ˆ
U(θ) = x +
ˆ
f(e; α, γ) + m µe, (1)
6
subject to budget constraint
y T x + ep
e
. (2)
Consumers’ equilibrium choice of e, denoted ˜e(θ), is determined by the following first-order
condition:
ˆ
f
0
(˜e; α, γ) µ = p
e
. (3)
This equation shows that increasing the moral price µ can have the same effect on behavior as
increasing the price p
e
. However, we discuss below how a price increase vs. a moral price increase
are very different from a welfare perspective.
Two market failures can cause equilibrium ˜e(θ) to differ from the social optimum. First, γ
(imperfect information or other factors) affects choice but not experienced utility. Second, price p
e
may differ from social marginal cost c
e
+φ
e
because of the externality φ
e
and markup π
e
. In the first
best, p
e
= c
e
+φ
e
and the consumer would maximize experienced utility U(θ) = x+f(e; α)+mµe.
I.B Nudges
The policymaker can implement a nudge at cost C
n
per consumer and maintains a balanced budget
using lump-sum tax T = C
n
. We formalize the nudge as a binary instrument n {1, 0} that changes
consumers’ γ, m, and µ. Specifically, each consumer has possibly different potential outcomes θ
n
for n = 0 vs. n = 1, in which γ, m, and µ could differ. We define Θ = {θ
0
, θ
1
} and let F (Θ) denote
its distribution. In words, a nudge provides information, reduces bias, and/or persuades people by
activating moral utility. This is intended to be consistent with the practical examples of Thaler and
Sunstein (2008), and it is closely analogous to the formal definition in Farhi and Gabaix (2015).
I.C Private and Social Welfare Effects of Nudges
We define “pre-tax consumer welfare” as V (θ
n
) = U(θ
n
) + T , and we use to represent effects of
a nudge, e.g. V V (θ
1
) V (θ
0
). The effect of the nudge on pre-tax consumer welfare is
V = ˜e · p
e
+ f + M. (4)
Social welfare is consumer welfare plus profits minus the externality:
W (n) =
ˆ
U(θ
n
) + (π
e
φ
e
)˜e(θ
n
) dF (Θ). (5)
The effect of the nudge on social welfare is
7
W =
´
V C
n
+ (π
e
φ
e
)∆˜e dF (Θ). (6)
The first term in Equation (6) reflects the net benefit to consumers, ignoring the fact that they
must pay for the nudge through the lump-sum tax. The second term C
n
then accounts for the cost
of the nudge. The final term reflects the change in the pricing distortion.
Nudges with the same effect on behavior ˜e and the same cost C
n
and thus the same cost
effectiveness can have very different effects on consumer welfare, and thus very different social
welfare effects. Figure 1 helps present several distinct mechanisms through which demand could
shift from D
0
to D
1
, giving the same ˜e < 0 as the equilibrium shifts from point a to point g.
First, imagine that there is no moral utility, and the nudge only sets
ˆ
f = f, i.e. it only provides
information or eliminates bias. In this example, D
0
represents perceived
ˆ
f, while D
1
represents
true f. The nudge saves consumers money ˜e · p
e
(rectangle acdg), which is only partially offset
by reduction in consumption utility f(e; α) (trapezoid bcdg). To a first order approximation, the
nudge generates V
1
2
(∆˜e)
2
de/dp
e
> 0, i.e. it eliminates deadweight loss triangle abg.
Figure 1: Illustrating the Effects of a Nudge on Consumer Welfare
$
e
a
g
p
e
b
i
h
m
s
l
j
D
0
D
1
k
Now imagine that
ˆ
f = f without the nudge, and the nudge only raises the moral price from
µ
0
= 0 to µ
1
, generating the same ˜e. In this example, D
0
reflects consumption utility f(e; α),
8
both with and without the nudge. As in the first example, this saves consumers money ˜e · p
e
,
but this is outweighed by consumption utility loss shown by trapezoid acdh. In addition, moral
utility M decreases by µ
1
˜e(θ
1
), which is area ghji. In sum, the moral tax reduces consumer welfare
by the same amount as a standard tax: V =
1
2
(∆˜e)
2
de/dp
e
µ
1
˜e(θ
1
) < 0, or trapezoid agij. Unlike a
standard tax, however, the moral tax does not generate revenues it simply reduces utility. The
welfare effect is negative even if the first-best ˜e is achieved. Alternatively, the nudge could be a
moral subsidy on every unit of e not consumed up to m
s
. In this case, consumer welfare would
change by V =
1
2
(∆˜e)
2
de/dp
e
+ µ
1
(m
s
˜e(θ
1
)) > 0, or trapezoid aklg. More broadly, the nudge can
have unbounded positive or negative effects on V unless further restrictions are placed on m.
This discussion highlights how traditional evaluation metrics can be misleading guides for policy
decisions: large behavior change ˜e and low implementation cost C
n
are neither necessary nor
sufficient for a nudge to increase welfare.
I.D Estimation
In the remainder of the paper, we estimate Equation (6) for a specific nudge: Home Energy Reports.
We estimate the change in energy use ∆˜e by implementing HERs as a randomized control trial. We
use outside estimates of energy use externalities φ
e
, and we estimate markup π
e
and nudge cost C
n
from pricing and cost data. To estimate the change in consumer welfare V , we elicit willingness-
to-pay for the nudge. In doing this, we must assume that our experimental design correctly elicits
WTP and that consumers are “sophisticated” in the sense that their WTP for the nudge equals its
true effect on their welfare. Sections II-IV present evidence on the plausibility of this assumption,
and we formalize it before performing the welfare analysis in Section V.
II Experimental Design
The Opower Home Energy Report is a one-page letter (front and back) with two key features
illustrated in Figure 2. The Social Comparison Module in Panel (a) compares a household’s energy
use to that of its 100 geographically nearest neighbors in similar house sizes whose energy use meters
were read on approximately the same date. In the neighbor comparison graphs, “All Neighbors”
refers to the mean of the neighbor distribution, while “Efficient Neighbors” refers to the 20th
percentile. To the right of the three-bar neighbor comparison graph is a box presenting “injunctive
norms” intended to signal virtuous behavior (Schultz et al. 2007): consumers earn one smiley face
for using less than their mean neighbor and two smiley faces for using less than their Efficient
Neighbors. The Action Steps Module in Panel (b) gives energy conservation tips; these suggestions
are tailored to each household based on past usage patterns. The HERs are thus designed to both
provide information and activate “moral utility.”
9
Figure 2: The Opower Home Energy Report
If you have questions or no longer want to receive
reports, call 555-555-5555.
For a full list of energy-saving products and
services for purchase, including rebates from
UtilityCo, visit
utilityco.com/rebates .
This report gives you context on your energy use
to help you make smart energy-saving decisions.
Home Energy Report
Account number: 1234567890
Report period: 11/23/14–12/21/14
Efficient Neighbors: The most efficient
20 percent from the “All Neighbors” group
All Neighbors: Approximately 100 occupied,
nearby homes that are similar in size to yours
(avg 1,517 sq ft)
Who are your Neighbors?
How you're doing:
More than average
GOOD
Great
* Therms: Standard unit of measuring heat energy
28
All Neighbors
27
YOU
19 Therms*
Efficient Neighbors
You used than your efficient neighbors.42% more natural gas
Last Month Neighbor Comparison
You used than your efficient neighbors.81% more natural gas
This costs you about per year.$229 extra
Last 12 Months Neighbor Comparison
Efficient NeighborsAll NeighborsYou
Key:
DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV
2 0
4 0
6 0
8 0
Therms
20 14 >< 201 3
Turn over for savings
BOB SMITH
555 MAIN STREET
ANYTOWN, ST 12345
1515 N. Courthouse Road, Floor 8
Arlington, VA 22201-2909
(a) Social Comparison Module
Personalized tips | For a complete list of energy saving investments and smart purchases, visit utilityco.com/rebates.
Do you need help heating your home?
Don’t let bill trouble prevent you from keeping your home
and family warm this winter. The Low-Income Home Energy
Assistance Program (LIHEAP) can help eligible customers
pay current or past-due heating bills or help restore power
that has been shut off.
Apply for winter assistance at utilityco.com/rebates.
or call 555-555-5555.
Quick Fix
Something you can do right now
Open your shades on winter
days
Taking advantage of winter's
direct sunlight can make a dent
in your heating costs. Open
blinds and other window
treatments during the day to
capture free heat and light.
South-facing windows have the
most potential for heat gain,
and the sun is most intense
from 9 a.m. to 3 p.m.
When you let the sun in,
remember to lower the
thermostat by a few degrees.
These two steps combined are
what save money and energy.
10
$
SAVE UP TO
PER YEAR
Smart Purchase
An affordable way to save more
Program your thermostat
A programmable thermostat
can automatically adjust your
heat or air conditioning when
you're away, then return to your
preferred temperature when
you're home to enjoy it.
If you don't already have a
programmable thermostat, look
for one at your local home
improvement store. For comfort
and convenience, be sure to
program your thermostat with
energy-efficient settings.
If you need help installing or
programming your thermostat,
consult your manual or call the
manufacturer for assistance.
65
$
SAVE UP TO
PER YEAR
Smart Purchase
An affordable way to save more
Weatherstrip windows and
doors
Windows and doors can be
responsible for up to 25% of
heat loss in winter for a typical
home.
If you're comfortable doing the
task yourself, you can
weatherize your home in just a
few hours. Seal windows for
about $1 each with rope caulk,
or install more permanent
weatherstripping for $8-$10 per
window. Also, install sweeps at
the bottom of exterior doors.
A professional can help you
with this work if you prefer.
10
$
SAVE UP TO
PER YEAR
© 2012-2015 Opower
Printed on 10% post-consumer recycled paper using water-based inks.
utilityco.com/energyreports | 555-555-5555 | [email protected]
(b) Action Steps Module
Notes: The Home Energy Report is a one-page (front and back) letter including the Social Comparison
Module in Panel (a) and the Action Steps Module in Panel (b).
10
Opower has implemented HER programs at 95 utilities in nine countries. We focus on one
program at Central Hudson Gas and Electric, which serves 300,000 electric customers and 78,000
natural gas customers in eight New York counties. Like 23 other states, New York has an Energy
Efficiency Portfolio Standard, which requires that utilities cause consumers to reduce energy de-
mand by a specified amount each year (ACEEE 2015). As part of compliance with the standard,
Central Hudson had already planned a multi-year Home Energy Report program for residential
natural gas customers. Central Hudson and Opower agreed to modify the program to incorporate
this study.
Why do utilities typically send many HERs over multiple years instead of stopping after the
first HER or first year? The reason is that in practice, continued HERs cause incremental conser-
vation (Allcott and Rogers 2014). The continuing effects likely arise both because additional HERs
are a motivational reminder and because they provide new information. Indeed, 49.5 percent of
households saw their ranking relative to their mean neighbor or Efficient Neighbors change across
reports in the first year of the Central Hudson program we study.
7
Furthermore, the energy con-
servation tips change with every report. It is thus unlikely that the first HER provides the bulk
of the informational or motivational benefits, and it is not obvious the extent to which consumers
would value the first HERs vs. later HERs differently.
Figure 3 summarizes the experimental design. Starting with an eligible population of 19,927
households, Opower randomly assigned half to treatment and half to control. The HER treatment
group received up to four HERs during the “heating season” from late October 2014 through late
April 2015. Central Hudson employees read each household’s natural gas meter every two months,
and an HER was generated and mailed shortly after each meter read in order to provide timely and
relevant information. Like almost all other HER programs, this is an “opt out” program, so house-
holds continue to receive HERs unless they contact the utility to opt out. Of the recipient group
households, 525 were not sent any HERs for standard technical reasons such as not having enough
neighbors to generate valid comparisons; we did not survey these households. Some households
received fewer than four HERs for the same technical reasons.
Opower included a one-page survey and postage-paid Business Reply Mail return envelope in
the same envelope as the final HER of the 2014-2015 heating season. Figure 4 reproduces the
survey. The first seven questions were a multiple price list (MPL) that asked recipients to trade off
four more HERs with checks for different amounts of money. The responses can be used to bound
willingness-to-pay. For example, consumers who prefer “four more Home Energy Reports plus a $9
check” instead of “a $10 check” value the four HERs at $1 or more. Consumers who prefer “a $10
check” instead of “four more Home Energy Reports plus a $5 check” value the four HERs at $5 or
less. A consumer who answered as in these two examples therefore has WTP between $1 and $5.
7
These changes occur largely because of standard month-to-month variation in household energy use, not due
to conservation actions induced by the HERs. The average treatment effect of HERs is very small relative to the
standard within-household and between-household variation.
11
Figure 3: Experimental Design
Control
1. Four Reports (October 2014-April 2015)
2. First mail survey (in final Report)
3. Follow-up mail survey (own envelope, May)
4. Phone survey (June-August)
5. Next four Reports or check (later 2015)
Process
Report Recipient
19,927 households
Treatment Groups
Base group
(first mail
survey only)
Follow-up
group
1/2
1/2
2/3
1/3
The survey letters included three variations intended to remind consumers of different features
of the HERs. Figure 4 was the “Standard” version. In the “Comparison” version, the sentence
“Remember that Home Energy Reports compare your energy use to your neighbors’ use
was added after “we want to know what you think about them” in the introductory paragraph.
In the Environmental version, “Remember that Home Energy Reports help you to reduce
your environmental impact was added in that same place.
In a typical Opower HER program, around one percent of consumers dislike HERs enough to
take the time to opt out. If time has any positive value, this implies a negative WTP for HERs.
To correctly measure the distribution of WTP in such an opt-out program, it is thus necessary to
allow consumers to reveal negative WTP. We designed the MPL to do this, by asking consumers
to choose between “four more HERs plus a $10 check” and checks of less than $10. For example,
consumers who choose “a check for $9” instead of “four more HERs plus a $10 check” are giving up
$1 to not receive four more HERs, meaning that their WTP must be no greater than $-1. Answers
to the seven-question MPL place a respondent’s WTP into eight ranges, which are symmetric about
zero: (−∞, 9], [9, 5], [5, 1], [1, 0], [0, 1], [1, 5], [5, 9], and [9, ).
The survey’s final question was, “Think back to when you received your first Home Energy
Report. Did you find that you used more or less energy than you thought?” This measures the
extent to which HERs caused consumers to update beliefs about relative usage.
12
Figure 4: Mail Survey
CHGE_0009_WELCOME_LETTER_SURVEYA
Tell us what you think — and earn a check for up to $10!
Central Hudson has been sending you Home Energy Reports since last fall, and we want to
know what you think about them. Would you take a moment to complete the survey below?
For each question, please ll in one box with your answer.
What happens next?
1. When you’re nished, mail the survey back to us in the enclosed prepaid envelope.
2. We will use a lottery to draw one of the rst seven questions, and we’ll mail you what you
chose in that question — either a check or a check plus four more Home Energy Reports.
Thank you!
Your participation will help us make these reports even more useful for you. If you have any
questions, please email us at HERSurvey@cenhud.com or call (845) 486-5221.
Somewhat more Much moreSomewhat lessMuch less
About what I thought
Which would you prefer?
7.
Which would you prefer?
6.
Which would you prefer?
5.
Which would you prefer?
4.
8.
Think back to when you received
your rst Home Energy Report. Did
you nd that you used more or less
energy than you thought?
A $10 check
$10
4 more Home Energy
Reports PLUS a $1 check
4
+
$1
Which would you prefer?
3.
Which would you prefer?
2.
Which would you prefer?
1.
A $10 check
$10
A $10 check
$10
A $10 check
$10
A $9 check
$9
A $5 check
$5
A $1 check
$1
4 more Home Energy
Reports PLUS a $5 check
4
+
$5
4 more Home Energy
Reports PLUS a $9 check
4
+
$9
4 more Home Energy
Reports PLUS a $10 check
4
+
$10
4 more Home Energy
Reports PLUS a $10 check
4
+
$10
4 more Home Energy
Reports PLUS a $10 check
4
+
$10
4 more Home Energy
Reports PLUS a $10 check
4
+
$10
OR
OR
OR
OR
OR
OR
OR
Account Number: xxxx-xxxx-xx-x
13
A randomly-selected 2/3 of HER recipients were sent a follow-up mail survey on May 26th,
2015. This was not part of an HER and was sent through a separate vendor, so the outbound
envelope had a different originating address than the HERs. The survey and Business Reply Mail
return envelope were identical to the first mail survey. The “base group” was not sent the follow-up
mail survey.
In June, July, and early August 2015, an independent survey research firm surveyed the entire
HER treatment group by phone. Each phone number was called up to eight times until the
household completed the survey or declined to participate. The beginning of the phone survey
parallels the mail survey, except that we used a three-question version of the same MPL that
dynamically eliminated questions whose answers were implied by earlier answers.
8
We then asked
a belief update question parallel to the mail survey and a series of additional questions to elicit
beliefs about energy cost savings and qualitative evaluations of the HERs. Appendix A presents
the full phone survey questionnaire. A condensed version is:
1. [Multiple price list]
2. Did your first Report say you were using more or less than you thought?
3. Do you think that receiving four more Reports this fall and winter would help you reduce
your natural gas use by even a small amount?
(a) If Yes: How much money do you think you would save on your natural gas bills if you
receive four more Reports?
4. How much money do you think the average household has saved since last fall?
5. How would you like the Reports if they didn’t have the neighbor comparison graph?
6. Do the Reports make you feel inspired, pressured, neither, or both?
7. Do the Reports make you feel proud, guilty, neither, or both?
8. Do you agree/disagree with: “The Reports gave useful information that helped me conserve
energy.”
9. Do you have any other comments about the Reports that you’d like to share?
8
We began by asking question 4 from the mail survey. If the respondent preferred HERs+a $10 check, we asked
question 6. If the respondent preferred HERs+a $5 check on question 6, we asked question 7, whereas if the respondent
preferred a $10 check on question 6, we asked question 5. If the respondent preferred a $10 check on question 4, we
asked question 2. If the respondent preferred HERs+a $10 check on question 2, we asked question 3, whereas if the
respondent preferred a $5 check on question 2, we asked question 1.
14
If the phone survey respondent reported that he or she had already returned the mail survey,
the phone survey skipped directly to question 3. Questions 6 and 7 were designed to measure
whether the HERs tend to generate positive or negative affect, to provide suggestive evidence on
whether HERs affect “moral utility” or act as a psychological tax or subsidy. The words “inspired,”
“proud,” and “guilty,” were drawn from the Positive and Negative Affect Schedule (Watson, Clark,
and Tellegan 1988), a standard measure in psychology. We added the word “pressured” because
we hypothesized that it might be relevant in this context.
Both the mail and phone MPLs clearly stated at the outset that they were incentive-compatible.
The mail survey stated, “We will use a lottery to draw one of the first seven questions, and we’ll mail
you what you chose in that question.” The phone survey script stated, “These are real questions:
Central Hudson will use a lottery to pick one question and will actually mail you what you chose.”
Once all survey responses were collected, one of the seven MPL questions was randomly selected for
each respondent, and the respondent received what he or she had chosen in that question: either
a check from Central Hudson or a check plus four more HERs in the 2015-2016 heating season.
9
The surveys did not state the consequences of non-response. In reality, households that did not
respond to the survey did not receive a check, and they are scheduled to receive four more HERs
in the 2015-2016 heating season.
III Data
There are five data sources: the utility’s natural gas bill data, neighbor comparisons, customer
demographic data, mail surveys, and phone surveys.
Central Hudson reads customers’ natural gas meters on very regular bi-monthly cycles: 95
percent of billing period durations are between 55 and 70 days. Central Hudson measures natural
gas use in hundred cubic feet (ccf). As we discuss further in Section V, Central Hudson uses a
decreasing block tariff, and the average marginal retail price during the post-treatment period is
$0.98 per ccf. We observe gas use for each household in treatment or control for all meter read
dates between September 1, 2013, and May 22, 2015.
The key feature of the Social Comparison Module in Panel (a) of Figure 2 is a bar graph
comparing the household’s use on its previous bill to the mean and 20th percentile of the distribution
of neighbors’ use. We observe that mean and 20th percentile for all HERs, including HERs that
control group households would have received.
Table 1 presents demographic variable summary statistics. “Baseline use” is mean use per day
across all meter read dates in the first 365 days of our sample, from September 2013 through August
2014. Hybrid auto share is the share (from 0-100) of vehicles registered in the census tract in 2013
9
Because Central Hudson needs to continue the program to satisfy regulatory requirements under the Energy
Efficiency Portfolio Standard, we placed 98.6 percent probability on the first question, on which 94 percent of
respondents chose HERs. The remaining six questions were each selected with 0.2 percent probability.
15
that were hybrids. All other variables are from a demographic data vendor and are matched to the
utility account holder. These variables are from a combination of public records, survey responses,
online and offline purchases, and statistical predictions, and most are likely measured with error.
Some households in the population could not be matched to demographic data, in which case we
use mean imputation. Neither attenuation bias nor imputation affects our arguments because we
do not use these variables to estimate unbiased covariances. Instead, we will use these covariates
primarily for prediction.
Table 1: Demographic Variable Summary Statistics
Variable Obs Mean SD Min Max
Baseline use (ccf/day) 19,898 2.09 1.64 0 16.0
Income ($000s) 15,557 94.4 81.9 10 450
Net worth ($000s) 15,557 195 288 -30 1500
House value ($000s) 19,927 215 192 0 2527
Education (years) 19,475 13.6 2.44 10 18
Male 16,811 0.51 0.50 0 1
Age 17,282 50.7 16.1 19 99
Retired 16,728 0.04 0.20 0 1
Married 15,406 0.59 0.49 0 1
Rent 17,561 0.30 0.46 0 1
Single family home 17,734 0.68 0.46 0 1
House age 14,885 59.7 40.2 0 115
Democrat 18,080 0.16 0.55 -1 1
Hybrid auto share 19,728 1.03 2.78 0 18.2
Green consumer 18,883 0.15 0.35 0 1
Wildlife donor 16,728 0.06 0.24 0 1
Profit score 19,784 0.00 1.00 -1.65 2.09
Buyer score 14,967 0.00 1.00 -2.03 1.47
Mail responder 17,734 0.47 0.46 0 1
Home improvement 16,728 0.13 0.33 0 1
Notes: This table summarizes the demographic variables. Baseline use is mean natural gas use (in hundred
cubic feet per day) between September 2013 and August 2014. Hybrid auto share is the Census tract average.
All other variables are from a demographic data provider.
These data may overestimate household income, but the population is relatively wealthy: ac-
cording to Census data, the mean household is in a census block group with median household
income of $64,000. Education is top-coded at 18 years for people with any graduate degree. Demo-
crat takes value 1 for Democrats and -1 for Republicans. Green consumer is a binary measure of
environmentalism based on income, age, and purchases of organic food, energy efficient appliances,
and environmentally responsible brands. Wildlife donor is an indicator for whether the consumer
has contributed to animal or wildlife causes. These two variables could proxy for environmentalism
and thus interest in energy conservation. Profit score and buyer score measure the consumer’s like-
16
lihood of paying debts and making purchases; we normalize both to mean 0, standard deviation 1.
Mail responder is an indicator for whether anyone in the household has purchased by direct mail.
Home improvement is an indicator for home improvement transactions or product registrations,
which could proxy for interest in making energy-saving improvements in response to HERs.
Our household covariates X are these same variables, except that we take natural logs of
income, net worth, house value, age, and house age.
10
Appendix Tables A1 and A2 confirm that
these covariates are not more correlated with HER recipient group or survey group assignment than
would be expected by chance.
Table 2 summarizes response rates. Households that were sent the follow-up mail survey were
more than twice as likely to respond as base group households, who only received the survey in
their final Home Energy Report. 899 households (9.5 percent of households that were surveyed)
responded to the mail survey, and 1690 households (17.9 percent) completed the phone survey.
2312 households (24.5 percent) responded to one or both surveys.
Table 2: Survey Response Rates
Response rate (%)
Mail survey 9.5
Base mail survey group 4.5
Follow-up mail survey group 12.0
Phone survey 17.9
Both mail and phone surveys 2.9
Mail and/or phone surveys 24.5
Figure 5 summarizes responses to the qualitative evaluations of the HERs from the phone survey.
Forty-nine percent would like HERs less if the neighbor comparisons were removed, against only 11
percent who would like them more. Seventy-three percent of respondents agree or strongly agree
that HERs provide useful information. For most respondents, the HERs did not generate positive or
negative affect: 57 percent said that the HERs made them feel neither “inspired” nor “pressured,”
and 63 percent said that HERs made them feel neither “proud” nor “guilty.” When the HERs did
induce some positive or negative affect, it was much more likely to be positive (inspired or proud)
instead of negative (pressured or guilty). These qualitative results suggest that most people “like”
HERs, i.e. that they would want HERs if they were free.
10
Some households have negative net worth, so before taking the natural log, we add a constant to all observations
such that the minimum value is $1.
17
Figure 5: Qualitative Evaluations of Home Energy Reports
0 10 20 30 40
Percent of respondents
Much
less
Somewhat
less
About
the same
Somewhat
more
Much
more
How would you like Reports without neighbor comparisons?
0 10 20 30 40 50
Percent of respondents
Strongly
disagree
Disagree Neither Agree Strongly
agree
The Reports gave useful information
0 20 40 60
Percent of respondents
Inspired Pressured Neither Both
Do the Reports make you feel ...
0 20 40 60
Percent of respondents
Proud Guilty Neither Both
Do the Reports make you feel ...
Notes: This figure presents qualitative evaluations of Home Energy Reports from the phone survey.
III.A Constructing Willingness-to-Pay
Complete and internally-consistent responses to the multiple price list allow us to place each re-
spondent’s willingness-to-pay into one of eight ranges. For simplicity, we assign one unique WTP
for each range. For the six interior ranges, we assign the mean of the endpoints. For example, we
assign a WTP of $-3 for all responses on [5, 1] and a WTP of $0.50 for all responses on [0, 1]. For
the unbounded ranges, i.e. WTP less than $-9 or greater than $9, we assume that the conditional
distribution of WTP is triangular, with initial density equal to the average density on the adjacent
range.
11
This gives $14.48 and $-12.36, respectively, as the conditional mean WTPs on [9, ) and
11
For example, the density on [5, 9] is 2.49 percent of respondents per dollar, and the mass above $9 is 20.43 percent
of respondents. We assume that this 20.43 percent of respondents is distributed triangular on [9, ), with maximum
density of 2.49 percent per dollar at $9 decreasing to zero density above some upper bound. This gives an upper
18
(−∞ 9]. We also present results under alternative assumptions.
For the 2.9 percent of households that responded to both the phone and mail surveys, we use
the phone survey WTP in order to be consistent with the phone survey’s additional qualitative
questions. This decision does not significantly affect our welfare estimates, because the mean WTP
for these households was almost identical on the mail and phone MPLs. 87 households returned
more than one mail survey with valid WTP; in these cases we use the first survey we received.
Ten households opted out during the program’s first year. These households will not receive
HERs in the program’s second year, so we will exclude them from the mean WTP estimation in
Table 6 and from the welfare analyses in Sections V-VI.
III.B Do the Surveys Correctly Measure Willingness-to-Pay?
While standard in academic economics and lab settings, multiple price list surveys are relatively
unusual in field settings. One concern in designing this study was that respondents would not
understand the MPL, rendering WTP estimates noisy or meaningless. We devoted substantial
effort to designing easily-understandable surveys and piloting the mail and phone instruments.
Table 3 shows that the vast majority of returned mail surveys were filled out in a way that allows
us to construct a valid WTP. 14.7 percent of mail surveys were incomplete, usually because the
respondent answered only one of the seven questions. 11.1 percent of phone respondents heard the
introduction to the MPL but terminated the interview before completing all three questions. Only
2.1 percent of mail MPL responses were complete and internally inconsistent. Three mail MPL
responses (0.3 percent) were both incomplete and internally inconsistent. Because the phone MPL
was shortened by not asking questions whose answers were implied by previous responses, there
was no opportunity to be internally inconsistent on the phone survey. These figures suggest that
consumers generally understood the MPL and gave meaningful answers.
12
Table 3: Multiple Price List Response Statistics
Mail Phone
Percent incomplete 14.7 11.1
Percent complete and internally inconsistent 2.1 N/A
Percent complete and internally consistent 83.2 88.9
bound of $25.43. The mean of WTP on [9, ) is thus $14.48. The mean WTP on (−∞ 9] is determined by an
analogous calculation, given that the density on [9, 5] is 1.25 percent per dollar and the mass below $-9 is 6.31
percent.
12
We listened to about 25 early phone survey interviews. Because the MPL questions are unusual, respondents
would sometimes pause to process the first question but would then provide a considered answer to that and the next
two MPL questions.
19
WTP is very strongly correlated with the qualitative assessments of the HERs from questions 3-
9 of the phone survey. As would be expected, WTP is strongly positively correlated with reporting
that future HERs would save them more money (question 3), feeling inspired and proud (questions
6 and 7), agreeing that HERs give useful information (question 8), and with positive additional
comments about the HERs (question 9).
13
Also as expected, WTP is strongly negatively correlated
with preferring that HERs not have neighbor comparisons (question 5) and with feeling pressured
(question 6). The only result that we did not expect was that feeling guilty is positively associated
with WTP, but the relationship is not significant after conditioning on predicted savings, which
suggests that consumers do not like guilt per se they like guilt only because it helps them reduce
expenditures. See Appendix Table A4 for formal results.
As we detail below, 35 percent of respondents reported negative WTP. In Appendix Table A5,
we confirm that negative WTP is strongly associated with the same set of qualitative assessments in
expected ways. Furthermore, all six households that opted out and also responded to the survey had
negative WTP. These strong correlations build confidence that both the MPL and the qualitative
questions elicited meaningful responses.
87 households returned more than one mail survey with valid WTP. These could have been filled
out by different people in the same household, or by one person who wanted to ensure that his or
her response was received. Thus, one might expect responses to be correlated, but not perfectly
correlated. WTP is indeed very highly correlated across the two responses within these households,
implying that people understood the mail MPL enough that responses were consistent within a
person or household. See Appendix Table A6 for formal results.
277 households responded to both the phone and mail surveys, of which 224 have valid WTP
from both surveys and 259 responded to the belief update question on both surveys. Because the
phone survey called for skipping these questions if the respondent reported already returning the
mail survey, it seems likely that duplicate mail and phone responses came from different people
in the same household. Here again, one might thus expect responses to be correlated, but not
perfectly correlated. Appendix Table A6 confirms this: WTP, an indicator for negative WTP, and
belief updates are all strongly correlated within household across the mail and phone surveys. WTP
and answers to the belief update question within household are almost equally strongly correlated
across the mail and phone surveys, which suggests that the MPL questions to elicit WTP were no
more confusing or cognitively demanding than the belief update question, where responses were on
the familiar Likert scale. Across the 224 households with valid WTP from both surveys, the mean
13
456 phone survey respondents offered comments in response to question 9. Of these, 170 were positive, such as
“They’re terrific. I like the way they’re laid out and easy to understand,” and “I think you did it right. It has all
the information owners need. I think it’s an excellent idea,” and “Detailed and a great thing. Helps me monitor my
usage.” 213 were neutral, often including complaints about high energy prices. 73 were negative, such as “I do not
understand it; it does not make sense,” and “It’s a waste of paper. If they did not send those reports maybe they
could lower the delivery charges,” and “The money would be better spent reducing the cost of energy rather than
sending the reports.”
20
WTP and the share of negative WTPs are almost exactly identical between the mail and phone
surveys. This implies that neither survey format generated an idiosyncratic bias in mean WTP.
In general, these results suggest that respondents understood the MPLs and that the survey
instruments correctly elicited WTP. Here we address some remaining reasons why that might not
be the case.
First, time discounting could affect WTP. For example, if respondents have annual discount
rates of six percent and thought that checks would arrive six months before the HERs’ benefits,
their WTP would be about three percent lower than if they thought that checks would arrive at the
same time as the benefits. Such a small difference would not be enough to meaningfully affect the
welfare calculations below. Conceptually, we want all components of welfare to be discounted to
the time at which the implementation costs are incurred for the second year of HERs. In practice,
the checks will arrive in late 2015, although we intentionally did not say this on the survey because
we did not want to make time discounting salient.
Second, WTP might be lower if paying out of pocket instead of trading off against an unexpected
windfall from a check. If this results from a behavioral bias, it is not obvious what WTP to respect
for welfare analysis.
Third, WTP might be higher with per-month subscription pricing instead of a one-time check.
Because the monetary amounts are small and respondents pay for HERs from a future windfall
instead of from their existing funds, it is unlikely that credit constraints could explain a preference
for subscription payments. If WTP differs with subscription pricing vs. a one-time check due to a
behavioral bias such as focusing bias (Koszegi and Szeidl 2013), it is not clear that the subscription
pricing WTP would be the one to respect for welfare analysis.
Fourth, Beauchamp et al. (2015) demonstrate a compromise effect in multiple price lists that
is, that people tend to favor the middle option of an MPL. Because our phone MPL questions
were given sequentially, however, this concern does not apply to our phone MPL. Furthermore,
for the households that responded to both phone and mail MPLs, the mean WTPs from the
two instruments are indistinguishable. This suggests that the mail MPL is also unaffected by a
compromise effect.
Models of contextual inference such as Kamenica (2008) suggest two reasons why our mail MPL
would not be biased by a compromise effect. First, there is little imperfect information: the MPL
asks simple questions about a familiar good and, unlike Beauchamp et al. (2015), there are no risky
prospects that could increase cognitive complexity. Second, consumers were unlikely to infer that
they are “middlebrow” relative to the bounds of the MPL: the distribution of responses suggests
that the first two questions had relatively obvious answers (very few people were willing to pay
significant amounts to avoid HERs) while the last two questions did not (many people were in the
top two WTP ranges).
21
IV Empirical Analysis
In this section, we estimate parameters needed for the welfare analysis prescribed by Equation
(6). We begin by estimating the treatment effects on energy use, which determine the externality
benefits and profit losses. We then calculate average WTP, which will be our measure of the
consumer welfare effects.
IV.A Effects on Energy Use
To estimate the effect of Home Energy Reports on energy use, we limit the sample to post-
treatment data and control for pre-treatment usage, allowing the coefficient to vary over time.
14
Post-treatment is defined as any meter read after the household’s first HER was generated. The
first HERs were generated on October 13th, 2014, and first HERs had been generated for 98 percent
of households by December 8th. We also observe generation dates for HERs that would have been
sent to the control group.
Y
it
is household i’s average natural gas use (in ccf/day) over the billing period ending on date
t, and R
i
is a recipient group indicator variable.
˜
Y
it
is the average usage during the billing period
ending 12 months prior, and ν
m
allows separate coefficients on
˜
Y
it
by month. ω
mq
is a vector of
indicators for baseline usage quartile interacted with the month containing date t. The estimating
equation is:
Y
it
= τR
i
+ ν
m
˜
Y
it
+ ω
mq
+ ε
it
. (7)
Standard errors are clustered by household to allow for arbitrary serial correlation.
Figure 6 presents estimates of the τ parameter, separately for each pair of months after the
baseline period ends on August 31, 2014. The several months of pre-treatment observations allow
us to test for spurious pre-treatment effects, and there are indeed zero statistical effects for meters
read in September and October. There are also zero statistical effects for meters read in November
and December. For the coldest part of winter, the billing periods ending in January through April,
the recipient group reduces gas use by about a 0.04 ccf/day. The standard errors for May 2015
widen out substantially because the final meter read in our current data is May 22nd, so there are
only 22 days of reads underlying that data point instead of a full two months.
14
Natural gas use is highly seasonal: average consumption drops below 0.5 ccf/day in the summer and rises above
4 ccf/day in the winter. Thus, controlling for seasonal fluctuations is crucial for improving statistical efficiency. Note
that estimating in logs and transforming the percent savings back into levels is not a consistent estimator of the level
of average savings due to Jensen’s Inequality. For this reason, Allcott (2011, 2015) and Allcott and Rogers (2014)
estimate effects in levels.
22
Figure 6: Effects of Home Energy Reports on Natural Gas Use
Treatment begins
-.15 -.1 -.05 0 .05
Treatment effect (ccf/day)
Sep 2014 Nov 2014 Jan 2015 Mar 2015 May 2015
Treatment effect 90% confidence interval
Notes: This figure presents estimates of Equation (7), allowing the treatment effect to vary by two-month
periods. Dependent variable is natural gas use in ccf/day, where “ccf means hundred cubic feet. For
context, one ccf is worth about $0.98 at retail prices. Robust standard errors, clustered by household.
Table 4 presents estimates of Equation (7). Column 1 controls only for the 12-month lag usage
˜
Y
it
, and columns 2-4 progressively add controls. The estimates are very stable. In column 4, our
primary estimate of the average treatment effect through May 22nd is a 0.0278 ccf/day decrease.
This sums to $5.52 of retail natural gas cost savings through that date. Control group natural gas
use averages 3.66 ccf/day in the post-treatment period, so the treatment effect amounts to 0.76
percent of counterfactual use. In percent terms, this is substantially less than the typical effect of
HERs on electricity use (Allcott 2015), but Opower’s natural gas-focused programs typically have
smaller percent effects. Given the small sample relative to other HER programs, it is not surprising
that the t-statistics are around 1.8 instead of larger.
23
Table 4: Effects on Natural Gas Use in the Program’s First Year
(1) (2) (3) (4) (5)
-1(Report recipient) -0.0316 -0.0308 -0.0294 -0.0278 -0.0301
(0.0161)* (0.0162)* (0.0161)* (0.0162)* (0.0174)*
Observations 49,873 49,873 49,873 49,873 49,873
R
2
0.819 0.822 0.825 0.827 0.825
12-month lag usage Yes Yes Yes Yes Yes
Month indicators Yes Yes Yes Yes
12-month lag use×month Yes Yes Yes
Baseline use quartile×month Yes Yes
Weights Duration Duration Duration Duration Duration×
ˆ
P r(Responded|X)
Notes: This table presents estimates of Equation (7), using post-treatment data only. Dependent variable
is natural gas use in hundred cubic feet (ccf) per day. Control group sample mean usage is 3.66 ccf/day.
Robust standard errors, clustered by household, in parentheses. *, **, ***: statistically significant with 90,
95, and 99 percent confidence, respectively.
Column 5 presents estimates with the sample re-weighted on observables X to match the survey
respondents with valid WTP, using fitted probabilities from probit estimates in Appendix Table
A7. The effect is slightly although not statistically significantly larger, which suggests that
survey respondents have somewhat larger energy savings, perhaps because they are more engaged
with the HERs.
IV.B Willingness-to-Pay
Figure 7 presents the distribution of WTP, with separate bars for the mail vs. phone survey
responses. Fewer households responded via mail, so all mail bars are shorter. Mail respondents
also have slightly higher willingness to pay, with relatively less density in the negative range and
more in the positive range. Thirty-five percent of respondents reported weakly negative WTP,
although most of that group is close to indifferent: 56 percent of negative WTPs are between $0
and $-1. This dispersion in WTP, and in particular the result that a meaningful share of the
population is willing to pay to avoid being nudged, will motivate the analysis of opt-in programs
and targeting in Section VI.
24
Figure 7: Willingness-to-Pay for Home Energy Reports
0 5 10 15
Percent of respondents
-9 or less [-9,-5] [-5,-1] [-1,0] [0,1] [1,5] [5,9] 9 or more
Mail Phone
Notes: This figure presents the histogram of willingness-to-pay for four more Home Energy Reports, with
all survey responses weighted equally.
Table 5 presents correlates of WTP. To simplify the presentation of the many X covariates,
column 1 presents the post-Lasso estimator that is, we use Lasso for variable selection, then
present the OLS regression of WTP on the selected covariates; see Belloni and Chernozhukov
(2013). The correlations are intuitive: point estimates suggest that income and buyer score are
positively associated with WTP, retirees have lower WTP, and renters have lower WTP, likely
because they do not have the ability or incentive to make energy-saving capital stock changes in
response to HERs. People who have donated to animal and wildlife causes have higher WTP,
perhaps because this proxies for interest in environmental conservation.
We carry out the welfare analysis for two populations: the subset of households that responded
to the MPL, and the entire set of households in the HER recipient group. The latter welfare
calculation requires extrapolating from respondents to non-respondents. Our primary approach to
extrapolating to the full HER recipient population is to use inverse probability weights (IPWs)
to re-weight the sample of respondents with valid WTP to match the full recipient population on
observables. See Appendix Table A7 for the probit estimates used for this reweighting.
25
Table 5: Correlates of WTP and Their Correlation with Response
(1) (2)
Dependent variable: WTP Have WTP
Baseline use (ccf/day) 0.0912 0.0120
(0.101) (0.00897)
ln(Income) 0.0427 0.0254
(0.243) (0.0227)
Retired -1.905 0.103
(0.836)** (0.0798)
Married 0.684 -0.0108
(0.416)* (0.0370)
Rent -0.724 -0.109
(0.444) (0.0399)***
Single family home 0.289 0.0569
(0.425) (0.0384)
Wildlife donor 1.027 0.226
(0.623)* (0.0652)***
Buyer score 0.296 0.0438
(0.221) (0.0199)**
Observations 2137 9439
Notes: Column 1 presents estimates from a post-Lasso estimator, in which OLS is run on covariates
selected by Lasso, using equally-weighted observations. For the Lasso estimates only, each variable was
normalized to standard deviation one. Column 2 presents marginal effects probit estimates from a model
where the same selected covariates are used to predict whether a household responds to a survey and has
valid WTP. Robust standard errors in parentheses. *, **, ***: statistically significant with 90, 95, and 99
percent confidence, respectively.
To give intuition for how re-weighting on observables will affect estimated WTP, column 2 of
Table 5 presents marginal effects probit estimates of how the WTP predictors from column 1 are
associated with whether a household responds and has valid WTP. The fact that most coefficients
have the same signs in columns 1 vs. 2 suggests that survey responders are positively selected on
observables. One mechanism that works against this is that retirees have lower WTP but are more
likely to respond to surveys.
Table 6 presents estimates of mean WTP, with standard errors in parentheses. Column 1
presents unweighted estimates, while column 2 uses row-specific IPWs to weight each row’s sample
to match the full HER recipient group on observables. Mail survey responses are divided in two
different ways: households randomly assigned to the base vs. follow-up groups and households that
actually returned the first survey vs. the follow-up survey. The bottom row of Panel A reports
that the unweighted mean WTP for the 24.5 percent of households that returned the survey is
$2.98. When re-weighted on observables to match the full recipient population, the mean falls to
$2.85, confirming that respondents are slightly positively selected on observables. We use this row
26
of estimates as the base case for welfare analysis.
Table 6 shows that respondents to the first mail survey are positively selected. Unweighted
mean WTP is marginally significantly higher for the randomly-assigned base group vs. follow-up
group ($4.33 vs. $3.22, p 0.117), and mean WTP is much higher for households in either group
that returned the first mail survey vs. those that returned only the follow-up survey ($4.37 vs.
$2.58, p 0.001). This positive selection is almost mechanical: people who do not open and read
HERs likely have WTP closer to zero than people who do, and the former group would not have
even seen the first mail survey.
Table 6: Estimates of Mean Willingness-to-Pay
(1) (2)
Unweighted Weighted
Panel A: Mean WTP
Mail 3.40 3.29
(standard error) (0.26) (0.29)
Base group 4.33 3.66
(0.57) (0.61)
Follow-up group 3.22 3.14
(0.29) (0.33)
Returned first survey 4.37 4.02
(0.35) (0.42)
Returned follow-up survey 2.58 2.64
(0.37) (0.41)
Phone 2.79 2.70
(0.18) (0.19)
Combined 2.98 2.85
(0.16) (0.16)
Panel B: p-Values of Differences
Base vs. follow-up mail 0.117 0.510
Returned first vs. returned follow-up mail 0.001 0.019
Mail vs. phone 0.059 0.080
Base group vs. phone 0.026 0.171
Follow-up group vs. phone 0.213 0.221
Returned first survey vs. phone 0.000 0.003
Returned follow-up survey vs. phone 0.606 0.894
Notes: Estimates in column 2 are re-weighted to match the HER recipient group on observables.
By contrast, the phone survey and follow-up mail survey, which was sent from a different
outbound address and was not part of an HER, are not subject to this form of positive selection.
Indeed, unweighted mean WTP is statistically and economically very similar for phone survey vs.
follow-up mail survey respondents ($2.79 vs. $2.58, p 0.606), and the weighted means are almost
27
identical ($2.70 vs. $2.64, p 0.894). This implies that these two samples are either not selected
from non-respondents or that they have the same sample selection bias despite coming from two
different forms of contact (mail vs. phone). Appendix Table A8 presents suggestive evidence in
favor of the former explanation, showing that WTP does not vary statistically or economically
for households that responded on earlier vs. later phone survey attempts. Extrapolating this
suggests that phone survey non-responders, who would in theory have responded on some eventual
phone survey attempt, would have similar mean WTP. (This logic draws on the intensive follow-up
approach used by DiNardo, McCrary, and Sanbonmatsu (2006) and others.)
If respondents to the first mail survey are positively selected on unobservables but the remainder
of mail and phone survey respondents are selected only on observables, then an unbiased estimate
of mean WTP for the full HER recipient population can be constructed by giving first mail survey
respondents weight of one (representing themselves only), and re-weighting phone and follow-up
mail respondents to match the remaining HER recipients on observables.
15
We do this by repeating
the previous IPW exercise but fixing the weights of first mail survey respondents to one. This gives
a predicted population mean WTP of $2.71.
IV.B.1 Measuring Moral Utility
Our model in Section I includes a moral utility term that does not appear explicitly in most models.
Does moral utility have any empirical content? And if so, are social comparisons a moral tax on
“bad” behavior, as suggested by the concerns of Glaeser (2006) and others? The model generates
four predictions that allow us to shed light on these questions.
First, if there is no moral utility, then a nudge that does not affect behavior will not affect
consumer welfare. To see this, recall from Equation (4) that consumer welfare change is V =
˜e · p
e
+ f +M. If there is no behavior change, then ˜e · p
e
+ f = 0. If M = 0 also, then
V = 0, so WTP should be zero. 39 percent of respondents to question 3 on the phone survey
predicted that future HERs would not help them reduce their natural gas use “by even a small
amount.” These consumers have wide dispersion in WTP, with observations in all eight ranges
and standard deviation even larger than for respondents predicting non-zero savings. Moral utility,
or some other unmodeled factor unrelated to financial gain or consumption utility, is needed to
explain this non-zero WTP for consumers predicting zero behavior change.
Second, if HERs act only as a moral tax, i.e. they increase µ but have no other effect, then V <
0. As we saw above, however, average WTP is positive. HERs almost certainly have a meaningful
informational component, and we saw above that 73 percent of phone survey respondents agree
15
More precisely, denote S
1
as the set of first mail survey respondents, and denote
ˆ
Pr(S
i
|X
i
) as the conditional
probability of survey response in the sample of HER recipients excluding S
1
, which we estimate in column 8 of
Appendix Table A7. If w
i
is WTP for household i and N
n
=9964 is the number of HER recipients, the predicted
population mean WTP is
P
i∈S
1
w
i
+
P
i /∈S
1
w
i
ˆ
Pr(S
i
|X
i
)
/N
n
.
28
that HERs give useful energy conservation information. Thus, it is clear that HERs do not act only
as a moral tax.
Third, if HERs increase the moral price µ, this should tend to decrease moral utility more
for heavy users. Intuitively, a moral price increase hurts heavy users more because it accrues
over more inframarginal units, just as an actual price change affects expenditures more for high-
demand consumers.
16
Testing this requires us to measure ∆M . The phone survey questions asking
consumers if HERs made them feel inspired, pressured, proud, or guilty were designed to help proxy
for positive and negative aspects of moral utility. Define A
i
as a vector of four indicator variables
capturing individual i’s responses to those four affect questions, and define E
i
as predicted savings
from question 3. We regress WTP w
i
on A
i
and E
i
in the sample of phone survey respondents:
w
i
= β
0
+ β
E
E
i
+ β
A
A
i
+
i
. (8)
This is a rough empirical analogue to Equation (4), in which w
i
proxies for V , β
E
E
i
proxies for
˜e·p
e
+f (under the assumption that f scales proportionally with savings ˜e·p
e
), and β
A
A
i
proxies for M
i
. Estimates show that predicted savings E
i
is strongly positively associated with
WTP, while feeling inspired and pressured, respectively, are positively and negatively conditionally
associated with WTP with greater than 90 percent confidence. See Appendix Table A9 for formal
results. Using these estimates, we fit
d
M
i
=
ˆ
βA
i
.
The first row of Table 7 presents results of univariate regressions of seven different variables
on average post-treatment usage ˜e(θ
1
), measured in ccf/day. Column 1 reports that a one ccf/day
increase in usage is unconditionally associated with a $0.111 increase in WTP. Heavier users have
higher predicted savings, are more likely to report negative affect (feeling guilty and pressured),
and are less likely to report positive affect (feeling proud and inspired). Column 7 reports that
heavier usage is associated with reduced moral utility. This suggests that the HERs do increase µ.
Results are similar when regressing the same outcomes on baseline usage instead of post-treatment
usage.
16
This requires a bound on the usage decrease for heavier users relative to lighter users. Intuitively, if the existing
moral price was positive and heavy users decrease usage by much more than light users, heavy users could gain
moral utility relative to light users by reducing inframarginal moral utility losses. Formally, decompose M into
M = ∆m µ · ˜e(θ
1
) µ
0
· ˜e and take
dM
d˜e(θ
1
)
= µ µ
0
d˜e
d˜e(θ
1
)
. We think of the moral price µ as being weakly
positive. If µ
0
> 0, then
dM
d˜e(θ
1
)
< 0 if
d˜e
d˜e(θ
1
)
>
µ
µ
0
, i.e. if behavior change does not increase too much in ˜e(θ
1
). If
µ
0
= 0,
dM
d˜e(θ
1
)
< 0 holds unambiguously.
29
Table 7: Measuring Moral Utility
(1) (2) (3) (4) (5) (6) (7)
Predicted
WTP savings Proud Guilty Inspired Pressured
d
M
Average usage 0.111 0.443 -0.0100 0.0119 -0.0330 0.0130 -0.0349
(0.0641)* (0.212)** (0.00386)*** (0.00314)*** (0.00399)*** (0.00330)*** (0.0130)***
Mean comparison 0.240 1.159 -0.0369 0.0356 -0.0957 0.0399 -0.116
(0.156) (0.545)** (0.00973)*** (0.00787)*** (0.0107)*** (0.00825)*** (0.0316)***
Notes: This table presents results of univariate regressions of the dependent variable in each column on the
independent variable in each row. “Average usage” is mean post-treatment natural gas usage in hundred
cubic feet/day. “Mean comparison” is the average difference (in cubic feet) between own natural gas usage
and mean neighbor usage on the first year of HERs. Robust standard errors in parentheses. *, **, ***:
statistically significant with 90, 95, and 99 percent confidence, respectively.
In Section I, we remarked that our model nests a model in which moral utility depends on the
perceived social norm s: M = m
s
µ(e s). The variable “Mean comparison” is an empirical
analogue of (e s): it is the average difference (in units of cubic feet of natural gas) between own
natural gas usage and mean neighbor usage on the first year of HERs. Substituting (e s) for e
in the model generates the analogous prediction that if µ > 0, then (e s) should be negatively
correlated with M. The second row of Table 7 confirms that this is the case empirically. Results
are similar when regressing the same outcomes on (e s) from only the first HER.
We also find that WTP is $0.69 lower (p = 0.076) for the randomly-assigned “Comparison”
survey version that reminds people that the HERs compare their energy use to their neighbors’
use. This is consistent with the hypothesis that social comparisons are the part of the HERs that
reduce moral utility. The “Environmental” version does not statistically significantly affect WTP.
See Appendix Table A10 for formal results.
A fourth prediction is that if µ > 0 but m = 0, then M < 0. In words, if a nudge
increases the moral price but provides no other utility windfall, then it will decrease moral utility.
Alternatively, however, a nudge can both increase the moral price and provide some additional
utility m. In fact, the mean
d
M
i
fitted from above is $0.96, suggesting that m > 0.
In simple terms, these results reflect the fact that more people report positive affect than
negative affect, but heavy users are relatively less likely to report positive affect. In the context of
our model, these results imply that the HERs act through multiple channels: providing information,
increasing the moral price µ, and providing a “windfall” of positive affect m > 0 for the average
consumer.
30
V Welfare
In this section, we use the empirical estimates of energy savings and WTP to calibrate the welfare
formula from Equation (6). Before doing this, we make explicit two assumptions and calibrate
three additional parameters.
V.A Assumptions for Welfare Analysis
Our first assumption is that consumer i’s WTP w
i
equals the consumer welfare change V
i
from
the second year of the HER program:
Assumption 1: V
i
= w
i
This assumption is only plausible in situations where consumers are well-informed about what
the nudge is and, if the nudge addresses behavioral biases, are “sophisticated” about those biases.
For example, the assumption would fail for naive hyperbolic discounters evaluating a commitment
device or for individuals who are uninformed about the benefits and costs of a choice that is
being nudged. By contrast, this assumption is particularly plausible in our context. After receiving
several HERs, each of which is different but follows a similar structure, consumers are well-informed
about what HERs are and have a good sense of how future HERs would further inform or motivate
them.
17
Because WTP w
i
is for the second year of HERs, our welfare analysis is relevant only to
the program’s second year.
To estimate the natural gas savings ∆˜e from a second year of HERs, we would ideally compare
natural gas use at households that were randomly assigned to receive a second year of HERs vs.
households that received only the first year. Unfortunately, the Central Hudson program is too
small for such estimates to be sufficiently precise. Allcott and Rogers (2014) study three larger
programs that carried out similar tests. Based on their findings, we assume that the incremental
energy savings from a second year of HERs equals the savings accrued over the post-treatment
period observed in the energy use data:
Assumption 2: ˜e
i
= τ
i
· D
In this equation, τ
i
is individual i’s post-treatment savings in ccf/day, and D = 202 is the
number of days between November 1, 2014, and May 22, 2015.
18
The Allcott and Rogers (2014) results suggest that this assumption is not a bad approximation.
HERs have persistent effects even after they are no longer delivered, so the full impact of the first
17
Appendix D.A provides additional evidence on two biases that might be relevant in this context. There is
suggestive evidence that consumers overestimate the energy savings caused by HERs, which could bias WTP upward.
There is also suggestive evidence that consumers are overoptimistic, by which we mean that they tend to underestimate
their own energy use before the arrival of the first HER. However, there is no evidence that this optimism affects
WTP.
18
Of course, τ
i
is unobserved because we do not observe both potential outcomes for any individual, and this
assumption is stronger than we need. Aggregating across consumers, Assumption 2 implies that the average treatment
effect ˆτ from Table 4 is an unbiased estimate of the average ˜e. We use this for the welfare analysis below. For the
machine learning algorithms in Section VI, we need that Assumption 2 holds in expectation conditional on X.
31
year of HERs is almost certainly larger than observed through May 22. However, the incremental
impact of additional HERs declines after the first few are delivered. In essence, Assumption 2 is
that these two countervailing forces cancel exactly.
In any event, we shall see below that the social welfare estimates are not very sensitive to the
value of ˜e because retail marginal prices are very close to social marginal cost. In other words,
the term (π
e
φ
e
) that multiplies ˜e in the social welfare equation is approximately zero, so our
welfare calculations would change little under any plausible alternative to Assumption 2.
19
V.B Implementation Cost, Externality, and Markup Parameters
The welfare analysis requires three additional parameters: the implementation cost C
n
, the exter-
nality φ
e
, and the retail markup π
e
.
We calculate average cost C
n
using an accounting approach detailed in Appendix D.B. The
second year of an HER program entails both fixed costs F
n
and per-household marginal costs c
n
, so
C
n
= F
n
/N
n
+c
n
, where N
n
= 9964 is the number of nudge recipients. The per-household marginal
cost of the program’s second year is c
n
$2.06, almost entirely for printing and mailing HERs.
Opower and Central Hudson also incur an estimated $16,339 per year in costs to manage ongoing
programs. Central Hudson has three HER programs in addition to the one we study, for a total of
four programs and about 100,000 recipient households.
20
Some of the ongoing management costs
are effectively fixed costs per program, whereas others do not depend on the number of programs.
In our primary estimates, we allocate the $16,339 equally to each of Central Hudson’s 100,000
recipient households, giving F
n
/N
n
$0.16/household. We also present an alternative calculation
in which these costs are allocated equally to each of the four programs. This gives F
n
$4085 per
program, or F
n
/N
n
$0.41 per household in the program we study.
We include local air pollution and carbon dioxide externalities from natural gas combustion
as well as methane externalities from the natural gas supply chain. For local air pollutants, we
consider nitrogen oxides, particulate matter, and sulfur dioxide. We use the EPA (1995) AP-42
emission factors and marginal damages from Holland et al. (2015), whose key assumptions are a
$6 million value of a statistical life and a fine particulate dose response function from Pope et al.
(2002). Holland et al. provided us with county-specific marginal damages relevant for ground-level
emissions (i.e. homes instead of power plant smokestacks), and we take the mean across counties,
weighting by the number of households in the HER experiment. Local air pollutant damages amount
to $0.045/ccf. Using results from the U.S. Government Interagency Working Group on the Social
Cost of Carbon (2013), we use a $40 social cost of carbon, which translates to $0.264/ccf damages
19
We are slated to receive one more update of the natural gas use data, which will allow us to estimate the treatment
effect for the 2015-2016 heating season.
20
Different programs are well-defined in the sense that they have different specific customer sub-populations in
recipient and control groups. Different programs start at different times, may focus on different fuels (e.g. households
that purchase electricity but not natural gas), and have custom-designed elements on the HERs.
32
from natural gas combustion. Drawing on Howarth et al. (2012) and Abrahams et al. (2015), we
assume that three percent of natural gas escapes during drilling and transportation before arriving
in homes. We translate this to carbon dioxide equivalents using a methane global warming potential
of 34 from the Intergovernmental Panel on Climate Change (Myhre et al. 2013), giving an additional
$0.10/ccf externality. Thus, the total environmental externality φ
e
is $0.045+$0.264+$0.10$0.41
per ccf.
Central Hudson uses decreasing block pricing. Marginal prices consist of a constant marginal gas
supply charge, which passes through Central Hudson’s cost to acquire gas from wholesale pipelines,
plus constant marginal fees and taxes such as the “system benefit charge” used to fund energy
efficiency programs, plus decreasing marginal delivery charges, which allow Central Hudson to
recover additional fixed costs such as maintenance, customer service operations, meter reading, and
billing.
21
During the post-treatment period, Central Hudson’s usage-weighted marginal acquisition
cost was c
e
$0.586/ccf, while the usage-weighted marginal retail price was p
e
$0.983/ccf.
Thus, marginal retail prices exceed marginal acquisition costs by an average of π
e
$0.397/ccf.
22
When households use less gas due to HERs or any other conservation program, they reduce their
contribution to Central Hudson’s fixed costs by that π
e
$0.40/ccf, and this will eventually be
made up through higher prices.
23
This pricing approach is not unusual: Central Hudson’s retail
markup is almost exactly identical to the 40 percent average markup for residential and commercial
natural gas consumers nationwide, as calculated by Davis and Muehlegger (2010).
Comparing the environmental externality and retail markup, Central Hudson’s retail marginal
gas price is only 1.4 percent ($0.014/ccf) below social marginal cost. As Davis and Muehlegger
(2010) point out, this calls into question the argument that large energy efficiency programs are
needed as second best substitutes for getting prices right, but further discussion of this issue is
beyond the scope of our paper. This does mean that any welfare gains of the nudge will need to
be driven primarily by private gains to nudge recipients rather than by additional uninternalized
social benefits.
21
To be clear, “fixed costs” means costs that are fixed with respect to the volume of natural gas consumed, not
necessarily fixed with respect to the number of residential consumers or some other factor. In addition to the gas
supply charge, we include the relatively small “merchant function charge” as part of marginal acquisition costs. If
this charge is instead classified as a fixed cost, this slightly increases π
e
, which would slightly worsen the program’s
welfare effects.
22
Because the extensive margin (natural gas connections) is highly inelastic, while the intensive margin (natural
gas use) is more moderately inelastic, the Ramsey-Boiteux framework suggests that it would be more economically
efficient to pass through fixed costs as fixed monthly charges. There are various justifications for amortizing fixed
costs into marginal prices, including horizontal and vertical equity (Borenstein and Davis 2012), and the allocative
impact of this distortion is mitigated if consumers respond to average instead of marginal prices (Ito 2014). Regardless
of whether this rate structure is desirable, decreased net revenue still enters the welfare calculation.
23
Central Hudson’s profits are regulated by the New York Public Service Commission. If profits fall short of the
allowed amount, Central Hudson is allowed to make this up in future years through higher retail prices.
33
V.C Results
Table 8 presents the welfare analysis of the program’s second year. Columns 1 and 2 present results
for the program’s full HER treatment group, after reweighting WTP with the inverse probability
weights discussed earlier. Columns 3 and 4 present results for the sample of MPL respondents.
WTP observations are thus unweighted, but energy savings are from column 5 of Table 4, where
the full sample was re-weighted to match the MPL respondents on observables.
Table 8: Social Welfare Effects of a Second Year of Home Energy Reports
(1) (2) (3) (4)
Population: All HER Recipients MPL Respondents
Panel A: Benefits and Costs Other than Consumer Welfare ($/recipient)
Implementation cost: C
n
2.22 2.22
Retail gas savings:˜e · p
e
5.52 5.98
Gas acquisition cost savings: ˜e · c
e
3.29 3.56
Utility net revenue loss: ˜e · (p
e
c
c
) 2.23 2.41
Externality reduction: ˜e · φ
e
2.30 2.50
Panel B: Mean WTP and Social Welfare Effect ($/recipient)
Assumption Mean WTP ∆Welfare Mean WTP ∆Welfare
Base case 2.85 0.70 2.98 0.84
Uniform WTP at MPL endpoints 2.60 0.46 2.72 0.59
WTP = {-12,12} at MPL endpoints 2.36 0.22 2.48 0.34
WTP = {-15,15} at MPL endpoints 2.79 0.65 2.92 0.79
Non-respondents have WTP=0 0.61 -1.53
Weight = 1 for first mail respondents 2.71 0.57
Fixed costs equally allocated 2.85 0.46 2.98 0.59
Notes: Columns 1 and 2 use samples weighted to match all HER recipients, while columns 3 and 4 use
samples weighted to match the MPL respondents. ∆Welfare in columns 2 and 4 is
W =
´
V C
n
+ (π
e
φ
e
)∆˜e dF (Θ) from Equation (6), where Mean WTP in columns 1 and 3 is our
measure of consumer welfare gain V .
Panel A presents benefits and costs other than consumer welfare. Under Assumption 2, natural
gas savings amount to $5.52 and $3.29 per recipient household at retail price and acquisition cost,
respectively. The difference between these two figures is the utility net revenue loss: $2.23 per
household-year. Environmental externalities drop by $2.30 per recipient household.
Panel B completes the social welfare estimates by adding in WTP. Columns 1 and 3 present
WTP under different assumptions, while columns 2 and 4 present the resulting social welfare
estimate using Equation (6). In the base case, WTP is $2.85 and $2.98 for all HER recipients and
the MPL respondents, respectively, as we found in Table 6. The social welfare effects are $0.70 and
$0.84 per household for all HER recipients and MPL respondents, respectively.
The next three rows of Panel B implement alternative assumptions for mean WTP at the end-
34
points of the MPL, i.e. those consumers with WTP below -$9 or above $9. The first assumes
a uniform distribution of WTP beyond the endpoints, with density equal to the density on the
adjacent WTP bin. This gives mean WTPs of $13.11 and -$11.52 for the upper and lower end-
points, respectively. The next two rows use $12 or $15 as heuristic benchmarks. All three of these
alternative assumptions give lower mean WTP, so less positive welfare effects. Because only 27
percent of respondents have WTP at one of the endpoints, this alone does not significantly change
mean WTP.
The next two rows of Panel B consider alternative adjustments for non-response when extrap-
olating to the full HER recipient population. In Section IV.B, we speculated that if there is a
non-response bias, it is likely positive. Under the extreme assumption that non-respondents have
zero WTP, welfare effects would be $-1.53 per HER recipient. We view this as an unrealistic
assumption, presented only as a lower bound.
24
When we assume that mean WTP is $2.71, as
calculated by the alternative weighting procedure in which respondents to the first mail survey
have weights fixed to one, welfare gains are $0.57 per recipient.
The final row uses the higher average implementation cost C
n
if the fixed costs of continuing
programs are allocated equally to each of Central Hudson’s four ongoing programs. This penalizes
small programs and benefits large ones. While this cost allocation assumption is likely too extreme,
it’s certainly true that at some point an HER program would not be large enough to generate enough
social surplus to outweigh the program-level fixed costs. If implementation costs were 32 percent
higher, externality damages were 31 percent lower, or WTP were 25 percent lower, the base case
social welfare point estimate would be negative.
Because (π
e
φ
e
) 0, i.e. marginal retail prices are very close to true social marginal cost,
the social welfare effect depends very little on estimated energy savings ˜e. Thus, violations of
our Assumption 2 do not make much difference, nor does the sampling error in our energy savings
estimates. Applying the Delta method to the energy savings estimates in Table 4, the 90 percent
confidence interval on welfare effects for all HER recipients extends 1.65·
ˆ
SE(τ)·D·(φ
e
π
e
) $0.07
in either direction. WTP estimates in Table 6 are relatively precisely estimated, with a 90 percent
confidence interval that extends $0.26 in either direction for the base estimates.
Figure 8 illustrates our base case welfare analysis, weighted for the HER recipient population.
The demand curve is drawn to be consistent with the assumptions used to code WTP from the MPL
responses: WTP is distributed triangular on the highest and lowest ranges and uniform on the six
interior ranges of the MPL. “Expected marginal social cost,” i.e.
´
c
n
(π
e
φ
e
)∆˜e dF (Θ)/N
n
=
c
n
(π
e
φ
e
)ˆτD, is approximately $1.98 per household. The net social welfare effect is the area
between the demand curve and expected marginal social cost, i.e. the blue polygon minus the red
24
Although EPA (2006) reports that 44 percent of unsolicited mail is not read, HERs arrive in utility branded
envelopes. Since utilities typically send bills or other important communications, open rates are likely to be much
higher than standard unsolicited mail. Just under five percent of phone survey respondents reported not remembering
HERs.
35
polygon, minus fixed cost F
n
. Leaving aside the variation in ˜e across households, which is less
relevant because (π
e
φ
e
) 0, the social welfare effect trades off the gains to the 41 percent of
consumers willing to pay more than $1.98 with the losses to the 59 percent of consumers that are
not. The consumer surplus in blue is large: more than 30 percent of people are willing to pay twice
the social marginal cost. This figure motivates the opt-in and targeting analysis in Section VI:
perhaps the nudge can be modified to avoid the loss outlined in red.
Figure 8: Social Welfare Analysis: Graphical
Expected marginal social cost
Home Energy Report demand curve
-20 -10 0 10 20
$
0 .2 .4 .6 .8 1
Share of population
Welfare gain Welfare loss
Notes: This figure presents a graphical version of the base case welfare analysis weighted for the HER
recipient population, corresponding to columns 1 and 2 of Table 8.
V.D Discussion: Why Measuring Consumer Welfare Matters
Using the consumer welfare formula in Equation (4), the difference between mean WTP ($2.85) and
retail energy savings ($5.52) implies that consumers incur an average of $2.68 in net utility costs,
which we call “non-energy costs” for shorthand.
25
This benefit/cost ratio of $5.52/$2.682.06 im-
plies that leaving aside implementation costs C
n
, HERs generate highly privately-beneficial energy
savings for recipients. For comparison, data from from Allcott and Greenstone (2015) suggest that
25
Instead of imposing Assumption 2 here, we could instead use consumers’ predictions of future retail energy cost
savings from the phone survey. Because mean expected savings is larger than the observed $5.52, this would imply
larger non-energy costs, which further reinforces our arguments in this section.
36
the mean recommended home insulation investment in Wisconsin has a private benefit/cost ratio
of 0.98.
Traditional evaluations of HERs ignore non-energy costs. Specifically, most energy efficiency
programs in most states are evaluated using what regulators call the “total resource cost” metric,
which is the net benefit to utility consumers after accounting for the fact that they will eventually
pay for the utility revenue loss through higher energy prices: W
T RC
=
´
V + π
e
˜e P
n
dF (Θ).
This differs from social welfare in that it excludes externalities and considers the price the utility
pays for the program, denoted P
n
, instead of social cost C
n
. Before this paper, however, there were
no estimates of V for HERs and other behavior-based energy conservation programs, so these
programs have been evaluated using the “program administrator cost” metric, which just trades
off the energy acquisition cost savings with the program price: W
P AC
=
´
c
e
˜e P
n
dF (Θ).
Using that π
e
= p
e
c
e
, we see that substituting W
P AC
for W
T RC
amounts to assuming that
V = p
e
˜e, i.e. that there are no non-energy costs. In the context of a smoking cessation
program, this would amount to assuming that the only effect on consumer welfare is to save people
money on buying cigarettes.
How does ignoring non-energy costs affect the social welfare calculation in Table 8? If we set
V = ˜e · p
e
, the welfare gain is $3.38 per recipient – almost five times larger than our estimate
of $0.70 per household. In other cases, it is easy to imagine that this could change whether or not
a nudge is determined to be welfare enhancing.
Evaluating only the program’s second year leaves open the question of whether the full pro-
gram (from beginning to end) is welfare enhancing. In particular, there are fixed costs to begin
a program that do not enter F
n
, the fixed cost of continuing an existing program. Furthermore,
there have been many different Home Energy Report programs with very different energy savings
effects. In Appendix D.C, we provide a speculative, back-of-the envelope calculation under the
assumptions that Opower’s price reflects the cost of a full program and that non-energy costs are
$2.68/$5.5249% of total retail energy savings. We consider the full life of a typical Opower pro-
gram, using energy savings estimates from Allcott and Rogers (2014). Our estimates suggest that
the typical full program is welfare enhancing, but that ignoring non-energy costs overstates welfare
gains by a factor of 2.4.
VI Allocating Nudges: Opt-In vs. Smart Defaults
Figure 7 shows that WTP for HERs is highly heterogeneous. The effect of HERs on energy use
may be heteogeneous as well. Can better allocation of this nudge improve its social welfare effects?
We consider two approaches: an opt-in program and a machine learning algorithm that targets the
nudge to maximize social welfare.
37
VI.A Opt-In Programs
A natural reaction to heterogeneous valuations of a good or service is that it should be priced at
social marginal cost, and consumers should be allowed to buy or not buy as they wish. We begin
by evaluating that idea. For simplicity, we assume that the average energy savings of consumers
that opt into HERs equals the estimated average treatment effect ˆτ from Table 4. We then set the
price at expected social marginal cost c
n
(π
e
φ
e
)ˆτD $1.98.
Table 9 presents results. Column 1 presents the percent of population receiving HERs, while
Columns 2-4 present the mean natural gas use change, WTP, and social welfare change per recipient
household, respectively. Column 5 presents the aggregate social welfare effect across all 19,927
households, which is (column 1)/100 × (column 4) × (19,927/1000). Row 1 presents the existing
opt-out program as a benchmark.
Row 2 presents the welfare effects of an opt-in program assuming zero switching cost that is,
we assume that all consumers opt into the second year of HERs if they are willing to pay more
than the $1.98 price. Under this assumption, 40.8 percent of consumers opt in, and they have mean
WTP of $9.88. The total social welfare gain in column 5 is nine times larger than for the existing
program, even though fewer households are included. This dramatic improvement arises because a
significant number of consumers with low or negative WTP no longer are nudged.
Opower has run one opt-in program in the U.S., at a large utility called American Electric
Power in Ohio. They aggressively marketed free HERs to 250,000 customers, of whom only 1.5
percent opted in. Although the Ohio population could be different, the low opt-in rate suggests
that default effects are very powerful in this context. In other words, switching costs or other forms
of inertia prevent many people who value the nudge at more than its price from opting in. Given
results from Madrian and Shea (2001), Kling et al. (2012), Handel (2013), and Ericson (2014)
showing the power of inertia in high-stakes choices such as retirement savings plans and health
insurance, it is very plausible that inertia could be powerful in low-stakes decisions such as whether
to receive Home Energy Reports. This implies that the zero switching cost assumption in row 2 is
unrealistic.
We explore the importance of switching costs under three assumptions. First, 1.5 percent of
consumers opt in, as in Ohio. Second, consumers opt in if and only if their WTP is larger than
the switching cost, so the 1.5 percent that opt in will be drawn from the right tail of the WTP
distribution. Third, the switching cost is not welfare-relevant in other words, an implied switching
cost arises from factors such as imperfect information, not because of a material transaction cost.
These latter two assumptions give a best-case scenario for welfare gains for a given switching cost.
Row 3 shows that even under this best-case scenario, the welfare gains from an opt-in program
are $5,100 – less than for the current opt-out program in row 1. Even though mean WTP of nudge
recipients is high, substantial potential consumer welfare gains are lost because many high-WTP
consumers do not opt in. Furthermore, the fixed implementation cost F
n
is spread across a small
38
number of recipients.
Table 9: Opt-In and Smart Defaults: Results
(1) (2) (3) (4) (5)
Percent of Mean gas Mean Welfare Total
population use change WTP effect welfare
receiving (ccf/ ($/ ($/ effect
Row Policy HERs recipient-day) recipient) recipient) ($000s)
1 Existing opt-out program 50 -0.028 2.85 0.70 7.0
2 Opt-in; zero switching cost 40.8 -0.028 9.88 7.70 62.6
3 Opt-in; 1.5% opt-in rate 1.5 -0.028 24.6 17.20 5.1
4 Targeted on energy savings 50 -0.068 3.51 1.48 14.7
5 Targeted on WTP 50 -0.057 3.52 1.46 14.5
6 Targeted on welfare 50 -0.058 3.58 1.52 15.2
7 Drop recipients; maximize welfare 42 -0.036 3.22 1.17 9.7
Notes: Column 5 presents the aggregate social welfare effect across all 19,927 households, which is (column
1)/100 × (column 4) × (19,927/1000).
VI.B Targeted Opt-Out Programs
The importance of both heterogeneity and inertia suggests a different policy approach: an opt-out
program that targets consumers who would generate large welfare gains, and excludes consumers
who would not.
Formally, we want to derive a statistical decision rule δ : X {0, 1} that maps household
covariates from space X to treatment assignment {0, 1} in order to maximize objective L(δ). Ini-
tially, we hold the number of recipient households constant at 50 percent of the 19,927-household
population and compare the results of maximizing three different objectives: energy conservation,
where L
τ
(δ) =
P
i
τ
i
δ(X
i
), consumer welfare, where L
CW
(δ) =
P
i
w
i
δ(X
i
), and social welfare,
where L
W
(δ) = F
n
+
P
i
(w
i
+ (π
e
φ
e
)τ
i
c
n
) δ(X
i
).
This is a standard prediction problem where additional covariates provide more information,
but using too many covariates leads to overfitting, which worsens out-of-sample prediction. From
a computational perspective, the only unusual feature of this problem is that different parts of the
social welfare objective function L
W
(δ) are estimated from different datasets: WTP w
i
is from the
MPL surveys, while energy savings τ
i
is from the full sample of billing data. Standard pre-packaged
procedures thus cannot be used. For simplicity, we select variables using forward stepwise regression
and consider only linear combinations of the 20 X variables. We use five-fold cross validation to
avoid overfitting. Appendix E presents details.
26
26
One alternative approach would be to use a tree method akin to those in Athey and Imbens (2015), splitting the
WTP data and energy use data at the same nodes. In order to reduce residual variance, we would first residualize
energy use of the baseline usage controls in Equation (7).
39
Rows 4-6 present results when maximizing energy savings, WTP, and welfare, respectively. The
table clearly shows how targeting can improve performance on whatever objective the algorithm is
trained to maximize. Remarkably, there is little tradeoff between targeting on energy savings and
targeting on WTP: maximizing WTP generates only slightly lower energy savings than maximizing
energy savings, and vice versa.
Figure 9 presents differences in mean X variables (in standard deviations) between targeted
and non-targeted households for each of the three maximands in rows 4-6. All three algorithms
target similar households, which explains why there is little tradeoff between maximizing WTP and
maximizing energy savings. The fact that WTP and energy savings are positively correlated with
the same observables implies that WTP and energy savings are themselves positively correlated,
unless they have strong opposite correlations with unobservables. This is again consistent with the
idea that the informational channel outweighs the moral tax channel in generating behavior change:
if the moral tax channel were more popular, the households with the largest behavior change would
likely have the lowest WTP.
Figure 9: Demographic Differences Between Targeted and Non-Targeted Households
-.5 0 .5 1
Mean(targeted) - Mean(not targeted) (standard deviations)
Home improvement
Mail responder
Buyer score
Profit score
Wildlife donor
Green consumer
Hybrid auto share
Democrat
House age
Single family home
Rent
Married
Retired
Age
Male
Education
House value
Net worth
Income
Baseline use
Target energy savings Target WTP
Target welfare
Notes: We use the machine learning algorithm to target 50 percent of the Central Hudson program popula-
tion, maximizing energy savings, willingness-to-pay, or welfare. This figure presents the normalized difference
in means between targeted and non-targeted households for each of these three maximands, in standard de-
viation units.
40
The algorithm can also be used to predict which current recipients generate welfare losses and
drop them from the program’s second year. To do this, we train the algorithm to maximize social
welfare L
W
(δ) while allowing it to target any subset of the existing treatment group. Row 7 presents
results; this should be compared to the results for the current recipient group in row 1. By nudging
42 percent instead of 50 percent of the entire population i.e. dropping about 16 percent of the
current recipient group the algorithm increases the total welfare gain by 39 percent.
When comparing opt-in and targeted opt-out policies, the typical comparative static is that
high inertia favors a targeted policy, while poor ability to predict welfare favors an opt-in policy.
The remarkable feature of these results is that even with generous assumptions about the welfare
gains from an opt-in policy, inertia is such a large barrier that a targeted opt-out policy is preferred.
VII Conclusion
Many economists recognize the importance of evaluating nudge-style interventions on the basis of
social welfare, not just behavior change. Notwithstanding, it is often difficult to actually quantify
the full consumer welfare effects of a given nudge. Our main contribution is to develop and imple-
ment an experimental design that allows an empirical social welfare analysis in a case study of one
prominent nudge.
There are three main takeaways. First, we find significant individual-level heterogeneity in
willingness to pay for the nudge, including a significant minority of consumers who prefer not to
be nudged. This implies large welfare gains from using prediction for “smart defaults.” Second,
despite the worries of Glaeser (2006) and others, social comparison nudges need not only act as
an emotional tax on “bad” behavior. We find evidence that in addition to increasing the moral
price, HERs work by providing both information and additional windfall utility through positive
affect. Third, the nudge we study increases welfare. However, this welfare gain comes with costs
to consumers that typically go unmeasured, and ignoring these “non-energy costs” would cause
the analyst to overstate social welfare gains by a factor of five. This highlights the importance of
measuring the full welfare effects of nudges.
41
References
[1] Abrahams, Leslie S., Constantine Samaras, W. Michael Griffin, and H. Scott Matthews. 2015.
“Life Cycle Greenhouse Gas Emissions From U.S. Liquefied Natural Gas Exports: Implications
for End Uses.” Environmental Science and Technology 49: 3237-3245.
[2] ACEEE (American Council for an Energy-Efficient Economy). 2015. “State
Energy Efficiency Resource Standards (EERS): April 2015.” Available from
http://aceee.org/sites/default/files/eers-04072015.pdf. Accessed August 5, 2015.
[3] Allcott, Hunt. 2011. “Social Norms and Energy Conservation.” Journal of Public Economics
95 (9-10): 1082–95.
[4] Allcott, Hunt. 2013. “The Welfare Effects of Misperceived Product Costs: Data and Calibra-
tions from the Automobile Market.” American Economic Journal: Economic Policy 5 (3):
30-66.
[5] Allcott, Hunt. 2015. “Site Selection Bias in Program Evaluation.” Quarterly Journal of Eco-
nomics 130 (3): 1117-1165.
[6] Allcott, Hunt, and Michael Greenstone. 2015. “Measuring the Welfare Effects of Energy Effi-
ciency Programs.” Working Paper, New York University.
[7] Allcott, Hunt, and Todd Rogers. 2014. “The Short-Run and Long-Run Effects of Behavioral In-
terventions: Experimental Evidence from Energy Conservation.” American Economic Review
104 (10): 3003-37.
[8] Andreoni, James, Justin M. Rao, and Hannah Trachtman.. 2011. “Avoiding the Ask: A Field
Experiment on Altruism, Empathy, and Charitable Giving.” NBER Working Paper No. 17648.
[9] Ashby, Kira, Hilary Forster, Bruce Ceniceros, Bobbi Wilhelm, Kim Friebel,
Rachel Henschel, and Shahana Samiullah. 2012. “Green with Envy: Neigh-
bor Comparisons and Social Norms in Five Home Energy Report Programs.”
http://www.aceee.org/files/proceedings/2012/data/papers/0193-000218.pdf (accessed
September 2013).
[10] Ashley, Elizabeth M., Clark Nardinelli, and Rosemarie A. Lavaty. 2015. “Estimating the Ben-
efits of Public Health Policies that Reduce Harmful Consumption.” Health Economics 24 (5):
617-624.
[11] Athey, Susan, and Guido Imbens. 2015. “Machine Learning Methods for Estimating Hetero-
geneous Causal Effects.” Working Paper, Stanford University.
42
[12] Attari, Shahzeen, Michael DeKay, Cliff Davidson, and Wandi Bruine de Bruin. 2010. “Public
Perceptions of Energy Consumption and Savings.” Proceedings of the National Academy of
Sciences 107 (37): 16054–16059.
[13] Ayres, Ian, Sophie Raseman, and Alice Shih. 2013. “Evidence from Two Large Field Exper-
iments that Peer Comparison Feedback Can Reduce Residential Energy Usage.” Journal of
Law, Economics, and Organization 29 (5): 992–1022.
[14] Beauchamp, Jonathan P., Daniel J. Benjamin, Christopher Chabris, and David I. Laibson.
2015. “Controlling for Compromise Effects Debiases Estimates of Preference Parameters.”
Working Paper, Harvard University.
[15] Belloni, Alexandre, and Victor Chernozhukov. 2013. “Least Squares After Model Selection in
High-Dimensional Sparse Models.” Bernoulli 19 (2): 521-547.
[16] Bernheim, B. Douglas, Andrey Fradkin, and Igor Popov. 2015. “The Welfare Economics of
Default Options in 401(k) Plans.” American Economic Review, 105 (9): 2798-2837.
[17] Borenstein, Severin, and Lucas W. Davis. 2012. “The Equity and Efficiency of Two-Part Tariffs
in U.S. Natural Gas Markets.” Journal of Law and Economics, 55 (1): 75-128.
[18] Brunnermeier, Markus K., and Jonathan A. Parker. 2005. “Optimal Expectations.” American
Economic Review 95 (4): 1092-1118.
[19] Caplin, Andrew (2003). “Fear as a Policy Instrument.” In George Loewenstein, Daniel Read,
and Roy Baumeister, Eds, Time and Decision: Economic and Psychological perspectives in
Intertemporal Choice. New York, NY: Russel Sage.
[20] Carroll, Gabriel, James Choi, David Laibson, Brigitte Madrian, and Andrew Metrick. 2009.
“Optimal Defaults and Active Decisions: Theory and Evidence from 401(k) Saving.” Quarterly
Journal of Economics 124 (4): 1639-1674.
[21] Chaloupka, Frank J., Kenneth E. Warner, Daron Acemoglu, Jonathan Gruber, Fritz Laux,
Wendy Max, Joseph Newhouse, Thomas Schelling, and Jody Sindelar. 2014. “An Evaluation
of The FDA’s Analysis of the Costs and Benefits of the Graphic Warning Label Regulation.”
Tobacco Control 0: 1-8.
[22] Chaloupka, Frank J., Gruber, Jonathan & Warner, Kenneth E. 2015. “Accounting for “Lost
Pleasure” in a Cost–Benefit Analysis of Government Regulation: The Case of the FDA’s
Proposed Cigarette Labeling Regulation.” Annals of Intern Medicine, 162 (1): 64.
43
[23] Costa, Dora L., and Matthew E. Kahn. 2013. “Energy Conservation ‘Nudges’ and Environ-
mentalist Ideology: Evidence from a Randomized Residential Electricity Field Experiment.”
Journal of the European Economic Association 11 (3): 680–702.
[24] Cutler, David M., Amber Jessup, Donald Kenkel, and Martha A. Starr. 2015. “Valuing Regula-
tions Affecting Addictive or Habitual Goods.” Journal of Benefit-Cost Analysis 6 (2): 247-280.
[25] Davis, Lucas, and Erich Muehlegger. 2010. “Do Americans Consume Too Little Natural Gas?
An Emprical Test of Marginal Cost Pricing.” RAND Journal of Economics 41 (4): 791-801.
[26] DellaVigna, Stefano, John List, and Ulrike Malmendier. 2012. “Testing for Altruism and Social
Pressure in Charitable Giving.” Quarterly Journal of Economics 127 (1): 1-56.
[27] Delmas, Magali A., Miriam Fischlein, and Omar I. Asensio. 2013. “Information Strategies and
Energy Conservation Behavior: A Meta-Analysis of Experimental Studies from 1975 to 2012.”
Energy Policy 61: 729-739.
[28] DiNardo, John, Justin McCrary, and Lisa Sanbonmatsu. 2006. “Constructive Proposals for
Dealing with Attrition: An Empirical Example.” Working Paper.
[29] Dolan, Paul and Robert Metcalfe. 2013. “Neighbors, Knowledge, and Nuggets: Two Natural
Field Experiments on the Role of Incentives on Energy Conservation.” Centre for Economic
Performance Discussion Paper No. 1222.
[30] EOP (Executive Office of the President). 2015. “Executive Order Using Behavioral Science In-
sights to Better Serve the American People.” Available from https://www.whitehouse.gov/the-
press-office/2015/09/15/executive-order-using-behavioral-science-insights-better-serve-
american. Accessed September 26, 2015.
[31] EPA (U.S. Environmental Protection Agency). 1995. “AP 42, Ffth Edition: Compilation of Air
Pollutant Emission Factors, Volume 1: Stationary Point and Area Sources. Chapter 1: External
Combustion Sources.” Available from http://www.epa.gov/ttn/chief/ap42/ch01/index.html.
Accessed June 12, 2015.
[32] Ericson, Keith M. Marzilli. 2014. “Consumer Inertia and Firm Pricing in the Medicare Part D
Prescription Drug Insurance Exchange. American Economic Journal: Economic Policy, 6(1):
38-64.
[33] FDA (U.S. Food and Drug Administration). 2011. “Required Warnings for Cigarette Packages
and Advertisements, Final Rule.” Federal Register, 76, 36628–36777.
44
[34] Fowlie, Meredith, Michael Greenstone, and Catherine Wolfram. 2015. “Do Energy Efficiency
Investments Deliver? Evidence from the Weatherization Assistance Program.” Working Paper,
University of Chicago (June).
[35] Glaeser, Edward. 2006. Paternalism and Psychology. University of Chicago Law Review.
73:133-156.
[36] Glaeser, Edward. 2014. The Supply of Environmentalism: Psychological Interventions and
Economics. Review of Environmental Economics and Policy. 8(2):208-229.
[37] Handel, Benjamin R. 2013. “Adverse Selection and Inertia in Health Insurance Markets: When
Nudging Hurts.” American Economic Review 103 (7): 2643-2682.
[38] Heckman, James J. 1979. “Sample Selection Bias as a Specification Error.” Econometrica 47
(1): 153-161.
[39] Herberich, David H., John A. List, and Michael K. Price. 2012. “How Many Economists Does
It Take to Change a Light Bulb? A Natural Field Experiment on Technology Adoption.”
Working Paper.
[40] Holland, Stephen, Erin Mansur, Nicholas Muller, and Andrew Yates. 2015. “Measuring the
Spatial Heterogeneity in Environmental Externalities from Driving: A Comparison of Gasoline
and Electric Vehicles.” NBER Working Paper No. 21291 (June).
[41] Howarth, Robert, Drew Shindell, Renee Santoro, Anthony Ingraffea, Nathan Phillips, and
Amy Townsend-Small. 2012. “Methane Emissions from Natural Gas Systems.” Available from
http://www.eeb.cornell.edu/howarth/publications/Howarth et al 2012 National Climate Assessment.pdf.
Accessed August 15, 2015.
[42] Integral Analytics. 2012. “Sacramento Municipal Utility District Home Energy Report
Program.” http://www.integralanalytics.com/ia/Portals/0/FinalSMUDHERSEval2012v4.pdf
(accessed September 2013).
[43] Interagency Working Group on the Social Cost of Carbon, United States Government.
2013. “Technical Support Document: Technical Update of the Social Cost of Car-
bon for Regulatory Impact Analysis Under Executive Order 12866.” Available from
https://www.whitehouse.gov/sites/default/files/omb/inforeg/social cost of carbon for ria 2013 update.pdf.
Accessed July 26, 2015.
[44] Ito, Koichiro. 2014. “Do Consumers Respond to Marginal or Average Price? Evidence from
Nonlinear Electricity Pricing.” American Economic Review 104 (2): 537-563.
45
[45] Jin, Lawrence, Don Kenkel, Feng Liu, and Hua Wang. 2015. “Retrospective and Prospective
Benefit-Cost Analyses of U.S. Anti-Smoking Policies.” Journal of Benefit-Cost Analysis 6 (1):
154-186.
[46] Kamenica, Emir. 2008. “Contextual Inference in Markets: On the Informational Content of
Product Lines.” American Economic Review 98 (5): 2127-2149.
[47] Kantola, S. J., G. J. Syme, and N. A. Campbell. 1984. “Cognitive Dissonance and Energy
Conservation.” Journal of Applied Psychology 69 (3): 416-421.
[48] Keuring van Elektrotechnische Materialen te Arnhem (KEMA). 2012. “Puget Sound Energy’s
Home Energy Reports Program: Three Year Impact, Behavioral and Process Evaluation.”
Madison, Wisconsin: DNV KEMA Energy and Sustainability.
[49] Kling, Jeffrey R., Sendhil Mullainathan, Eldar Shafir, Lee C. Vermeulen, and Marian V.
Wrobel. 2012. “Comparison Friction: Experimental Evidence from Medicare Drug Plans.”
Quarterly Journal of Economics 127 (1): 199-235.
[50] Koszegi, Botond, and Adam Szeidl. 2013. “A Model of Focusing in Economic Choice.” Quar-
terly Journal of Economics 128 (1): 53–104.
[51] Kushler, Martin, Seth Nowak, and Patti Witte. 2012. “A National Survey of State Policies
and Practices for the Evaluation of Ratepayer-Funded Energy Efficiency Programs.” American
Council for an Energy-Efficient Economy Report Number U122.
[52] Larrick, Richard, and Jack Soll. 2008. “The MPG Illusion.” Science 320 (5883): 1593-1594.
[53] Madrian, Brigitte C., and Dennis F. Shea. 2001. The Power of Suggestion: Inertia in 401(k)
Participation and Savings Behavior.” Quarterly Journal of Economics 116 (4): 1149-1187.
[54] Myhre, G., D. Shindell, F.-M. Br´eon, W. Collins, J. Fuglestvedt, J. Huang, D. Koch, J.-F.
Lamarque, D. Lee, B. Mendoza, T. Nakajima, A. Robock, G. Stephens, T. Takemura and
H. Zhang. 2013. “Anthropogenic and Natural Radiative Forcing.” In: Climate Change 2013:
The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report
of the Intergovernmental Panel on Climate Change [Stocker, T.F., D. Qin, G.-K. Plattner,
M. Tignor, S.K. Allen, J. Boschung, A. Nauels, Y. Xia, V. Bex and P.M. Midgley (eds.)].
Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA.
[55] Nadel, Steven, and Kenneth Keating. 1991. “Engineering Estimates vs. Impact Evaluation Re-
sults: How Do They Compare and Why?” American Council for an Energy-Efficient Economy
Report Number U915.
46
[56] Nolan, Jessica M., Wesley Schultz, Robert B. Cialdini, Noah J. Goldstein, and Vladas Griskevi-
cius. 2008. “Normative Influence is Underdetected.” Personality and Social Psychology Bulletin
34 (7): 913–923.
[57] Opinion Dynamics. 2012. “Massachusetts Three Year Cross-Cutting Behavioral Program Eval-
uation Integrated Report.” Waltham, MA: Opinion Dynamics Corporation.
[58] Opower. 2014. “Prospectus.” Available from http://www.nasdaq.com/markets/ipos/filing.ashx?filingid=9465802.
Accessed September 26, 2015.
[59] Oster, Emily, Ira Shoulson, and E. Ray Dorsey. 2013. “Optimal Expectations and Limited
Medical Testing: Evidence from Huntington Disease.” American Economic Review 103 (2):
804-830.
[60] Perry, Michael, and Sarah Woehleke. 2013. “Evaluation of Pacific Gas and Electric Company’s
Home Energy Report Initiative for the 2010–2012 Program.” San Francisco: Freeman, Sullivan,
and Company.
[61] Pope, C. Arden, Richard T. Burnett, Michael J. Thun, Eugenia, E. Calle, Daniel Krewski,
Kazuhiko Ito, and George D. Thurston. 2002. “Lung Cancer, Cardiopulmonary Mortality,
and Long-term Exposure to Fine Particulate Air Pollution.” Journal of the American Medical
Association 287 (9): 1132-1141.
[62] Schultz, P. Wesley, Jessica M. Nolan, Robert B. Cialdini, Noah J. Goldstein, and Vladas Griske-
vicius. 2007. “The Constructive, Destructive, and Reconstructive Power of Social Norms.”
Psychological Science 18 (5): 429–434.
[63] Sudarshan, Anant. 2014. “Nudges in the Marketplace: Using Peer Comparisons
and Incentives to Reduce Household Electricity Consumption.” Working Paper.
http://www.anantsudarshan.com/uploads/1/0/2/6/10267789/nudges sudarshan 2014.pdf.
[64] Thaler, Richard, and Cass R. Sunstein. 2008. Nudge: Improving Decisions about Health,
Wealth, and Happiness. New Haven: Yale University Press.
[65] Trachtman, Hannah, Andrew Steinkruger, Mackenzie Wood, Adam Wooster, James Andreoni,
James J. Murphy, and Justin M. Rao. 2015. “Fair Weather Avoidance: Unpacking the Costs
and Benefits of “Avoiding the Ask.”” Journal of the Economic Science Association 1 (1): 8-14.
[66] Violette, Daniel, Provencher, Bill, and Mary Klos. 2009. “Impact Evaluation of Positive Energy
SMUD Pilot Study.” Boulder, CO: Summit Blue Consulting, LLC.
47
[67] Watson, David, Lee Anna Clark, and Auke Tellegen. 1988. “Development and Validation of
Brief Measures of Positive and Negative Affect: The PANAS Scales.” Journal of Personality
and Social Psychology 54: 1063-1070.
[68] Weimer, David L., Aidan R. Vining, and Randall K. Thomas. 2009. “Cost-Benefit Analysis
Involving Addictive Goods: Contingent Valuation to Estimate Willingness-to-Pay for Smoking
Cessation.” Health Economics 18 (2): 181-202.
48
Online Appendix Allcott and Kessler
Online Appendix: Not for Publication
The Welfare Effects of Nudges: A Case Study of Energy Use Social Comparisons
Hunt Allcott and Judd B. Kessler
49
Online Appendix Allcott and Kessler
A Phone Survey Questionnaire
Below is the phone survey questionnaire. Programming notes and comments are in italics. Bolded
headers are for organizational purposes and were not read.
Introduction
Hi. I am calling on behalf of Central Hudson Gas and Electric, your local utility. Central
Hudson has been sending you Home Energy Reports since last fall, and we want to know what
you think about them. Do you have about two minutes to answer some questions? If yes, Central
Hudson will send you a check for up to $10.
If asked, “What is a Home Energy Report?”, say: “Home Energy Reports are one-page letters
that compare your natural gas use to your neighbors’ use and provide energy conservation tips.
Central Hudson sent up to four of these reports to the address on the account associated with this
phone number between late fall 2014 and early spring 2015. Do you recall receiving any Home
Energy Reports in the past nine months?”
If “Yes”, continue to Question 1.
If “No”, or if the customer otherwise says “I don’t remember receiving any Home Energy
Reports,” say: “Is there someone else in the household who may have seen these reports
come in the mail? If so, may I speak to him or her?” If there is no one else who might
have seen the reports, terminate call and code response as “Does not remember Home Energy
Reports.” If there is someone else but not available, record that person’s name and attempt to
call him/her later.
If the caller indicates that he/she has already answered these questions in a mail survey, then
skip questions 1 and 2 and say: “Thank you for responding to our mail survey. We have a couple
of follow-up questions that are better to ask by phone.” Then continue to Question 3.
Question 1
To start, I’m going to ask three questions where you’ll choose between some combination of
continuing Home Energy Reports and receiving checks for different amounts of money. These are
unusual questions, but they’re designed to tell us how much you value the Reports. These are real
questions: Central Hudson will use a lottery to pick one question and will actually mail you what
you chose, so please answer carefully.
Survey Version B only: “Remember that Home Energy Reports compare your energy use to
your neighbors’ use.
Survey Version C only: “Remember that Home Energy Reports help you to reduce your envi-
ronmental impact.”
50
Online Appendix Allcott and Kessler
a. Which would you prefer: 4 more Home Energy Reports PLUS a $10 check, OR a $1 check?
b. Which would you prefer: 4 more Home Energy Reports PLUS a $10 check, OR a $5 check?
c. Which would you prefer: 4 more Home Energy Reports PLUS a $10 check, OR a $9 check?
d. Which would you prefer: 4 more Home Energy Reports PLUS a $10 check, OR a $10 check?
e. Which would you prefer: 4 more Home Energy Reports PLUS a $9 check, OR a $10 check?
f. Which would you prefer: 4 more Home Energy Reports PLUS a $5 check, OR a $10 check?
g. Which would you prefer: 4 more Home Energy Reports PLUS a $1 check, OR a $10 check?
If consumers have consistent preferences, we would not need to ask all seven MPL questions
because answers to some imply answers to others. Questions 1a-1g were asked in the following
order:
Ask 1d first
If 1d=”HER+$10”, then 1f
If 1f=”HER+$5”, then 1g
If 1f=”$10”, then 1e
If 1d=”$10”, then 1b
If 1b=”HER+$10”, then 1c
If 1b=”$5”, then 1a
Question 2
Think back to when you received your first Home Energy Report. Did the Report say that you
were using more or less energy than you thought?
a. Much less than I thought
b. Somewhat less than I thought
c. About what I thought
d. Somewhat more than I thought
e. Much more than I thought
Question 3
Do you think that receiving four more Home Energy Reports this fall and winter would help
you reduce your natural gas use by even a small amount?
a. Yes
b. No
If Yes: How much money do you think you would save on your natural gas bills if you receive
four more Reports compared to if you do not receive them?
If necessary: “We just want to know your best guess.”
Note to enumerators: Prompt for a dollar value, not a percentage. If necessary: “I’m supposed
to ask for your best guess of how many dollars you’d save in total.”
Question 4
51
Online Appendix Allcott and Kessler
Since last fall, Central Hudson sent up to four Home Energy Reports to many households like
yours. For the average household, how much money do you think these Reports have helped them
save on their natural gas bills?
If necessary: “We just want to know your best guess.”
Note to enumerators: Prompt for a dollar value, not a percentage. If necessary: “I’m supposed
to ask for your best guess of total dollar savings since last fall.”
Question 5
How would you like the Reports if they did not have the bar graph comparing your energy use
to your neighbors’ use?
a. Much less
b. Somewhat less
c. About the same
d. Somewhat more
e. Much more
Question 6
Some people feel either inspired or pressured when they see their Home Energy Reports. Did
you feel inspired, pressured, neither, or both?
a. Inspired
b. Pressured
c. Neither
d. Both
Question 7
Some people feel either proud or guilty when they see their Home Energy Reports. Did you feel
proud, guilty, neither, or both?
a. Proud
b. Guilty
c. Neither
d. Both
Question 8
To what extent do you agree or disagree with the following statement: “The Home Energy
Reports gave useful information that helped me conserve energy.”
a. Strongly agree
b. Agree
c. Neither
d. Disagree
e. Strongly disagree
52
Online Appendix Allcott and Kessler
Question 9
Do you have any other comments about the Home Energy Reports that you’d like to share?
Open response, please write down as much as possible.
53
Online Appendix Allcott and Kessler
B Data Appendix
Table A1: Balance Tests (Page 1)
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
Dependent variable: Baseline
use
(ccf/day)
ln(Income) ln(Net
worth)
ln(House
value)
Education
(years)
Male ln(Age) Retired Married Rent
Panel A: Home Energy Report Recipient/Control
Recipient 0.0096 -0.011 0.0058 -0.029 -0.032 -0.0026 -0.0044 0.0028 -0.0064 0.0039
(0.023) (0.012) (0.024) (0.029) (0.035) (0.0077) (0.0051) (0.0030) (0.0079) (0.0069)
Observations 19,898 19,927 15,557 16,741 19,475 16,811 17,282 16,728 15,406 17,561
Panel B: Survey Group
Mail follow-up -0.028 0.0073 -0.054 -0.043 -0.0076 0.0093 -0.0092 -0.012 0.0068 -0.012
(0.036) (0.018) (0.036) (0.044) (0.054) (0.012) (0.0078) (0.0050)** (0.012) (0.011)
Comparison cue -0.034 0.00063 -0.043 -0.062 -0.025 -0.014 0.0015 0.0057 -0.00061 -0.011
(0.042) (0.021) (0.042) (0.051) (0.063) (0.014) (0.0090) (0.0054) (0.014) (0.012)
Environmental cue 0.011 0.0056 0.012 -0.049 0.0016 -0.018 0.015 0.011 0.0058 -0.0100
(0.042) (0.021) (0.042) (0.051) (0.063) (0.014) (0.0090)* (0.0055)** (0.014) (0.012)
Observations 9424 9439 7466 7965 9226 8036 8251 8004 7255 8371
F-test p-value 0.58 0.97 0.26 0.45 0.97 0.46 0.18 0.023 0.91 0.53
Notes: This table presents tests of balance on observables between randomly-assigned groups. Samples in Panel A include the full HER
recipient and control groups, while samples in Panel B are limited to the households that were sent Home Energy Reports and were thus
eligible for our surveys. Observation counts differ between columns because regressions include only non-missing observations of the dependent
variable. Robust standard errors in parentheses. *, **, ***: statistically significant with 90, 95, and 99 percent confidence, respectively.
54
Online Appendix Allcott and Kessler
Table A2: Balance Tests (Page 2)
(11) (12) (13) (14) (15) (16) (17) (18) (19) (20)
Dependent variable: Single
family
home
ln(House
age)
Democrat Hybrid
auto
share
Green
con-
sumer
Wildlife
donor
Profit
score
Buyer
score
Mail re-
sponder
Home
im-
prove-
ment
interest
Panel A: Home Energy Report Recipient/Control
Recipient 0.0018 -0.032 0.0018 0.021 0.0042 0.0033 -0.0033 -0.022 -0.012 0.00024
(0.0070) (0.016)* (0.0082) (0.047) (0.0052) (0.0036) (0.014) (0.016) (0.0069)* (0.0051)
Observations 17,734 14,885 18,080 19,728 18,883 16,728 19,784 14,967 17,734 16,728
Panel B: Survey Group
Mail follow-up -0.0097 -0.021 0.0037 0.0013 -0.0053 -0.0094 0.018 -0.017 0.0017 -0.025
(0.011) (0.025) (0.013) (0.053) (0.0081) (0.0058) (0.022) (0.025) (0.011) (0.0081)***
Comparison cue 0.0048 -0.047 -0.00065 -0.10 0.0058 -0.0035 -0.022 0.0036 -0.012 -0.0037
(0.012) (0.029) (0.015) (0.071) (0.0094) (0.0066) (0.025) (0.029) (0.012) (0.0092)
Environmental cue 0.011 -0.048 0.0038 -0.18 -0.0056 -0.0045 0.010 -0.014 -0.0079 -0.0015
(0.012) (0.029)* (0.015) (0.13) (0.0092) (0.0066) (0.026) (0.029) (0.012) (0.0092)
Observations 8464 7109 8617 9340 8977 8004 9377 7143 8464 8004
F-test p-value 0.67 0.23 0.98 0.35 0.59 0.36 0.49 0.84 0.80 0.020
Notes: This table presents tests of balance on observables between randomly-assigned groups. Samples in Panel A include the full HER
recipient and control groups, while samples in Panel B are limited to the households that were sent Home Energy Reports and were thus
eligible for our surveys. Observation counts differ between columns because regressions include only non-missing observations of the dependent
variable. Robust standard errors in parentheses. *, **, ***: statistically significant with 90, 95, and 99 percent confidence, respectively.
55
Online Appendix Allcott and Kessler
Table A3: Survey Response Counts by Attempt
(1) (2)
Attempt Mail Phone
1 402 523
2 497 358
3 229
4 172
5 163
6 83
7 80
8 80
Overall 899 1690
Notes: For the mail survey, attempt 1 refers to the survey included in the final Home Energy Report, and
attempt 2 refers to the follow-up survey sent to 2/3 of households. For the phone survey, attempt refers to
the number of times that the phone number was called before completing the survey.
56
Online Appendix Allcott and Kessler
Table A4: Correlations of Willingness-to-Pay with Qualitative Survey Responses
(1) (2) (3) (4) (5) (6)
Predicted savings 0.11
(0.0089)***
Like without comparisons -1.06
(0.16)***
Useful info 2.27
(0.18)***
Inspired 3.37
(0.38)***
Pressured -1.02
(0.50)**
Proud 1.18
(0.41)***
Guilty 1.40
(0.49)***
Positive comment 4.35
(0.44)***
Observations 1365 1581 1570 1571 1571 2137
R
2
0.094 0.026 0.093 0.047 0.011 0.042
Notes: Data are the unweighted sample of phone survey responses. Dependent variable is
willingness-to-pay. The independent variables in columns 1-6 are from questions 3, 5, 8, 6, 7, and 9,
respectively. Predicted savings is winsorized at $50. Columns 2 and 3 consider the five-point Likert scale
responses to questions 5 and 8, which we code as integers {−2, 1, 0, 1, 2}. The sample in column 6
includes both mail and phone survey respondents: the phone survey enumerators transcribed responses to
question 9, and we also transcribed the 30 unsolicited comments written on the mail survey. The variable
“Positive comment” takes value 1 for positive comments about HERS, -1 for negative comments, and 0 for
neutral or no comments. Sample sizes vary due to item non-response. Robust standard errors in
parentheses. *, **, ***: statistically significant with 90, 95, and 99 percent confidence, respectively.
57
Online Appendix Allcott and Kessler
Table A5: Correlations of Negative Willingness-to-Pay with Qualitative Survey Re-
sponses
(1) (2) (3) (4) (5) (6)
Predicted savings -0.0059
(0.00057)***
Like without comparisons 0.059
(0.011)***
Useful info -0.14
(0.011)***
Inspired -0.18
(0.024)***
Pressured 0.10
(0.034)***
Proud -0.11
(0.027)***
Guilty -0.026
(0.033)
Positive comment -0.22
(0.026)***
Observations 1365 1581 1570 1571 1571 2137
R
2
0.070 0.019 0.089 0.037 0.011 0.025
Notes: Data are the unweighted sample of phone survey responses. Dependent variable is an indicator for
negative willingness-to-pay. The independent variables in columns 1-6 are from questions 3, 5, 8, 6, 7, and
9, respectively. Predicted savings is winsorized at $50. Columns 2 and 3 consider the five-point Likert scale
responses to questions 5 and 8, which we code as integers {−2, 1, 0, 1, 2}. The sample in column 6
includes both mail and phone survey respondents: the phone survey enumerators transcribed responses to
question 9, and we also transcribed the 30 unsolicited comments written on the mail survey. The variable
“Positive comment” takes value 1 for positive comments about HERS, -1 for negative comments, and 0 for
neutral or no comments. Sample sizes vary due to item non-response. Robust standard errors in
parentheses. *, **, ***: statistically significant with 90, 95, and 99 percent confidence, respectively.
58
Online Appendix Allcott and Kessler
Table A6: Within-Household Correlations of Survey Responses
(1) (2) (3) (4)
WTP from first WTP from 1(WTP from Belief update from
Dependent variable mail survey phone survey phone survey<0) phone survey
WTP from second mail survey 0.819
(0.080)***
WTP from mail survey 0.440
(0.072)***
1(WTP from mail survey<0) 0.362
(0.071)***
Belief update from mail survey 0.500
(0.064)***
Observations 87 224 224 259
R
2
0.584 0.206 0.132 0.217
Notes: The sample for column 1 is households that returned more than one mail survey with valid WTP.
The sample for columns 2-4 is households that responded to both mail and phone surveys. Robust
standard errors in parentheses. *, **, ***: statistically significant with 90, 95, and 99 percent confidence,
respectively.
59
Online Appendix Allcott and Kessler
C Appendix to Empirical Estimates
60
Online Appendix Allcott and Kessler
Table A7: Inverse Probability Weights
(1) (2) (3) (4) (5) (6) (7) (8)
Have WTP, Have WTP, Have WTP Have WTP Have WTP;
Have WTP Assigned Assigned from Base from Follow-up Have WTP Base Mail
Dependent var: from Paper to Base to Follow-up Mail Mail from Phone Have WTP Excluded
Baseline use -0.461 -0.123 -0.336 -0.386 -0.0833 0.666 0.374 0.684
(0.176)*** (0.0636)* (0.163)** (0.119)*** (0.128) (0.242)*** (0.274) (0.261)***
ln(Income) -0.405 -0.192 -0.189 -0.132 -0.269 -0.878 -0.699 -0.407
(0.487) (0.187) (0.447) (0.331) (0.359) (0.698) (0.784) (0.758)
ln(Net worth) 0.124 0.0802 0.0477 0.243 -0.106 0.0637 0.119 -0.176
(0.309) (0.118) (0.286) (0.208) (0.230) (0.419) (0.471) (0.453)
ln(House value) -0.181 0.0278 -0.215 -0.0826 -0.0963 -0.114 -0.132 -0.0697
(0.166) (0.0626) (0.153) (0.112) (0.122) (0.240) (0.266) (0.256)
Education 0.584 0.141 0.431 0.311 0.252 0.430 0.805 0.556
(0.117)*** (0.0411)*** (0.108)*** (0.0789)*** (0.0852)*** (0.174)** (0.194)*** (0.188)***
Male -0.230 0.0711 -0.314 -0.263 0.0243 0.358 -0.0650 0.00841
(0.581) (0.234) (0.529) (0.393) (0.427) (0.851) (0.947) (0.918)
ln(Age) 1.407 0.297 1.099 0.699 0.651 0.483 1.463 0.799
(1.077) (0.414) (0.990) (0.734) (0.789) (1.534) (1.708) (1.648)
Retired 0.587 0.210 0.311 -0.0256 0.557 1.026 1.672 2.185
(1.383) (0.515) (1.268) (0.903) (1.026) (2.198) (2.440) (2.404)
Married -0.0957 0.182 -0.261 0.0918 -0.157 -1.000 -1.399 -1.493
(0.727) (0.264) (0.674) (0.475) (0.554) (1.048) (1.165) (1.126)
Rent 0.480 0.187 0.264 0.271 0.185 -1.738 -1.441 -1.757
(0.851) (0.330) (0.783) (0.576) (0.634) (1.174) (1.320) (1.271)
Single family 1.004 0.322 0.677 0.364 0.633 -0.226 -0.103 -0.320
(0.749) (0.288) (0.688) (0.490) (0.571) (1.061) (1.183) (1.141)
ln(House age) -0.892 -0.0873 -0.795 -0.366 -0.512 -1.145 -1.723 -1.381
(0.323)*** (0.131) (0.293)*** (0.216)* (0.238)** (0.475)** (0.531)*** (0.517)***
Democrat 0.466 0.343 0.129 0.310 0.133 0.916 1.173 0.884
(0.487) (0.191)* (0.443) (0.326) (0.358) (0.752) (0.832) (0.814)
Hybrid share 0.120 0.0275 0.0891 0.0908 0.0146 0.461 0.494 0.418
(0.0878) (0.0319) (0.0809) (0.0535)* (0.0695) (0.129)*** (0.146)*** (0.142)***
Green consumer -0.529 -0.176 -0.343 -0.747 0.252 0.742 0.318 0.864
(0.768) (0.309) (0.699) (0.519) (0.561) (1.165) (1.297) (1.261)
Wildlife donor 3.572 1.108 2.305 2.671 0.549 3.313 6.002 3.470
(1.216)*** (0.469)** (1.112)** (0.779)*** (0.922) (1.906)* (2.129)*** (2.114)
Profit score 2.007 0.0491 1.915 0.715 1.247 1.049 2.322 1.785
(0.419)*** (0.147) (0.388)*** (0.278)** (0.312)*** (0.608)* (0.678)*** (0.655)***
Buyer score 0.854 0.188 0.648 0.597 0.252 -0.280 0.393 -0.144
(0.396)** (0.149) (0.365)* (0.258)** (0.299) (0.566) (0.635) (0.613)
Mail responder 0.295 0.0445 0.248 -0.0514 0.359 -0.950 -0.591 -0.669
(0.676) (0.258) (0.621) (0.447) (0.503) (1.004) (1.115) (1.082)
Home improve 0.357 -0.338 0.670 -0.416 0.779 1.767 1.355 1.705
(0.940) (0.403) (0.845) (0.632) (0.685) (1.390) (1.562) (1.517)
Observations 9429 9429 9429 9429 9429 9429 9429 9028
Notes: This table presents probit estimates used to construct inverse probability weights. Robust standard
errors in parentheses. *, **, ***: statistically significant with 90, 95, and 99 percent confidence,
respectively.
61
Online Appendix Allcott and Kessler
Table A8: Correlation of Willingness-to-Pay with Phone Survey Responsiveness
(1) (2)
Completed survey attempt number 0.0230 0.0558
(0.0887) (0.0900)
Observations 1609 1609
Weights Equal IPW
Notes: Dependent variable is willingness-to-pay, sample is all phone survey respondents. For the phone
survey, each respondent was dialed up to eight times; the independent variable is the attempt number on
which the survey was completed. Robust standard errors in parentheses. *, **, ***: statistically significant
with 90, 95, and 99 percent confidence, respectively.
Table A9: Fitting Moral Utility
(1)
Predicted savings 0.0879
(0.00929)***
Proud 0.0954
(0.473)
Guilty 0.685
(0.564)
Inspired 2.824
(0.451)***
Pressured -1.190
(0.594)**
Observations 1350
R
2
0.123
Notes: Dependent variable is willingness-to-pay. Sample includes only phone survey respondents with non-
missing data. Predicted savings is winsorized at $50. Robust standard errors in parentheses. *, **, ***:
statistically significant with 90, 95, and 99 percent confidence, respectively.
62
Online Appendix Allcott and Kessler
Table A10: Effect of Survey Version on Willingness-to-Pay
(1) (2) (3) (4)
Comparison version -0.688 -0.695 -0.663 -0.660
(0.387)* (0.384)* (0.391)* (0.388)*
Environmental version -0.212 -0.155 -0.191 -0.145
(0.387) (0.389) (0.395) (0.400)
Mean comparison 0.106 0.0268
(0.216) (0.290)
Comparison version×Mean comparison 0.111 0.184
(0.317) (0.320)
Environmental version×Mean comparison 0.0533 0.0387
(0.348) (0.355)
Observations 2137 2137 2137 2137
Include X covariates No Yes No Yes
Notes: Dependent variable is willigness-to-pay. “Mean comparison” is the average difference (in cubic feet)
between own natural gas usage and mean neighbor usage on the first year of HERs. Robust standard errors
in parentheses. *, **, ***: statistically significant with 90, 95, and 99 percent confidence, respectively.
63
Online Appendix Allcott and Kessler
D Appendix to Welfare Estimates
D.A Testing for Biased Beliefs and Overoptimism
For the welfare analysis, we assume that WTP equals consumer utility gain. In this context, we
could imagine two reasons why this might fail: biased beliefs and overoptimism.
By biased beliefs, we mean that consumers might systematically underestimate or overestimate
the energy cost savings resulting from their conservation efforts. Consumers likely know the mon-
etary and non-monetary costs of their efforts, such as the time to adjust the thermostat or the
money to install energy-saving windows, but resulting energy savings can be quite difficult to infer
given that gas bills fluctuate substantially across months and years. There is empirical evidence to
support this concern: Pronin, Berger, and Molouki (2007) and Nolan et al. (2008) find that people
underestimate the motivational power of social norm messaging, and Larrick and Soll (2008), Attari
et al. (2010), and Allcott (2013) explore various belief biases related to energy costs.
To test this, the phone survey asked respondents how much money they thought they would
save on their natural gas bills if they received four more HERs, as well as how much money they
thought the average HER recipient had saved since last fall. Figure A1 shows that both the median
and mean respondents overstate gas cost savings relative to the true average treatment effect. This
suggests that if anything, biased beliefs could bias WTP upward instead of downward. However,
we treat this result very cautiously, given that these questions were not incentive compatible and
stated beliefs are highly dispersed.
64
Online Appendix Allcott and Kessler
Figure A1: Beliefs About Savings Caused by Home Energy Reports
True average savings
0 .2 .4 .6 .8 1
Cumulative density
0 50 100 150 200
Retail gas cost savings ($)
Own future Average past
Notes: This figure presents the unweighted distribution of responses to the following phone survey questions:
“How much money do you think you would save on your natural gas bills if you receive four more Reports?”
and “For the average household, how much money do you think these Reports have helped them save on their
natural gas bills?” True average savings over the observed post-treatment period is $5.52 per household.
A second and more controversial reason why WTP might not equal consumer welfare gain has
to do with overoptimism bias. Oster, Shoulson, and Dorsey (2013) show that people at high risk of
Huntington disease do not get tested despite the fact that knowledge of disease status leads to very
different life choices. They propose a model based on Brunnermeier and Parker (2005) in which
people optimally choose beliefs while trading off the utility gain from optimistic beliefs with the
utility loss from suboptimal actions. Bracha and Brown (2012) develop an alternative model in
which overoptimism is constrained by the cost of holding incorrect beliefs. Evaluating information
provision in these models requires the analyst to take a stand on whether to recognize overoptimistic
beliefs as true utility. In these models, overoptimistic consumers may not experience a utility gain
from exogenously-provided information, even though it would lead to more accurate beliefs and
(in Brunnermeier and Parker’s model) improved decision making. If current Home Energy Report
recipients derive utility from believing that they use less energy than their neighbors and want to
be overoptimistic about their relative energy use in the future, this might reduce their WTP for
HERs, and perhaps the utility loss from correcting overoptimism should not be counted as a “true”
65
Online Appendix Allcott and Kessler
utility loss.
Even without taking a stand on this issue, we can provide suggestive tests of whether overopti-
mism affects WTP. On both the mail and phone surveys, we asked people whether their first HER
told them they were using more or less energy than they thought. We hypothesize that people who
want to be overoptimistic in the future are more likely to have been overoptimistic in the past. The
initial belief update should thus be negatively correlated with WTP if overoptimism affects WTP.
People gave meaningful responses: the belief update variable is positively correlated with baseline
usage, usage relative to neighbors on the first HER, and reporting that they would like the HERs
more if they did not have social comparisons. More people report underestimating their energy
use than report overestimating. However, Appendix Table A11 shows that the belief update is not
associated with WTP, either unconditionally or conditional on X.
Table A11: Correlation of Willingness-to-Pay with Pre-Treatment Optimism
(1) (2)
Belief update 0.0541 0.0525
(0.134) (0.139)
Observations 2102 2102
Include X covariates No Yes
Notes: This table presents regressions of WTP on the belief update using unweighted responses from both
mail and phone surveys. Belief update is from question 8 on the mail survey and question 2 on the phone
survey: “Think back to when you received your first Home Energy Report. Did the Report say that you
were using more or less energy than you thought?” Responses are on a five-point Likert-style scale from
“much less than I thought” to “much more than I thought,” and we code these as integers from -2 to +2.
Robust standard errors in parentheses. *, **, ***: statistically significant with 90, 95, and 99 percent
confidence, respectively.
66
Online Appendix Allcott and Kessler
D.B Program Implementation Cost
Home Energy Report programs have setup costs, per-household marginal costs, and annual fixed
costs. In evaluating a program’s second year, we ignore setup costs. Panel A presents the per-
household annual marginal costs. Based on a high-volume price quote from PFL (www.PrintingForLess.com),
we assume $0.4926 per HER for printing and mailing. This uses the appropriate printing and pa-
per quality, production speed, and shipping method for HERs. HER recipients occasionally call
the utility to ask questions, complain, or opt out of HERs. Opower data show that HER recip-
ients typically call with 0.5 percent probability per year and that these calls cost the utility $5
per call to answer. We estimate $0.01 per household for server space to store data, and $0.05 to
purchase household-level demographic data to enhance the HERs. Overall, we estimate that the
per-household marginal cost for one year of a program involving four HERs is $2.06.
Panel B presents the per-utility annual costs that are fixed with respect to the number of house-
holds. Opower reported an estimated 51 hours of program design and reporting time for a client
like Central Hudson. In addition, Central Hudson and Opower have in-person meetings approxi-
mately every quarter, and short phone meetings most weeks. We assume that Opower staff cost
$85 per hour, on the basis of a $118,097 nationwide median annual salary for “program managers”
(see http://www1.salary.com/Program-Manager-Salary.html) multiplied by a 1.5 loading factor to
account for health insurance, vacation, and other benefits and divided by 2080 hours per year.
Central Hudson reported to us that their fully-loaded staff time for this project costs $62.64 per
hour. Total utility-level fixed costs are $16,339.
Central Hudson has four HER programs the natural gas program we study, plus three others
– with a total of about 100,000 households in treatment. Some of the per-utility fixed costs such as
program design and reporting likely would increase with the number of programs, whereas others
such as travel time for quarterly meetings likely would not. If the fixed cost is allocated equally
to each of Central Hudson’s 100,000 recipient households, this gives $0.16 per household, or $1628
for the 9964-recipient natural gas program we study. Alternatively, if the fixed cost is allocated
equally to each program, this is $4,085 per program. Allocating this $4,085 equally across the 9964
recipient households gives $0.41 per household.
67
Online Appendix Allcott and Kessler
Table A12: Implementation Cost Estimates
Item Explanation Cost ($)
Panel A: Per-Household Annual Marginal Costs
Printing and mailing $0.4926/HER × 4 HERS 1.97
Utility call center 0.5% call probability × $5/call 0.025
Server space $0.01 per recipient household 0.01
Demographic data $0.05 per recipient household 0.05
Total 2.06
Panel B: Per-Utility Annual Fixed Costs
Opower ($85/hour)
Program design and reporting 51 hours 4,335
Quarterly meetings (time) 8 hours/quarter× 2 people 5,440
Quarterly meetings (travel) $250/quarter × 2 people 2,000
Weekly phone meetings 20 minutes/week × 1 person 1,473
Central Hudson ($62.64/hour)
Quarterly meetings 2 hours/quarter× 4 people 2,004
Weekly phone meetings 20 minutes/week × 1 person 1,086
Total 16,339
Annual fixed cost per household Central Hudson has 100,000 HER recipients 0.16
Annual fixed cost per program Central Hudson has four HER programs 4,085
Notes: This table presents the implementation costs for an ongoing Opower Home Energy Report program.
See text for details.
D.C Speculative Evaluation of a Typical Full Opower Program
In this appendix, we address two shortcomings of the welfare evaluation in Table 8. First, Table
8 evaluates only the second year of an Opower program. Second, it evaluates a particular Opower
program, which may or may not be typical.
Table A13 evaluates the full course of a typical Home Energy Report program. We use the
energy savings from “site 2” studied by Allcott and Rogers (2014), an electricity-focused program
with savings approximately equal to the average savings of other Opower programs. Using Table
8 from Allcott and Rogers (2014), four years of Home Energy Reports are projected to save 1875
kilowatt-hours (kWh) in total, including significant savings after the program ends. At the 2014
national average electricity price of $0.125/kWh, this amounts to $234 dollars, as shown in Panel
A.
27
We assume that the long-run marginal source of electricity is a combined cycle gas plant,
with cost and heat rate characteristics from the U.S. Energy Information Administration’s Annual
27
See http://www.eia.gov/electricity/monthly/pdf/epm.pdf.
68
Online Appendix Allcott and Kessler
Energy Outlook.
28
This gives energy acquisition cost savings of $176 and externality reduction of
$53, using the externality damage assumptions detailed in the body of this paper.
For implementation cost, we use the price that Opower charges utilities, which is about $8 per
household per year for six HERs. We assume that this covers costs to set up and operate the HER
program as well as relevant overhead costs for sales, marketing, and research and development.
29
Panel B shows the consumer welfare and social welfare effect of the program under two assump-
tions. In column 1, we ignore non-energy costs, assuming that V = ˜e · p
e
. In column 2, we
adjust for non-energy costs using our estimate that V ˜e · p
e
× 0.51. By the “total resource
cost” metric, which is what regulators use to evaluate energy efficiency programs, failing to adjust
for non-energy costs overstates gains by a factor of 4.7. Similarly, failing to adjust for non-energy
costs overstates social welfare gains by a factor of 2.4. We label this calculation as “speculative”
because it hinges on the assumption that V ˜e · p
e
× 0.51.
Table A13: Social Welfare Effects of a Full Home Energy Report Program
(1) (2)
Panel A: Benefits and Costs Other than Consumer Welfare ($/recipient)
Implementation cost: C
n
, P
n
32
Retail energy savings:˜e · p
e
234
Energy acquisition cost savings: ˜e · c
e
176
Utility net revenue loss: ˜e · (p
e
c
c
) 58
Externality reduction: ˜e · φ
e
53
Panel B: Consumer Welfare and Social Welfare Effect ($/recipient)
Assumption: V = ˜e · p
e
V ˜e · p
e
× 0.51
Consumer welfare effect: V 234 121
Total resource cost: V + π
e
˜e P
n
144 31
Social welfare effect: V C
n
+ (π
e
φ
e
)∆˜e 197 83
Notes: See text for details.
28
http://www.eia.gov/forecasts/aeo/electricity generation.cfm
29
Opower (2014) reports that the company has a 65 percent gross margin in 2013, which would suggest that the
price overstates program implementation cost. On the other hand, the company operated at a net loss through that
year, suggesting that the gross margin is actually not sufficient to cover sales, marketing, R&D, and other relevant
overhead.
69
Online Appendix Allcott and Kessler
E Machine Learning Algorithm
We use a simple forward stepwise regression approach. We divide the 19,927 household sample
into K = 5 random partitions. We define δ
0
as random assignment the decision rule based on an
empty set of predictors x
0
. Then, for covariates j = 1, ..., J, we:
1. For each X variable x
(a) For k = {1, K}
i. Estimate
ˆ
β
τ
and
ˆ
β
w
using linear regression of τ and w on {x, x
j1
} in the K 1
training sets excluding k
ii. Predict
ˆ
L
i
|{x, x
j1
} out of sample in partition k
(b) Propose δ
jx
i
= 1 for 1/2 of observations with largest
ˆ
L
i
|{x, x
j1
}
(c) Estimate L(δ
jx
) in full dataset using proposed δ
jx
2. Choose the x
with highest L(δ
jx
). Call this δ
j
and L(δ
j
)
3. If L(δ
j
) > L(δ
j1
), increment to j + 1 and set x
j
= {x
, x
j1
}
(a) Else, stop: δ
j1
is optimal decision rule
70