March 5, 2008
by Gary Walker
Download PDF (70.6 KB)
Reflections on the "Evaluation Revolution"
by Gary Walker
prepared for Hudson Institute's Bradley Center for Philanthropy and Civic Renewal
I left my job as a Wall Street lawyer in the early 1970s to establish a jobs program for ex-convicts and recovering drug addicts in mid-town Manhattan. Like many of my generation who created or ran social programs, I went into this endeavor with the optimistic assumption that a social program would be the door to significant change in the participants' lives: no more crime, no more addiction.
Coming from Wall Street, I was familiar with bottom lines, hard outcomes and mounds of data to prove or disprove efficiency and viability. But the cause of helping others made those terms fade in importance, and though the Department of Labor provided funds to my project for collecting demographic and performance data, the human and social cause aspects of the job—plus the more mundane aspects of operating an organization—took precedence. A formal evaluation would have been an annoyance to me—at best.
I don't think my attitude towards evaluation was or is unusual for a program operator. And in the 1970s most foundations were sympathetic to it; they were either about helping individuals or fostering "social change." They were not eager to put their resources into costly evaluations, which would do neither.
In the three-plus decades since those early, enthusiastic and optimistic days of social programming, the most consistent and notable trend has been the dramatic increase in and enthusiasm for formal evaluations, in both the public and philanthropic sectors. No matter whether the administration has been Democrat or Republican, liberal or conservative; the foundation old or new, the donor deceased or in charge, evaluation has become a critical element of all social initiatives.
"Effectiveness must become the principal criterion for givers of time and money." This clarion declaration is the first of five conclusions of the 1997 report of The National Commission on Philanthropy and Civic Renewal, Giving Better, Giving Smarter.
The national office of United Way—whose local chapters raise over $3 billion—initiated several years ago a major project to both emphasize the importance of specifying outcomes for local giving, and to provide assistance to local chapters on how to go about determining the outcomes of individual grants. This effort, funded by the W. K. Kellogg Foundation and the Ewing Marion Kauffman Foundation, resulted in Measuring Program Outcomes: A Practical Approach and Focusing on Outcomes.
Grantees report that never before have grant negotiations with foundation staffs been so focused on specifying outcomes. Some foundations have employed consultants to work with their staffs so that input, operational processes, and intended intermediate and long-term outcomes and impacts are specified and differentiated. Many have added evaluation departments to their organizational structure. Small- and medium-sized foundations, which have previously given exclusively to direct services, are now asking for and funding evaluations, so that they may know with objectivity and rigor if the projected outcomes are achieved.
The list of declarations and initiatives to increase the philanthropic resources and efforts that go into evaluation is near endless. Just this morning (February 15) I talked with a recently hired staffer at the Gates Foundation, whose job is to set up a new department, called "Impact Planning and Improvement," which will work with all the substantive departments of the foundation in developing theories of change and results measurement strategies.
So you might say, what's the problem? The field of social programming is growing up, maturing, and developing a sense of responsibility beyond the rhetoric of social change and the rosy glow that comes from being in the business of helping others. For some, that ends the discussion; the search for measuring effectiveness joins the ranks of motherhood and apple pie, not to be questioned.
And that perspective achieved a direct hit on me: The jobs program I ran was in the mid-1970s expanded to nine other locations, and evaluated as a national demonstration. The results were displayed on elegant tables with asterisks and footnotes and numbers to the third decimal, but in the end the results could be expressed in simple narrative. The program did not substantially change the lives of the ex-convicts and recovering addicts who participated. The control group, in terms of jobs, income and avoidance of prison or substance abuse, succeeded at about the same rate as the participants.
What I had learned by then is that it is not so easy to change lives. The program I ran was ok, but the quality of its services and the depth of change it provided were nothing extraordinary. As the results showed.
I spent the next three decades working for organizations that conducted large-scale national research demonstrations (MDRC and Public/Private Ventures); the evaluations were the critical element of these demonstrations, the raison-d'être of the organizations. It is that experience which produced other lessons, lessons that complicate the evaluation storyline. These lessons need not dampen our enthusiasm for effectiveness and accountability; but they do illuminate factors that affect whether and how we evaluate, and if there are other more pressing needs.
Lesson 1: Outcomes are not impacts. New philanthropists often confuse the two, because almost every social initiative declares outcome goals: in jobs programs, place 75% of participant in career-path jobs; in education, graduate 90% of freshman participants; etc.
It turns out, though, that not only is there little congruence between outcomes and impacts, but that the general lesson from the past three decades is that the higher the outcomes a program has, the less likely it is having an impact.
A good example is the 1983 federal Job Training Partnership Act (JTPA). Its programs aimed to place poor people with multiple obstacles to employment into jobs; the act put a major emphasis on quantitative placement rates as a measure of local success in achieving the act's goals. Local administrators set very high goals for the programs they funded, and offered financial incentives.
As local placement rates around the country began to soar—to over 80 percent in many locations—and were verified as factually accurate, critics speculated that these rate were too good, and indicated that most JTPA participants did not have serious obstacles to employment and would have gotten jobs even without JTPA's modest training interventions.
JTPA advocates scoffed. Then an impact study was done. It basically supported the critics. Well-specified outcomes, careful measurement, incentives and good performance all amounted to very little added value.
Why does this occur? Primarily for two reasons. First, it is not as easy as it appears to predict who is going to fail in school and in life. Sometimes advocates would have us believe that all poor people, all illiterate people, all children of one-parent families are going to end up jobless, in prison, on drugs or in some other desperate condition
But that is not true. The odds are higher that they will—but many, sometimes the majority, will, on their own, do all right in life.
So programs which use broad categories of disadvantage—as most do—are taking in many people who would have done all right without the program—and when the evaluation includes a control group, the results will show that.
Second, the higher the outcome goals, the more a program will try to select from the eligibles those who seem more likely to succeed—how else can high goals be reached? But a well-constructed impact evaluation will include in the control group only those that the program would have accepted—so they too will succeed at a high rate.
Thus evaluations that only track outcomes, and do not include a control group, often lead to satisfied funders and operators—but in fact are achieving no more than would have happened anyway.
This now well documented phenomenon has led many to conclude that an impact study, with a carefully constructed control group, is the only kind of evaluation worth doing. Thus the last twenty years have seen a significant increase in impact studies, and the growth of an industry with a number of strong organizations that can carry out large-scale, multi-site impact studies. The technical expertise involved is impressive, and the mathematical techniques involved are well beyond the intelligent layman.
In short, the science of evaluation of social programs has made enormous progress in telling us whether social programs make a lasting difference in participants' lives. The result?
Lesson 2: The overwhelming majority of social programs with impact studies do not show a significant change in participants' lives a year or two after the program. This phenomenon has not changed over the thirty-odd years that social programs have been evaluated; that is, while the science of evaluation has been improving and growing in sophistication, size, and resources devoted to it, our ability to actually improve lives through social programs has been consistently unimpressive.
This disjuncture in progress is not entirely surprising; the human ability to make technical progress has throughout history always surpassed our ability to make progress on less orderly social issues. But the disjuncture does belie the earlier conclusion that the increased interest in evaluation is a function of a maturing field of social programming. What is maturing is the field of evaluation; the field of social programming seems to be stuck in the same no-to-low performance mode that it's been in for the past forty years.
If you stopped at these two lessons, you might conclude that the "evaluation revolution," or "outcomes movement" as it is more popularly misnomered, has performed a great if discouraging service: it has shown us that outcomes per se are a deceptive measure; that setting high outcome goals does not necessarily increase impacts; and that social programs don't accomplish much. You might take a bold step further and conclude that social programs are a waste of money, whether the taxpayers' or philanthropists'.
The conclusions regarding outcomes are well founded, and are a great service. The conclusions regarding impacts are complicated by another lesson:
Lesson 3: The weak impacts we've evaluated are in good part an artifact of our approach to setting up impact studies. Impact studies are expensive; require significant numbers of participants to be highly reliable; and often are controversial, since control group members cannot receive the programs' services.
This combination of requirements means that government and large foundations are the likely funders; they have the money. Both these sectors hold innovation in high regard, in contrast to building on "old" ideas, or the ideas of competitors, and thus want to test and evaluate innovations. Multiple locations to test the ideas must be found, in order to achieve the high numbers required. New operations are often required both to get the numbers and to avoid the controversy over control groups, which is usually highest in established program operations.
The result of all the above is that the majority of well-done impact studies are on relatively new operations, carrying out relatively new ideas. And if the study contains an implementation evaluation component, there are almost without exception two major findings: that the new idea was not well or consistently implemented across the multiple sites, and that new operations had significant "startup" problems.
So it is usually not clear whether we're measuring a bad idea, or simply the poor implementation of an idea. This is a shaky foundation to come to the conclusion that social programs are a waste of money—especially when a small group of impact evaluations on established programs with decades of experience and high implementation standards and fidelity, such as Big Brothers Big Sisters and the Nurse-Family Partnership, have produced some impressive impacts.
Plus we simply don't know about the impact of a whole class of social programs—small, community-based operations that have too few numbers to ever be part of an impact study.
The "artifact" lesson is hardly the fault of evaluators; they evaluate what funders want evaluated. But it should make us pause in our rush to be accountable by means of impact evaluations. For the implication of this lesson is that the first things funders need to be accountable for is the quality of the program which they're funding. That requires patience, and a use of funds for things like training, and the development and implementation of quality standards, indicators and tools.
This lesson is not new. In the late 1970s I traveled to England to look at some of their social programs; they were stunned that we did impact evaluations without being sure that the program was well implemented. And in the mid-1990s our own Department of Labor did a review of all the evaluations it had funded, and concluded:
"It often takes time for programs to begin to work. Many of the success stories for the disadvantaged have come from programs which were operating for five years or more before they were evaluated." (U. S. Department of Labor, 1995)
The problem is that our impatience for results, the fact that new political administrations and new philanthropists want to be known for their innovations, means that it is not a lesson well learned. Instead these factors conspire to promote a sophisticated and maturing field of evaluation, thrust upon an ever-immature field of demonstration social programs.
Some evaluators are aware of this dilemma, and warn funders and innovators, and insist upon assessing quality before proceeding to impact evaluations. But it seems unrealistic to expect the evaluation industry to correct this practice: it needs work to satisfy its own successful growth, and the most lucrative work comes in the form of large-scale, multi-site research demonstrations. And there are always new political administrations, and wealthy new philanthropists, to satisfy that need.
The "evaluation revolution" has produced a number of important insights, most notably the counterintuitive, often inverse relationship between outcomes and impacts, and the fact that most programs are not well implemented and thus don't produce much in the way of impacts. Those that do produce impacts tend to be well established, with clear standards of quality.
The challenge for philanthropy and the public sector over the coming decades relates less to measuring outcomes or impacts, or to churning out innovations, than to helping establish?and measure?quality in program performance. It will, I think, be a hard challenge to imbed in funders' practice, as it does not have the excitement of innovation, or the elegance and certainty of impact evaluation. And in an impatient culture, thinking about quality delays our need for immediate results and accountability.
But its advantages would be the actual maturing of the world of social programming, and those that fund it. The emphasis would shift from creating answers at the political or philanthropic level and then further creating demonstrations to test those answers to looking at the thousands of efforts already created locally, and helping the most promising with the resources and assistance necessary to achieve quality. If some seemed unique enough to replicate, the resources and time to do so with quality would be the standard—not doing it so fast that an impact demonstration can be set up right away.
But what about results, actual impacts? We just wouldn't know within the usual five years. Maybe not for ten or fifteen years. But then we've spent the better part of four decades learning that actual impacts are very hard to achieve, and definitely can't be achieved without quality implementation.
Progress often involves slowing down, reflecting on the path taken, and shifting the course. I think that's the spot we're at in the world of social programs, and in our thinking that a focus on evaluating results will in itself produce better results.
Gary Walker served as president of Public/Private Ventures (P/PV) from 1995-2006.
Home | Learn About Hudson | Hudson Scholars | Find an Expert | Support Hudson | Contact Information | Site Map
Policy Centers | Research Areas | Publications & Op-Eds | Hudson Bookstore
Hudson Institute, Inc.
1015 15th Street, N.W. 6th Floor
Washington, DC 20005
Phone: 202.974.2400
Fax: 202.974.2410
Email the Webmaster
© Copyright 2013 Hudson Institute, Inc.