Program Evaluation: Lessons Learned

A number of lessons have been learned from implementing peer reviews and critiques of past (pre-2006) outcome/impact evaluation studies that will help improve evaluation practice in EERE. Awareness of these lessons can help promote continuous improvement in the planning, design, conduct and use of evaluation studies in EERE, in other words, put best practices into action. It is recommended that program managers incorporate these lessons, as appropriate, into the Statement of Work used to hire an evaluator.

Lessons Learned from Peer Reviews

Although programs successfully plan, design and conduct peer reviews, programs are encouraged to make continuous improvement in implementing the peer review process. Areas where there are particular opportunities for further improvement include:

  • Ensuring the independence of the review and the review process;
  • Making sure there is sufficient time for a rigorous Q&A exchange between principal investigators and reviewers;
  • Providing reviewers with sufficient information and access to principal investigators;
  • Applying the peer review process to a higher Program level review (beyond just project-level review);
  • Shortening the time taken to prepare the final peer review report; and
  • Making the peer review report publicly available.

Lessons Learned from Outcome/Impact Evaluation Studies

Formulation of the Evaluation Statement of Work and Plan

  1. Develop a Statement of Work (SOW) for the Evaluation Study: Some general program evaluations have been initiated without program staff preparing a full SOW. This usually leads to an unproductive evaluation and wasted managerial time because a full consideration of the scope of the evaluation is not established before hiring an evaluator. Program staff should develop a SOW to use to hire an Evaluation Contractor. The SOW then becomes the basis for the more detailed Evaluation Plan that should be prepared by the Contractor before doing the actual study.

  2. Develop an Evaluation Plan: Not all programs have required their Evaluation Contractors to prepare a detailed evaluation Plan. The absence of a Plan may also lead to an unproductive evaluation, and one that produces results of less than expected quality. Contractors should be required to prepare (and defend -See item #4 on Quality Assurance) an Evaluation Plan. A Plan imposes some discipline on the Contractor and program staff who commissioned the study, to encourage them both to carefully consider critical evaluation planning and methodological assumptions for the proposed work before the project activity moves to the data collection stage.

  3. Evaluation objective statements should be clearly specified: Evaluation SOWs do not always describe the intended uses of the evaluation, the decisions under consideration, the types of information required, or even clearly define the evaluation objectives. The evaluation should be designed with specific objectives in mind, and these should be clearly described in the SOW. Program staff initially, and then in consultation with the evaluator, need to clarify intended uses of the evaluation, decisions under consideration, kinds of information required, and use this information to define clear evaluation objectives.

  4. A process to ensure quality assurance in the evaluation is essential: Some commissioned evaluation studies are not subjected to a fully and careful review. Evaluation should be conducted by outside independent experts. This means that, even if staff commission a study (fund an Evaluation Contractor) that contractor should have some degree of independence from the program office that is being evaluated. Also, the Contractor should have no real or perceived conflict of interest. Although the program staff person may work with the Contractor during the consultation phase to clarify information needs and discuss potential evaluation criteria and questions, the staff should establish some line of separation from the Contractor for much of the remainder of the evaluation study — i.e., put up a firewall after the initial consultation period is concluded. It is further highly recommended that a panel of external evaluation experts who are not part of the Contractor's team be assembled to serve as reviewers of the Contractor work. This would include the Evaluation Plan, and the draft and final reports. Having the work of the Contractor's team itself evaluated helps ensure the evaluation methodology is sound and the study is carried out efficiently. It also sets up a 2nd firewall that raises the credibility of the study to even a higher level (important for those who remain skeptical of evaluation studies commissioned by the program being evaluated).

Credibility of Results

  1. Double counting: The overlapping and interactive structure of program components can lead to possible double counting of energy savings when savings estimates attributable to each program component (or activity) are developed separately. Deployment programs may use the outputs of R&D programs. In such a case both programs may claim credit for energy savings resulting from their efforts. For outcome, impact, and cost-benefit evaluations, evaluators should be asked to identify areas where double counting is possible and describe how double counting would be avoided, addressed, and documented in the report.

  2. Sources of overestimation and underestimation: Often, outcome or impact evaluation studies report that their estimates are "conservative" in that overestimation is outweighed by underestimation. In other cases, spillover benefits from program outcomes may be hypothesized but not quantified because of the difficulty of making reliable estimates. On the other hand, it is sometimes observed that evaluation studies may overestimate savings, for example when they use savings multiplier factors (in lieu of site-specific measurement) to estimate savings across a diverse population. For outcome and impact evaluations, evaluators should be asked to clearly identify in the Evaluation Plan, and document in the report, all sources of overestimation & underestimation. Hypothesized spillover benefits should be discussed even if they are not quantified.

  3. Use of "savings factors" in lieu of site-specific measurement: When savings factors, e.g., kWh saved per energy efficiency measure outcome, are used in lieu of direct measurement they must be applied appropriately to match the profile of the population that they are intended to represent. It may not be correct to transfer savings factors to entities that have widely different profiles compared to those from which the savings factors were derived. Evaluators should be asked to fully describe the planned methodology for use of savings factors in the Evaluation Plan, including how they intend to account for site-by-site variation, applications variation, and other variations in the profile of the study population where these factors could be significant. Where savings factors are used, develop a means to check the reasonableness of the resultant energy savings numbers across the study population (e.g., acquire and evaluate information that can be used as a benchmark).

  4. Construction of attribution questions in surveys: When survey-based questions are used to address attribution, the questions have to be carefully structured to get at the attribution issue at hand. Failure to properly structure the questions will result in unreliable recipient responses. For example, a question such as "Did it influence your decision—Yes or No?" is inadequate for addressing attribution. An attribution question should not force a "yes" or "no" response. Instead, it should distinguish response by degree of influence (e.g., very little, somewhat, significant, dominant; or a numeric degree-of-influence scale). Survey-based attribution questions in draft survey instruments should allow for the many factors that can influence choice and be reviewed by evaluation peers before the survey is fielded.

  5. Survey non-response: A common problem encountered in survey work is non-response. Non-response can introduce error into survey results. The degree to which the results represent the intended population critically depends on the response rate. A poor response rate can undermine the external validity of the survey results. Evaluators who plan to use survey research should be asked to describe in the SOW and the Evaluation Plan their approach for avoiding, minimizing, or controlling potential non-response error. In the final report they should describe how they addressed non-response, and any implications for the reliability of the results. Evaluators should not consider the non-response problem for the first time after the survey is fielded.

  6. Explicit documentation of the source(s) of energy savings: Frequently, studies that are not based on site measurement of savings do not clearly describe the source of their reported energy savings. Savings based on factors used by different sources, e.g., states, are provided without describing the assumptions underlying the savings factors. Program managers should require that evaluators explicitly address in the Evaluation Plan how they intend to estimate energy savings and the assumptions underlying their estimates. This should also be documented in the final report.

  7. Describing caveats on data used in the evaluation: Budget constraints sometimes force compromises in the methodology used for data collection, yet the potential weaknesses created by these necessary choices are not acknowledged. The study needs to be sure to fully describe the caveats and other issues concerning the data used in the study. The report outline developed by program staff and the evaluation contractor should include a section on limitations and caveats regarding the data. The report should adequately and appropriately highlight any concerns and limitations about the data used. Data caveats should also be mentioned in the Executive Summary for the less reliable findings and recommendations.

  8. Sources of information: Evaluation reports have not always described sources of data in sufficient detail to allow an independent determination of the appropriateness of the information. The evaluation study SOW should stipulate that the evaluator must describe sources of data in enough detail to allow the appropriateness of the data to be determined. This description should be included in both the Evaluation Plan and the Final Report.

Interactions within Program and Across Programs

  1. Synergistic effects among program elements: Studies do not always make an effort to assess the synergistic effects among program elements – e.g., how a combination of publications, software tools, and technical assistance might be more effective than each as a separate entity. As appropriate, evaluators should be asked to describe in the Evaluation Plan how they intend to assess the synergistic effects among program elements. However, avoid double counting. (See item #1 under Credibility of Results.)

  2. The same population receives the services of multiple programs : When the same population receives the services of multiple programs, several difficult evaluation questions can arise. How do deployment activities and other programs that provide direct service to the same set of customers interact to produce a customer choice? How should the resulting outcomes be allocated? Program staff should clearly document what other programs within or outside of the program also serve their target audience. For impact evaluations, the Evaluation Plan should include a discussion of this issue and the plan for addressing it.

  3. Accounting for "shelf life" of programs' products: Energy efficiency measures and practices do not last forever. The effectiveness of most energy-efficient measures deteriorates with time. All have a useful effective life. These effects should be applied to the benefits side of cost-effectiveness evaluations. Program staff and the evaluator should decide how to account for savings shelf life. The evaluation contractor should describe in the Evaluation Plan how this will be accomplished.

Findings and Recommendations Presented in Reports

  1. Precision of reporting of the results: Reports sometimes report results at a level of precision that is not justified by the data and analysis. Evaluators should not report numbers with too many decimal places. In some cases, the evaluation contractor might consider reporting results as a point estimate within a range.

  2. Provide a list of clear, actionable and prioritized recommendations that are supported by the analysis: Some evaluation studies have not developed program-improvement recommendations for the client to consider, or do not always develop recommendations that are adequately supported by the analysis. Similarly, recommendations for improving the quality of the evaluation are often omitted, even though the evaluation report acknowledges difficulties in performing the evaluation. Evaluators should provide an explicit set of recommendations for both program and evaluation improvement, as appropriate, and ensure they are supported by the analysis conducted. Recommendations should be ranked in priority order (high, medium, low).

  3. Rank findings by level of defensibility: Outcome and impact evaluations that estimate savings by component or activity levels typically do not associate a level of defensibility to each reported component result. For outcome or impact evaluations, evaluators should report on the defensibility of each estimate associated with a particular program component for which a quantified finding was developed. This need not be a quantitative value; a subjective comment or ranking can be feasible based on the relative strengths and weaknesses of the respective methodologies. An alternative approach to this would describe caveats for the findings as described under Credibility of Results above.

  4. Program record keeping and database recommendations: Program record keeping and databases are rarely designed to support evaluation activity. Often information about participants that is important for evaluation procedures is missing from program records. Evaluators should make their program record-keeping recommendations for general program evaluation purposes explicit so the program can begin to collect these data for future evaluations.