Program Evaluation: Rating Systems for Peer Reviews

Both the review panel and the presenters should clearly understand the objectives and guidelines for the review as well as the specific evaluation criteria that will be addressed. The review leader and chairperson should determine how the projects/program would be rated and distributed to both reviewers and those being reviewed a written description (evaluation guidelines) of the evaluation method. These guidelines should describe the purpose and scope of the review, the evaluation criteria and questions, data to be presented, and how the data will be collected from reviewers, analyzed and reported. An example is provided in Appendix H of the guide.

A great deal of valuable information is received from reviewers in the exchange during a panel session. A summary of opinions is captured in narrative form at the end of the review session. In addition to capturing in narrative (qualitative) form the opinions of the reviewers, this guide recommends that all reviews use a rating system of some sort. This does not have to be a numerical rating system. Ratings collected similarly across reviewers provide a way of summarizing the opinions of the reviewers. While even a well defined rating system may be interpreted slightly differently by reviewers, this summarization is likely to be perceived as more accurately reflecting the sum of opinion than is a narrative summary by the review leader or a group other than the reviewers themselves. The ratings also provide a way of comparing across projects or sets of activities when the same panel is reviewing multiple projects or sets.

A clear description of the rating system is an important element of the evaluation guidelines provided to reviewers. For example, a rating system could be a word scale, from "poor" to "outstanding" with all criteria having equal weight, or the system may be more complex. The three key decisions involved in designing the appropriate rating system are:

 

  • Determining whether it is qualitative or quantitative, the level of precision that is appropriate, its feasibility given the criteria and data involved, and what is actually required of it considering the decisions that the ratings inform;
  • Determining how to rate projects relative to the core criteria, which includes considerations of how to get consistency across individual reviewers and sub-program reviews, and could include considerations of scales and weights; and
  • Deciding whether panel discussion about ratings is to be encouraged or avoided, with awareness of the FACA restrictions regarding a group consensus.

Generally, the peer reviewers will individually provide an overall rating of the project and/or program taking into consideration all the review criteria. The review leader could develop an overall rating for the highest level of program structure reviewed (project, groups of projects, or program) as well as ratings for each criterion separately. These ratings help managers identify thresholds for action and to more specifically focus program improvement.

The Evaluation Form provided in Appendix J of the guide has a rating system for each of the EERE Core criteria and also asks for written explanation and comments. An example of a summary tally sheet is provided in Appendix K for when a numerical rating system is used and the review panel reviews multiple projects. Another example is the Superconductivity Program peer review process that assigns weights and uses the rating scale of 0 to 10 connected to adjectives of "not adequate" through "excellent". For ratings on the more specific questions, the superconductivity program provides a short statement and asks reviewers to indicate the degree to which they agree with that statement (strongly disagree to strongly agree). The Chemical Visions Program review rating scale uses a 5-point word scale (called anchors) with a sentence describing what would be true for each of the 5 ratings.

It is important to recognize that a single number does not adequately reflect the many dimensions of the program considered by the peer review. It is also important to recognize that ratings across different peer review panels and groups of projects cannot generally be compared without careful study to determine how to anchor the scales and normalize the results.

If there are multiple project or subprogram reviews within a program that are phased over time or performed by different panels in the same time frame, it helps to have one reviewer who is part of the series of reviews to help calibrate differences across the panels. For example, a reviewer present at the subprogram review would represent that sub panel on the overall panel reviewing the entire program. These concerns also apply to how review comments and ratings about a program will be considered across time. There is a need to provide some continuity and consistency, perhaps by having some reviewers serve on successive panels, and by the program presenting results from different time periods.

When multiple projects are being reviewed by the same reviewers and some evaluation criteria are more important to a particular program element than to others, it may be appropriate to assign weights to each criterion according to the importance of that criterion to the program element. Then when weighted ratings on criteria are summed across criteria, the sum will reflect more accurately the peers' assessments. The same may hold true for questions within a criterion.

Learn about other peer review preparation activities.