Metrics of Evaluation
Sharon Welburn, PhD
Learning Objectives
- Define program evaluation
- Differentiate between the 5 major types of evaluation and their use
- Identify potential interest holders
- Describe the GRADE approach in evaluating evidence
Let's Think about it.
- When should you start thinking about evaluating a project?
Why Evaluate?
- Ensures accountability and continuous improvement
- Provides data for decision-making in public health programs
- Informs policy, funding, and practice change
- Bridges the gap between research and real-world program implementation
"Program Evaluation is the use of social research methods to systematically investigate the effectiveness of social intervention programs in ways that are adapted to their political and organizational environments and are designed to inform social action to improve social conditions
Rossi, Lipsey, Freeman, 2004
Who needs the program, how great the need is, and what might work to meet the need
F O R M A T I V E
Needs Assessment
5 Major Types of Evaluation
Is it likely to work?
Feasibility
"When the cook tastes the soup, that's formative; When the guests taste the soup, that's summative."
How is the program being delivered? Is the delivery effective?
Process
S U M M A T I V E
Did behavior / knowledge change?
Outcome
Did the program improve health?
NSF Evaluation Handbook
Impact
Where does it all fit?
If a "need" is the gap between the ideal and current health status of a target population then a "needs assessment" is the process of gathering data to know what the gap is, and what precedes it.
Things to Consider
- Specific target population
- Socio-ecological levels of influence
Some Indicators
Environmental
Mental Health
Social Health
Physical Health
- Poverty
- Education
- Crime
- Supports
- Morbidity
- Mortality
- Health costs
- Prevalence of a risk factor
- Utilization rates
- Toxins & pollutants
- Transportation
- Housing
- Mental health care costs
- Medications
- Lifespan
Don't forget to assess strengths
- Cultural influences that would support an intervention
- Faith or spiritual support
- Availability of resources including effective interventions
- Community wisdom or experience
- Resilience
Methods
- Public health data (secondary data)
- Interviews, focus groups, etc. with stakeholders (primary data)
- Mixed methods is ideal - qualitative can help interpret quantitative
Steps
Step 4
Step 3
Step 2
Step 1
Gather data, only take what you can use.
What's your scope?
Report and share!
Analyze
Data collected from pilot situations and recipients while developing an intervention to obtain feedback about feasibility of proposed activities and fit with intended settings and recipients.
Assessing validity & feasibility
- Design Review
- Expert Review
- Resources in place?
- Pilot it!
- How will you apply what you've found?
Methods
- Focus groups
- Observation
- Open-ended interviews
- Expert judgement
- Equipment trial
Where does it all fit?
Process Objectives
- Program components are the basis for selecting or developing instruments to measure aspects of program
- Extent of implementation
- Scope of implementation
- Asks who, what, when, and how many program activities and outputs were accomplished
- Answers to these questions allow us to assess if activities are being delivered as intended
- Helps determine areas where program needs to be improved
Process Evaluation Questions
- Were program activities accomplished?
- Were milestones achieved as planned (on time)?
- How well were activities implemented?
- Was the target audience reached?
- How did external factors influence program delivery?
Outcome Monitoring
- Results focused
- Short-term outcomes MAY be attainable in 1-3 years
- Mid-term outcomes MAY be achievable in 4-6 years
- Connectedness
- Short-term outcomes must be achieved in order for mid-term outcomes to occur
Short-term vs. Mid-Term
Mid-term
Short-term
- Knowledge
- Attitudes
- Beliefs
Outcome Evaluation Questions
- Did the intervention CAUSE the expected outcomes?
- How do we know this?
Impact Assessment
- Deeper, long-term outcomes
- May occur after the conclusion of project funding
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
Interest Holders
Identifying Interest Holders
- Who would be served or affected by the program?
- Who is helping plan or implement the program?
- Who might find the findings useful?
- Who are skeptical about the program?
Helpful Input
- Who do they represent and why are they interested in the program?
- What is important about the program to them?
- What would they like the program to accomplish?
- How much progress would they expect the program to make at various times? (milestones?)
- What do they see as critical evaluation questions?
- How would they use the results of the evaluation?
- What resources (time, funds, expertise, access to respondents or policymakers) might they contribute to the evaluation effort?
'GRADE'ing the Evidence Quality
What is GRADE?
- Grading of Recommendations, Assessment, Development, and Evaluation
- Widely used in guideline development, systematic reviews, and public-health decision making
- Study Design
- RCTs - rating starts at HIGH quality
- non-RCTs - rating stats at LOW quality
- Assesses quality of evidence against 8 criteria
Ryan R, Hill S (2016) How to GRADE the quality of the evidence. Cochrane Consumers and Communication Group, available at http://cccrg.cochrane.org/author-resources. Version 3.0 December 2016.
GRADE Criteria
- Deductions:
- Risk of Bias
- Inconsistency
- Indirectness
- Imprecision
- Publication bias
- Upgrades:
- Large magnitude of effect
- Dose response
- Effect of all plausible confounding factors
Ryan R, Hill S (2016) How to GRADE the quality of the evidence. Cochrane Consumers and Communication Group, available at http://cccrg.cochrane.org/author-resources. Version 3.0 December 2016.
GRADE Rating
1. Risk of Bias
- Degree to which study design may have introduced systematic error
- RCTs start as high quality but can be downgraded for poor methods
- Indicators of concern:
- Lack of randomization or allocation concealment
- No blinding of participants or assessors
- High loss to follow-up or selective reporting
- Incomplete data or deviations from protocol
What are the limitations?
2. Inconsistency
- Variation in results across different studies (heterogeneity)
- Consistent direction and magnitude of effect strengthens confidence
- Indicators of concern:
- Large variation in point estimates
- Confidence intervals that barely overlap
- High I² statistic in meta-analysis (>50%)
- No clear explanation for variability
How consistent are the results?
3. Indirectness
- Evidence doesn't directly apply to research question, population, or intervention of interest (can possibly use indirect comparison: A with C and B with C)
- Indicators of concern:
- Population differs from target (e.g., adults studied but intervention for children)
- Surrogate outcomes instead of clinical outcomes
- Intervention or comparator not identical to the one of interest
- Setting or implementation context differs
How do these results apply to my review question?
4. Imprecision
- Results are uncertain due to small sample size or wide confidence intervals
- Indicators of concern:
- Confidence interval crosses the threshold for meaningful benefit or harm
- Small total number of events
- Studies underpowered to detect an effect
How precise is the effect size?
5. Publication Bias
- The published evidence is systematically unrepresentative of all research conducted
- Indicators of concern:
- Non-publication of negative or null studies
- Selective outcome reporting
- Funding source bias (e.g., industry-sponsored trials)
- Funnel plot asymmetry in meta-analysis
Are these all of the relevant studies?
Reasons to Upgrade
- Rare to upgrade quality of evidence
- Very rare to upgrade evidence from RCTs that were downgraded
- For observational studies, only evidence with no important validity threats should be upgraded
- 3 Major possible reasons to upgrade
Ryan R, Hill S (2016) How to GRADE the quality of the evidence. Cochrane Consumers and Communication Group, available at http://cccrg.cochrane.org/author-resources. Version 3.0 December 2016.
6. Large Effect
- When an effect is so large that bias is unlikely to fully explain it
- Applies mostly to observational studies
- Indicators to upgrade:
- RR or OR > 2 (or < 0.5) with no plausible confounding
- Clear, consistent direction of effect
Is there a large magnitude of effect?
7. Dose-Response
- Clear relationship between the amount of exposure and the magnitude of effect increases confidence in causality
- Indicators to upgrade:
- Stepwise increases in benefit or harm with higher exposure
- Linear trend across exposure categories
Is there a dose-response gradient in the findings?
8. All Plausible Confounding Factors
- All reasonable sources of bias would diminish (not exaggerate) the observed effect, confidence increases
- Indicators to upgrade:
- Known confounders would bias toward the null
- Effect observed despite conservative bias
- Direction of residual confounding predictable
Have all plausible confounding factors been accounted for?
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
Questions?
Evaluation Participation
- What would an evaluation plan look like for your project?
- Next week, we'll focus on the CDC framework of program evaluation
- Make sure to read Sriram and Pullybank articles on Strong Hearts, Healthy Communities
Metrics of Evaluation
Sharon Welburn (Slovina)
Created on October 27, 2025
Start designing with a free template
Discover more than 1500 professional designs like these:
View
Audio tutorial
View
Pechakucha Presentation
View
Desktop Workspace
View
Decades Presentation
View
Psychology Presentation
View
Medical Dna Presentation
View
Geometric Project Presentation
Explore all templates
Transcript
Metrics of Evaluation
Sharon Welburn, PhD
Learning Objectives
Let's Think about it.
Why Evaluate?
"Program Evaluation is the use of social research methods to systematically investigate the effectiveness of social intervention programs in ways that are adapted to their political and organizational environments and are designed to inform social action to improve social conditions
Rossi, Lipsey, Freeman, 2004
Who needs the program, how great the need is, and what might work to meet the need
F O R M A T I V E
Needs Assessment
5 Major Types of Evaluation
Is it likely to work?
Feasibility
"When the cook tastes the soup, that's formative; When the guests taste the soup, that's summative."
How is the program being delivered? Is the delivery effective?
Process
S U M M A T I V E
Did behavior / knowledge change?
Outcome
Did the program improve health?
NSF Evaluation Handbook
Impact
Where does it all fit?
If a "need" is the gap between the ideal and current health status of a target population then a "needs assessment" is the process of gathering data to know what the gap is, and what precedes it.
Things to Consider
Some Indicators
Environmental
Mental Health
Social Health
Physical Health
Don't forget to assess strengths
Methods
Steps
Step 4
Step 3
Step 2
Step 1
Gather data, only take what you can use.
What's your scope?
Report and share!
Analyze
Data collected from pilot situations and recipients while developing an intervention to obtain feedback about feasibility of proposed activities and fit with intended settings and recipients.
Assessing validity & feasibility
Methods
Where does it all fit?
Process Objectives
Process Evaluation Questions
Outcome Monitoring
Short-term vs. Mid-Term
Mid-term
Short-term
Outcome Evaluation Questions
Impact Assessment
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
Interest Holders
Identifying Interest Holders
Helpful Input
'GRADE'ing the Evidence Quality
What is GRADE?
Ryan R, Hill S (2016) How to GRADE the quality of the evidence. Cochrane Consumers and Communication Group, available at http://cccrg.cochrane.org/author-resources. Version 3.0 December 2016.
GRADE Criteria
Ryan R, Hill S (2016) How to GRADE the quality of the evidence. Cochrane Consumers and Communication Group, available at http://cccrg.cochrane.org/author-resources. Version 3.0 December 2016.
GRADE Rating
1. Risk of Bias
- Degree to which study design may have introduced systematic error
- RCTs start as high quality but can be downgraded for poor methods
- Indicators of concern:
- Lack of randomization or allocation concealment
- No blinding of participants or assessors
- High loss to follow-up or selective reporting
- Incomplete data or deviations from protocol
What are the limitations?2. Inconsistency
- Variation in results across different studies (heterogeneity)
- Consistent direction and magnitude of effect strengthens confidence
- Indicators of concern:
- Large variation in point estimates
- Confidence intervals that barely overlap
- High I² statistic in meta-analysis (>50%)
- No clear explanation for variability
How consistent are the results?3. Indirectness
- Evidence doesn't directly apply to research question, population, or intervention of interest (can possibly use indirect comparison: A with C and B with C)
- Indicators of concern:
- Population differs from target (e.g., adults studied but intervention for children)
- Surrogate outcomes instead of clinical outcomes
- Intervention or comparator not identical to the one of interest
- Setting or implementation context differs
How do these results apply to my review question?4. Imprecision
- Results are uncertain due to small sample size or wide confidence intervals
- Indicators of concern:
- Confidence interval crosses the threshold for meaningful benefit or harm
- Small total number of events
- Studies underpowered to detect an effect
How precise is the effect size?5. Publication Bias
- The published evidence is systematically unrepresentative of all research conducted
- Indicators of concern:
- Non-publication of negative or null studies
- Selective outcome reporting
- Funding source bias (e.g., industry-sponsored trials)
- Funnel plot asymmetry in meta-analysis
Are these all of the relevant studies?Reasons to Upgrade
Ryan R, Hill S (2016) How to GRADE the quality of the evidence. Cochrane Consumers and Communication Group, available at http://cccrg.cochrane.org/author-resources. Version 3.0 December 2016.
6. Large Effect
- When an effect is so large that bias is unlikely to fully explain it
- Applies mostly to observational studies
- Indicators to upgrade:
- RR or OR > 2 (or < 0.5) with no plausible confounding
- Clear, consistent direction of effect
Is there a large magnitude of effect?7. Dose-Response
- Clear relationship between the amount of exposure and the magnitude of effect increases confidence in causality
- Indicators to upgrade:
- Stepwise increases in benefit or harm with higher exposure
- Linear trend across exposure categories
Is there a dose-response gradient in the findings?8. All Plausible Confounding Factors
- All reasonable sources of bias would diminish (not exaggerate) the observed effect, confidence increases
- Indicators to upgrade:
- Known confounders would bias toward the null
- Effect observed despite conservative bias
- Direction of residual confounding predictable
Have all plausible confounding factors been accounted for?01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
01:00
Now... Let's See What You remember
Questions?
Evaluation Participation