Want to create interactive content? It’s easy in Genially!

Metrics of Evaluation

Sharon Welburn (Slovina)

Created on October 27, 2025

Start designing with a free template

Discover more than 1500 professional designs like these:

Newspaper Presentation

Audio tutorial

Pechakucha Presentation

Desktop Workspace

Decades Presentation

Psychology Presentation

Medical Dna Presentation

Explore all templates

Metrics of Evaluation

Sharon Welburn, PhD

Learning Objectives

Define program evaluation
Differentiate between the 5 major types of evaluation and their use
Identify potential interest holders
Describe the GRADE approach in evaluating evidence

Let's Think about it.

Why do evaluation?

What is evaluation?

When should you start thinking about evaluating a project?

Why Evaluate?

Ensures accountability and continuous improvement
Provides data for decision-making in public health programs
Informs policy, funding, and practice change
Bridges the gap between research and real-world program implementation

"Program Evaluation is the use of social research methods to systematically investigate the effectiveness of social intervention programs in ways that are adapted to their political and organizational environments and are designed to inform social action to improve social conditions

Rossi, Lipsey, Freeman, 2004

Who needs the program, how great the need is, and what might work to meet the need

F O R M A T I V E

Needs Assessment

5 Major Types of Evaluation

Is it likely to work?

Feasibility

"When the cook tastes the soup, that's formative; When the guests taste the soup, that's summative."

How is the program being delivered? Is the delivery effective?

Process

S U M M A T I V E

Did behavior / knowledge change?

Outcome

Did the program improve health?

NSF Evaluation Handbook

Impact

Where does it all fit?

If a "need" is the gap between the ideal and current health status of a target population then a "needs assessment" is the process of gathering data to know what the gap is, and what precedes it.

Things to Consider

Specific target population

Socio-ecological levels of influence

Theory

Some Indicators

Environmental

Mental Health

Social Health

Physical Health

Poverty
Education
Crime
Supports

Morbidity
Mortality
Health costs
Prevalence of a risk factor
Utilization rates

Toxins & pollutants
Transportation
Housing

Mental health care costs
Medications
Lifespan

Don't forget to assess strengths

Cultural influences that would support an intervention
Faith or spiritual support
Availability of resources including effective interventions
Community wisdom or experience
Resilience

Methods

Public health data (secondary data)

Interviews, focus groups, etc. with stakeholders (primary data)

Mixed methods is ideal - qualitative can help interpret quantitative

Steps

Step 4

Step 3

Step 2

Step 1

Gather data, only take what you can use.

What's your scope?

Report and share!

Analyze

Data collected from pilot situations and recipients while developing an intervention to obtain feedback about feasibility of proposed activities and fit with intended settings and recipients.

Assessing validity & feasibility

Design Review
Expert Review
Resources in place?
Pilot it!
How will you apply what you've found?

Methods

Focus groups
Observation
Open-ended interviews
Expert judgement
Equipment trial

Where does it all fit?

Process Objectives

Program components are the basis for selecting or developing instruments to measure aspects of program

Extent of implementation
Scope of implementation

Asks who, what, when, and how many program activities and outputs were accomplished
Answers to these questions allow us to assess if activities are being delivered as intended
Helps determine areas where program needs to be improved

Process Evaluation Questions

Were program activities accomplished?
Were milestones achieved as planned (on time)?
How well were activities implemented?
Was the target audience reached?
How did external factors influence program delivery?

Outcome Monitoring

Results focused
Short-term outcomes MAY be attainable in 1-3 years
Mid-term outcomes MAY be achievable in 4-6 years
Connectedness

Short-term outcomes must be achieved in order for mid-term outcomes to occur

Short-term vs. Mid-Term

Mid-term

Short-term

Knowledge
Attitudes
Beliefs

Behaviors
Skills

Outcome Evaluation Questions

Did the intervention CAUSE the expected outcomes?
How do we know this?

Impact Assessment

Deeper, long-term outcomes

Causally distal outcomes

May occur after the conclusion of project funding

01:00

Now... Let's See What You remember

01:00

Now... Let's See What You remember

01:00

Now... Let's See What You remember

01:00

Now... Let's See What You remember

01:00

Now... Let's See What You remember

Interest Holders

Identifying Interest Holders

Who would be served or affected by the program?
Who is helping plan or implement the program?
Who might find the findings useful?
Who are skeptical about the program?

Helpful Input

Who do they represent and why are they interested in the program?
What is important about the program to them?
What would they like the program to accomplish?
How much progress would they expect the program to make at various times? (milestones?)
What do they see as critical evaluation questions?
How would they use the results of the evaluation?
What resources (time, funds, expertise, access to respondents or policymakers) might they contribute to the evaluation effort?

'GRADE'ing the Evidence Quality

What is GRADE?

Grading of Recommendations, Assessment, Development, and Evaluation

Widely used in guideline development, systematic reviews, and public-health decision making

Study Design

RCTs - rating starts at HIGH quality
non-RCTs - rating stats at LOW quality

Assesses quality of evidence against 8 criteria

Ryan R, Hill S (2016) How to GRADE the quality of the evidence. Cochrane Consumers and Communication Group, available at http://cccrg.cochrane.org/author-resources. Version 3.0 December 2016.

GRADE Criteria

Deductions:

Risk of Bias
Inconsistency
Indirectness
Imprecision
Publication bias

Upgrades:

Large magnitude of effect
Dose response
Effect of all plausible confounding factors

Ryan R, Hill S (2016) How to GRADE the quality of the evidence. Cochrane Consumers and Communication Group, available at http://cccrg.cochrane.org/author-resources. Version 3.0 December 2016.

GRADE Rating

1. Risk of Bias

Degree to which study design may have introduced systematic error
RCTs start as high quality but can be downgraded for poor methods
Indicators of concern:

Lack of randomization or allocation concealment
No blinding of participants or assessors
High loss to follow-up or selective reporting
Incomplete data or deviations from protocol

What are the limitations?

2. Inconsistency

Variation in results across different studies (heterogeneity)
Consistent direction and magnitude of effect strengthens confidence
Indicators of concern:

Large variation in point estimates
Confidence intervals that barely overlap
High I² statistic in meta-analysis (>50%)
No clear explanation for variability

How consistent are the results?

3. Indirectness

Evidence doesn't directly apply to research question, population, or intervention of interest (can possibly use indirect comparison: A with C and B with C)
Indicators of concern:

Population differs from target (e.g., adults studied but intervention for children)
Surrogate outcomes instead of clinical outcomes
Intervention or comparator not identical to the one of interest
Setting or implementation context differs

How do these results apply to my review question?

4. Imprecision

Results are uncertain due to small sample size or wide confidence intervals
Indicators of concern:

Confidence interval crosses the threshold for meaningful benefit or harm
Small total number of events
Studies underpowered to detect an effect

How precise is the effect size?

5. Publication Bias

The published evidence is systematically unrepresentative of all research conducted
Indicators of concern:

Non-publication of negative or null studies
Selective outcome reporting
Funding source bias (e.g., industry-sponsored trials)
Funnel plot asymmetry in meta-analysis

Are these all of the relevant studies?

Reasons to Upgrade

Rare to upgrade quality of evidence
Very rare to upgrade evidence from RCTs that were downgraded
For observational studies, only evidence with no important validity threats should be upgraded

3 Major possible reasons to upgrade

Ryan R, Hill S (2016) How to GRADE the quality of the evidence. Cochrane Consumers and Communication Group, available at http://cccrg.cochrane.org/author-resources. Version 3.0 December 2016.

6. Large Effect

When an effect is so large that bias is unlikely to fully explain it
Applies mostly to observational studies
Indicators to upgrade:

RR or OR > 2 (or < 0.5) with no plausible confounding
Clear, consistent direction of effect

Is there a large magnitude of effect?

7. Dose-Response

Clear relationship between the amount of exposure and the magnitude of effect increases confidence in causality
Indicators to upgrade:

Stepwise increases in benefit or harm with higher exposure
Linear trend across exposure categories

Is there a dose-response gradient in the findings?

8. All Plausible Confounding Factors

All reasonable sources of bias would diminish (not exaggerate) the observed effect, confidence increases
Indicators to upgrade:

Known confounders would bias toward the null
Effect observed despite conservative bias
Direction of residual confounding predictable

Have all plausible confounding factors been accounted for?

01:00

Now... Let's See What You remember

01:00

Now... Let's See What You remember

01:00

Now... Let's See What You remember

01:00

Now... Let's See What You remember

01:00

Now... Let's See What You remember

Questions?

Evaluation Participation

What would an evaluation plan look like for your project?

1-2 sentences

Next week, we'll focus on the CDC framework of program evaluation

Make sure to read Sriram and Pullybank articles on Strong Hearts, Healthy Communities

View

Newspaper Presentation

View

Audio tutorial

View

Pechakucha Presentation

View

Desktop Workspace

View

Decades Presentation

View

Psychology Presentation

View

Medical Dna Presentation

Metrics of Evaluation

Start designing with a free template

View

Newspaper Presentation

View

Audio tutorial

View

Pechakucha Presentation

View

Desktop Workspace

View

Decades Presentation

View

Psychology Presentation

View

Medical Dna Presentation

Transcript

Metrics of Evaluation

Learning Objectives

Let's Think about it.

Why Evaluate?

"Program Evaluation is the use of social research methods to systematically investigate the effectiveness of social intervention programs in ways that are adapted to their political and organizational environments and are designed to inform social action to improve social conditions

F O R M A T I V E

Needs Assessment

5 Major Types of Evaluation

Feasibility

Process

S U M M A T I V E

Outcome

NSF Evaluation Handbook

Impact

Where does it all fit?

If a "need" is the gap between the ideal and current health status of a target population then a "needs assessment" is the process of gathering data to know what the gap is, and what precedes it.

Things to Consider

Some Indicators

Environmental

Mental Health

Social Health

Physical Health

Don't forget to assess strengths

Methods

Steps

Step 4

Step 3

Step 2

Step 1

Data collected from pilot situations and recipients while developing an intervention to obtain feedback about feasibility of proposed activities and fit with intended settings and recipients.

Assessing validity & feasibility

Methods

Where does it all fit?

Process Objectives

Process Evaluation Questions

Outcome Monitoring

Short-term vs. Mid-Term

Mid-term

Short-term

Outcome Evaluation Questions

Impact Assessment

01:00

Now... Let's See What You remember

01:00

Now... Let's See What You remember

01:00

Now... Let's See What You remember

01:00

Now... Let's See What You remember

01:00

Now... Let's See What You remember

Interest Holders

Identifying Interest Holders

Helpful Input

'GRADE'ing the Evidence Quality

What is GRADE?

GRADE Criteria

GRADE Rating

1. Risk of Bias

2. Inconsistency

3. Indirectness

4. Imprecision

5. Publication Bias