Demo

Dr Sunčica Hadžidedić

Saffron Zainchkovskaya

Recommender systems: exploring the effects of interpretable explainability on popular performance metrics

Issues:

Single domain exploration [3,5,6,7,8,11]
Lack of performance metrics available - mostly user study based

(not as conclusive)

Lack of the effect of addition of explainability on common metrics like RMSE, precision, and recall.

Some papers ignore standard performance metrics

Severe lack of human-interpretable explanations

Knowledge graphs
Cluster graphs
Latent factors

Literature

Understand Explainability in RSs Across Different Domains

Identifying any domain-specific challenges

Examine the effects of integrating explainability features on standard performance metrics, across multiple domains.
Address the lack of knowledge surrounding the effect of differing explainability incorporation on a multitude of evaluation metrics.
Transforming common explanability types to interpretable versions

Use metrics to evaluate such explainations.

aims for this project

Complete data collection and cleaning for the high impact domain
Expand the RS & ERS to a front end application
Implement the back end for the high impact domain (RS*& ERS*).
Perform offline comparisons to assess the impact of domain on explainability and trust.

Launch a complete mobile application for the RSs in the high impact domain.
Execute a comprehensive user studyxplainability and trust in the RS.

Complete data collection and cleaning for the low impact domain
Develop a baseline recommendation system (RS) and a explinable RS (ERS) in the low impact domain.
Conduct offline tests to compare the explainable and trustworthy RS against the baseline

INTERMEDIATE

ADVANCED

BASIC

OLd DELIVERABLES

Complete data collection and cleaning for the high-impact domain
Implement the back end for the high-impact domain (RS*& ERS*).
Perform offline comparisons to assess the impact of the domain on explainability and performance metrics.

Use GPT-3 API to process explanations produced by models to a more interpretable format.
Test this novel approach by executing a user focus group on the types of explainability produced.

Research the optimal techniques for explainability and evaluation
Complete data collection and cleaning for the low-impact domain
Develop a baseline recommendation system (RS) and an explainable RS (ERS) in the low-impact domain.
Conduct offline tests to compare the explainable and trustworthy RS against the baseline

INTERMEDIATE

ADVANCED

BASIC

NEW DELIVERABLES

Complete data collection and cleaning for the high-impact domain
Implement the back end for the high-impact domain (RS*& ERS*).
Perform offline comparisons to assess the impact of the domain on explainability and performance metrics.

Use ChatGPT API to process explanations produced by models to a more user-friendly format
Test this novel approach by executing a user focus group on the types of explainability produced.

Research the optimal techniques for explainability and evaluation
Complete data collection and cleaning for the low-impact domain, including text pre-processing using transformers (BERT)
Develop a baseline recommendation system (RS) and an explainable RS (ERS) in the low-impact domain.
Conduct offline tests to compare the explainable and trustworthy RS against the baseline

INTERMEDIATE

ADVANCED

BASIC

Progress ON DELIVERABLES

gpt 3 explanation with latent factors

Thorough research of explainability metrics

Collected extensive data on thids

Tested GPT3 for producing explanations succesfully from latent factors
Planned future steps

Completed data collection for the low-impact domain

Collected IMDB data using web scraping

Completed data cleaning for the low-impact domain

Processed IMDB data to retrieve descriptions, directors, and actors.
Used BERT transformer to generate embeddings for descriptions.

Completed data analysis for the low-impact domain

Analysed the produced data, ready to be fed into the RS

Milestones achieved pt 1

Project Plan Overview

Comparison with Actual Progress

Focused on back end before front end
Revised the gantt chart for more appropriate completion
Revised the time spent on data collection and analysis
Revised front end
Proposed new user study format
Proposed the use of a novel technique of using ChatGPT API to process the recommendations.

Adjustments Made

Challenge: Data Accessibility Issues

accessing a robust dataset for the high-impact domain.
The initial attempt to scrape data from Glassdoor was met with technical roadblocks
IP address getting blocked repeatedly.

Solution: Innovative Data Collection Strategies

Researched and found external application

designed for dynamic web scraping, which helped bypass the IP blocking issue.

Thought of strategy to periodically change our IP addresses

allowing to continue data collection without interruptions. (currently working on)

As a backup, a base dataset/ synthetic data.

Challenges and Solutions

Research Latent Factors method
Complete LF for ML
Create Explanations for low impact
Evaluate on metrics
Repeat for high impact

Next Steps

Recommender systems: exploring the effects of explainability on user trust

SAFFRON ZAINCHKOVSKAYA

THANK YOU!

SUCCESS METRICS- PRODUCT

Evaluation Criteria:

Accuracy: How accurate is the RSs? Has the incorporation of explainability lowered the accuracy?
Explainability: Is the incorporation of explainable recommendations within systems beneficial to users?
User Trust: Does this incorporation enhance the trust users have with the system?
Domains: Is the incorporation of explainations more important in a high/low impact domain?
Project Completion: How effectively was the project completed within the stipulated timeframe and project scope? Does the completion status of the project reflect on the quality and reliability of the RSs?

The project will be evaluated on a bi-weekly basis to ensure it is consistently on track.
During this evaluation the project components below will be examined

SUCCESS METRICS- PROJECT

PROJECT EVALUATION

PLANNING

SUCCESS METRICS- PRODUCT

Evaluation Criteria:

Accuracy: How accurate is the RSs? Has the incorporation of explainability lowered the accuracy?
Explainability: Is the incorporation of explainable recommendations within systems beneficial to users?
User Trust: Does this incorporation enhance the trust users have with the system?
Domains: Is the incorporation of explainations more important in a high/low impact domain?
Project Completion: How effectively was the project completed within the stipulated timeframe and project scope? Does the completion status of the project reflect on the quality and reliability of the RSs?

The project will be evaluated on a bi-weekly basis to ensure it is consistently on track.
During this evaluation the project components below will be examined

SUCCESS METRICS- PROJECT

PROJECT EVALUATION

PROJECT PROBLEM

Central problem addressed by this project

the lack of knowledge surrounding the effects of the incorporation of explainability within RSs
and the impact this has on a user’s trust level with the system

Also addresses the problem seen commonly within literature

where only one domain is being investigated

Demo

More creations to inspire you

MASTER'S THESIS ENGLISH

49ERS GOLD RUSH PRESENTATION

3 TIPS FOR AN INTERACTIVE PRESENTATION

RACISM AND HEALTHCARE

BRANCHES OF U.S. GOVERNMENT

TAKING A DEEPER DIVE

WWII TIMELINE WITH REVIEW

Transcript

Dr Sunčica Hadžidedić