Want to create interactive content? It’s easy in Genially!
[Slide] midpoint report
Kyra Zhou
Created on January 20, 2023
Start designing with a free template
Discover more than 1500 professional designs like these:
View
Geniaflix Presentation
View
Vintage Mosaic Presentation
View
Shadow Presentation
View
Newspaper Presentation
View
Zen Presentation
View
Audio tutorial
View
Pechakucha Presentation
Transcript
Backdoor attacks on NLP prompting
Mid-point report
@ kyraz
Backdoor attacks on NLP prompting
Mid-point report
@ kyraz
Backdoor attacks on NLP prompting
Mid-point report
@ kyraz
Before NLP prompting ...
Pre-trained language model
Downstream task
- A big neural network
- Pre-trained on large corpus like Wikipedia
- Guess next word or sentence
- e.g., BERT, RoBERTa
- End-user task
- Application of NLP
- e.g., sentiment analysis on movie reviews, hate speech detection model
Before NLP prompting ...
Pre-trained language model
Downstream task
Fine-tuning (add an extra neural network layer)
- A big neural network
- Pre-trained on large corpus like Wikipedia
- Guess next word or sentence
- e.g., BERT, RoBERTa
- End-user task
- Application of NLP
- e.g., sentiment analysis on movie reviews, hate speech detection model
Before NLP prompting ...
Pre-trained language model
Downstream task
Fine-tuning (add an extra neural network layer)
Problem lack of labelled datasets
Prompt-based learning
Manual, Auto, differential prompts
Manual Discrete prompt
Automated Discrete prompt
Automated Differential prompt
- Efficient
- Only allow discrete words
- Lack interpretability
- Intuitive, easy to understand
- Time-consuming
- Sub-optimal
- Continuous space
- Interpretable
- Flexible control
Are Auto and Diff better than Manual?
* SST2: A binary sentiment analysis task on movie reviews
* QNLI: A binary textual entailment task on question-answer pairs
Backdoor attack
Assumptions:
- Attackers have access to the pre-trained language model (PLM)
- Attackers do not know the particular downstream task
- A successful attack preserves a high class discrimination score, but once the trigger is inserted, gives a high misclassified proportion of samples
Backdoor attack performance
MNLI-MATCHED
Poison triggers:["cf", "mn", "bb", "qt", "pt", "mt"]
Auto
Differential
Manual
Progress so far ...
PART 1 - Manual, Auto, Differential promptingPART 2 - Backdoor attacks
- What's the next step?
- Could a research project be ... (e.g., implementation-heavy)?
- What's your biggest takeaway?
- When is the latest time your supervisors replied to your email/messages?
If you don't have any questions... Here are some questions you may ask :)
Any Questions?
Appendix
Backdoor attack
Assumptions:
- Attackers have access to the pre-trained language model (PLM)
- Attackers do not know the particular downstream task
- A successful attack preserves a high class discrimination score, but once the trigger is inserted, gives a high misclassified proportion of samples
Backdoor attack performance
MNLI-MATCHED
Poison triggers:["cf", "mn", "bb", "qt", "pt", "mt"]
Why Auto prompting performs badly?
_____
________
___
___
___
_____
_______
_____
___
____
____
____
___
____
___
_______
__
Manual, Auto, differential prompts
Manual Discrete prompt
Automated Discrete prompt
- Less time-consuming
- Only allow discrete words
- Lack interpretability
- Intuitive, easy to understand
- Time-consuming
- Sub-optimal
Manual, Auto, differential prompts
Manual Discrete prompt
- Intuitive, easy to understand
- Time-consuming
- Sub-optimal
Differential prompting
Auto prompting
Auto prompting - verbaliser
Are Auto and Diff better than Manual?
* SST2: A binary sentiment analysis task on movie reviews
Are Auto and Diff better than Manual?
* QNLI: A binary textual entailment task on question-answer pairs
Are Auto and Diff better than Manual?
* TWEETS-HATE-OFFENSIVE: A safety-critical multi-class hate/offensive speech detection task
Backdoored PLM
Backdoor attack performance
Visualise mask embedding
MNLI-MATCHED Auto
K = 16
K = 1000
K = 100
Visualise mask embedding
MNLI-MATCHED Differential
K = 16
K = 1000
K = 100
Visualise mask embedding