Backdoor attacks on NLP prompting
Mid-point report
@ kyraz
Backdoor attacks on NLP prompting
Mid-point report
@ kyraz
Backdoor attacks on NLP prompting
Mid-point report
@ kyraz
Before NLP prompting ...
Pre-trained language model
Downstream task
- A big neural network
- Pre-trained on large corpus like Wikipedia
- Guess next word or sentence
- e.g., BERT, RoBERTa
- End-user task
- Application of NLP
- e.g., sentiment analysis on movie reviews, hate speech detection model
Before NLP prompting ...
Pre-trained language model
Downstream task
Fine-tuning (add an extra neural network layer)
- A big neural network
- Pre-trained on large corpus like Wikipedia
- Guess next word or sentence
- e.g., BERT, RoBERTa
- End-user task
- Application of NLP
- e.g., sentiment analysis on movie reviews, hate speech detection model
Before NLP prompting ...
Pre-trained language model
Downstream task
Fine-tuning (add an extra neural network layer)
Problem lack of labelled datasets
Prompt-based learning
Manual, Auto, differential prompts
Manual Discrete prompt
Automated Discrete prompt
Automated Differential prompt
- Efficient
- Only allow discrete words
- Intuitive, easy to understand
- Time-consuming
- Sub-optimal
- Continuous space
- Interpretable
Are Auto and Diff better than Manual?
* SST2: A binary sentiment analysis task on movie reviews
* QNLI: A binary textual entailment task on question-answer pairs
Backdoor attack
Assumptions:
- Attackers have access to the pre-trained language model (PLM)
- Attackers do not know the particular downstream task
- A successful attack preserves a high class discrimination score, but once the trigger is inserted, gives a high misclassified proportion of samples
Backdoor attack performance
MNLI-MATCHED
Poison triggers:["cf", "mn", "bb", "qt", "pt", "mt"]
Auto
Differential
Manual
Progress so far ...
PART 1 - Manual, Auto, Differential promptingPART 2 - Backdoor attacks
- Could a research project be ... (e.g., implementation-heavy)?
- What's your biggest takeaway?
- When is the latest time your supervisors replied to your email/messages?
... ...
If you don't have any questions... Here are some questions you may ask :)
Any Questions?
Appendix
Backdoor attack
Assumptions:
- Attackers have access to the pre-trained language model (PLM)
- Attackers do not know the particular downstream task
- A successful attack preserves a high class discrimination score, but once the trigger is inserted, gives a high misclassified proportion of samples
Backdoor attack performance
MNLI-MATCHED
Poison triggers:["cf", "mn", "bb", "qt", "pt", "mt"]
Why Auto prompting performs badly?
_____
________
___
___
___
_____
_______
_____
___
____
____
____
___
____
___
_______
__
Manual, Auto, differential prompts
Manual Discrete prompt
Automated Discrete prompt
- Less time-consuming
- Only allow discrete words
- Intuitive, easy to understand
- Time-consuming
- Sub-optimal
Manual, Auto, differential prompts
Manual Discrete prompt
- Intuitive, easy to understand
- Time-consuming
- Sub-optimal
Differential prompting
Auto prompting
Auto prompting - verbaliser
Are Auto and Diff better than Manual?
* SST2: A binary sentiment analysis task on movie reviews
Are Auto and Diff better than Manual?
* QNLI: A binary textual entailment task on question-answer pairs
Are Auto and Diff better than Manual?
* TWEETS-HATE-OFFENSIVE: A safety-critical multi-class hate/offensive speech detection task
Backdoored PLM
Backdoor attack performance
Visualise mask embedding
MNLI-MATCHED Auto
K = 16
K = 1000
K = 100
Visualise mask embedding
MNLI-MATCHED Differential
K = 16
K = 1000
K = 100
Visualise mask embedding
MNLI-MATCHED Manual
K = 16
K = 1000
K = 100
[Slide] midpoint report
Kyra Zhou
Created on January 20, 2023
Start designing with a free template
Discover more than 1500 professional designs like these:
View
Tech Presentation Mobile
View
Geniaflix Presentation
View
Vintage Mosaic Presentation
View
Shadow Presentation
View
Newspaper Presentation
View
Zen Presentation
View
Audio tutorial
Explore all templates
Transcript
Backdoor attacks on NLP prompting
Mid-point report
@ kyraz
Backdoor attacks on NLP prompting
Mid-point report
@ kyraz
Backdoor attacks on NLP prompting
Mid-point report
@ kyraz
Before NLP prompting ...
Pre-trained language model
Downstream task
Before NLP prompting ...
Pre-trained language model
Downstream task
Fine-tuning (add an extra neural network layer)
Before NLP prompting ...
Pre-trained language model
Downstream task
Fine-tuning (add an extra neural network layer)
Problem lack of labelled datasets
Prompt-based learning
Manual, Auto, differential prompts
Manual Discrete prompt
Automated Discrete prompt
Automated Differential prompt
Are Auto and Diff better than Manual?
* SST2: A binary sentiment analysis task on movie reviews
* QNLI: A binary textual entailment task on question-answer pairs
Backdoor attack
Assumptions:
Backdoor attack performance
MNLI-MATCHED
Poison triggers:["cf", "mn", "bb", "qt", "pt", "mt"]
Auto
Differential
Manual
Progress so far ...
PART 1 - Manual, Auto, Differential promptingPART 2 - Backdoor attacks
- When is the latest time your supervisors replied to your email/messages?
... ...If you don't have any questions... Here are some questions you may ask :)
Any Questions?
Appendix
Backdoor attack
Assumptions:
Backdoor attack performance
MNLI-MATCHED
Poison triggers:["cf", "mn", "bb", "qt", "pt", "mt"]
Why Auto prompting performs badly?
_____
________
___
___
___
_____
_______
_____
___
____
____
____
___
____
___
_______
__
Manual, Auto, differential prompts
Manual Discrete prompt
Automated Discrete prompt
Manual, Auto, differential prompts
Manual Discrete prompt
Differential prompting
Auto prompting
Auto prompting - verbaliser
Are Auto and Diff better than Manual?
* SST2: A binary sentiment analysis task on movie reviews
Are Auto and Diff better than Manual?
* QNLI: A binary textual entailment task on question-answer pairs
Are Auto and Diff better than Manual?
* TWEETS-HATE-OFFENSIVE: A safety-critical multi-class hate/offensive speech detection task
Backdoored PLM
Backdoor attack performance
Visualise mask embedding
MNLI-MATCHED Auto
K = 16
K = 1000
K = 100
Visualise mask embedding
MNLI-MATCHED Differential
K = 16
K = 1000
K = 100
Visualise mask embedding
MNLI-MATCHED Manual
K = 16
K = 1000
K = 100