Want to create interactive content? It’s easy in Genially!

Get started free

[Slide] midpoint report

Kyra Zhou

Created on January 20, 2023

Start designing with a free template

Discover more than 1500 professional designs like these:

Geniaflix Presentation

Vintage Mosaic Presentation

Shadow Presentation

Newspaper Presentation

Zen Presentation

Audio tutorial

Pechakucha Presentation

Transcript

Backdoor attacks on NLP prompting

Mid-point report

@ kyraz

Backdoor attacks on NLP prompting

Mid-point report

@ kyraz

Backdoor attacks on NLP prompting

Mid-point report

@ kyraz

Before NLP prompting ...

Pre-trained language model

Downstream task

  • A big neural network
  • Pre-trained on large corpus like Wikipedia
  • Guess next word or sentence
  • e.g., BERT, RoBERTa
  • End-user task
  • Application of NLP
  • e.g., sentiment analysis on movie reviews, hate speech detection model

Before NLP prompting ...

Pre-trained language model

Downstream task

Fine-tuning (add an extra neural network layer)

  • A big neural network
  • Pre-trained on large corpus like Wikipedia
  • Guess next word or sentence
  • e.g., BERT, RoBERTa
  • End-user task
  • Application of NLP
  • e.g., sentiment analysis on movie reviews, hate speech detection model

Before NLP prompting ...

Pre-trained language model

Downstream task

Fine-tuning (add an extra neural network layer)

Problem lack of labelled datasets

Prompt-based learning

Manual, Auto, differential prompts

Manual Discrete prompt

Automated Discrete prompt

Automated Differential prompt

  • Efficient
  • Only allow discrete words
  • Lack interpretability
  • Intuitive, easy to understand
  • Time-consuming
  • Sub-optimal
  • Continuous space
  • Interpretable
  • Flexible control

Are Auto and Diff better than Manual?

* SST2: A binary sentiment analysis task on movie reviews
* QNLI: A binary textual entailment task on question-answer pairs

Backdoor attack

Assumptions:

  • Attackers have access to the pre-trained language model (PLM)
  • Attackers do not know the particular downstream task
  • A successful attack preserves a high class discrimination score, but once the trigger is inserted, gives a high misclassified proportion of samples

Backdoor attack performance

MNLI-MATCHED

Poison triggers:["cf", "mn", "bb", "qt", "pt", "mt"]

Auto
Differential
Manual

Progress so far ...

PART 1 - Manual, Auto, Differential promptingPART 2 - Backdoor attacks

  • What's the next step?
  • Could a research project be ... (e.g., implementation-heavy)?
  • What's your biggest takeaway?
  • When is the latest time your supervisors replied to your email/messages?
... ...

If you don't have any questions... Here are some questions you may ask :)

Any Questions?

Appendix

Backdoor attack

Assumptions:

  • Attackers have access to the pre-trained language model (PLM)
  • Attackers do not know the particular downstream task
  • A successful attack preserves a high class discrimination score, but once the trigger is inserted, gives a high misclassified proportion of samples

Backdoor attack performance

MNLI-MATCHED

Poison triggers:["cf", "mn", "bb", "qt", "pt", "mt"]

Why Auto prompting performs badly?

_____
________
___
___
___
_____
_______
_____
___
____
____
____
___
____
___
_______
__

Manual, Auto, differential prompts

Manual Discrete prompt

Automated Discrete prompt

  • Less time-consuming
  • Only allow discrete words
  • Lack interpretability
  • Intuitive, easy to understand
  • Time-consuming
  • Sub-optimal

Manual, Auto, differential prompts

Manual Discrete prompt

  • Intuitive, easy to understand
  • Time-consuming
  • Sub-optimal

Differential prompting

Auto prompting

Auto prompting - verbaliser

Are Auto and Diff better than Manual?

* SST2: A binary sentiment analysis task on movie reviews

Are Auto and Diff better than Manual?

* QNLI: A binary textual entailment task on question-answer pairs

Are Auto and Diff better than Manual?

* TWEETS-HATE-OFFENSIVE: A safety-critical multi-class hate/offensive speech detection task

Backdoored PLM

Backdoor attack performance

Visualise mask embedding

MNLI-MATCHED Auto
K = 16
K = 1000
K = 100

Visualise mask embedding

MNLI-MATCHED Differential
K = 16
K = 1000
K = 100

Visualise mask embedding

MNLI-MATCHED Manual
K = 16
K = 1000
K = 100