Want to create interactive content? It’s easy in Genially!

Get started free

Explain First, Trust Later: LLM-Augmented Explanations for Graph-Based Crypto Anomaly Detection

Adriana Watson

Created on September 27, 2025

Start designing with a free template

Discover more than 1500 professional designs like these:

Tech Presentation Mobile

Geniaflix Presentation

Vintage Mosaic Presentation

Shadow Presentation

Newspaper Presentation

Zen Presentation

Audio tutorial

Transcript

Explain First, Trust Later: LLM-Augmented Explanations for Graph-Based Crypto Anomaly Detection

Adriana Watson

$2.4 Million

$340 Million

$4 Billion

BitConnect

OneCoin

Forsage

Motivation

Carried out a $2.4 billion Ponzischeme.Was ultimately charged with wire fraud, operating anunlicensed money transmitting business, and conspiracy. Despite the $2.4 billion lost by users, only $17 million waspaid out to the victims of the company.

Defrauded investors out of $340 million.Was only charged with two counts of conspiracy to commit wire fraud. Atthe time of writing, the victims of the scheme have not been compensated

Marketed as a revolutionary cryptocurrency, but exposed as a $4 billion fraud. Ultimately turned out to be a MLM scheme with no real cryptocurrency behind it.Investors worldwide lost nearly everything.

CATCHING FRAUD CAN BE CHALLENGING

EVEN WHEN WE USE ML
  • Different types of fraud demonstrate different behaviors (e.g. a rug pull doesn't look the same as a Ponzi scheme from a data perspective).
  • Instances of fraud across domains are anomolies
    • In a standard cryptocurrency transaction dataset, most transactions are not fraudulent.
  • The methods used to catch fraudulent behavior need to be transparent to be effective.
    • Because we don't live in a post- Minority Report world

Graph Centric

Explainable

Intelligible

If the solution uses a black-box model, it must have an explainable component. This serves as evidence for how a particular decision was made by the model which is essential for any application with real-world implications (such as possible prosecution).

The output of the solution must be coherent for a nontechnical audience.

QUALIFICATIONS FOR A VALID SOLUTION

The fraud detection model must be trained on a graphical database so transaction and wallet metadata is maintained.

EXISTING RESEARCH GAP

INDUSTRIAL SOLUTIONS
ACADEMIC SOLUTIONS
  • Heavy reliance on black-box AI models → regulators and users cannot understand decisions.
  • High false positive rates due to class imbalance in anomaly detection.
  • Integration challenges with legacy financial systems.
  • Regulatory uncertainty limits adoption despite strong technical tools (e.g., Chainalysis, JPMorgan).
  • Many works adapt methods from credit card fraud detection, but not well-suited for blockchain’s graph-structured data.
  • Some use supervised/semi-supervised models, but lack scalability and real-world ground truth.
  • Graph-based methods exist, but explanations are limited or missing.
  • Limited attempts at combining GNN + XAI + LLM in a modular, regulator-ready way.
  • Early works that use explainability often stop at feature attribution, without translating results into human-readable narratives.Whatever you want!

EVALUATION PROTOCOL

MODULAR PIPELINE

03
01

Anomaly detection + explanation faithfulness + analyst utility

Transparent, modular fraud detection pipeline.

CONTRIBUTIONS

INTELLIGIBLE OUTPUT

PROMPTING STRATEGY

04
02

Lightweight dashboard and human comprehensible explenations.

Best-practices prompting strategy that includes feature importance, node values, and few shot prompting.

SYSTEM OVERVIEW

01

02

03

DATASET

GRAPH BUILDER

AD MODEL

ANOMALY DETECTION

- Elliptic++ - Bitcoin - 200k+ transactions - 800k+ wallets

- GNN (Anomaly GCN)- Unsupervised Learning - GNNs capture local subgraphs

- Linked nodes and edges to create one large graph dataset- Combines ~9 csv files to create a PyG Graph (networkx) and features tensor

xai explainer

  • XAI- Explainable AI: A relatively new ML method used to reveal information about how black box models make decisions.
  • XAI tools generally use permutation analysis to observe how model outputs change given some input.
  • This work uses GraphLIME (a derivative of the popular XAI package LIME- Local Interprable Model-Agnostic Explanations).
  • The trained GNN model is passed to GraphLIME, which builds an explainer for the GNN, the explainer is then used to "explain" individual nodes.

You are a financial crime analyst specializing in cryptocurrency fraud. A graph-based anomaly detection model has flagged the following wallet as suspicious. Your task is to analyze both: 1. The top features that *influenced the model's decision* (from GraphLIME), and 2. The actual transaction statistics of the wallet. **Note:** The feature importance scores do NOT reflect actual values - they only indicate how strongly each feature contributed to the anomaly detection. [Few Shot Prompting Examples (removed due to slide size contsraints)] --- Now analyze this real case: **Node ID**: {node_id} **Features that most influenced the anomaly model (importance scores only):** {formatted_weights} **Actual Node Values:** {formatted_data} --- Your tasks: 1. Explain the suspicious behavior based on these two views. 2. If appropriate, classify it using known crypto fraud types: {fraud_types} 3. If the behavior appears normal, say so explicitly.

LLM API INPUT STRUCTURE

Node ID: 1EQPoYt9DAnpTrAYjTBRCSD5bj5e1an4tFType: unknown Class Label: N/A Time Step: N/A Lifetime (blocks): 0.0 Top contributing features from GraphLIME: - degree: 9.941e-01 - btc_received_median: 9.941e-01 - btc_sent_total: 0.000e+00 Additional node statistics: - total_txs: 2.0 - btc_received_total: 5159.84 - btc_sent_total: 5159.84 - num_txs_as_sender: 1.0 - num_txs_as_receiver: 0.0 - btc_transacted_total: 10319.7 - fees_total: 0.0013 - degree: 5

LLM API INPUT VARIABLES

DASHBOARD EXAMPLE

DASHBOARD EXAMPLE

RESULTS: IMPACT OF PROMPT MODIFICATIONS

RESULTS: SAMPLE LLM OUTPUT

  1. Based on the feature importances and actual node values provided, the model flagged the wallet as suspicious primarily due to the high degree centrality and the high median amount of Bitcoin received. Additionally, the wallet has sent out the same amount of Bitcoin it received, which could indicate a quick turnover of funds. The low number of total transactions (only 2) and the fact that the wallet has only been a sender in one transaction also raise suspicions. This behavior could be indicative of a money laundering scheme where funds are quickly moved through the wallet to obfuscate their origin.
  2. This behavior aligns more closely with money laundering than any of the specific crypto fraud types listed. Money laundering involves the process of making illegally obtained funds appear legitimate by moving them through a complex sequence of banking transfers or commercial transactions. In this case, the high volume of funds being quickly moved through the wallet could be an attempt to disguise the origin of the funds.
  3. Given the information provided, the behavior does appear abnormal and warrants further investigation to determine the true nature of the transactions and the wallet's involvement in potentially illicit activities.

RESULTS: SANKEY DIAGRAM

01

02

03

VALIDATION

COMPUTATIONAL COST

FRAUD TYPE DEFINITIONS

Similar to other fraud detection and cyber attack research, datasets that include a ground truth are difficult to construct and thus challenging to find. Particularly in the application presented, a ground truth would necessitate positive confirmation of the fraudulent or non-fraudulent activity for every transaction to be truly viable.

LIMITATIONS

Since the XAI explainer and LLM must generate explanations at the individual node level, the pipeline has a higher computational cost, particularly at larger scales. This also limits the usability of the pipeline as there is no way to batch the XAI and LLM processes with the current pipeline.

The LLM prompt simply provides a list of cryptocurrency fraud types to the LLM rather than explicitly defining each type. The pipeline assumes that the LLM can distinguish between each fraud type, identify key behaviors of the fraud type, and connect them to the provided model outputs.

future work

  • Generating LLM insights for non-anomalous nodes could provide further clarity into regular versus irregular behavior.
  • Adding data from other blockchain transactions (the dataset used only included Bitcoin transactions) would add a layer of complexity and improve the range of applications.
  • Connecting the insights with a more RAG-like system to more carefully define fraud types would improve the LLM insights.
  • Conducting qualitative research into legislative buy in for ML enhanced pipelines would provide relevance to the work as well as direction in applying the work. Furthermore, real-world testing generally would help to evaluate the entire proposed pipeline.
  • The Sankey diagram produced an interesting depiction of the relationship between database features and distinct fraud types. This observation could be expanded to generate more static guidelines or decision trees to define the relationships between fraudulent node features and fraud types.

THANK YOU!