Explain First, Trust Later: LLM-Augmented Explanations for Graph-Based Crypto Anomaly Detection
Adriana Watson
$2.4 Million
$340 Million
$4 Billion
BitConnect
OneCoin
Forsage
Motivation
Carried out a $2.4 billion Ponzischeme.Was ultimately charged with wire fraud, operating anunlicensed money transmitting business, and conspiracy. Despite the $2.4 billion lost by users, only $17 million waspaid out to the victims of the company.
Defrauded investors out of $340 million.Was only charged
with two counts of conspiracy to commit wire fraud. Atthe time of writing, the victims of the scheme have not been compensated
Marketed as a revolutionary cryptocurrency, but exposed as a $4 billion fraud. Ultimately turned out to be a MLM scheme with no real cryptocurrency behind it.Investors worldwide lost nearly everything.
CATCHING FRAUD CAN BE CHALLENGING
EVEN WHEN WE USE ML
- Different types of fraud demonstrate different behaviors (e.g. a rug pull doesn't look the same as a Ponzi scheme from a data perspective).
- Instances of fraud across domains are anomolies
- In a standard cryptocurrency transaction dataset, most transactions are not fraudulent.
- The methods used to catch fraudulent behavior need to be transparent to be effective.
- Because we don't live in a post- Minority Report world
Graph Centric
Explainable
Intelligible
If the solution uses a black-box model, it must have an
explainable component. This serves as evidence for how
a particular decision was made by the model which is
essential for any application with real-world implications
(such as possible prosecution).
The output of the solution must be coherent for a nontechnical audience.
QUALIFICATIONS FOR A VALID SOLUTION
The fraud detection model must be trained on a graphical
database so transaction and wallet metadata is maintained.
EXISTING RESEARCH GAP
INDUSTRIAL SOLUTIONS
ACADEMIC SOLUTIONS
- Heavy reliance on black-box AI models → regulators and users cannot understand decisions.
- High false positive rates due to class imbalance in anomaly detection.
- Integration challenges with legacy financial systems.
- Regulatory uncertainty limits adoption despite strong technical tools (e.g., Chainalysis, JPMorgan).
- Many works adapt methods from credit card fraud detection, but not well-suited for blockchain’s graph-structured data.
- Some use supervised/semi-supervised models, but lack scalability and real-world ground truth.
- Graph-based methods exist, but explanations are limited or missing.
- Limited attempts at combining GNN + XAI + LLM in a modular, regulator-ready way.
- Early works that use explainability often stop at feature attribution, without translating results into human-readable narratives.Whatever you want!
EVALUATION PROTOCOL
MODULAR PIPELINE
03
01
Anomaly detection + explanation faithfulness + analyst utility
Transparent, modular fraud detection pipeline.
CONTRIBUTIONS
INTELLIGIBLE OUTPUT
PROMPTING STRATEGY
04
02
Lightweight dashboard and human comprehensible explenations.
Best-practices prompting strategy that includes feature importance, node values, and few shot prompting.
SYSTEM OVERVIEW
01
02
03
DATASET
GRAPH BUILDER
AD MODEL
ANOMALY DETECTION
- Elliptic++ - Bitcoin - 200k+ transactions - 800k+ wallets
- GNN (Anomaly GCN)- Unsupervised Learning - GNNs capture local subgraphs
- Linked nodes and edges to create one large graph dataset- Combines ~9 csv files to create a PyG Graph (networkx) and features tensor
xai explainer
- XAI- Explainable AI: A relatively new ML method used to reveal information about how black box models make decisions.
- XAI tools generally use permutation analysis to observe how model outputs change given some input.
- This work uses GraphLIME (a derivative of the popular XAI package LIME- Local Interprable Model-Agnostic Explanations).
- The trained GNN model is passed to GraphLIME, which builds an explainer for the GNN, the explainer is then used to "explain" individual nodes.
You are a financial crime analyst specializing in cryptocurrency fraud. A graph-based anomaly detection model has flagged the following wallet as suspicious. Your task is to analyze both: 1. The top features that *influenced the model's decision* (from GraphLIME), and 2. The actual transaction statistics of the wallet. **Note:** The feature importance scores do NOT reflect actual values - they only indicate how strongly each feature contributed to the anomaly detection. [Few Shot Prompting Examples (removed due to slide size contsraints)] --- Now analyze this real case: **Node ID**: {node_id} **Features that most influenced the anomaly model (importance scores only):** {formatted_weights} **Actual Node Values:** {formatted_data} --- Your tasks: 1. Explain the suspicious behavior based on these two views. 2. If appropriate, classify it using known crypto fraud types: {fraud_types} 3. If the behavior appears normal, say so explicitly.
LLM API INPUT STRUCTURE
Node ID: 1EQPoYt9DAnpTrAYjTBRCSD5bj5e1an4tFType: unknown Class Label: N/A Time Step: N/A Lifetime (blocks): 0.0 Top contributing features from GraphLIME: - degree: 9.941e-01 - btc_received_median: 9.941e-01 - btc_sent_total: 0.000e+00 Additional node statistics: - total_txs: 2.0 - btc_received_total: 5159.84 - btc_sent_total: 5159.84 - num_txs_as_sender: 1.0 - num_txs_as_receiver: 0.0 - btc_transacted_total: 10319.7 - fees_total: 0.0013 - degree: 5
LLM API INPUT VARIABLES
DASHBOARD EXAMPLE
DASHBOARD EXAMPLE
RESULTS: IMPACT OF PROMPT MODIFICATIONS
RESULTS: SAMPLE LLM OUTPUT
- Based on the feature importances and actual node values provided, the model flagged the wallet as suspicious primarily due to the high degree centrality and the high median amount of Bitcoin received. Additionally, the wallet has sent out the same amount of Bitcoin it received, which could indicate a quick turnover of funds. The low number of total transactions (only 2) and the fact that the wallet has only been a sender in one transaction also raise suspicions. This behavior could be indicative of a money laundering scheme where funds are quickly moved through the wallet to obfuscate their origin.
- This behavior aligns more closely with money laundering than any of the specific crypto fraud types listed. Money laundering involves the process of making illegally obtained funds appear legitimate by moving them through a complex sequence of banking transfers or commercial transactions. In this case, the high volume of funds being quickly moved through the wallet could be an attempt to disguise the origin of the funds.
- Given the information provided, the behavior does appear abnormal and warrants further investigation to determine the true nature of the transactions and the wallet's involvement in potentially illicit activities.
RESULTS: SANKEY DIAGRAM
01
02
03
VALIDATION
COMPUTATIONAL COST
FRAUD TYPE DEFINITIONS
Similar to other fraud detection and cyber attack research, datasets that include a ground truth are difficult to construct and thus challenging to find. Particularly in the application presented, a ground truth would necessitate positive confirmation of the fraudulent or non-fraudulent activity for every transaction to be truly viable.
LIMITATIONS
Since the XAI explainer and LLM must generate explanations at the individual node level, the pipeline has a higher computational cost, particularly at larger scales. This also limits the usability of the pipeline as there is no way to batch the XAI and LLM processes with the current pipeline.
The LLM prompt simply provides a list of cryptocurrency fraud types to the LLM rather than explicitly defining each type. The pipeline assumes that the LLM can distinguish between each fraud type, identify key behaviors of the fraud type, and connect them to the provided model outputs.
future work
- Generating LLM insights for non-anomalous nodes could provide further clarity into regular versus irregular behavior.
- Adding data from other blockchain transactions (the dataset used only included Bitcoin transactions) would add a layer of complexity and improve the range of applications.
- Connecting the insights with a more RAG-like system to more carefully define fraud types would improve the LLM insights.
- Conducting qualitative research into legislative buy in for ML enhanced pipelines would provide relevance to the work as well as direction in applying the work. Furthermore, real-world testing generally would help to evaluate the entire proposed pipeline.
- The Sankey diagram produced an interesting depiction of the relationship between database features and distinct fraud types. This observation could be expanded to generate more static guidelines or decision trees to define the relationships between fraudulent node features and fraud types.
THANK YOU!
Explain First, Trust Later: LLM-Augmented Explanations for Graph-Based Crypto Anomaly Detection
Adriana Watson
Created on September 27, 2025
Start designing with a free template
Discover more than 1500 professional designs like these:
View
Tech Presentation Mobile
View
Geniaflix Presentation
View
Vintage Mosaic Presentation
View
Shadow Presentation
View
Newspaper Presentation
View
Zen Presentation
View
Audio tutorial
Explore all templates
Transcript
Explain First, Trust Later: LLM-Augmented Explanations for Graph-Based Crypto Anomaly Detection
Adriana Watson
$2.4 Million
$340 Million
$4 Billion
BitConnect
OneCoin
Forsage
Motivation
Carried out a $2.4 billion Ponzischeme.Was ultimately charged with wire fraud, operating anunlicensed money transmitting business, and conspiracy. Despite the $2.4 billion lost by users, only $17 million waspaid out to the victims of the company.
Defrauded investors out of $340 million.Was only charged with two counts of conspiracy to commit wire fraud. Atthe time of writing, the victims of the scheme have not been compensated
Marketed as a revolutionary cryptocurrency, but exposed as a $4 billion fraud. Ultimately turned out to be a MLM scheme with no real cryptocurrency behind it.Investors worldwide lost nearly everything.
CATCHING FRAUD CAN BE CHALLENGING
EVEN WHEN WE USE ML
Graph Centric
Explainable
Intelligible
If the solution uses a black-box model, it must have an explainable component. This serves as evidence for how a particular decision was made by the model which is essential for any application with real-world implications (such as possible prosecution).
The output of the solution must be coherent for a nontechnical audience.
QUALIFICATIONS FOR A VALID SOLUTION
The fraud detection model must be trained on a graphical database so transaction and wallet metadata is maintained.
EXISTING RESEARCH GAP
INDUSTRIAL SOLUTIONS
ACADEMIC SOLUTIONS
EVALUATION PROTOCOL
MODULAR PIPELINE
03
01
Anomaly detection + explanation faithfulness + analyst utility
Transparent, modular fraud detection pipeline.
CONTRIBUTIONS
INTELLIGIBLE OUTPUT
PROMPTING STRATEGY
04
02
Lightweight dashboard and human comprehensible explenations.
Best-practices prompting strategy that includes feature importance, node values, and few shot prompting.
SYSTEM OVERVIEW
01
02
03
DATASET
GRAPH BUILDER
AD MODEL
ANOMALY DETECTION
- Elliptic++ - Bitcoin - 200k+ transactions - 800k+ wallets
- GNN (Anomaly GCN)- Unsupervised Learning - GNNs capture local subgraphs
- Linked nodes and edges to create one large graph dataset- Combines ~9 csv files to create a PyG Graph (networkx) and features tensor
xai explainer
You are a financial crime analyst specializing in cryptocurrency fraud. A graph-based anomaly detection model has flagged the following wallet as suspicious. Your task is to analyze both: 1. The top features that *influenced the model's decision* (from GraphLIME), and 2. The actual transaction statistics of the wallet. **Note:** The feature importance scores do NOT reflect actual values - they only indicate how strongly each feature contributed to the anomaly detection. [Few Shot Prompting Examples (removed due to slide size contsraints)] --- Now analyze this real case: **Node ID**: {node_id} **Features that most influenced the anomaly model (importance scores only):** {formatted_weights} **Actual Node Values:** {formatted_data} --- Your tasks: 1. Explain the suspicious behavior based on these two views. 2. If appropriate, classify it using known crypto fraud types: {fraud_types} 3. If the behavior appears normal, say so explicitly.
LLM API INPUT STRUCTURE
Node ID: 1EQPoYt9DAnpTrAYjTBRCSD5bj5e1an4tFType: unknown Class Label: N/A Time Step: N/A Lifetime (blocks): 0.0 Top contributing features from GraphLIME: - degree: 9.941e-01 - btc_received_median: 9.941e-01 - btc_sent_total: 0.000e+00 Additional node statistics: - total_txs: 2.0 - btc_received_total: 5159.84 - btc_sent_total: 5159.84 - num_txs_as_sender: 1.0 - num_txs_as_receiver: 0.0 - btc_transacted_total: 10319.7 - fees_total: 0.0013 - degree: 5
LLM API INPUT VARIABLES
DASHBOARD EXAMPLE
DASHBOARD EXAMPLE
RESULTS: IMPACT OF PROMPT MODIFICATIONS
RESULTS: SAMPLE LLM OUTPUT
RESULTS: SANKEY DIAGRAM
01
02
03
VALIDATION
COMPUTATIONAL COST
FRAUD TYPE DEFINITIONS
Similar to other fraud detection and cyber attack research, datasets that include a ground truth are difficult to construct and thus challenging to find. Particularly in the application presented, a ground truth would necessitate positive confirmation of the fraudulent or non-fraudulent activity for every transaction to be truly viable.
LIMITATIONS
Since the XAI explainer and LLM must generate explanations at the individual node level, the pipeline has a higher computational cost, particularly at larger scales. This also limits the usability of the pipeline as there is no way to batch the XAI and LLM processes with the current pipeline.
The LLM prompt simply provides a list of cryptocurrency fraud types to the LLM rather than explicitly defining each type. The pipeline assumes that the LLM can distinguish between each fraud type, identify key behaviors of the fraud type, and connect them to the provided model outputs.
future work
THANK YOU!