Unveiling the Power of Multimodal RAG

Ryan Siegler - Data Scientist

KDB.AI Vector Database Introduction to RAG What is Multimodal data? Multimodal RAG

Retrieval
Generation

Demo

Agenda

KDB.AI

The vector database that enables the most relevant temporal and semantic search to power language models and anomaly detection at scale.

Start Your 90 Day Evaluation

Learn More

Start for Free

Integrated with Azure ML and OpenAI for developers who require turnkey technology stacks to speed up the process of building and deploying AI applications.

Evaluate large scale generative AI applications on-premises or on your own cloud provider.

Single container deployment
Scale to your requirements
Customize to your dev environment

KDB.AI on Azure ML

KDB.AI Server

Experiment with smaller generative AI projects with a vector database in our cloud.

4 GB memory per instance
30 GB data storage
Get started quickly with sample projects

KDB.AI Cloud

Get Started with Flexible Options

Architecture Walkthrough

Ingest source data
Embed source data using embedding model
Vector Embeddings are stored in KDB.AI vector database
Query the database to find relevant vectors

Understanding Vector Embeddings

Architecture Walkthrough

Retrieval Augmented Generation (RAG) enables LLMs to work on your own data. Two Key Steps:

Retrieval
Generation

Vector databases are the center of the retrieval pipeline Retrieve the most relevant data from the vector db using vector similarity search Metadata filtering on this search could improve the speed and the relevance of the Top-K retrieved results

Retrieval Augmented Generation (RAG)

Multimodal Data

Data Representation

Goal is to represent different data modalities in a unified manner within a vector database. Embed all data types in a shared vector space Use a single multimodal embedding model

Multimodal Embedding Model

Unified Text Embedding

Transform all data into text format Text embedding model used to embed text, image summaries, and audio transcriptions

Multimodal Embedding Unified Text

Architecture Walkthrough

Unified Text

Summarize Images & Tables
Use text embedding model
Store text embeddings in vector database

Multimodal Embedding

Embedding model can embed multiple modalities together
Store embeddings in vector database
Limitation: Few available models

Retrieval

Either method can be used to retrieve embeddings
Pass retrieved data to LLM to perform RAG

Multimodal Retrieval Methods

User Query and Retrieved Data is passed to LLM for Generation
Generation: What type of LLM should I use?

Multimodal LLM: Can take multiple data types as inputs
Text based LLM: Pass only text-based data like text chunks, summaries, and transcriptions to the LLM

RAG - Generation

THANK YOU

rsiegler@kx.com

Unveiling the Power of Multimodal RAG - Dark Theme

More creations to inspire you

ASTL

ENGLISH IRREGULAR VERBS

VISUAL COMMUNICATION AND STORYTELLING

GROWTH MINDSET

BLENDED LEARNING

INTRO INNOVATE

SUMMER ZINE 2018

Transcript