Want to create interactive content? It’s easy in Genially!

Get started free

CHoRUS Flow Chart

Jared H

Created on March 19, 2024

Start designing with a free template

Discover more than 1500 professional designs like these:

Transcript

How To FIND An SOP on This page
Data acq.
Standards
Tooling
Data acq.
Standards
Tooling
Data acq.
Standards
progress
office hours
Discussions
WAVEFORM
imaging
Free TeXT
Flowsheet
Structured EHR
data sources
TO ANALYTICS ENCLAVE

If you ARE WORKING WITH DATA INFRASTRUCTURE IN THE CLOUD

If you ARE A DATA GENERATING SITE

C3

Approve & MERGE

Once data fulfills CHoRUS requirements, merge with other approved extracts

C1

INGEST Data

Process and load data to facilitate standard assessment

C2

Assess Data & Provide Feedback

Evaluate quality and completeness of the extract

D6

Improve data

Iteratively update D1-D5 to improve quality and completeness

D1

GET DATA

Capture and characterize data across various modalities

D2

STANDARDIZE DATA

Harmonize data to standard formats to enable downstream integration

D3

LINK DATA

Connect patient and encounter information between modalities

D4

DEIDENTIFY & QC DATA

Remove sensitive information as per site-specific DUA's, perform QC

D5

Create & Submit Extract

Consolidate and submit data to the central cloud environment

  • CHoRUS Reports
  • ICU Module: DQD
  • TO BE ADDED
Codebase
OFFICE HOURS
  • TO BE ADDED
S.O.P.
Resources

Assess data & Provide Feedback

Motivation

This task refers to two steps. The frist is a standard process of evaluating data extracts for their quality and fitness for use in the broader data enclave. The second is providing the data generating sites with feedback about the extracts that they delivered.

    • Central QC Reports
    • Local QC
  • OHDSI Ares
  • OHDSI AresIndexer
  • Contributing to Ares
  • Ares Overview
  • DQD Overview
  • Quality Control
  • OHDSI DQD
  • DQD Output
Codebase
  • OHDSI Achilles
OFFICE HOURS
  • Achilles Output
S.O.P.
  • Deidentification
Resources

Deidentify and perform quality control

Motivation

This task refers to the process of verifying the identify-ability, plausibility, completeness, and conformance of the dataset. Here, we will use established open-source tools (e.g. Achilles, DQD, Ares) to execute a series of validated checks and then produce an extract (i.e. AresIndex) that can be visualized and compared with other OMOP instances with regard to its richness, quality, and diversity. It is this extract that data contributing sites will be required to submit to the central MGH cloud instance for evaluation and feedback.

    • Local QC
    • Central QC Reports
  • Data Improvement
  • Data Extraction
  • Quality Control
  • Data Deidentification
  • Data Linkage
  • Data Standardization
  • To be added
  • Data Collection
Codebase
OFFICE HOURS
  • To be added
S.O.P.
Resources

Imaging DATA

DESCRIPTION

[TYPES OF IMAGING DATA]

    • File Format
  • Data Improvement
    • Local QC
    • Central QC Reports
  • Data Extraction
  • Quality Control
  • Data Deidentification
  • Data Linkage
  • Data Standardization
  • To be added
  • Data Collection
Codebase
OFFICE HOURS
  • To be added
S.O.P.
Resources

WAVEFORM DATA

DESCRIPTION

[TYPES OF WAVEFORM DATA]

  • Data Linkage - Waveform
  • Data Linkage - Imaging
  • Data Linkage - Free text
  • Data Linkage - Flowsheet
  • Waveform Parsing
  • Private Tags
Codebase
  • Image Parsing
OFFICE HOURS
  • Imaging Modalities
S.O.P.
  • Data Linkage - EHR
Resources

Linking Data modalities

Motivation

This task refers to connecting data from diverse modes together in a way that enables the selection and characterization of patient cohorts using a data mode(s) of choice. For example, a cohort could be selected based upon (1) diagnoses registered in a patient's EHR, (2) measurement values recorded in flowsheet data, (3) complications outlined in a discharge report, (4) artifacts identified in a CT image, or (5) artifacts extracted from a waveform signal. Once created, this dataset would contain all data modes available for the associated cohort, any of which could be used in downstream analyses.

  • Data Colletion - Waveform
  • Data Colletion - Imaging
  • Data Colletion - Free Text
  • Data Collection - Flowsheet
Codebase
  • White Rabbit
OFFICE HOURS
  • White Rabbit
S.O.P.
  • Data Collection - EHR
Resources

Collecting and characterizing data

Motivation

This task refers to using open-source tooling like WhiteRabbit and other internal data analysis methods to investigate and understand the data available to each data contributing site. For relational EHR data, this typically requires running a database scan or producing metadata about the contents of relevant tables. For non-relational data, characterizations will likely focus on identifying quantity (storage space, number of files, etc) and diversity (unique codes, ontology structures, etc.) of data and an overview of the metadata available that will require mapping in subsequent stages.

  • AZ CLI Please contact Heidi Schmidt (hschmidt@mgb.org) for assistance.
  • Data Upload tool

The second and all future downloads will use:

  • Azure Data Share Please contact Alex Ruiz (ruiz.alex@microsoft.com) for assistance.
    • Data Upload
  • Data Extraction - Waveform
  • Data Extraction - Imaging
  • Data Extraction - Free text
  • Data Extraction - Flowsheet
  • MIMIC Waveform
Codebase
  • MIMIC Images
OFFICE HOURS
  • To be added
S.O.P.
  • Data Extraction - EHR
Resources

create and submit Data extract to Central Cloud

Motivation

This task refers to two steps. First, placing data in the organizational structure defined by the CHoRUS DataAcquisition team. Thus far, the convention is to create per-person directories, each with three sub-directories (OMOP, Image, Waveform). This structure is subject to change depending on results of preliminary ingestion processes in the central cloud instance. Second, the process of sharing the organized data extract with the central Azure instance hosted by MGH. The first download will use:

    • Capture and Sharing of Unmapped Terms
    • Local QC
    • Central QC Reports
  • Data Improvement
  • Data Extraction
  • Quality Control
  • Data Deidentification
  • Data Linkage
  • Data Standardization
  • To be added
  • Data Collection
Codebase
OFFICE HOURS
  • To be added
S.O.P.
Resources

FLOWSHEET Data

DESCRIPTION

[TYPES OF FLOWSHEET DATA]

    • Waveform File Format
  • Standardization - Flowsheet
  • Standardization - Waveform
  • Standardization - Imaging
    • Capture and Sharing of Unmapped Terms
  • Standardization - Free text
  • Athena Search
  • Delphi Disc.
  • Sharing Disc.
  • Workload Disc.
  • Delphi Mappings
OTHER
  • OMOP Vocab pt 2
  • OMOP Vocab pt 1
  • Usagi & STCM
  • Flowsheets pt 3
  • Flowsheets pt 2
  • Flowsheets pt 1
  • Vocab Gaps
  • Map Validation pt 2
  • Map Validation pt 1
  • Mapping 101
Codebase
  • OHDSI USAGI
OFFICE HOURS
  • Delphi MIMIC
S.O.P.
  • Standardization - EHR
Resources

Standardizing Data Elements

Motivation

This task refers refers to making connections between source representations of medical events or concepts (e.g. EPIC procedural code referring to an appendectomy) and standard representations of those elements (e.g. ICDPCS Procedure for appendectomy). Through the DelPhi process, we have generated a prioritized list of medical concepts that are relevant for the downstream analyses proposed in Bridge2AI.

HOW TO FIND THE Sop you need

If you are a data generating site, there are two ways to access SOPs using this graphic (same information organized differently):

  1. Click on the plus sign (+) attached to any of the steps labeled D1 thru D6. Each option has more details about the motivation and resources available (SOPs, Office Hour session, and Codebase)
  2. Click on the plus sign (+) attached to any of the Data Sources (bottom left corner of graphic). Each option share same information described above (option 1).
SOPs that have been completed appear as a link and will take you to the final (or provisional) approved version on GitHub when selected.

  • Data Improvement - Waveform
  • Data Improvement - Imaging
  • Data Improvement - Free text
  • Data Improvement - Flowsheet
  • To be added
  • Data Improvement - EHR
Codebase
OFFICE HOURS
  • To be added
S.O.P.
Resources

Review feedback and improve quality

Motivation

This task refers to evaluating the feedback provided by the central cloud team and revising any elements that need attention.

    • Local QC
    • Central QC Reports
  • Data Improvement
  • Data Extraction
  • Quality Control
  • Data Deidentification
  • Data Linkage
  • Data Standardization
  • To be added
  • Data Collection
Codebase
OFFICE HOURS
  • To be added
S.O.P.
Resources

FREE-TEXT NOTES

DESCRIPTION

[TYPES OF FREE-TEXT DATA]

Thie first download will use:

  • Data Upload tool

The second and all future downloads will use:

  • AZ CLI Please contact Heidi Schmidt (hschmidt@mgb.org) for assistance.
  • Azure Data Share Please contact Alex Ruiz (ruiz.alex@microsoft.com) for assistance.
  • Ingestion ETL
  • Data Upload
Codebase
OFFICE HOURS
  • TO BE ADDED
S.O.P.
Resources

Ingest data extract at central cloud

Motivation

This task refers to ingesting csv files into a staging database and executing processing steps like date shifting and quality checks.

    • Data Upload
    • Central QC Reports
    • Local QC
  • Quality Control
  • Data Improvement
  • Data Extraction
  • Data Deidentification
  • Data Linkage
  • Data Standardization
  • To be added
  • Data Collection
Codebase
OFFICE HOURS
  • To be added
S.O.P.
Resources

Structured EHR Data

DESCRIPTION

[TYPES OF STRUCTURED EHR DATA]

  • MERGE ETL
  • TO BE ADDED
Codebase
OFFICE HOURS
  • TO BE ADDED
S.O.P.
Resources

Approve Extract and merge with others

Motivation

This task refers to defining and evaluating quality thresholds necessary for approval, and once an extract meets those extracts, to execute a merge process to link those data with other approved extracts while retaining relationality.