If you ARE WORKING WITH DATA INFRASTRUCTURE IN THE CLOUD
D1
GET DATA
Capture and characterize data across various modalities
C1
INGEST Data
D2
STANDARDIZE DATA
Process and load data to facilitate standard assessment
How To FIND An SOP on This page
Harmonize data to standard formats to enable downstream integration
Assess Data & Provide Feedback
C2
D3
LINK DATA
Evaluate quality and completeness of the extract
Connect patient and encounter information between modalities
data sources
C3
D4
Approve & MERGE
DEIDENTIFY & QC DATA
Once data fulfills CHoRUS requirements, merge with other approved extracts
Remove sensitive information as per site-specific DUA's, perform QC
Structured EHR
D5
Flowsheet
Create & Submit Extract
Consolidate and submit data to the central cloud environment
TO ANALYTICS ENCLAVE
Free TeXT
D6
imaging
Improve data
Iteratively update D1-D5 to improve quality and completeness
progress
WAVEFORM
office hours
Discussions
Standards
Standards
Standards
If you ARE A DATA GENERATING SITE
Data acq.
Data acq.
Data acq.
Tooling
Tooling
Assess data & Provide Feedback
Motivation
This task refers to two steps. The frist is a standard process of evaluating data extracts for their quality and fitness for use in the broader data enclave. The second is providing the data generating sites with feedback about the extracts that they delivered.
Resources
S.O.P.
OFFICE HOURS
Codebase
Deidentify and perform quality control
Motivation
This task refers to the process of verifying the identify-ability, plausibility, completeness, and conformance of the dataset. Here, we will use established open-source tools (e.g. Achilles, DQD, Ares) to execute a series of validated checks and then produce an extract (i.e. AresIndex) that can be visualized and compared with other OMOP instances with regard to its richness, quality, and diversity. It is this extract that data contributing sites will be required to submit to the central MGH cloud instance for evaluation and feedback.
Resources
S.O.P.
OFFICE HOURS
Codebase
Imaging DATA
DESCRIPTION
[TYPES OF IMAGING DATA]
Resources
S.O.P.
OFFICE HOURS
Codebase
WAVEFORM DATA
DESCRIPTION
[TYPES OF WAVEFORM DATA]
Resources
Codebase
OFFICE HOURS
S.O.P.
Linking Data modalities
Motivation
This task refers to connecting data from diverse modes together in a way that enables the selection and characterization of patient cohorts using a data mode(s) of choice. For example, a cohort could be selected based upon (1) diagnoses registered in a patient's EHR, (2) measurement values recorded in flowsheet data, (3) complications outlined in a discharge report, (4) artifacts identified in a CT image, or (5) artifacts extracted from a waveform signal. Once created, this dataset would contain all data modes available for the associated cohort, any of which could be used in downstream analyses.
Resources
S.O.P.
Codebase
OFFICE HOURS
Collecting and characterizing data
Motivation
This task refers to using open-source tooling like WhiteRabbit and other internal data analysis methods to investigate and understand the data available to each data contributing site. For relational EHR data, this typically requires running a database scan or producing metadata about the contents of relevant tables. For non-relational data, characterizations will likely focus on identifying quantity (storage space, number of files, etc) and diversity (unique codes, ontology structures, etc.) of data and an overview of the metadata available that will require mapping in subsequent stages.
Resources
S.O.P.
OFFICE HOURS
Codebase
- Data Collection - Flowsheet
- Data Colletion - Free Text
- Data Colletion - Waveform
create and submit Data extract to Central Cloud
Motivation
This task refers to two steps. First, placing data in the organizational structure defined by the CHoRUS DataAcquisition team. Thus far, the convention is to create per-person directories, each with three sub-directories (OMOP, Image, Waveform). This structure is subject to change depending on results of preliminary ingestion processes in the central cloud instance. Second, the process of sharing the organized data extract with the central Azure instance hosted by MGH. The first download will use:
- Azure Data Share Please contact Alex Ruiz (ruiz.alex@microsoft.com) for assistance.
- AZ CLI Please contact Heidi Schmidt (hschmidt@mgb.org) for assistance.
The second and all future downloads will use:
Resources
OFFICE HOURS
S.O.P.
Codebase
- Data Extraction - Flowsheet
- Data Extraction - Imaging
- Data Extraction - Free text
- Data Extraction - Waveform
FLOWSHEET Data
DESCRIPTION
[TYPES OF FLOWSHEET DATA]
Resources
S.O.P.
Codebase
OFFICE HOURS
- Capture and Sharing of Unmapped Terms
Standardizing Data Elements
Motivation
This task refers refers to making connections between source representations of medical events or concepts (e.g. EPIC procedural code referring to an appendectomy) and standard representations of those elements (e.g. ICDPCS Procedure for appendectomy). Through the DelPhi process, we have generated a prioritized list of medical concepts that are relevant for the downstream analyses proposed in Bridge2AI.
Resources
OFFICE HOURS
S.O.P.
Codebase
OTHER
- Standardization - Flowsheet
- Capture and Sharing of Unmapped Terms
- Standardization - Free text
- Standardization - Imaging
- Standardization - Waveform
HOW TO FIND THE Sop you need
If you are a data generating site, there are two ways to access SOPs using this graphic (same information organized differently):
- Click on the plus sign (+) attached to any of the steps labeled D1 thru D6. Each option has more details about the motivation and resources available (SOPs, Office Hour session, and Codebase)
- Click on the plus sign (+) attached to any of the Data Sources (bottom left corner of graphic). Each option share same information described above (option 1).
SOPs that have been completed appear as a link and will take you to the final (or provisional) approved version on GitHub when selected.
Review feedback and improve quality
Motivation
This task refers to evaluating the feedback provided by the central cloud team and revising any elements that need attention.
Resources
S.O.P.
OFFICE HOURS
Codebase
- Data Improvement - Flowsheet
- Data Improvement - Free text
- Data Improvement - Imaging
- Data Improvement - Waveform
FREE-TEXT NOTES
DESCRIPTION
[TYPES OF FREE-TEXT DATA]
Resources
OFFICE HOURS
Codebase
S.O.P.
Ingest data extract at central cloud
Motivation
This task refers to ingesting csv files into a staging database and executing processing steps like date shifting and quality checks.
Thie first download will use:
- Azure Data Share Please contact Alex Ruiz (ruiz.alex@microsoft.com) for assistance.
- AZ CLI Please contact Heidi Schmidt (hschmidt@mgb.org) for assistance.
The second and all future downloads will use:
Resources
S.O.P.
OFFICE HOURS
Codebase
Structured EHR Data
DESCRIPTION
[TYPES OF STRUCTURED EHR DATA]
Resources
S.O.P.
OFFICE HOURS
Codebase
Approve Extract and merge with others
Motivation
This task refers to defining and evaluating quality thresholds necessary for approval, and once an extract meets those extracts, to execute a merge process to link those data with other approved extracts while retaining relationality.
Resources
S.O.P.
OFFICE HOURS
Codebase
CHoRUS Flow Chart
Jared H
Created on March 19, 2024
Start designing with a free template
Discover more than 1500 professional designs like these:
View
Timeline Diagram
View
Timeline Diagram 3
View
Timeline Diagram 4
View
Timeline Diagram 2
View
Triangle Diagram 3
View
Color Shapes SWOT
View
Lean Business Canvas
Explore all templates
Transcript
If you ARE WORKING WITH DATA INFRASTRUCTURE IN THE CLOUD
D1
GET DATA
Capture and characterize data across various modalities
C1
INGEST Data
D2
STANDARDIZE DATA
Process and load data to facilitate standard assessment
How To FIND An SOP on This page
Harmonize data to standard formats to enable downstream integration
Assess Data & Provide Feedback
C2
D3
LINK DATA
Evaluate quality and completeness of the extract
Connect patient and encounter information between modalities
data sources
C3
D4
Approve & MERGE
DEIDENTIFY & QC DATA
Once data fulfills CHoRUS requirements, merge with other approved extracts
Remove sensitive information as per site-specific DUA's, perform QC
Structured EHR
D5
Flowsheet
Create & Submit Extract
Consolidate and submit data to the central cloud environment
TO ANALYTICS ENCLAVE
Free TeXT
D6
imaging
Improve data
Iteratively update D1-D5 to improve quality and completeness
progress
WAVEFORM
office hours
Discussions
Standards
Standards
Standards
If you ARE A DATA GENERATING SITE
Data acq.
Data acq.
Data acq.
Tooling
Tooling
Assess data & Provide Feedback
Motivation
This task refers to two steps. The frist is a standard process of evaluating data extracts for their quality and fitness for use in the broader data enclave. The second is providing the data generating sites with feedback about the extracts that they delivered.
Resources
S.O.P.
OFFICE HOURS
Codebase
Deidentify and perform quality control
Motivation
This task refers to the process of verifying the identify-ability, plausibility, completeness, and conformance of the dataset. Here, we will use established open-source tools (e.g. Achilles, DQD, Ares) to execute a series of validated checks and then produce an extract (i.e. AresIndex) that can be visualized and compared with other OMOP instances with regard to its richness, quality, and diversity. It is this extract that data contributing sites will be required to submit to the central MGH cloud instance for evaluation and feedback.
Resources
S.O.P.
OFFICE HOURS
Codebase
Imaging DATA
DESCRIPTION
[TYPES OF IMAGING DATA]
Resources
S.O.P.
OFFICE HOURS
Codebase
WAVEFORM DATA
DESCRIPTION
[TYPES OF WAVEFORM DATA]
Resources
Codebase
OFFICE HOURS
S.O.P.
Linking Data modalities
Motivation
This task refers to connecting data from diverse modes together in a way that enables the selection and characterization of patient cohorts using a data mode(s) of choice. For example, a cohort could be selected based upon (1) diagnoses registered in a patient's EHR, (2) measurement values recorded in flowsheet data, (3) complications outlined in a discharge report, (4) artifacts identified in a CT image, or (5) artifacts extracted from a waveform signal. Once created, this dataset would contain all data modes available for the associated cohort, any of which could be used in downstream analyses.
Resources
S.O.P.
Codebase
OFFICE HOURS
Collecting and characterizing data
Motivation
This task refers to using open-source tooling like WhiteRabbit and other internal data analysis methods to investigate and understand the data available to each data contributing site. For relational EHR data, this typically requires running a database scan or producing metadata about the contents of relevant tables. For non-relational data, characterizations will likely focus on identifying quantity (storage space, number of files, etc) and diversity (unique codes, ontology structures, etc.) of data and an overview of the metadata available that will require mapping in subsequent stages.
Resources
S.O.P.
OFFICE HOURS
Codebase
create and submit Data extract to Central Cloud
Motivation
This task refers to two steps. First, placing data in the organizational structure defined by the CHoRUS DataAcquisition team. Thus far, the convention is to create per-person directories, each with three sub-directories (OMOP, Image, Waveform). This structure is subject to change depending on results of preliminary ingestion processes in the central cloud instance. Second, the process of sharing the organized data extract with the central Azure instance hosted by MGH. The first download will use:
The second and all future downloads will use:
Resources
OFFICE HOURS
S.O.P.
Codebase
FLOWSHEET Data
DESCRIPTION
[TYPES OF FLOWSHEET DATA]
Resources
S.O.P.
Codebase
OFFICE HOURS
Standardizing Data Elements
Motivation
This task refers refers to making connections between source representations of medical events or concepts (e.g. EPIC procedural code referring to an appendectomy) and standard representations of those elements (e.g. ICDPCS Procedure for appendectomy). Through the DelPhi process, we have generated a prioritized list of medical concepts that are relevant for the downstream analyses proposed in Bridge2AI.
Resources
OFFICE HOURS
S.O.P.
Codebase
OTHER
HOW TO FIND THE Sop you need
If you are a data generating site, there are two ways to access SOPs using this graphic (same information organized differently):
- Click on the plus sign (+) attached to any of the steps labeled D1 thru D6. Each option has more details about the motivation and resources available (SOPs, Office Hour session, and Codebase)
- Click on the plus sign (+) attached to any of the Data Sources (bottom left corner of graphic). Each option share same information described above (option 1).
SOPs that have been completed appear as a link and will take you to the final (or provisional) approved version on GitHub when selected.Review feedback and improve quality
Motivation
This task refers to evaluating the feedback provided by the central cloud team and revising any elements that need attention.
Resources
S.O.P.
OFFICE HOURS
Codebase
FREE-TEXT NOTES
DESCRIPTION
[TYPES OF FREE-TEXT DATA]
Resources
OFFICE HOURS
Codebase
S.O.P.
Ingest data extract at central cloud
Motivation
This task refers to ingesting csv files into a staging database and executing processing steps like date shifting and quality checks.
Thie first download will use:
The second and all future downloads will use:
Resources
S.O.P.
OFFICE HOURS
Codebase
Structured EHR Data
DESCRIPTION
[TYPES OF STRUCTURED EHR DATA]
Resources
S.O.P.
OFFICE HOURS
Codebase
Approve Extract and merge with others
Motivation
This task refers to defining and evaluating quality thresholds necessary for approval, and once an extract meets those extracts, to execute a merge process to link those data with other approved extracts while retaining relationality.
Resources
S.O.P.
OFFICE HOURS
Codebase