For both homework and project, we will use MIMIC-III Critical Care Database. This page describes information about the dataset and procedures to obtain the dataset.
MIMIC-III is a large, openly-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012.
Among the types of data included are:
MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors:
During this course, we will be working with the MIMIC database. MIMIC, although de-identified, still contains detailed information regarding the clinical care of patients, and must be treated with appropriate care and respect.
You must finish CITI training first to get MIMIC access.Do NOT request access individually.
We will collect all student information and send a batch request to MIT, after which you'll be notified and send the access request.
Throughout the training exercises on this site we will use a small sample data set. If you followed the instructions documented on the environment setup page to set up your environment, you will find the sample data in the
/bigdata-bootcamp/data folder in the virtual environment.
There are two data files with names
control.csv respectively. For the purpose of these exercises we will define patients who developed heart failure (HF) at some time point as case patients, and those who didn't develop HF as control patients.
Each line of the sample data file consists of a tuple structured as
(patient-id, event-id, timestamp, value), below are a few lines as an example:
020E860BD31CAC69,DRUG36987254604,968,30.0 020E860BD31CAC69,DRUG64158080642,974,30.0 020E860BD31CAC69,DRUG00440128228,976,60.0 020E860BD31CAC69,DIAG486,907,1.0 020E860BD31CAC69,DIAG7863,907,1.0 020E860BD31CAC69,DIAGV5866,907,1.0 020E860BD31CAC69,DIAG3659,907,1.0 020E860BD31CAC69,DIAGRG199,907,1.0 020E860BD31CAC69,PAYMENT,907,15000.0 020E860BD31CAC69,heartfailure,956,1.0
patient-idis just a patient identifier (id) in order to differentiate records from different patients. For example, the portion of data we show above is all about the same patient, who has an id of
event-idencodes all the clinical events that a patient has had. For example,
DRUG00440128228indicates that the patient was taking a drug identified by a National Drug Code of
00440128228. The numbers in
DIAG486are the first 3 digits of an ICD9 code, which in this case is the code for Pneumonia. For this data an event-id of
PAYMENTmeans that the patient made a payment with the corresponding dollar amount.
timestampindicates the date at which the event on that row happened. Here the timestamp is not formatted as a real date but rather as an offset from an unspecified start point. This is done both to improve the simplicity of processing and to protect the privacy of the patients' data.
valueis the associated value for an event. See the below table for a detailed description data in the value field.
|event type||sample ||value meaning||example|
|diagnostic code||DIAG486||Will always be ||1.0|
|drug consumption||DRUG00440128228||Dosage of the drug||30|
|payment||PAYMENT||Amount of payment made on ||15000|
|heartfailure||heartfailure||Indicator of heart failure event||1.0|