Dataset


Dataset to use for homework and project

For both homework and project, we will use MIMIC-III Critical Care Database. This page describes information about the dataset and procedures to obtain the dataset.

About MIMIC-III

MIMIC-III is a large, openly-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012.

Among the types of data included are:

  1. General - Patient demographics, hospital admissions & discharge dates, room tracking, death dates (in or out of the hospital), ICD-9 codes, unique code for health care provider and type (RN, MD, RT, etc). All dates are surrogate dates because of privacy issues, but time intervals (even those between multiple admissions of the same patient) are preserved.
  2. Physiological - Hourly vital sign metrics, SAPS, SOFA, ventilator settings, etc.
  3. Medications - IV meds, provider order entry data, etc.
  4. Lab Tests - Chemistry, hematology, ABGs, imaging, etc.
  5. Fluid Balance - Intake (solutions, blood, etc) and output (urine, estimated blood loss, etc).
  6. Notes & Reports - Discharge summary, nursing progress notes, etc; cardiac catheterization, ECG, radiology, and echo reports.

MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors:

  1. it is publicly and freely available
  2. it encompasses a diverse and very large population of ICU patients
  3. it contains high temporal resolution data including lab results, electronic documentation, and bedside monitor trends and waveforms.

CITI Training

Gatech policy requires that all personnel involved with human subjects research must pass a training course before doing so. This requirement encompasses all types of interactions with human subjects, including the analysis of data. To complete the human subjects training:

  1. Go to https://www.citiprogram.org/
  2. Login via SSO (Single Sign On). SSO will allows to login using your Georgia Tech username and password
  3. Select Georgia Institute of Technology as the authentication provider
  4. Once logged in, under Georgia Institute of Technology courses, click on "Add Course or Update Learner Groups"
  5. Now you will have three main courses to select. You will check the box for "Human Subjects Research"
  6. Click next, then you will select the radio button "NO, I have NOT completed the basic course"
  7. Now, you will see three learner groups. You are required to complete Group 1 and Group 2. Let us start with Group 1 (select Group 1) and click next
  8. Good Clinical Practice is not required so select "N/A", then click next
  9. Health Information Privacy and Security (HIPS) is required, click "CITI Health Information Privacy and Security (HIPS) for Biomedical Research Investigators"
  10. Select "RCR for engineering"
  11. Now under Georgia Tech courses you will have "Group 1 Biomedical research Investigators and Key Personnel" listed as incomplete. You will have to go through every tutorial in that course and complete a quiz for each.
  12. Once you completed and passed Group 1, repeats the steps above to complete Group 2 (Social / Behavioral Research Investigators and Key Personnel)

Request MIMIC Access

During this course, we will be working with the MIMIC database. MIMIC, although de-identified, still contains detailed information regarding the clinical care of patients, and must be treated with appropriate care and respect. In order to obtain access, it is necessary to:

  1. Create a PhysioNet account.

  2. Request MIMIC access.

    • Login using your account to request access
    • Review the data use agreement and select "I agree"
    • Fill the form and upload all your certifications. Some informaiton you may need

      • Reference category: Supervisor
      • Reference's name: Jimeng Sun
      • Reference's telephone number: 404.894.0482
      • Reference's email address: jsun@cc.gatech.edu
      • Reference's title: PI
      • General research area for which the data will be used: CSE8803 Big Data Analytics for Healthcare, Spring 2016
    • Submit the form