Skip to content
Snippets Groups Projects
Yin Xi's avatar
Yin Xi authored
5bd54cc1

Radiomics Pipeline

This is a set of python and R routines for converting imaging and mask information to radiomic features and perform simple classification task.

there are three independent compartments in this repository:

  1. reading and storing ROIs -- input: folder containing DICOM data; -- output: 1. folder of imagins in NRRD format, 2. folder of ROIs in NRRD format, 3. a csv meta file that each row represents a unique ROI with coresponding image
  2. radiomics extraction -- input: a csv meta file that each row represents a unique ROI with coresponding image (may also need a parameter file for extraction configuration) -- output: a csv dataset with the patient ID and extracted radiomics features in subcequent columns
  3. analysis -- input: a vector of the dependent variable (binary), a matrix of independen variables (extracted radiomic features) -- output: list of two objects, 1. cross-validated predicted probability 2. selected features in descending order with selected %.

these compartments are designed to run independently. So if NRRD files are readily available, step 1 can be skipped. Compartment 3 can also be applied to any binary classification problem other than radiomics.

Details:

  1. reading and storing ROIs there are currently two ways of drawing ROIs that can be used for radiomics, 1. save ROIs into DICOM header via pyOsirix plug-in; 2. MINT (limited access). Two separate python routines were developed to deal with each of this situation. the end products are the same, a folder for the image (.NRRD) and a folder for masks (.NRRD) and a csv list of paths of each mask and corresponding image.

For extracting ROI from DICOM header, both classic and enhanced DICOM formats were considered. This code is well tested for classic DICOM but not extensively tested for enhanced DICOM. For DWI where multiple b values are in the same series, ROIs will be orders the same as the image.

  1. radiomics extraction this is just a wrapper for feature extraction via pyRdiomics. A parameter file may be needed.

  2. analysis

it has two modules, one for ROC analysis using LASSO logistics another is descriptive using heatmaps and volcano plot.

LASSO Logistics: This is just a wrapper for the cv.glmnet function from glmnet package in R for repeated nested-cross validation. Currently support only binary dependent variable, i.e. logistic regression. variable selection is done via LASSO (glmnet package). Selection of hyperparameter, lambda, for LASSO is based on minimizing cross-validated AUC (in the case of small sample size and/or small number of events, deviance metric may be used instead). Some example code for subsequent anlyses include drawing ROC curve, calculating AUC and displaying top slected features.

heatmaps and volcano plot: direct implementation of heatmaply with built-in clustering and everything.