* denotes equal contribution and joint lead authorship.
  1. Development & Validation of Optimal Phenomapping Methods to Estimate Long-term Atherosclerotic Cardiovascular Disease Risk in Patients with Type 2 Diabetes.
    Matthew W. Segar, KV Patel, M Vaduganathan, MC Caughey, BC Jaeger, M Basit, D Willett, J Butler, PP Sengupta, TJ Wang, DK McGuire, and A Pandey

    In Diabetologia 2021.

  2. Development and Validation of Machine Learning-Based Race-Specific Models to Predict 10-Year Risk of Heart Failure: A Multi-Cohort Analysis.
    Matthew W. Segar, B Jaeger, KV Patel, V Nambi, CE Ndumele, A Correa, J Butler, A Chandra, C Ayers, S Rao, A Lewis, LM Raffield, CJ Rodriguez, ED Michos, CM Ballantyne, ME Hall, RJ Mentz, JA Lemos, and A Pandey

    In Circulation 2021.

  3. Deep-Learning Models of Diastolic Dysfunction: Development and Validation in NHLBI-Funded Heart Failure Clinical Trials.
    A Pandey, N Kagiyama, N Yanamala, Matthew W. Segar, JS Cho, M Tokodo, and Partho Sengupta

    In Press 2021.



  1. Association of Long-Term Change and Variability in Glycemia with Risk of Incident Heart Failure among Patients with Type 2 Diabetes Mellitus – A Secondary Analysis of the ACCORD Trial.
    Matthew W. Segar*, Kershaw V. Patel*, Muthiah Vaduganathan, Melissa C. Caughey, Javed Butler, Gregg C. Fonarow, Justin L. Grodin, Darren K. McGuire, and Ambarish Pandey

    In Diabetes Care 2020.

  2. Predictors and Prognostic Implications of Visit-to-Visit Variability in Kidney Function and Serum Electrolytes among Patients with Heart Failure with Preserved Ejection Fraction.
    Matthew W. Segar*, Ravi B. Patel*, Kershaw V. Patel, Marat Fudim, Adam D. DeVore, Pieter Martens, Susan Hedayati, Justin L. Grodin, W.H. Wilson Tang, and Ambarish Pandey

    In JAMA Cardiology 2020.

  3. County-level Phenomapping to Identify Disparities in Cardiovascular Outcomes: An Unsupervised Clustering Analysis.
    Matthew W. Segar*, Shreya Rao*, Ann Marie Navar, Erin Michos, Alana Lewis, Adolfo Correa, Mario Sims, Amit Khera, Amy E. Hughes, and Ambarish Pandey

    In American Journal of Preventaive Cardiology 2020.

  4. Genetic West African Ancestry, Blood Pressure Response to Therapy, and Cardiovascular Risk among African Americans in the Systolic Blood Pressure Reduction Intervention Trial (SPRINT).
    Shreya Rao*, Matthew W. Segar*, Adam P. Bress, Pankaj Arora, Wanpen Vongpatanasin, Vijay Agusala, Utibe Essien, Adolfo Correa, Alana Morris, James Lemos, and Ambarish Pandey

    In JAMA Cardiology 2020.

  5. Association of Medicaid Expansion with Rates of Utilization of Cardiovascular Therapies among Medicaid Beneficiaries in 2011 and 2018.
    Andrew Sumarsono, Hussain Lalani, Matthew W. Segar, Shreya Rao, Muthiah Vaduganathan, Rishi Wadhera, Sandeep Das, Ann Marie Navar, Gregg C. Fonarow, and Ambarish Pandey

    In Circulation: Cardiovascular Quality and Outcomes 2020.

  6. Gender Based Differences in Outcomes among Resuscitated Patients with Out-of-Hospital Cardiac Arrest.
    Purav Mody, Ambarish Pandey, Arthur S. Slutsky, Matthew W. Segar, Alex Kiss, Paul Dorian, Janet Parsons, Damon C. Scales, Valeria E. Rac, Sheldon Cheskes, Arlene S. Bierman, Beth L. Abramson, Sara Gray, Rob. A. Fowler, Katie N. Dainty, Ahamed H. Idris, and Laurie Morrison

    In Circulation 2020.



  1. Machine Learning to Predict the Risk of Incident Heart Failure Hospitalization Among Patients With Diabetes: The WATCH-DM Risk Score.
    Matthew W. Segar, Muthiah Vaduganathan, Kershaw V. Patel, Darren K. McGuire, Javed Butler, Gregg C. Fonarow, Mujeeb Basit, Vaishnavi Kannan, Justin L. Grodin, Brendan Everett, Duwayne Willett, Jarett Berry, and Ambarish Pandey

    In Diabetes Care 2019.

    OBJECTIVE To develop and validate a novel, machine learning–derived model to predict the risk of heart failure (HF) among patients with type 2 diabetes mellitus (T2DM). RESEARCH DESIGN AND METHODS Using data from 8,756 patients free at baseline of HF, with \textless10% missing data, and enrolled in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) trial, we used random survival forest (RSF) methods, a nonparametric decision tree machine learning approach, to identify predictors of incident HF. The RSF model was externally validated in a cohort of individuals with T2DM using the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT). RESULTS Over a median follow-up of 4.9 years, 319 patients (3.6%) developed incident HF. The RSF models demonstrated better discrimination than the best performing Cox-based method (C-index 0.77 [95% CI 0.75–0.80] vs. 0.73 [0.70–0.76] respectively) and had acceptable calibration (Hosmer-Lemeshow statistic \chi2 = 9.63, P = 0.29) in the internal validation data set. From the identified predictors, an integer-based risk score for 5-year HF incidence was created: the WATCH-DM (Weight [BMI], Age, hyperTension, Creatinine, HDL-C, Diabetes control [fasting plasma glucose], QRS Duration, MI, and CABG) risk score. Each 1-unit increment in the risk score was associated with a 24% higher relative risk of HF within 5 years. The cumulative 5-year incidence of HF increased in a graded fashion from 1.1% in quintile 1 (WATCH-DM score ≤7) to 17.4% in quintile 5 (WATCH-DM score ≥14). In the external validation cohort, the RSF-based risk prediction model and the WATCH-DM risk score performed well with good discrimination (C-index = 0.74 and 0.70, respectively), acceptable calibration (P ≥0.20 for both), and broad risk stratification (5-year HF risk range from 2.5 to 18.7% across quintiles 1–5). CONCLUSIONS We developed and validated a novel, machine learning–derived risk score that integrates readily available clinical, laboratory, and electrocardiographic variables to predict the risk of HF among outpatients with T2DM.
  2. Phenomapping of patients with heart failure with preserved ejection fraction using machine learning‐based unsupervised cluster analysis.
    Matthew W. Segar, Kershaw V. Patel, Colby Ayers, Mujeeb Basit, W.H. Wilson Tang, Duwayne Willett, Jarett Berry, Justin L. Grodin, and Ambarish Pandey

    In European Journal of Heart Failure 2019.

    Aim To identify distinct phenotypic subgroups in a highly‐dimensional, mixed‐data cohort of individuals with heart failure (HF) with preserved ejection fraction (HFpEF) using unsupervised clustering analysis. Methods and results The study included all Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist (TOPCAT) participants from the Americas (n = 1767). In the subset of participants with available echocardiographic data (derivation cohort, n = 654), we characterized three mutually exclusive phenogroups of HFpEF participants using penalized finite mixture model‐based clustering analysis on 61 mixed‐data phenotypic variables. Phenogroup 1 had higher burden of co‐morbidities, natriuretic peptides, and abnormalities in left ventricular structure and function; phenogroup 2 had lower prevalence of cardiovascular and non‐cardiac co‐morbidities but higher burden of diastolic dysfunction; and phenogroup 3 had lower natriuretic peptide levels, intermediate co‐morbidity burden, and the most favourable diastolic function profile. In adjusted Cox models, participants in phenogroup 1 (vs. phenogroup 3) had significantly higher risk for all adverse clinical events including the primary composite endpoint, all‐cause mortality, and HF hospitalization. Phenogroup 2 (vs. phenogroup 3) was significantly associated with higher risk of HF hospitalization but a lower risk of atherosclerotic event (myocardial infarction, stroke, or cardiovascular death), and comparable risk of mortality. Similar patterns of association were also observed in the non‐echocardiographic TOPCAT cohort (internal validation cohort, n = 1113) and an external cohort of patients with HFpEF [Phosphodiesterase‐5 Inhibition to Improve Clinical Status and Exercise Capacity in Heart Failure with Preserved Ejection Fraction (RELAX) trial cohort, n = 198], with the highest risk of adverse outcome noted in phenogroup 1 participants. Conclusions Machine learning‐based cluster analysis can identify phenogroups of patients with HFpEF with distinct clinical characteristics and long‐term outcomes.
  3. Generalizability and Implications of the H 2 FPEF Score in a Cohort of Patients With Heart Failure With Preserved Ejection Fraction.
    Matthew W. Segar, Kershaw V. Patel, Jarett D. Berry, Justin L. Grodin, and Ambarish Pandey

    In Circulation 2019.



  1. Transmembrane protein 88 (TMEM88) promoter hypomethylation is associated with cisplatin resistance in ovarian cancer.
    M.C. De Leon, H. Cardenas, R. Emerson, and D. Matei

    In Gynecologic Oncology 2016.

  2. Rifampin Regulation of Drug Transporters Gene Expression and the Association of MicroRNAs in Human Hepatocytes.
    Eric A. Benson, Michael T. Eadon, Zeruesenay Desta, Yunlong Liu, Hai Lin, Kimberly S. Burgess, Matthew W. Segar, Andrea Gaedigk, and Todd C. Skaar

    In Frontiers in Pharmacology 2016.

    Membrane drug transporters contribute to the disposition of many drugs. In human liver, drug transport is controlled by two main superfamilies of transporters, the solute carrier transporters (SLC) and the ATP Binding Cassette transporters (ABC). Altered expression of these transporters due to drug-drug interactions can contribute to differences in drug exposure and possibly effect. In this study, we determined the effect of rifampin on gene expression of hundreds of membrane transporters along with all clinically relevant drug transporters.


  1. Age-Related Changes in MicroRNA Expression and Pharmacogenes in Human Liver.
    KS Burgess, S Philips, EA Benson, Z Desta, A Gaedigk, R Gaedigk, MW Segar, Y Liu, and TC Skaar

    In Clinical Pharmacology & Therapeutics 2015.

  2. MMBIRFinder: a tool to detect microhomology-mediated break-induced replication.
    Matthew W Segar, Cynthia J Sakofsky, Anna Malkova, and Yunlong Liu

    In IEEE/ACM Transactions on Computational Biology and Bioinformatics 2015.

    The introduction of next-generation sequencing technologies has radically changed the way we view structural genetic events. Microhomology-mediated break-induced replication (MMBIR) is just one of the many mechanisms that can cause genomic destabilization that may lead to cancer. Although the mechanism for MMBIR remains unclear, it has been shown that MMBIR is typically associated with template-switching events. Currently, to our knowledge, there is no existing bioinformatics tool to detect these template-switching events. We have developed MMBIRFinder, a method that detects template-switching events associated with MMBIR from whole-genome sequenced data. MMBIRFinder uses a half-read alignment approach to identify potential regions of interest. Clustering of these potential regions helps narrow the search space to regions with strong evidence. Subsequent local alignments identify the templateswitching events with single-nucleotide accuracy. Using simulated data, MMBIRFinder identified 83% of the MMBIR regions within a 5 nucleotide tolerance. Using real data, MMBIRFinder identified 16 MMBIR regions on a normal breast tissue data sample and 51 MMBIR regions on a triple-negative breast cancer tumor sample resulting in detection of 37 novel template-switching events. Finally, we identified template-switching events residing in the promoter region of 7 genes that have been implicated in breast cancer. The program is freely available for download at


  1. Epigenetic Targeting of Ovarian Cancer Stem Cells..
    Kenneth P Nephew, Yinu Wang, Horacio Cardenas, Fang Fang, Salvatore Condello, Pietro Taverna, Matthew Segar, Yunlong Liu, and Daniela Matei

    In Cancer Research 2014.

    Emerging results indicate that cancer stem-like cells contribute to chemoresistance and poor clinical outcomes in many cancers, including ovarian cancer (OC). As epigenetic regulators play a major role in the control of normal stem cell differentiation, epigenetics may offer a useful arena to develop strategies to target cancer stem-like cells. Epigenetic aberrations, especially DNA methylation, silence tumor suppressor and differentiation-associated genes that regulate the survival of ovarian cancer stem-like cell (OCSC). In this study, we tested the hypothesis that DNA hypomethylating agents may be able to reset OCSC towards a differentiated phenotype, by evaluating the effects of the new DNA methytransferase inhibitor SGI-110 on OCSC phenotype, as defined by expression of the cancer stem-like marker aldehyde dehydrogenase (ALDH). We demonstrated that ALDH+ OC cells possess multiple stem cell characteristics, were highly chemoresistant, and were enriched in xenografts residual after platinum therapy. Low dose SGI-110 reduced the stem-like properties of ALDH+ cells, including their tumor initiating capacity, resensitized these OCSCs to platinum, and induced re-expression of differentiation-associated genes. Maintenance treatment with SGI-110 after carboplatin inhibited OCSC growth, causing global tumor hypomethylation and decreased tumor progression. Our work offers preclinical evidence that epigenome-targeting strategies have the potential to delay tumor progression by re-programming residual cancer stem-like cells. Further, the results suggest that SGI-110 might be administered in combination with platinum to prevent the development of recurrent and chemoresistant ovarian cancer.
  2. Spectral probabilities of top-down tandem mass spectra.
    Xiaowen Liu, Matthew W Segar, Shuai Cheng Li, and Sangtae Kim

    In BMC Genomics 2014.

    BACKGROUND: In mass spectrometry-based proteomics, the statistical significance of a peptide-spectrum or protein-spectrum match is an important indicator of the correctness of the peptide or protein identification. In bottom-up mass spectrometry, probabilistic models, such as the generating function method, have been successfully applied to compute the statistical significance of peptide-spectrum matches for short peptides containing no post-translational modifications. As top-down mass spectrometry, which often identifies intact proteins with post-translational modifications, becomes available in many laboratories, the estimation of statistical significance of top-down protein identification results has come into great demand.\backslashn\backslashnRESULTS: In this paper, we study an extended generating function method for accurately computing the statistical significance of protein-spectrum matches with post-translational modifications. Experiments show that the extended generating function method achieves high accuracy in computing spectral probabilities and false discovery rates.\backslashn\backslashnCONCLUSIONS: The extended generating function method is a non-trivial extension of the generating function method for bottom-up mass spectrometry. It can be used to choose the correct protein-spectrum match from several candidate protein-spectrum matches for a spectrum, as well as separate correct protein-spectrum matches from incorrect ones identified from a large number of tandem mass spectra.
  3. TGF- βinduces global changes in DNA methylation during the epithelial-to-mesenchymal transition in ovarian cancer cells.
    Horacio Cardenas, Edyta Vieth, Jiyoon Lee, Matthew Segar, Yunlong Liu, P Kenneth, and Daniela Matei

    In Epigenetics 2014.


  1. Utilization of Probabilistic Models in Short Read Assembly from Second-Generation Sequencing.
    Matthew Segar.

    In Honor’s Theses 2012.