Myeloid Gene Expression Data After ICH (ICH-Seq)

Expression vs Time




Only display first sample taken per patient for each compartment

Longitudinal Gene Expression

CD14+ Monocyte/Macrophage Expression Levels

Neutrophil Expression Levels

Legend


Blood -

blue


Hematoma -

gold


HC -

brown

Expression Dot Plot of Subacute Stage




Gene Expression - Samples Collapsed by Patient

CD14+ Monocyte/Macrophage Expression Levels

Neutrophil Expression Levels

Legend


Blood -

blue


Hematoma -

gold


HC -

brown

Differential Gene Expression Analysis

CD14+ Monocytes/Macrophages

Neutrophils





Expression vs Time




Longitudinal Gene Expression by Patient Outcome

CD14+ Monocyte/Macrophage Expression Levels

(Hematoma Samples)

Neutrophil Expression Levels

(Hematoma Samples)

Legend


Patients with

good outcome -

blue


poor outcome -

red

CD14+ Monocyte/Macrophage Expression Levels

(Blood Samples)

Neutrophil Expression Levels

(Blood Samples)

Expression Dot Plot




Gene Expression - Samples Collapsed by Patient

CD14+ Monocyte/Macrophage Expression Levels

(Hematoma Samples)

Neutrophil Expression Levels

(Hematoma Samples)

Legend


Patients with

good outcome -

blue


poor outcome -

red

CD14+ Monocyte/Macrophage Expression Levels

(Blood Samples)

Neutrophil Expression Levels

(Blood Samples)

Differential Gene Expression Analysis - Patient Outcome

CD14+ Monocytes/Macrophages

Neutrophils









About

Data generated by ICH-seq (NINDS R01NS097728, PI: Lauren H. Sansing), an ancillary study to MISTIE III (NINDS 1U01NS080824, PI: Daniel F. Hanley).


Sequencing and analysis performed as a collaborative effort of Michael H. Askenase, Brittany A. Goods, J. Christopher Love, Alex K. Shalek, and Lauren H. Sansing.


Shiny web app designed by Victor S. Avram, Michael H. Askenase, and Lauren H. Sansing.


Manuscript describing initial data analysis: ---


Counts matrices are deposited on GEO and can be found



Questions and troubleshooting contacts:

Michael Askenase, michael.askenase@yale.edu

Victor Avram, victor.avram@yale.edu

Lauren Sansing, lauren.sansing@yale.edu

Study Design

The purpose of this study was to define distinct stages of myeloid responses in the peripheral blood and within the hematoma over time after ICH. Patient enrollment at approved study sites was performed according to the parent MISTIE III trial clinical trial (1). Hematoma effluent and peripheral blood samples were collected under the ICHseq sub-study of the parent MISTIE III trial until its conclusion. Samples from the first four enrolled patients were used to optimize cell isolation and sequencing techniques. All subsequent samples were included in the final dataset, yielding 82 blood monocyte samples, 57 hematoma macrophage samples, 76 blood neutrophil samples, and 49 hematoma neutrophil samples. 21 patients contributed CD14+ monocyte/macrophage samples that passed quality control filtering criteria (see Initial Data Processing), 17 patients contributed neutrophil samples that passed these criteria. Peripheral blood leukocytes from 5 healthy donors without ICH were collected in the same manner.

Whenever sufficient cell numbers allowed, technical replicates were produced by sorting additional pools of the same cells and generating cDNA libraries in parallel with all other samples. This resulted in technical replicates being included in the analysis for 19 blood monocyte profiles, 35 hematoma macrophage profiles, 10 blood neutrophil profiles, and 37 hematoma neutrophil profiles. Initial transcriptional analysis was performed blinded to tissue of origin, patient, and outcome. Outcome assessments were performed by blinded investigators as part of the parent MISTIE III trial, and then provided to the investigators of the ICHseq sub-study.

Sample Processing and Sequencing

Collection of hematoma effluent and peripheral blood patient samples

A thorough description of the clinical trial procedures is described in the primary report of the results of the MISTIE III trial (1). Briefly, a rigid cannula was placed using image guidance through a burr hole in the skull into the middle two thirds of the hematoma. Clot aspiration was performed with a 10 mL syringe until first resistance. This sample was collected into one or more acid citrate dextrose (ACD) blood collection tubes (BD Biosciences #364606); samples collected during surgery were marked as “Day 1” (Sample Metadata). A soft catheter was then placed into the hematoma to allow administration of rtPA (Alteplase, Genentech) to the hematoma and subsequent effluent drainage into collection vessels. Once per day until the clinical endpoint was reached (<15 mL of residual hematoma or 9 doses of rtPA), the previous 8 hours of drained effluent was collected into ACD tubes. For each hematoma effluent collection (during surgery and post-operation), a matched peripheral blood sample was collected into an ACD tube.

Storage and shipping of samples

Upon collection into ACD tubes, effluent and blood samples were stored at 4°C and shipped overnight at the earliest possible date in temperature-controlled shipping containers at 4°C (Therapak #56850). We have previously found that these conditions are optimal for preservation of transcriptional profiles in human peripheral blood leukocytes (2). Time from collection to processing ranged from approximately 12 hours to 72 hours; most samples were processed within 24 hours of collection. RNA quality metrics were stable in samples held for up to 72 hours before processing and no samples that were processed more than 72 hours after collection were included in analysis.

Cell cover fixation, magnetic separation, and FACS-sorting of samples

Up to 5 x 106 leukocytes isolated from each sample were used for downstream sorting. Cells were first lightly fixed in 1 mL of Cell Cover, a light preservative that maintains RNA integrity (Anacyte #800-250), on ice for 10 minutes, then centrifuged. CD3+ cells were removed from total cell suspensions by magnetic selection (StemCell #17851) according to manufacturer’s instructions. CD3- fractions were stained for flow cytometry as described above with CD45 (HI30, Tonbo Biosciences #50-0459-T100), CD11b (ICRF44, Tonbo #35-0118-T100), CD14 (M5E2, Biolegend #301820), CD16 (3G8, BD #560474), CD66a/c/e (ASL-32, Biolegend #342310), CD2, (RPA-2.10, Biolegend #300204), CD20 (2H7, Biolegend #302349), CD56 (HCD56, Biolegend #318320), and viability dye (Life Technologies, #L34972). Myeloid populations were sorted using a FACSaria II directly into RNA lysis buffer composed of 200 L RA1 buffer (Macherey-Nagel #RA1) freshly spiked with 2% TCEP (Thermofisher #77720).

RNA isolation & cDNA library generation

RNA-sequencing libraries were generated as previously described (2, 3). Briefly, RNA was extracted using the NucleoSpin RNA XS Kit (Macherey-Nagel #740902) according to the manufacturer’s instructions. Smart-Seq2 cDNA synthesis was performed as described by Picelli et al (3) with the following modifications: 1) input RNA was normalized prior to cDNA generation by diluting to ~1,000 cells per reaction, 2) reverse transcription was performed with Superscript III (ThermoFisher #18080-085) in place of Superscript II according to the manufacturer instructions. Paired-end sequencing libraries were prepared using the Nextera XT DNA sample Prep Kit (Illumina #FC-131) according to the manufacturer’s instructions. Libraries were pooled in an equimolar ratio and sequenced on a NextSeq500/550 sequencer (Illumina) using a 75 cycle v2 sequencing kit with a paired end read structure.

Sequencing & alignment

Libraries were pooled in an equimolar ratio and sequenced on a NextSeq500/550 sequencer (Illumina) using a 75 cycle v2 sequencing kit with a paired end read structure. Following sequencing, BAM files were converted to merged, demultiplexed FASTQs. Paired-end reads were mapped to the UCSC hg19 genome using STAR and RSEM.

Data Analysis

Initial data processing

Aligned expression matrices generated by STAR were first filtered based on read depth and captured gene depth to remove low quality samples. A minimum of 4 x 106 FASTQ fragments and a minimum of 9,000 (for CD14+ monocytes/macrophages) or 8,500 (for neutrophils) captured genes were used as inclusion criteria. Samples lacking patient metadata or occurring later than 171h post-ICH were excluded, removing two additional samples for each cell type. Genes with fewer than 1 count per million (CPM) in greater than 80% of the samples of a given cell type were considered unexpressed and filtered out prior to downstream analysis. Filtered counts matrices with technical replicates included were used as input for differential expression (see below). For data visualization, filtered counts from technical replicates were collapsed together using the “edgeR” pipeline prior to generation of log-FPKM gene expression matrices. Trimmed mean of M (TMM) values library size normalization (4) was used to account for differences in sequenced reads between samples. Normalized library sizes were used for frequency estimation in generation of log-FPKM matrices.

Differential gene expression analysis

Differential gene expression analysis was performed using generalized linear modeling with the limma package for R using the “limma-voom” methodology (5-7). For all analyses, peripheral blood cells and hematoma cells were analyzed together using a single model, with the origin tissue of the cells included as a parameter in the design matrix. For certain analyses, additional parameters and/or interaction terms were included in the linear model, as noted in the text and/or figure legends. Batch correction was not performed prior to differential expression; rather, empirically determined sample weights were used to down-weight lower quality samples using the “voomWithQualityWeights” command in the limma package. This decreased the weight of samples with higher proportions of ribosomal RNA. This approach was employed because patient samples were not randomly distributed across all batches; all samples from a particular patient were sequenced within a single batch in order to study within-patient changes over time. Batch correction would therefore have improperly reduced patient-to-patient variation prior to differential gene expression. Differentially expressed genes in all analyses were determined using the empirical Bayes method with a significance threshold set at a Benjamini-Hochberg (BH) adjusted p value of 0.05; no fold-change threshold was used. The conclusions of these analyses were robust against altering parameters for determining significance (for example introducing a fold-change threshold or lowering the p value threshold)

Spline regression and k-means clustering of gene expression changes over time

To determine genes changing significantly over time in hematoma and blood, a cubic spline basis with 2 degrees of freedom (1 knot) was formed to fit each gene’s expression level against time using the “splines” package in R. This spline basis was input as an interacting term with the tissue compartment in the design matrix of the linear model, so that a separate spline was fit to samples from the blood and hematoma compartments for each gene evaluated. Genes with significant fits to the hematoma regression spline (BH adjusted p < 0.05) were first filtered to remove genes with no expression in hematoma samples, then spline fits were clustered by k means to find modules of genes with shared patterns of change over time. Gene fits using splines with 2 or more knots were also assessed, but this approach did not provide superior results. The number of clusters was determined by the “elbow” of a total within clusters sum of squares plot. Optimal initial means for generating stable clusters were determined by iterating 1,000 initial starting values and selecting values that both gave clusters with high stability across all iterations as well and minimized the within cluster sum of squares distance. No batch correction was performed prior to spline regression or visualization.

For the graphical representations of the cubic spline regression models of changes over time in the hematoma, the y axis depicts the change from the gene expression in the earliest collected hematoma sample. Log2 gene expression levels of each gene in every hematoma sample were normalized to that observed in the earliest sample (37h post-ICH) by subtracting the gene expression value in the 37h sample from every sample.

Blood versus hematoma differential gene expression at acute stage

To determine differentially expressed genes between tissues during the acute stage of ICH, a separate linear model was built using the limma package for each cell type with tissue compartment as the only parameter. Only the first blood and first hematoma effluent sample from each patient was included in the analysis; these samples ranged from 23–99 hours post-ICH (Sample metadata). Technical replicates were included by estimating their correlation and blocking for them in the linear model according to the methods described in the limma user’s guide (7).

ICH patient versus healthy donor blood differential gene expression

Differential expression was performed using the same approach as above. Blood samples from 5 healthy donors were compared to the first blood sample from each patient post-ICH (Sample metadata).

Differential gene expression according to patient outcome

To determine differentially expressed genes according to patient utcome at 365 days post-ICH, modified Rankin scores (mRS) calculated by the MISTIE III clinical trial were dichotomized to either “good” (mRS 0-3), or “poor” (mRS 4-6) outcomes. Samples were included only from the sub-acute stage of the ICH response when gene expression was temporally stable (>96 hours post-ICH). 15 patients from our initial cohort of 21 contributed hematoma effluent samples during this time range, the other 6 patients in our cohort contributed samples only during the acute stage and were therefore excluded from this analysis. Differential expression analysis on outcome was adjusted for initial disease severity using a composite severity score defined by the MISTIE III trial (1). The severity score incorporated patient age, initial Glasgow coma score, pre-existing white matter disease and diabetes status, location of the hemorrhage, hemorrhage volume, and intraventricular volume. This analysis included repeated measures from the same patients; both technical replication and biological replication were accounted for in a two-step process. First, gene expression from technical replicates were collapsed together using the edgeR package, generating a single sample from each patient at a particular timepoint. Second, repeated measures from the same patient collected on different days >96 hours post-ICH were considered biological replicates because gene expression was static in this time range and accounted for by estimating their correlation and blocking for them in the linear model as a random effect.

References

1. D. F. Hanley, R. E. Thompson, M. Rosenblum, G. Yenokyan, K. Lane, N. McBee, S. W. Mayo, A. J. Bistran-Hall, D. Gandhi, W. A. Mould, N. Ullman, H. Ali, J. R. Carhuapoma, C. S. Kase, K. R. Lees, J. Dawson, A. Wilson, J. F. Betz, E. A. Sugar, Y. Hao, R. Avadhani, J.-L. Caron, M. R. Harrigan, A. P. Carlson, D. Bulters, D. LeDoux, J. Huang, C. Cobb, G. Gupta, R. Kitagawa, M. R. Chicoine, H. Patel, R. Dodd, P. J. Camarata, S. Wolfe, A. Stadnik, P. L. Money, P. Mitchell, R. Sarabia, S. Harnof, P. Barzo, A. Unterberg, J. S. Teitelbaum, W. Wang, C. S. Anderson, A. D. Mendelow, B. Gregson, S. Janis, P. Vespa, W. Ziai, M. Zuccarello, I. A. Awad, MISTIE III Investigators, Efficacy and safety of minimally invasive surgery with thrombolysis in intracerebral haemorrhage evacuation (MISTIE III): a randomised, controlled, open-label, blinded endpoint phase 3 trial, Lancet 393, 1021–1032 (2019).


2. B. A. Goods, J. M. Vahey, A. F. Steinschneider, M. H. Askenase, L. Sansing, J. Christopher Love, Blood handling and leukocyte isolation methods impact the global transcriptome of immune cells, BMC Immunol 19, 30 (2018).


3. S. Picelli, Å. K. Björklund, O. R. Faridani, S. Sagasser, G. Winberg, R. Sandberg, Smart-seq2 for sensitive full-length transcriptome profiling in single cells, Nat. Methods 10, 1096–1098 (2013).


4. M. D. Robinson, A. Oshlack, A scaling normalization method for differential expression analysis of RNA-seq data, Genome Biol 11, R25–9 (2010).


5. M. E. Ritchie, B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi, G. K. Smyth, limma powers differential expression analyses for RNA-sequencing and microarray studies, Nucleic Acids Res. 43, e47 (2015).


6. C. W. Law, Y. Chen, W. Shi, G. K. Smyth, voom: Precision weights unlock linear model analysis tools for RNA-seq read counts, 15, R29 (2014).


7. C. W. Law, M. Alhamdoosh, S. Su, G. K. Smyth, M. E. Ritchie, RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR, F1000Res 5, 1408–29 (2016).

Contact Information


Please reach out with any questions.

Michael Askenase, email: michael.askenase@yale.edu
Victor Avram, email: victor.avram@yale.edu
Lauren Sansing, email: lauren.sansing@yale.edu