Relationships between genome-wide R-loop distribution and classes of recurrent DNA breaks in neural stem/progenitor cells – Scientific Reports

Genome-wide mapping of R-loops in NSPCs

Several classes of recurrent DSBs occur in mouse and human NSPCs, including DSB breakpoint clusters in long, transcribed and late-replicating genes and around transcriptional start sites4,6,7,8,9. To elucidate potential mechanistic factors involved in the formation of the different classes of recurrent DSBs, we assessed the genomic features of regions surrounding breakpoint junctions in NSPCs. We noted that the promoter regions—defined as regions ± 2 kb of the TSS—of active genes with HTGTS breakpoint junctions6,7,9 showed a significantly higher content of guanine/cytosine (GC) nucleotides (Fig. 1A). This prompted us to consider the role of R-loops in the formation of DSBs in NSPCs, given that regions with high G density in the non-template strand are prone to R-loop formation16. Moreover, R-loop formation has been implicated as a cause of genomic fragility in a subset of long human genes12, suggesting a potential involvement in the formation of recurrent DSB clusters in long, neural genes (RDC-genes)4,6,7,8.

Figure 1
figure 1

Elucidation of R-loops in neural stem/progenitor cells. (A) Top, illustration of genes with DNA breakpoints in the promoter region, defined as the two kilobase (kb) region surrounding the transcription start site (TSS). Triangles illustrate HTGTS breakpoint junctions. Bottom, box-and-whiskers plot showing fractional GC content in promoter regions of all NCBI37/mm9 RefSeq genes (n = 22,735) and actively transcribed genes with HTGTS junctions within two kb of the TSS in NSPCs (n = 2332). Whiskers show minimum and maximum values; upper and lower box edges correspond to the 25th and 75th percentile; horizontal line indicates the median. P < 0.0001, Mann–Whitney U test. (B) Visualization of reads per kilobase per million (RPKM)-normalized DRIP-seq signal in input controls and DRIP samples over the indicated genomic regions. Combined signal from nine DRIP samples and matching input controls from three biological replicates is plotted. RPKM-normalized GRO-seq signal is plotted to show transcription. Location of RefSeq genes and position of DSB junctions detected by high-throughput, genome-wide, translocation sequencing (HTGTS) of aphidicolin-treated NSPCs is shown. (C) Genomic distribution of NSPC R-loop peaks across the indicated genome annotations compared to the expected genomic distribution. (D) Transcriptional status of genes with R-loop peaks in NSPCs. Data in (C) and (D) represent mean ±  S.E.M. from three independent DRIP-seq experiments performed on biological replicates. P values were determined by one-way ANOVA with Tukey’s post hoc correction for multiple comparisons. (E) Left, metaplot analysis of RPKM-normalized, raw DRIP-seq signal across active (blue, n = 15,528) and inactive (dark gray, n = 3246) genes in NSPCs reveals enrichment at TSSs, gene bodies, and transcription end sites (TESs). Right, RPKM-normalized, raw DRIP-seq signal around the TSSs of active and inactive genes in NSPCs.

To directly assess the potential role of R-loops in the formation of TSS-proximal DSBs or recurrent DSB clusters in long neural genes, we set out to elucidate the genomic landscape of R-loops in NSPCs under the same conditions of aphidicolin (APH)-induced, mild replication stress that we had previously used to identify recurrent classes of DSBs in this cell type6,9. Reliable, high-resolution mapping of R-loops has become possible by multiple approaches, including “DNA:RNA immunoprecipitation with deep sequencing” (DRIP-seq)21,22,23,24. This approach relies on the S9.6 monoclonal antibody that specifically binds RNA:DNA hybrids and allows quantitative recovery of R-loops in conjunction with the high-resolution mapping capability of next-generation sequencing21. We first validated the S9.6 antibody and DRIP approach by performing dot blots and DRIP-quantitative PCR (DRIP-qPCR) assays, using established positive and negative controls (Fig. S1A and B). Next, we performed DRIP-seq in primary NSPCs isolated from postnatal day 7 mice in the presence of mild, aphidicolin-induced replication stress as described6,9. To assess the quality of our R-loop mapping in NSPCs, we visualized raw DRIP-seq signal over the “gold standard” Rpl13a and the Ywhaz gene promoter regions21. Consistent with previous reports in human cells21, we detected robust DRIP-seq signal over these regions in mouse NSPCs (Fig. 1B). Visual comparison of raw DRIP-seq signal in the mouse orthologs of human genes known to exhibit R-loops23,24 further confirmed the quality of our DRIP-seq analysis and revealed that R-loop formation in these genes is conserved between mice and humans and across cell types (Fig. S1C). Analysis of NSPC DRIP-seq libraries generated from nine DRIP samples prepared from three independent, biological replicates (i.e., three technical repeats per each of the three biological replicates) revealed a total of 22,132 R-loop peaks. R-loop peaks covered 1.01 ± 0.45% (mean ± S.D.) of the NSPC genome, which is similar to the extent of R-loop formation reported for other cell types and species23.

Further analysis revealed that R-loop peaks in NSPCs are significantly enriched in 5′-UTRs, promoters, introns, exons, transcription termination sites, and 3′-UTRs but are depleted in intergenic regions (Fig. 1C). Overall, NSPC R-loop peaks were detected in 9020 annotated genes (RefSeq NCBI37/mm9). Next, we assessed the transcriptional activity of genes containing R-loop peaks in NSPCs. Consistent with the notion that transcription promotes R-loop formation, the vast majority (99.17%) of genes containing R-loop peaks was either transcriptionally active (GRO-seq RPKM ≥ 0.025) or showed ambiguous (GRO-seq RPKM ≥ 0.0025 to < 0.025) transcriptional activity. Only 75 (0.83%) of the 9020 genes containing R-loop peaks in NSPCs were transcriptionally inactive (GRO-seq RPKM < 0.0025) (Fig. 1D).

To gain insights into the functions of R-loops and potential relationships to genomic stability in NSPCs undergoing mild replication stress, we further assessed the genomic distribution of DRIP-seq reads. DRIP-seq signal in NSPCs was present throughout the gene bodies of actively transcribed genes and showed a robust enrichment around the transcription start sites (TSSs) and transcription end sites (TESs) of active genes (Fig. 1E). These findings are consistent with the reported distribution of R-loops in other cell types and their involvement in regulatory functions in these regions24,25,26,27,28,29 and reveal that this distribution persists under mild replication stress in NSPCs.

Comparative analysis of R-loop formation in NSPCs and embryonic stem cells

To compare our DRIP-seq results from NSPCs with published DRIP-seq data and gain insights into potential, lineage-specific features of R-loop formation, we obtained the deposited FASTQ files from DRIP-seq studies in pluripotent, mouse embryonic stem cells (ESCs)23. To enable direct comparisons, we performed all data analysis of NSPC and ESC DRIP-seq under identical bioinformatic conditions. R-loop peaks in aphidicolin-treated NSPCs and untreated ESCs showed a similar distribution across chromosomes (Fig. S2A) and displayed a similar mean R-loop peak size of around 2 kb (NSPC, 2.19 ± 0.05 kb; ESC, 1.95 ± 0.02 kb; mean ± S.E.M) (Fig. S2B). Overall, the ESC DRIP-seq data set contained a slightly higher total number of R-loop peaks (57,751) than detected in the combined NSPC DRIP-seq data, but when normalized via random down-sampling to the total peak number observed in NSPCs, ESCs and NSPCs showed similar absolute R-loop peak numbers and R-loop densities across chromosomes (Fig. S2C and D). As in aphidicolin-treated NSPCs, analysis of R-loop peak distribution in untreated ESCs revealed enrichment in 5′-UTRs, promoters, introns, exons, transcription termination sites, and 3′-UTRs, and depletion in intergenic regions (Fig. S2E).

Next, we asked if R-loops in NSPCs are associated with genes involved in specific cellular functions and processes. To this end, we determined which genes show ≥ 1 R-loop peak in both ESCs and NSPCs (“common”), or ≥ 1 R-loop peak uniquely in either cell type (“ESC unique”; “NSPC unique”) (Fig. 2A). The group of “common” R-loop genes contained 7127 genes, representing 66.98% and 84.54% of active genes with R-loops in ESC and NSPCs, respectively. 1303 genes (15.46%) were unique to NSPCs, and 3514 genes (33.02%) were unique to ESCs (Fig. 2A). Figure 2B shows examples, with mitochondrial ribosomal protein 9 (MrpS9) being actively transcribed and forming R-loops in ESCs and Pou3f3 (also known as Brain-1), a gene with roles in brain development30 and intellectual disability31 being unique to NSPCs. Core ESC transcriptional factors such as Pou5f1 and Lin28A were unique to ESCs (Fig. 2C). Pou3f2 (Brain-2), a gene involved in the establishment of neural cell lineage, neocortical development and associated with psychiatric disorders32,33 was unique to NSPCs (Fig. 2C). Notably Pou3f3/Brain-1 acts synergistically with Sox11 and Sox4 in neural development and we find that both show robust R-loop formation in NSPCs (Fig. S3A). Moreover, R-loops in Pou3f3/Brain-1 extended into the neighboring Pantr1 (Pou3f3 adjacent non-coding transcript 1) gene, which encodes a long non-coding RNA implicated in glioma development34.

Figure 2
figure 2

The landscape of R-loop formation in NSPCs suggests roles in lineage-specific processes. (A) Venn diagram showing the number of common and unique genes with R-loop peaks in ESCs and NSPCs under mild replication stress. (B) Examples of genes forming R-loops in NSPCs or in ESCs. RPKM-normalized DRIP-seq and GRO-seq signals and RefSeq genes are shown. (C) Lineage-specific genes form R-loops in ESCs and NSPCs, illustrated as in (B). (D) Left, gene ontology (GO) analysis of NSPC-specific R-loop genes. Bars show significantly enriched GO terms and are colored by P values in log base 10. The Top 20 clusters are shown. Right, network visualization of the enriched terms shown on left, colored by cluster ID. Nodes sharing the same cluster ID are close to each other.

To assess the overall implications of R-loop formation within genes in the common, ESC- unique, and NSPC-unique sets, we performed pathway and process enrichment analyses (Figs. 2D and S3B-C). Strikingly, we found that genes with unique R-loop formation in NSPCs were significantly enriched in processes related to neural development and function (Fig. 2D). In stark contrast, shared R-loop genes were enriched for general biological processes (Fig. S3B) and genes in the ESC-specific set showed enrichment of more general cellular processes, including DNA repair, cell cycle, and DNA replication (Fig. S3C). Given the association between transcription and R-loop formation, we expected that similar results would be observed when considering genes just based on their transcriptional activity, i.e., regardless of R-loop status. Analysis of GRO-seq data revealed 778 genes with unique, active transcription (GRO-seq RPKM ≥ 0.025) in NSPCs (Fig. S3D). This set of genes showed enrichment for processes related to neural development and function but less so than the set of genes with R-loops unique to NSPCs, with fewer terms clearly related to neural function and development (Figs. 2D and S3E).

Most of the differences in R-loop peaks between the two cell types are likely due to differences in transcription (Fig. 2B,C). However, some genes with similar rates of transcription in both NSPCs and ESCs show strikingly different levels of R-loops (Fig. S4). Although beyond the scope of our current study, it will be informative to elucidate why these genes show a decoupling of R-loop formation and rate of transcription. Specifically, 1034 of 1303 (79.36%) genes with R-loops only in NSPCs show a higher transcription rate in NSPCs than in ESCs. 35 genes (2.69%) of genes with NSPC-specific R-loop peaks are transcribed at similar levels (± 5% transcriptional activity as measured by GRO-seq RPKM) in ESCs, and 234 (17.96%) genes with R-loop peaks in NSPCs showed higher transcription, but no R-loop peaks, in ESCs. Similarly, of the 3514 genes with R-loop peaks specific to ESCs, 957 (27.23%) show higher transcription in NSPCs, 165 (4.70%) display similar rates of transcription in both cell lineages, and most genes with ESC-specific R-loop peaks (2392; 68.07%) show higher transcriptional activity in ESCs.

Overall, our comparative analysis of R-loop signal in NSPCs and ESCs points to potential, lineage-specific functions of R-loops and suggests that perturbation of R-loop formation in NSPCs may impact neural processes and development, given the association between R-loop formation and transcription of cell type-specific genes.

Factors associated with R-loop formation

Active genes containing R-loop peaks in NSPCs were significantly longer than active genes without R-loops, with an average gene length of 62.68 ± 1.39 kb (S.E.M.) and 38.7 ± 0.98 kb (S.E.M.), respectively (Fig. 3A). Notably, this difference in length persisted when we only compared genes with or without R-loops that showed a similar rate of transcription (Fig. 3B). Consistent with this observation, R-loop peak-containing genes in ESCs were longer than genes without R-loops (54.58 ± 0.98 kb vs. 36 ± 1.26 kb; mean ±  S.E.M.; Fig. S5A) and, again, this length difference persisted when comparing only transcription rate-matched genes with or without R-loops in ESCs (Fig. S5B). As a group, active genes with R-loop peaks showed a significantly higher level of transcription than active genes without R-loop peaks in both NSPCs undergoing mild replication stress (Fig. 3C) and ESCs under basal conditions (Fig. S5C). These findings indicate that in both NSPCs and ESCs, R-loop-forming genes are generally longer and more actively transcribed than genes without R-loops.

Figure 3
figure 3

Factors promoting R-loop formation in NSPCs. (A) Box-and-whiskers plot showing gene length of active genes with (n = 8430) or without (n = 7093) R-loop peaks in NSPCs and all RefSeq genes (n = 22,735), for comparison. P values were determined by one-way ANOVA with Tukey’s post hoc correction. (B) Comparison of gene length of transcription rate-matched genes (n = 3255) with or without R-loop peaks in NSPCs (P < 0.0001; ns, not significant; Mann–Whitney U test). (C) Transcriptional activity of active genes with (n = 8430) or without (n = 7093) R-loops in NSPCs. P < 0.0001, Mann–Whitney U test. (D) Pie charts showing distribution of genes with R-loop peaks in NSPCs (top) or ESCs (bottom) across the indicated classes of GC skew, showing that most genes with R-loops exhibit GC skew. (E) Metaplot analysis of RPKM-normalized raw DRIP-seq signal across the TSS, gene body, and TES of active NSPC genes with (blue, n = 7079) or without GC skew (yellow, n = 8449) in the promoter region.

For reasons we do not currently understand, genes with R-loop peaks unique to NSPCs were on average significantly longer than genes with R-loops common to both cell types and genes with R-loop peaks unique to ESCs (Fig. S5D) and showed a significantly lower rate of transcription than either group of genes with R-loops (Fig. S5D). To assess this further, we compared the length of all actively transcribed genes (GRO-seq RPKM ≥ 0.025) in NSPCs (n = 15,528) and ESCs (n = 16,683). We found that actively transcribed genes in NSPCs are, on average, longer than actively transcribed genes in ESCs (51.72 ± 0.88 kb vs. 47.85 ± 0.77 kb; mean ±  S.E.M; P < 0.01, Mann–Whitney U test). Thus, one factor contributing to the greater length of genes with NSPC-specific R-loops may be that long genes are an expression feature of neural cells35.

To further assess factors contributing to R-loop formation in NSPCs, we used a 4-state hidden-Markov model (StochHMM)25,26 to predict GC skew regions in the mouse genome. After identifying regions with GC skew, R-loop peak-containing genes in NSPCs were clustered into four skew classes (strong skew, weak skew, no skew, and reverse skew)25,26. This analysis revealed that most R-loop-forming genes show GC skew, with only a minority (7.62%) exhibiting no GC skew (Fig. 3D, Top). Genes with R-loop peaks in ESCs showed a similar distribution across the four skew classes (Fig. 3D, Bottom), suggesting that—regardless of replication stress and cell type—GC skew is a universal feature of R-loop-forming genes, which is also supported by findings in human cells25.

To determine the impact of GC skew within the promoter region (± 2 kb of TSS) on R-loop formation in NSPCs, we divided all 15,528 genes that are actively transcribed in NSPCs into two groups; one group contained genes with GC skew within the promoter region (7079 genes), whereas the other group contained genes without GC skew within the promoter region (8449 genes). We then plotted the reads per kilobase per million (RPKM)-normalized DRIP-seq signal over these two groups of genes. Strikingly, genes with GC skew within 2 kb of the TSS showed a stronger DRIP-seq signal at the TSS and over the entire gene body and TES than genes without GC skew within the promoter region (Fig. 3E). In contrast, the latter group of genes showed a robust peak at the TES (Fig. 3E). We do not know why genes without TSS-proximal GC skew show extensive R-loop signal at the TES. One potential explanation may be that these genes rely more heavily on R-loop-mediated RNAPII pausing at their 3′ end36.

Together, our results reveal that gene length and rate of transcription are factors associated with R-loop formation and that GC skew in the promoter region is a strong predictor of overall R-loop formation throughout genes in NSPCs.

Interplay between R-loop formation and DNA breakpoints in NSPCs

We previously reported that breakpoint junctions are enriched around active TSSs in NSPCs9. To evaluate a potential role of R-loops in promoting this class of DSBs, we compared the gene length-normalized R-loop peak density of active genes of average length (i.e., 5.49–25.49 kb) containing an HTGTS junctions within two kb of the TSS (“Class A”) and those of the most robust RDC-genes containing at least one R-loop peak (“Class B”), respectively (Fig. 4A). Actively transcribed NSPC genes of average gene length with TSS-proximal breakpoint junctions displayed a significantly higher R-loop peak density than RDC-genes (Fig. 4A). However, these results do not reveal whether R-loops or transcription per se contribute to the formation of TSS-proximal DSBs. To shed light on this, we compared the R-loop peak density of genes with TSS-proximal DSBs detected by HTGTS (Class A; see Fig. S6 for examples) with a set of genes matched for rate of transcription and containing at least one R-loop peak (Set A’; Fig. 4B). These sets of genes showed similar R-loop peak density (Fig. 4B), suggesting that R-loop de-regulation or processing rather than R-loop levels per se may be relevant for TSS-proximal DSB formation.

Figure 4
figure 4

R-loops form in early-replicating regions with TSS-proximal DSBs, but RDC-genes are not prone to R-loop formation in NSPCs undergoing mild replication stress. (A) NSPC genes with TSS-proximal (± 2 kb) breakpoint junctions (“A”, n = 382) show a significantly higher R-loop peak density than RDC-genes with R-loop peaks (“B”, n = 23) overall. Whiskers show minimum and maximum values; top and bottom edges of boxplots correspond to 25th and 75th percentiles, respectively; horizontal lines indicate the median. P value was determined by the Mann–Whitney U test. (B) Left, Transcriptional activity of class A (n = 382) and transcription-matched gene set A’ (n = 367); Right, Box-and-whisker plots of R-loop peak density in the two groups. Plot details are as in (A). (C) Violin plots showing the frequency distribution of replication timing ratios of all R-loop peaks (n = 22,132), Group 1–3 RDCs (n = 113), and the set of 27 RDC-genes in NSPCs. Median (blue line) and quartile lines (black) are shown. P values were determined by one-way ANOVA with Tukey’s post hoc correction. (D) Parts-of-whole graph showing fraction of RDC-gene HTGTS junctions that fall within two kb of an R-loop peak. (E) Bar graphs indicating numbers of R-loop peaks within two kb of a breakpoint junction in RDC-genes in NSPCs. (F) RPKM-normalized DRIP-seq signal over the indicated RDC-genes. Combined signal from DRIP samples and matching input controls from three biological replicates of aphidicolin-treated NSPCs is plotted. RPKM-normalized NSPC GRO-seq signal is shown to indicate transcription. RefSeq genes are shown in black. DSB junctions detected in NSPCs via HTGTS are indicated. (G) Left, zoomed-in visualization of DRIP-seq signal in RDC-genes Npas3 (top) and Magi2 (bottom), as in (F). Grey rectangles indicate regions analyzed by DRIP-qPCR. Right, DRIP-qPCR analysis using primers located in the regions shown on left in Npas3 and Magi2 [with (R1) or without (R2, negative control) R-loop peak signal]. Where indicated, samples were treated with RNase H (RH) prior to DRIP. Treatment with RNase H significantly suppressed the DRIP-qPCR signal, consistent with R-loop formation in the tested regions. DRIP-qPCR signal intensity (mean ±  S.E.M) shows fold enrichment over the Snrpn negative control region. P values were determined by two-tailed, unpaired t test; ns, not significant. (H) RDC-genes with ≥ 10 R-loop peaks (n = 5) show a significantly earlier replication timing than RDC-genes with ≤ 1 R-loop peak (n = 8). Violin plots show the frequency distribution of replication timing ratios in the two subsets of RDC-genes. Median (blue line) and quartile lines (black) are shown. P value was determined by the Mann–Whitney U test.

To further consider a potential involvement of R-loop formation in the various, recurrent classes of DSBs in NSPCs, we next examined the replication timing of R-loop peaks. R-loop peaks in NSPCs undergoing mild replication stress were present in early-replicating regions and showed, on average, a significantly earlier replication timing than Group 1–3 RDCs or the most robust 27 RDC-genes6,7 (Fig. 4C). These findings were corroborated in ESCs, where R-loop peaks showed a significantly earlier replication timing than the set of RDC candidates in ESCs8 (Fig. S7A). Notably, the replication timing of R-loop peaks in aphidicolin-treated NSPCs and untreated ESCs did not differ significantly (Fig. S7B). These findings suggest that R-loop peaks preferentially occur in early-replicating regions of the genome in these two types of stem cells.

Next, we examined the formation of R-loops in the 27 RDC-genes6. To this end, we determined the number of R-loop peaks within two kb of HTGTS breakpoint junctions (Fig. 4D). Strikingly, within the most robust 27 RDC-genes, only 98 out of 1871 breakpoint regions (5.24%) contained an R-loop peak, whereas the vast majority (94.76%) of all RDC-gene breakpoint regions did not show R-loop formation (Fig. 4D). To reveal potential differences in R-loop formation among the 27 RDC-genes, we determined the number of R-loop peaks within two kb of HTGTS breakpoints in each RDC-gene. Whereas some RDC-genes contained few to no R-loop peaks within two kb of an HTGTS junction, we noticed a range of R-loop formation, with Npas3, Ctnnd2, and Cdh13 containing the most R-loop peaks (Fig. 4E,F). DRIP-qPCR analysis confirmed the formation of R-loops (Fig. 4G).

RDCs in long neural genes tend to occur in large introns. Given the extensive splicing of transcripts of RDC-genes such as Nrxn1 and Nrxn3, we hypothesized that co-transcriptional splicing of such large introns may make these genes prone to R-loop formation via reannealing of the nascent transcripts to the DNA. To our surprise, however, neither Nrxn1 nor Nrxn3 showed extensive R-loop formation (Fig. 4E), suggesting that RDCs in these long neural genes are not associated with a propensity for R-loop formation, even in the presence of mild replication stress. In the latter context, we had hypothesized that late replication timing would promote R-loop formation via transcription/replication collisions. However, RDC-genes with the highest number of R-loop peaks showed significantly earlier replication timing than RDC-genes with the lowest number of R-loop peaks (Fig. 4H), suggesting that R-loops may contribute to DSBs in some RDC-genes. But surprisingly, we did not find abundant R-loop formation in the RDC-genes with the highest DSB density, indicating that R-loops are not a major driver of RDC formation in these long, late-replicating genes under mild replication stress conditions.

Overall, our investigation of R-loop distribution in NSPCs under the same mild replication stress conditions under which recurrent classes of DSBs have been identified supports the notion that TSS-associated DSBs and DSBs in the gene bodies of long, transcribed neural genes are caused by different mechanisms, with the former class potentially being affected by processes related to R-loop formation and processing, consistent with findings in other cell types24,37,38.

#Relationships #genomewide #Rloop #distribution #classes #recurrent #DNA #breaks #neural #stemprogenitor #cells #Scientific #Reports

Leave a Comment

Your email address will not be published.