Introduction

This project analyses data obtained from Greene et al., 2024

Citation: Greene, C., Connolly, R., Brennan, D., Laffan, A., O’Keeffe, E., Zaporojan, L., O’Callaghan, J., Thomson, B., Connolly, E., Argue, R., Meaney, J. F. M., Martin-Loeches, I., Long, A., Cheallaigh, C. N., Conlon, N., Doherty, C. P., & Campbell, M. (2024). Blood-brain barrier disruption and sustained systemic inflammation in individuals with long COVID-associated cognitive impairment. Nature neuroscience, 27(3), 421–432. https://doi.org/10.1038/s41593-024-01576-9

GEO accession: GSE251849

This paper aimed to identify if long COVID-associated brain fog could be attributed to blood-brain barrier (BBB) dysfunction. Using a combination of magnetic resonance imaging techniques and molecular techniques, the researchers identified that BBB dysfunction and systemic inflammation are uniquely present in patients with long COVID-associated brain fog. Greene et al., used transcriptomic analysis to test if there are any unique gene expression changes in patients with long COVID-associated brain fog. Peripheral blood mononuclear cells (PBMCs) were collected and isolated from four cohorts: unaffected (n=7), recovered (n=5), long COVID without brain fog (n=6), and long COVID with brain fog (n=5). RNA-seq was run on the NovaSeq 6000 (Illumina) in 100 bp paired-end reads, with 20 million reads per sample.

Green Trail: The following analysis was performed using RNA-seq data from the long COVID with brain fog (n=3) and long COVID without brain fog cohorts (n=5). The goal of this analysis is to replicate Figure 6 c, d, and j-k.

Download Required Libraries

The following libraries are necessary for data analysis

library(knitr)
library(DESeq2) 
library(RColorBrewer)
library(ggplot2)
library(tidyverse)
library(ggalt)
library(EnhancedVolcano)
library(clusterProfiler)
library(cowplot)
library(patchwork)
library(ggplotify)
library(ggpubr)
library(genekitr)

Input annotation file

The csv file is the input annotation file containing sample name, condition, filename, and replicate number.

sampleTable <- read.csv("Metafile Final Project MMG 3320.csv", header = T)
sampleTable <- sampleTable[sampleTable$filename != "", ]
head(sampleTable)

##    Sample.Name                      filename  condition replicate
## 1 long_covid_1 SRR27321853.gene_id.count.txt long_covid         1
## 2 long_covid_2 SRR27321851.gene_id.count.txt long_covid         2
## 3 long_covid_3 SRR27321850.gene_id.count.txt long_covid         3
## 4 long_covid_4 SRR27321848.gene_id.count.txt long_covid         4
## 5 long_covid_5 SRR27321849.gene_id.count.txt long_covid         5
## 6  brain_fog_1 SRR27321852.gene_id.count.txt  brain_fog         1

convert the table into a data frame object.

#Convert to data frame 
sampleTable <- data.frame(sampleTable)
head(sampleTable)

##    Sample.Name                      filename  condition replicate
## 1 long_covid_1 SRR27321853.gene_id.count.txt long_covid         1
## 2 long_covid_2 SRR27321851.gene_id.count.txt long_covid         2
## 3 long_covid_3 SRR27321850.gene_id.count.txt long_covid         3
## 4 long_covid_4 SRR27321848.gene_id.count.txt long_covid         4
## 5 long_covid_5 SRR27321849.gene_id.count.txt long_covid         5
## 6  brain_fog_1 SRR27321852.gene_id.count.txt  brain_fog         1

Specify the condition column as a factor

#specify as factor 
condition <- factor(sampleTable$condition)

Running DESeq2 with DESeqDataSet

Create DESeq object with read counts files generated from htseq and the sampleTable annotation file information. Counts files inputed from htseq are located in the counts folder. Include condition in design formula as the the distinguishing parameter. Pre-filter the DESeq object (dds) to remove any insignificant rows that have less than 10 reads. Create a new object with this filtering applied.

dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                  directory = "counts/",
                                  design = ~ condition)

dds

## class: DESeqDataSet 
## dim: 78932 8 
## metadata(1): version
## assays(1): counts
## rownames(78932): ENSG00000000003 ENSG00000000005 ... ENSG00000310556
##   ENSG00000310557
## rowData names(0):
## colnames(8): long_covid_1 long_covid_2 ... brain_fog_2 brain_fog_3
## colData names(2): condition replicate

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

Specify Reference Condition as long_covid. This means that the long_covid no brain fog condition is considered the control condition

dds$condition <- relevel(dds$condition, ref = "long_covid")

Perform Differential expression analysis using DESeq() and extract results

dds <- DESeq(dds)
res <- results(dds, contrast=c('condition', 'brain_fog', 'long_covid')) 
head(res)

## log2 fold change (MLE): condition brain_fog vs long_covid 
## Wald test p-value: condition brain fog vs long covid 
## DataFrame with 6 rows and 6 columns
##                    baseMean log2FoldChange     lfcSE      stat      pvalue
##                   <numeric>      <numeric> <numeric> <numeric>   <numeric>
## ENSG00000000003     7.54109       0.151557  0.905058  0.167455 0.867011939
## ENSG00000000419   181.00668       0.710632  0.327200  2.171858 0.029866353
## ENSG00000000457   142.81802      -0.143892  0.296233 -0.485738 0.627153129
## ENSG00000000460    49.73487      -0.864720  0.482947 -1.790508 0.073372345
## ENSG00000000938 10116.54267      -0.762223  0.199535 -3.820004 0.000133449
## ENSG00000000971     2.51204      -0.403052  1.918004 -0.210142 0.833557221
##                       padj
##                  <numeric>
## ENSG00000000003 0.95658102
## ENSG00000000419 0.19832296
## ENSG00000000457 0.85422844
## ENSG00000000460 0.32316760
## ENSG00000000938 0.00522579
## ENSG00000000971         NA

Shrink log fold changes of genes with low mean and high dispersion to reduce noise. This filters the dataset to only retain DEGs. Sort results by adjusted p-value from most significant to least significant and create new result dataframe. Count significant genes that have an adjusted p-value of < 0.05

resLfc <- lfcShrink(dds, coef = "condition_brain_fog_vs_long_covid", res = res)
result <- resLfc[order(resLfc$padj), ]
#count signifigant genes
table(result$padj<0.05)

## 
## FALSE  TRUE 
## 14931  1046

1046 genes are significantly differently expressed while 14,931 are not. The original paper found 1,078 DEGs between long Covid with brain fog and long COVID without brain fog cohorts.

Merge the results object with the normalized counts and output results into new csv file

resdata <- merge(as.data.frame(result), 
                 as.data.frame(counts(dds, normalized=TRUE)), 
                 by="row.names", 
                 sort=FALSE)
#specify first column as Gene IDs
names(resdata)[1] <- "Gene_id"

head(resdata)

##           Gene_id  baseMean log2FoldChange     lfcSE       pvalue         padj
## 1 ENSG00000095794  704.3241       3.869104 0.4183854 2.760304e-21 4.410138e-17
## 2 ENSG00000163154  699.2050      -2.053365 0.2308901 2.901875e-20 2.318163e-16
## 3 ENSG00000277632  202.7253       3.060213 0.3689712 7.889678e-18 4.201779e-14
## 4 ENSG00000132002 3789.0306       1.904392 0.2315789 1.687022e-17 6.738389e-14
## 5 ENSG00000185022  247.6471       3.298600 0.4111445 1.095874e-16 3.501755e-13
## 6 ENSG00000197019 1022.1591       3.059639 0.3832102 1.357532e-16 3.614882e-13
##   long_covid_1 long_covid_2 long_covid_3 long_covid_4 long_covid_5 brain_fog_1
## 1    112.51917    119.41906    123.68325     84.11062    110.78374   2320.5623
## 2    804.18109    944.30304    951.34138   1086.83245   1128.22157    190.6142
## 3     70.32448     44.62278     70.48615     24.66070     46.97230    564.2560
## 4   1889.66010   1909.00506   2049.41809   1517.07367   1760.13201   9677.7024
## 5     89.35345     45.89772     66.93968     37.43143     34.56453    959.7094
## 6    349.14035    196.34024    309.87308    201.24896    242.83795   4037.9869
##   brain_fog_2 brain_fog_3
## 1    423.5553   2339.9591
## 2    206.6123    281.5335
## 3    495.8696    304.6101
## 4   6084.7327   5424.5206
## 5    371.9022    375.3781
## 6   1053.7228   1786.1226

#write results to csv file
write.csv(resdata, 
          file="brain_fog_vs_long_covid_unfiltered_matrix.csv", 
          row.names = FALSE)

Specify upregulated and downregulated genes using two log2 fold change thresholds: 1 and 0.5. The original paper used a log2 fold change threshold of 0.58.

#Get DEG up/down lists using Lo2FoldChange of 1
resdata_log2fc1 <- as.data.frame(result) |>  
  dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05) |> 
  tibble::rownames_to_column("Gene_id")
#edit  results dataframe to specify upregulated ad downregulated genes
#Add a column to dataframe to specify up or down regulated
resdata_log2fc1$diffexpressed <- "Unchanged"
#if log2FC > 1 and padj < 0.05 set as upregulated
resdata_log2fc1$diffexpressed[resdata_log2fc1$padj < 0.05 & resdata_log2fc1$log2FoldChange > 1] <- "Upregulated"
#if log2FC > -1 and padj < 0.05 set as downregulated
resdata_log2fc1$diffexpressed[resdata_log2fc1$padj < 0.05 & resdata_log2fc1$log2FoldChange < -1] <- "Downregulated"

#write results
write.csv(resdata_log2fc1, 
          "brain_fog_vs_Long_covid_log2fc1_fdr05_DEGlist.csv", 
          row.names = FALSE)

#Get DEG up/down lists using Lo2FoldChange of 0.5
resdata_log2fc0.5 <- as.data.frame(result) |>  
  dplyr::filter(abs(log2FoldChange) > 0.5 & padj < 0.05) |> 
  tibble::rownames_to_column("Gene_id")
resdata_log2fc0.5$diffexpressed <- "Unchanged"
resdata_log2fc0.5$diffexpressed[resdata_log2fc0.5$padj < 0.05 & resdata_log2fc0.5$log2FoldChange > 0.5] <- "Upregulated"
resdata_log2fc0.5$diffexpressed[resdata_log2fc0.5$padj < 0.05 & resdata_log2fc0.5$log2FoldChange < -0.5] <- "Downregulated"

#write results
write.csv(resdata_log2fc0.5, 
          "brain_fog_vs_Long_covid_log2fc0.5_fdr05_DEGlist.csv", 
          row.names = FALSE)

There are more DEGs when using a log2 fold change threshold of 0.5 compared to a threshold of 1. A threshold of 1 will identify significant DEGs with a 2-fold change in expression. A threshold of 0.5 will identify more subtle changes which may be better suited for this data set as both cohorts have long COVID symptoms, with the only distinguishing factor being brain fog possibly indicating subtle biological differences.

#Print number of upregulated and downregulated genes for each log2fold change threshold

#create table for fc DEGs
counts_fc1 <- table(resdata_log2fc1$diffexpressed)
counts_fc05 <- table(resdata_log2fc0.5$diffexpressed)

# Combine into a data frame
counts_DEGs <- data.frame(
  Threshold = c("log2FC ≥ 1", "log2FC ≥ 0.5"),
  Downregulated = c(counts_fc1["Downregulated"], counts_fc05["Downregulated"]),
  Upregulated = c(counts_fc1["Upregulated"], counts_fc05["Upregulated"]))

# output as table
kable(counts_DEGs, caption = "Number of Upregulated and Downregulated Genes")

Number of Upregulated and Downregulated Genes
Threshold	Downregulated	Upregulated
log2FC ≥ 1	147	406
log2FC ≥ 0.5	312	683

Data Visualization

PCA plot replicating figure 6c

PCA plot is replicating figure 6c. The title was changed from “PCA scores” to “PCA: Long COVID Associated Brain Fog vs. No Brain Fog” in order to enhance clarity. The axis of the figure generated below differs from Figure 6c because there are less data points in the brain fog condition, and the axis should accommodate all data points. Formatting from Figure 6c was replicated including color, shape, and ellipse around data points.

#perform variance stabilizing transformation and generate PCA plot
vstcounts <- vst(dds, blind = FALSE)
pcaData <- plotPCA(vstcounts, intgroup=c("condition"), returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
pca <- ggplot(pcaData, aes(PC1, PC2, color=condition, shape = condition)) +
  
 #Specify formatting to replicate Figure. 6 c
  geom_encircle(aes(group = condition, fill = condition), alpha = 0.2, expand = 0.01, show.legend = FALSE) +
  geom_point(size=4, alpha=0.8) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed() + 
  ggtitle("PCA: Long COVID Associated Brain Fog vs. No Brain Fog") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  # Manually set color and shape to match figure 6.c
  scale_color_manual(values = c("long_covid" = "blue", "brain_fog" = "red"), labels = c("long_covid"="long COVID no brain fog", "brain_fog"="long COVID Brain Fog")) +
  scale_fill_manual(values = c('long_covid' = 'blue', 'brain_fog' = 'red')) +
  scale_shape_manual(values = c("long_covid" = 17, "brain_fog" = 16),
                    labels = c("long_covid"="long COVID no brain fog", "brain_fog"="long COVID Brain Fog")) +
  theme_classic(base_size = 10) + 
  theme(legend.position = "right", legend.title = element_blank(),
        plot.title = element_text(hjust=0.5, face = "bold")) +
  theme(panel.border = element_rect(color = "black", fill = NA, size = 0.5)) # Outline the plot area
#save and output plot 
ggsave("PCAplot.png", width = 7, height = 5, dpi = 300)
pca

Figure 1. PCA plot of RNA-seq data from long COVID cohorts with brain fog and without brain fog

GeneID Conversion

convert Ensemble ID to Entrez Gene and Gene Symbol for downstream visualizations

# Remove the version from Ensembl gene IDs
resdata$Ensembl_id <- sub("\\.\\d+$", "", resdata$Gene_id)
# Extract the cleaned Ensembl IDs
ids <- resdata$Ensembl_id
# Convert to gene symbols and Entrez IDs
gene_symbol_map <- transId(ids, 
                           transTo = c("symbol", "entrez"), 
                           org = "human",   # switch to human
                           unique = TRUE)   # ensures one-to-one mapping
#include new gene IDs in results data frame
resdata <- merge(resdata, 
                  gene_symbol_map, 
                  by.x = "Ensembl_id", 
                  by.y = "input_id")
#str(resdata)

#write results to csv file
write.csv(resdata,
          "resdata_full_matrix.csv",
          row.names = FALSE)

Enhanced volcano plot replicating figure 6d using two log2Fold change thresholds: 1 and 0.5

Figure 2 and 3 show enhanced volcano plots depicting DEGs (red circles) with a log2 fold change threshold of 1 for Figure 2 and a log2 fold change threshold of 0.5 for figure 3. Statistically significant DEGs have an adjusted p-value < 0.05. The genes highlighted in Figure 6d were also highlighted in Figure 2 and 3 below.

# make an enhanced volcano similar to the one in Figure 6 d with FC cutoff of 1, highlighting the same genes as figure 6d
EnhancedVolcano(resdata,
                lab = as.character(resdata$symbol),
                x = 'log2FoldChange',
                y = 'padj',
                subtitle = NULL,
                selectLab = c('DHRS9', 'MED20', 'WAKMAR2', 'SNX1', 'ZNF394', 'PIK3R1', 'SNHG5','PDE4D', 'SNAI1'),
                title = "Brain Fog vs No Brain Fog Long COVID",
                FCcutoff = 1,
                pCutoff = 0.05,
                labSize = 4, 
                axisLabSize = 12, 
                col=c('#505050', 'grey', '#3B5998', 'red3'),
                labCol = 'black',
                labFace = 'bold',
                boxedLabels = TRUE,
                legendLabels=c('Not sig.','Log2FC','padj', 'padj & Log2FC'),
                legendPosition = 'right',
                legendLabSize = 16,
                legendIconSize = 5.0,
                drawConnectors = TRUE,
                widthConnectors = 1.0,
                colConnectors = 'black', 
                gridlines.major = FALSE,
                gridlines.minor = FALSE,
                ylim = c(0,6),
                xlim = c(-6,6))

Figure 2. Enhanced volcano plot showing the statistically significant (adjusted p-value < 0.05) differentially expressed (log2FC > 1) genes in the dataset. Red circles indicate significantly upregulated and downregulated genes with the right side representing upregulated in the brain fog cohort compared to the non-brain fog long COVID cohort. There are 341 upregulated genes and 137 downregulated genes.

ggsave("VolcanoPlotFC1.png",width = 7, height = 5, dpi = 300)

# make an enhanced volcano similar to the one in Figure 6 d with FC cutoff of 0.5
EnhancedVolcano(resdata,
                lab = as.character(resdata$symbol),
                x = 'log2FoldChange',
                y = 'padj', 
                subtitle = NULL,
                selectLab = c('DHRS9', 'MED20', 'WAKMAR2', 'SNX1', 'ZNF394', 'PIK3R1', 'SNHG5','PDE4D', 'SNAI1'),
                title = "Brain Fog vs No Brain Fog Long COVID",
                FCcutoff = 0.5,
                pCutoff = 0.05,
                labSize = 4, 
                axisLabSize = 12, 
                col=c('#505050', 'grey', '#3B5998', 'red3'),
                labCol = 'black',
                labFace = 'bold',
                boxedLabels = TRUE,
                legendLabels=c('Not sig.','Log2FC','padj', 'padj & Log2FC'),
                legendPosition = 'right',
                legendLabSize = 16,
                legendIconSize = 5.0,
                drawConnectors = TRUE,
                widthConnectors = 1.0,
                colConnectors = 'black', 
                gridlines.major = FALSE,
                gridlines.minor = FALSE,
                ylim = c(0,6),
                xlim = c(-6,6))

Figure 3. Enhanced volcano plot showing the statistically significant (adjusted p-value < 0.05) differentially expressed (log2FC > 0.5) genes in the dataset. Red circles indicate significantly upregulated and downregulated genes with the right side representing upregulated in the brain fog cohort compared to the non-brain fog long COVID cohort. There are 615 upregulated genes and 301 downregulated genes.

ggsave("VolcanoPlotFC0.5.png",width = 7, height = 5, dpi = 300)

Bar plots replicating Figure 6 j-k. Gene count bar plots of circadian gene regulatory network components show upregulation in the long COVID with brain fog cohort.

Figure 4 is replicating Fig. 6 j-k, with changed color scheme to enhance clarity between conditions. These plots show increased expression of PER1, NR1D2, and RORA in long COVID with brain fog cohort compared to the non-brain fog long COVID control cohort.

#plot PER1
plot_PER1 <- plotCounts(dds, 
                       gene= "ENSG00000179094", 
                       intgroup = c("condition") ,
                       returnData = TRUE) |>
                         
  ggplot(aes(x = condition, 
             y = count, 
             fill = condition)) + 
  geom_boxplot(outlier.shape = NA, color = "black") + 
  geom_jitter(shape = 16, position = position_jitter(0.02)) +
  stat_boxplot(geom = "errorbar", width = 0.3, color = "black") + 
  labs(title = "PER1 Expression", 
       y = "Normalized Counts", 
       x = "Condition") + 
  theme_grey() + 
  theme(plot.title = element_text(hjust = 0.5),
        axis.text = element_text(size = 8),
        axis.title = element_text(size = 14)) +
  scale_fill_manual(values = c("skyblue", "salmon"))
  


#plot NR1D2
plot_NR1D2 <- plotCounts(dds, 
                       gene= "ENSG00000174738", 
                       intgroup = c("condition") ,
                       returnData = TRUE) |>
                         
  ggplot(aes(x = condition, 
             y = count, 
             fill = condition)) + 
  geom_boxplot(outlier.shape = NA, color = "black") + 
  geom_jitter(shape = 16, position = position_jitter(0.02)) +
  stat_boxplot(geom = "errorbar", width = 0.3, color = "black") + 
  labs(title = "NR1D2 Expression", 
       y = "Normalized Counts", 
       x = "Condition") + 
  theme_grey() + 
  theme(plot.title = element_text(hjust = 0.5),
        axis.text = element_text(size = 8),
        axis.title = element_text(size = 14)) +
  scale_fill_manual(values = c("skyblue", "salmon"))
  


#plot RORA
plot_RORA <- plotCounts(dds, 
                       gene= "ENSG00000069667", 
                       intgroup = c("condition") ,
                       returnData = TRUE) |>
                         
  ggplot(aes(x = condition, 
             y = count, 
             fill = condition)) + 
  geom_boxplot(outlier.shape = NA, color = "black") + 
  geom_jitter(shape = 16, position = position_jitter(0.02)) +
  stat_boxplot(geom = "errorbar", width = 0.3, color = "black") + 
  labs(title = "RORA Expression", 
       y = "Normalized Counts", 
       x = "Condition") + 
  theme_grey() + 
  theme(plot.title = element_text(hjust = 0.5),
        axis.text = element_text(size = 8),
        axis.title = element_text(size = 14)) +
  scale_fill_manual(values = c("skyblue", "salmon"))
  
#plot each gene bar plot together
#Show output as a grid
boxplot_grid <- plot_grid(plot_PER1,
                          plot_NR1D2,
                          plot_RORA,
                          ncol = 2,
                          rel_widths = c(1,1,1),
                          labels = NA)

# Save the combined plot
ggsave("Final_project_boxplots.png", boxplot_grid,width = 15, height = 6, units = "in" )
boxplot_grid

Figure 4. Gene count bar plots of PER1, NR1D2, and RORA between non-brain fog and brain fog long COVID cohorts. The bar plots show the normalized gene counts across the two conditions. PER1, NR1D2, and RORA are all upregulated in the brain fog conditions.

Conclusion

During this analysis, I successfully replicated a figure panel from Greene et al., 2024 including a PCA plot, volcano plot, and bar plot using raw RNA-seq data from the study. While my analysis was successful, the major roadblock I faced was the loss of two replicates in the brain fog cohort during HTseq-counts generation. I was already working with a small sample size (n=5), so this loss reduced the statistical significance of my downstream analysis.

Despite this roadblock, the PCA plot still showed clear separation between the brain fog and no brain fog long COVID cohorts, replicating what was found in the original study. This indicates that there may be differences in gene expression that contribute to the brain fog symptoms involved in long COVID.

The two enhanced volcano plots revealed several significantly upregulated genes in the brain fog group, some of which are involved in coagulation, neuroinflammation, and circadian rhythm regulation. The two volcano plots used a different log2 fold change thresholds (1 and 0.5) which highlighted how different parameters in the data analysis and visualization step can impact the results. Using a lower threshold results in more statistically significant DEGs genes which can increase sensitivity and identify smaller changes, with the risk of false positives. A higher threshold (log2fc > 1) identifies genes with larger differences and improves specificity by reducing false positives, but reduces sensitivity. For the dataset used in this analysis I would favor a lower threshold around 0.5 because the biological pathways involved in long COVID associated brain fog might have subtle expression changes that result in a biologically relevant impact. This is because biological processes like inflammation, BBB dysfunction, coagulation, and circadian rhythm regulation are all complex biological networks than can be impacted by subtle changes.

The bar plots highlighted three genes (PER1, NR1D2, and RORA) which were all upregulated in the brain fog cohort. These three genes are all involved in cognitive function and circadian rhythm regulation, yet to my surprise were not highlighted in the original study’s discussion, as the authors were more focused on identifying BBB dysfunction associated genes. Disruption of these genes or overexpression may result in circadian dysregulation or neuroinflammation which would likely contribute to brain fog and neurological sequelae of COVID-19.

One limitation of this study and the analysis performed with the RNA-seq data is the small sample size. With a sample size of 5 or less samples per replicate, all results should be interpreted with caution, especially because conditions like long COVID are complex and samples likely vary greatly between individuals. Another important limitation of this study is that there is ambiguity in the clinical definition of long COVID and long COVID with brain fog. These conditions are highly subjective and cannot be standardized across samples. There are a number of factors (age, sex, comorbidities, medication use) that may impact the results and downstream analysis of this dataset. Brain fog status was self reported and the original study found an association between brain fog, age, sum of comorbidities, and duration of hospital stay, indicating that these factors contribute to long COVID brain fog and could explain some of the transcriptional changes. Lastly, the long COVID cohort size was small and interestingly all patients in the long COVID brain fog (+) cohort were female. It is possible that the differences in gene expression between these two groups is not due to long COVID associated brain fog and can be attributed to factors like age, sex, and comorbidities.

The authors future directions are evaluating the long term impact of SARS-CoV-2 spike protein on cerebrovascular function and evaluating BBB permeability over time. Since this study is evaluating long COVID, longitudinal changes in BBB function would be important to evaluate in order to determine how long BBB dysfunction and inflammation associated with COVID-19 usually occurs. I would be interested in evaluating a larger long COVID dataset with more specific clinical metrics of long COVID and brain fog becuase these conditions can be unclear. It would be interesting to include cognitive tests in addition to self-reported brain fog and evaluate transcriptional changes across cognitive impairment severity in order better understand gene expression changes and brain fog associated with long COVID.

sessionInfo()

## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Monterey 12.0.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] fstcore_0.10.0              genekitr_1.2.8             
##  [3] ggpubr_0.6.0                ggplotify_0.1.2            
##  [5] patchwork_1.3.0             cowplot_1.1.3              
##  [7] clusterProfiler_4.14.6      EnhancedVolcano_1.24.0     
##  [9] ggrepel_0.9.6               ggalt_0.4.0                
## [11] lubridate_1.9.4             forcats_1.0.0              
## [13] stringr_1.5.1               dplyr_1.1.4                
## [15] purrr_1.0.4                 readr_2.1.5                
## [17] tidyr_1.3.1                 tibble_3.2.1               
## [19] tidyverse_2.0.0             ggplot2_3.5.2              
## [21] RColorBrewer_1.1-3          DESeq2_1.46.0              
## [23] SummarizedExperiment_1.36.0 Biobase_2.66.0             
## [25] MatrixGenerics_1.18.1       matrixStats_1.5.0          
## [27] GenomicRanges_1.58.0        GenomeInfoDb_1.42.3        
## [29] IRanges_2.40.1              S4Vectors_0.44.0           
## [31] BiocGenerics_0.52.0         knitr_1.50                 
## 
## loaded via a namespace (and not attached):
##   [1] later_1.4.2             splines_4.4.2           bitops_1.0-9           
##   [4] urltools_1.7.3          R.oo_1.27.0             triebeard_0.4.1        
##   [7] polyclip_1.10-7         lifecycle_1.0.4         rstatix_0.7.2          
##  [10] lattice_0.22-7          MASS_7.3-65             backports_1.5.0        
##  [13] magrittr_2.0.3          openxlsx_4.2.8          sass_0.4.10            
##  [16] rmarkdown_2.29          remotes_2.5.0           jquerylib_0.1.4        
##  [19] yaml_2.3.10             httpuv_1.6.16           ggtangle_0.0.6         
##  [22] zip_2.3.2               sessioninfo_1.2.3       pkgbuild_1.4.7         
##  [25] ggvenn_0.1.10           DBI_1.2.3               pkgload_1.4.0          
##  [28] maps_3.4.2.1            abind_1.4-8             zlibbioc_1.52.0        
##  [31] R.utils_2.13.0          RCurl_1.98-1.17         ggraph_2.2.1           
##  [34] yulab.utils_0.2.0       tweenr_2.0.3            GenomeInfoDbData_1.2.13
##  [37] enrichplot_1.26.6       tidytree_0.4.6          codetools_0.2-20       
##  [40] DelayedArray_0.32.0     DOSE_4.0.1              xml2_1.3.8             
##  [43] ggforce_0.4.2           tidyselect_1.2.1        aplot_0.2.5            
##  [46] UCSC.utils_1.2.0        farver_2.1.2            viridis_0.6.5          
##  [49] ash_1.0-15              jsonlite_2.0.0          fst_0.9.8              
##  [52] ellipsis_0.3.2          tidygraph_1.3.1         Formula_1.2-5          
##  [55] systemfonts_1.2.2       bbmle_1.0.25.1          tools_4.4.2            
##  [58] progress_1.2.3          ragg_1.4.0              treeio_1.30.0          
##  [61] Rcpp_1.0.14             glue_1.8.0              gridExtra_2.3          
##  [64] Rttf2pt1_1.3.12         SparseArray_1.6.2       xfun_0.52              
##  [67] usethis_3.1.0           qvalue_2.38.0           numDeriv_2016.8-1.1    
##  [70] withr_3.0.2             fastmap_1.2.0           digest_0.6.37          
##  [73] mime_0.13               timechange_0.3.0        R6_2.6.1               
##  [76] gridGraphics_0.5-1      textshaping_1.0.0       colorspace_2.1-1       
##  [79] GO.db_3.20.0            RSQLite_2.3.9           R.methodsS3_1.8.2      
##  [82] generics_0.1.3          data.table_1.17.0       htmlwidgets_1.6.4      
##  [85] prettyunits_1.2.0       graphlayouts_1.2.2      httr_1.4.7             
##  [88] S4Arrays_1.6.0          pkgconfig_2.0.3         gtable_0.3.6           
##  [91] blob_1.2.4              XVector_0.46.0          htmltools_0.5.8.1      
##  [94] carData_3.0-5           geneset_0.2.7           profvis_0.4.0          
##  [97] fgsea_1.32.4            scales_1.3.0            png_0.1-8              
## [100] ggfun_0.1.8             rstudioapi_0.17.1       tzdb_0.5.0             
## [103] reshape2_1.4.4          coda_0.19-4.1           nlme_3.1-168           
## [106] bdsmatrix_1.3-7         cachem_1.1.0            KernSmooth_2.23-26     
## [109] parallel_4.4.2          miniUI_0.1.2            extrafont_0.19         
## [112] AnnotationDbi_1.68.0    apeglm_1.28.0           pillar_1.10.2          
## [115] grid_4.4.2              vctrs_0.6.5             urlchecker_1.0.1       
## [118] promises_1.3.2          car_3.1-3               xtable_1.8-4           
## [121] extrafontdb_1.0         evaluate_1.0.3          mvtnorm_1.3-3          
## [124] cli_3.6.4               locfit_1.5-9.12         compiler_4.4.2         
## [127] rlang_1.1.6             crayon_1.5.3            ggsignif_0.6.4         
## [130] labeling_0.4.3          emdbook_1.3.13          plyr_1.8.9             
## [133] fs_1.6.6                stringi_1.8.7           viridisLite_0.4.2      
## [136] BiocParallel_1.40.2     munsell_0.5.1           Biostrings_2.74.1      
## [139] lazyeval_0.2.2          devtools_2.4.5          proj4_1.0-15           
## [142] GOSemSim_2.32.0         Matrix_1.7-3            europepmc_0.4.3        
## [145] hms_1.1.3               bit64_4.6.0-1           shiny_1.10.0           
## [148] KEGGREST_1.46.0         igraph_2.1.4            broom_1.0.8            
## [151] memoise_2.0.1           bslib_0.9.0             ggtree_3.14.0          
## [154] fastmatch_1.1-6         bit_4.6.0               ape_5.8-1              
## [157] gson_0.1.0

RNA-seq Analysis of Long COVID: Brain Fog and Non-Brain Fog Comparisons

Samantha Tafrate

April 30th, 2025