The Cross-Species Transcriptomic Similarity Analysis R package (CrossTx) provides a comprehensive and user-friendly framework for evaluating transcriptional similarities between mouse cancer models and human cancer samples. Designed to streamline cross-species transcriptomic analysis, CrossTx enables rapid, multidimensional exploration with minimal preprocessing—simply input standardized files following the provided guidelines.
The package integrates a wide array of functionalities, including differential expression gene integration and visualization, graphical display of PCA, transcriptional similarity scoring (TROM scores), immune infiltration profiling, and functional enrichment analysis (GO terms). Each function is highly customizable through flexible parameter settings, allowing researchers to tailor the analysis to specific experimental needs.
With CrossTx, users can rigorously assess how closely animal models recapitulate the molecular features of human cancers, thereby facilitating translational research at the transcriptomic level.
You can install the package directly from GitHub:
# Install devtools if necessary
if (!require("devtools")) install.packages("devtools")
# Install the package
devtools::install_github("wangdian-PKU/CrossTx")
We strongly suggest that all user - provided input files should be in the tsv
format!!!
The merge_DEG_datasets()
function is designed to combine differential gene expression (DEG) results from TCGA (human) and multiple mouse cancer models into a unified, tidy data frame. This merged data can then be used to draw volcano plots of DEG downstream.
This function requires DEG result files from TCGA and mouse models in .tsv
or .csv
format. It can be derived from the DEG matrix generated by DESeq2 or edgeR. These files must:
logFC
: Log2 fold change of gene expression.FDR
: False discovery rate (adjusted p-value).Example .tsv
or .csv
format after being read:
Argument | Type | Default | Description |
---|---|---|---|
human_file_path | character | Required | File path to TCGA DEG file (.tsv or .csv ). |
mouse_files | named list | Required | Named list of file paths to mouse model DEG files. |
logFC_value | numeric | 1 | Log2 fold change threshold to define “Up” or “Down” genes. |
FDR_value | numeric | 0.05 | FDR threshold for statistical significance. |
A merged data frame with the following columns:
Column | Description |
---|---|
gene | Gene symbol (from row names) |
logFC | Log2 fold change |
FDR | Adjusted p-value |
contrast | Dataset source (e.g., “TCGA”, “GSE172629”, “Our_Model”) |
change | DEG status: “Up”, “Down” or “Stable” based on thresholds |
The function uses the following rule to classify gene expression change:
if (logFC > threshold & FDR < threshold) → "Up"
if (logFC < -threshold & FDR < threshold) → "Down"
otherwise → "Stable"
# Define file paths
tcga_file <- "./data/TCGA_DEG.tsv"
mouse_files <- list(
"Our_Model" = "./data/Our_Model_DEG.csv",
"GSE172629" = "./data/GSE172629_DEG.tsv",
"GSE208279" = "./data/GSE208279_DEG.tsv"
)
# Merge datasets with default thresholds
merged_DEG <- merge_DEG_datasets(
human_file_path = tcga_file,
mouse_files = mouse_files
)
# Optional: custom thresholds
merged_DEG_custom <- merge_DEG_datasets(
human_file_path = tcga_file,
mouse_files = mouse_files,
logFC_value = 1.5,
FDR_value = 0.01
)
# View result
head(merged_DEG)
plot_DEG_volcano()
functions.gene
column is automatically added from rownames of each DEG file..tsv
or .csv
, other formats (e.g., Excel .xlsx
) are not supported directly.The plot_DEG_volcano()
function generates a volcano plot for visualizing differential expression results, highlighting significantly upregulated, downregulated, and stable genes, and also used to illustrate transcriptomic differences between experimental groups.
This function requires a data frame (output from merge_DEG_datasets()
) with the following mandatory columns:
Column | Type | Description |
---|---|---|
gene | character | Gene name |
logFC | numeric | Log2 fold change of gene expression |
FDR | numeric | False discovery rate(adjusted p-value) |
contrast | character/factor | Sample group identifier(e.g. TCGA, GSE…) |
change | character | Gene status: “Up”, “Down”, or “Stable” |
Tip: Use merge_DEG_datasets()
to generate this standardized input data frame.
Parameter | Type | Default | Description |
---|---|---|---|
deg_data | data.frame | Required | DEG results including required columns |
title | character | “Volcano Plot of DEG Analysis” | Plot title |
xlab | character | “Group” | X-axis label |
ylab | character | “Log2 Fold Change” | Y-axis label |
colors | named list | list(Up = "#e6550d", Down = "#3182bd", Stable = "#636363") |
Colors for DEG categories |
point_size | numeric | 2 | Dot size |
alpha_range | numeric | c(0.3, 1) | Transparency range based on FDR |
x_angle | numeric | 45 | X-axis label angle |
legend_position | character | “right” | Legend position: “right”, “top”, “bottom”, “left”, “none” |
width | numeric | 10 | Plot width in inches |
height | numeric | 6 | Plot height in inches |
output_path | character | "./DEG/" |
Folder path to save output PDF |
volcano_plot.pdf
will be saved to the specified output_path
.# Prepare merged DEG data (e.g. from TCGA and mouse models)
merged_data <- merge_DEG_datasets(
human_file_path = "./data/TCGA_DEG.tsv",
mouse_files = list(
"GSE172629" = "./data/GSE172629_DEG.tsv",
"GSE208279" = "./data/GSE208279_DEG.tsv"
)
)
# Generate volcano plot with default settings
p <- plot_DEG_volcano(deg_data = merged_data)
# Display the plot
p
# Further customization using ggplot2
p + ggplot2::theme_minimal()
change
column (e.g., “Up”, “Down”, “Stable”).-log10(FDR)
.The prepare_DEG_heatmap_data()
function processes differential expression gene (DEG) results from TCGA and multiple mouse models to prepare a standardized dataset suitable for heatmap visualization. It performs significance filtering, homologous gene conversion, and logFC simplification, returning cleaned and comparable list of matrices for downstream visualization.
See the file format requirements in the merge_DEG_datasets()
documentation.
Each DEG file (both TCGA and mouse) must contain at least the following columns:
logFC
– log2 fold changeFDR
– false discovery rate (adjusted p-value)Files must be in .tsv
format and have gene identifiers as row names.
Argument | Type | Default | Description |
---|---|---|---|
tcga_file | character | Required | Path to the TCGA DEG .tsv file. |
mouse_files | list | Required | A named list of file paths to mouse DEG .tsv files. Names will be used as column prefixes. |
inTax | numeric | 9606 | Input species taxonomy ID (default: 9606 = human). |
outTax | numeric | 10090 | Output species taxonomy ID (default: 10090 = mouse). |
Load and Filter TCGA Data:
Reads the TCGA DEG file.
Selects genes with |logFC| ≥ 1
and FDR < 0.05
.
Convert Human Genes to Mouse Homologs:
Uses homologene::homologene()
to map human genes to mouse homologs.
Only genes with successful mappings are retained.
Count Up/Downregulated Genes:
logFC > 0
) and negative (logFC < 0
) genes.Format TCGA logFC Values:
Upregulated: logFC set to 1
Downregulated: logFC set to -1
Others: set to NA
Process Mouse Model Files:
logFC
and FDR
columns.Returns a named list with the following components:
Name | Type | Description |
---|---|---|
processed_tcga | data.frame | Filtered and formatted TCGA DEG matrix |
processed_mouse | list | List of processed mouse model DEG matrices, each with standardized logFC values |
tcga_gene_list | character vector | Final set of homologous genes retained in analysis |
# Define file paths
tcga_path <- "./data/TCGA_DEG.tsv"
mouse_files <- list(
"Our_Model" = "./data/Our_Model_DEG.tsv",
"GSE172629" = "./data/GSE172629_DEG.tsv"
)
# Run preprocessing
processed_data <- prepare_DEG_heatmap_data(tcga_path, mouse_files)
# View output
head(processed_data$processed_tcga)
head(processed_data$processed_mouse$Our_Model)
plot_DEG_heatmap()
functions.The plot_DEG_heatmap()
function generates a binary expression heatmap (values of 1
, -1
, or NA
) across multiple mouse models based on TCGA-derived significant genes. It is intended to highlight genes with high or low expression in TCGA tumors and examine their cross-species expression trends in mouse models.
This function is typically used after preparing input data with prepare_DEG_heatmap_data()
.
deg_data
to be the output from prepare_DEG_heatmap_data()
.Argument | Type | Default | Description |
---|---|---|---|
deg_data | list | Required | Output from prepare_DEG_heatmap_data() , containing filtered and formatted TCGA/mouse DEG data. |
mouse_files | list | Required | The original named list of mouse model DEG paths used as input to prepare_DEG_heatmap_data() . |
col_names | character | NULL | Optional custom column names for heatmap (default uses names from mouse_files ). |
cluster_rows | logical | FALSE | Whether to cluster the rows (genes). |
cluster_cols | logical | FALSE | Whether to cluster the columns (models). |
color_palette | named vector | c("1" = "#ff7676", "-1" = "#66d4ff", "NA" = "white") |
Color mapping for high/low/NA gene values. |
width | numeric | 7.9 | Width of the output PDF plot. |
height | numeric | 5.95 | Height of the output PDF plot. |
output_path | character | "./DEG/" |
Path to save the output heatmap PDF file. |
deg_data
contains the expected structure.logFC
information from each mouse model dataset.merge_edger
) using the TCGA gene set.mouse_files
or user-defined via col_names
.1
→ gene highly expressed in TCGA tumor samples.-1
→ gene lowly expressed in TCGA tumor samples.NA
→ not significant or not conserved.ComplexHeatmap::Heatmap()
and saved to PDF.plot_DEG_barplot()
).output_path
.merge_edger
) used for plot_DEG_barplot()
.Value | Meaning | Default Color |
---|---|---|
1 | Highly expressed in tumor | #ff7676 (red) |
-1 | Lowly expressed in tumor | #66d4ff (blue) |
NA | Non-significant / Not matched | white |
Legend labels:
"Highly expression in tumor"
= 1
"Low expression in tumor"
= -1
# Process DEG data
processed_data <- prepare_DEG_heatmap_data(tcga_file, mouse_files)
# Generate heatmap
merge_edger <- plot_DEG_heatmap(
deg_data = processed_data,
mouse_files = mouse_files
)
prepare_DEG_heatmap_data()
— don’t use it with arbitrary input.merge_edger
for barplot visualization via plot_DEG_barplot()
.The plot_DEG_barplot()
function generates a stacked barplot to compare the number of upregulated and downregulated genes in each mouse model, based on TCGA-derived gene categories. It uses the DEG matrix generated by the plot_DEG_heatmap()
function.
The function requires a matrix (merge_edger
) as input, which must be the output of plot_DEG_heatmap()
and contain:
Rows: Filtered genes shared with TCGA.
Columns: Mouse models.
Values:
1
for genes upregulated in TCGA tumors.-1
for downregulated genes.NA
for genes not meeting the threshold or unmatched.Argument | Type | Default | Description |
---|---|---|---|
heatmap_data | matrix | Required | The DEG matrix from plot_DEG_heatmap() , indicating expression patterns per model. |
colors | list | list(up = "#ff7676", down = "#66d4ff") |
Colors used for up/down-regulated genes. |
width | numeric | 8 | Width of the saved PDF plot. |
height | numeric | 4 | Height of the saved PDF plot. |
output_path | character | "./DEG/" |
Directory to save the output barplot. |
1
, -1
, NA
).ggplot2
."up"
(for 1
) or "down"
(for -1
) gene categories.contrast
).ggplot2::geom_bar()
to generate a clean stacked barplot."barplot.pdf"
inside the specified output_path
.Output | Type | Description |
---|---|---|
Barplot (PDF) | file | A visual file saved to disk. |
Plot Object | ggplot | The plot is returned as a ggplot object for further customization. |
Group | Value in Matrix | Color |
---|---|---|
Upregulated | 1 | "#ff7676" (red) |
Downregulated | -1 | "#66d4ff" (blue) |
colors
argument, for example:colors = list(up = "red", down = "blue")
# Generate barplot from merge_edger matrix
barplot <- plot_DEG_barplot(
heatmap_data = merge_edger,
colors = list(up = "#ff7676", down = "#66d4ff")
)
# Display in RStudio
print(barplot)
plot_DEG_heatmap()
.NA
) are automatically excluded.ggplot
object allows additional customization using the full ggplot2
ecosystem.The prepare_PCA_data()
function merges TCGA and mouse model expression data, assigns group and batch labels, and optionally performs batch correction. This prepares the data for PCA analysis and plotting of transcriptional similarity in downstream function plot_PCA_tinyarray()
.
Before you wanna using this fuction, you should first obtain the homologous gene-counts matrix files for TCGA and each mouse model, which could be obtained via the homologene
package. The specific code and the expected data is as follows:
# For TCGA
count <- read.table("./TCGA.rawcounts")
library(homologene)
homolo_gene <- homologene(rownames(count), inTax = 9606, outTax = 10090)[,c(1:2)]
count <- count[rownames(count) %in% homolo_gene[, 1], ]
rownames(count) <- toupper(rownames(count))
count <- rownames_to_column(count,var = "X") # The var parameter must set as "X"
# For mouse model
count <- read.table("./GSE.rawcounts")
library(homologene)
homolo_gene <- homologene(rownames(count), inTax = 10090, outTax = 9606)[,c(1:2)]
count <- count[rownames(count) %in% homolo_gene[, 1], ]
rownames(count) <- toupper(rownames(count))
count <- rownames_to_column(count,var = "X") # The var parameter must set as "X"
The input expression files (for both TCGA and mouse models) must:
.tsv
or .csv
format.X
).Parameter | Type | Default | Description |
---|---|---|---|
tcga_file | Character | Required | File path to TCGA expression matrix (.csv or .tsv ). |
mouse_files | Named list | Required | Named list of expression matrix paths for mouse models. |
sample_counts | Named list | Required | Sample size info per dataset: normal and tumor count. |
batch_correction | Logical | TRUE | Whether to apply ComBat batch correction. Default: TRUE . |
group_all
: sample group labels like TCGA_normal
, GSExxx_tumor
.batch
: batch labels for each dataset.sva::ComBat()
if enabled.Returns a list containing:
Output | Type | Description |
---|---|---|
merged_data | matrix | Merged raw expression matrix (genes × samples). |
group_all | character | Sample group labels for PCA visualization. |
batch | character | Batch information (dataset origin per sample). |
corrected_data | matrix | Batch-corrected matrix (if batch_correction = TRUE ). Otherwise NULL . |
mouse_files <- list(
"GSE172629" = "./data/GSE172629.homo.tsv",
"GSExxxxxx" = "./data/GSExxxxxx.homo.csv"
)
sample_counts <- list(
"TCGA" = c(normal = 50, tumor = 374),
"GSE172629" = c(normal = 3, tumor = 3)
)
pca_data <- prepare_PCA_data(
tcga_file = "./data/TCGA.tsv",
mouse_files = mouse_files,
sample_counts = sample_counts,
batch_correction = TRUE
)
.tsv
or .csv
).corrected_data
is ready for PCA, clustering, or heatmap visualization.The plot_PCA_tinyarray()
function performs principal component analysis (PCA) and visualizes the result using the tinyarray::draw_pca()
function. It supports both raw and batch-corrected expression matrices and highlights sample grouping with customizable colors.
The function expects a list object returned from prepare_PCA_data()
, which includes:
merged_data
)batch
)group_all
)corrected_data
)Parameter | Type | Default | Description |
---|---|---|---|
pca_data | List | Required | Output from prepare_PCA_data() , must contain merged and optionally corrected matrices. |
batch_correction | Logical | TRUE | Whether to use batch-corrected matrix for PCA plotting. |
colors | Character vector | NULL | Custom color palette for groups. If NULL , default color palette is used. |
batch_correction
flag.tinyarray::draw_pca()
, generating a 2D PCA plot with group labels and coloring.This function does not return an object, but directly draws the PCA plot to the active graphic device (RStudio Plots pane, PDF, etc.) for personalized adjustment.
# Assume you have run `prepare_PCA_data()` and stored the result in `pca_data`
plot_PCA_tinyarray(
pca_data = pca_data,
batch_correction = TRUE # Use ComBat-corrected data
)
# Customize colors manually (optional)
custom_colors <- c("#1b9e77", "#d95f02", "#7570b3")
plot_PCA_tinyarray(
pca_data = pca_data,
batch_correction = FALSE,
colors = custom_colors
)
batch_correction = TRUE
, but no corrected_data
exists in the input list, the function falls back to using the original merged data.group_all
matches the number and order of samples in the expression matrix.colors
vector.This function prepares TCGA patient-level clinical and CNV (copy number variation) information for use in cross-species similarity analysis with mouse models. It extracts user-defined clinical risk factors(condition) and CNV data for selected genes and merges them into a clean, analysis-ready data frame.
read.table(clinic_file_path, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
is presented in the figure.read.table (cnv_file_path, header=TRUE, sep="\ t", stringsAsFactors=False)
is presented in the figure.Parameter | Type | Default | Description |
---|---|---|---|
clinic_file | character | Required | Path to the TCGA clinical file (e.g., data_bcr_clinical_data_patient.tsv ). |
cnv_file | character | Required | Path to the TCGA CNV data file (e.g., CNA.tsv ). |
clinical_vars | character | Required | A character vector of user-defined clinical risk factors to extract. |
cnv_genes | character | Required | A character vector of gene names (e.g., TP53 , PTEN ) to extract from CNV. |
risk_factor_column | character | Required | The column name in the clinical file that contains different patient conditions. |
.tsv
files.PATIENT_ID
column and the user-specified clinical variables (via pattern matching).cnv_genes
.PATIENT_ID
as the key.-2
, -1
, 0
, 1
.A data frame containing:
.
instead of -
)clinical_vars
, with matched status (or NA if not present)cnv_genes
, representing CNV status (-2
, -1
, 0
, or 1
) as factorsThis merged output is used as input in downstream similarity scoring functions (e.g., compute_similarity_matrix()
→ plot_TROMscore_heatmap()
).
merge_clinic_CNV <- prepare_similarity_data(
clinic_file = "data_bcr_clinical_data_patient.tsv",
cnv_file = "CNA.tsv",
clinical_vars = c("Hepatitis B", "Non-Alcoholic"),
cnv_genes = c("TP53", "PTEN"),
risk_factor_column = "HISTORY_HEPATO_CARCINOMA_RISK_FACTORS"
)
clinical_vars
will be matched by partial string search (case-insensitive) in the column defined by risk_factor_column
.The compute_similarity_matrix()
function calculates a transcriptional similarity matrix between TCGA samples and mouse models, using the ws.trom()
algorithm from the TROM
package. It filters the similarity matrix to retain only biologically meaningful comparisons (tumor-tumor and normal-normal) and merges the filtered matrix with TCGA clinical and CNV data for downstream analysis.
The input expression_matrix
should be the batch-corrected expression matrix (typically the output from prepare_PCA_data()
), where:
The input merge_clinic_CNV
is the processed clinical + CNV data result from prepare_similarity_data()
Parameter | Type | Default | Description |
---|---|---|---|
expression_matrix | matrix | Required | Batch-corrected expression matrix (genes × samples). |
group_all | charactor | Required | Vector of sample group labels for TCGA and mouse models. |
tcga_sample_size | numeric | Required | Total number of TCGA samples. |
tcga_normal_count | numeric | Required | Number of normal samples in TCGA cohort. |
merge_clinic_CNV | data frame | Required | Data frame containing TCGA clinical and CNV data (from prepare_similarity_data() ). |
cnv_genes | charactor | Required | Character vector of CNV genes to include in the output. |
clinical_vars | charactor | Required | Character vector of clinical variables to include in the output. |
TROM::ws.trom()
between TCGA and mouse models.The function returns a named list with two components:
similarity_matrix
: The full similarity matrix (unfiltered), as computed by ws.trom()
.dm_trom_del_backgroud_clinic
: The filtered similarity matrix (excluding background noise), merged with clinical and CNV data for use in plot_TROMscore_heatmap()
and plot_TROMscore_boxplot()
.similarity_results <- compute_similarity_matrix(
expression_matrix = pca_data$corrected_data,
group_all = pca_data$group_all,
tcga_sample_size = 424,
tcga_normal_count = 50,
merge_clinic_CNV = merge_clinic_CNV,
cnv_genes = c("TP53", "PTEN"),
clinical_vars = c("Hepatitis B", "Non-Alcoholic")
)
ws.trom()
performs cross-species comparison based on gene expression similarity and returns a matrix of TROM scores.group_all
labeling, where TCGA samples are listed first.PATIENT_ID
is inferred from the first 12 characters of TCGA sample IDs.The plot_TROMscore_heatmap()
function generates a heatmap visualization of transcriptional similarity scores (TROM scores) between TCGA samples and mouse cancer models. It provides a high-level overview of how closely mouse models resemble different human tumor subtypes at the transcriptomic level. It also integrates CNV and clinical information in TCGA samples.
dm_trom_del_backgroud_clinic
: The data frame from compute_similarity_matrix()
, containing filtered similarity scores merged with clinical and CNV data.trom_matrix
: The data frame from compute_similarity_matrix()
, containing the raw TROM similarity matrix before background filtering.prepare_PCA_data()
and compute_similarity_matrix()
).Parameter | Type | Default | Description |
---|---|---|---|
dm_trom_del_backgroud_clinic | data frame | Required | Data frame with filtered similarity scores and clinical annotations. |
trom_matrix | matrix | Required | Raw similarity matrix from ws.trom() (before filtering). |
tcga_sample_size | numeric | Required | Total number of TCGA samples. |
tcga_normal_count | numeric | Required | Number of normal samples in TCGA. |
model_group_count | numeric | Required | Total number of mouse model samples. |
condition | named list | Required | Named list defining sample counts per mouse model condition. |
clinical_vars | character | Required | Vector of clinical variables to annotate (e.g., “Hepatitis B”). |
cnv_genes | character | Required | Vector of CNV genes used for annotations (e.g., “TP53”, “PTEN”). |
group_all | character | Required | Vector with group labels from prepare_PCA_data() . |
width | numeric | 11.84 | Width of the output PDF heatmap. |
height | numeric | 7.11 | Height of the output PDF heatmap. |
output_path | character | "./similarity/TROM_heatmap.pdf" |
Path to save the heatmap figure. |
cluster_rows | logical | FALSE | Whether to cluster rows in the heatmap. |
cluster_cols | logical | FALSE | Whether to cluster columns in the heatmap. |
ComplexHeatmap
.ComplexHeatmap
package.The function returns a list with two elements:
heatmap_obj
: The drawn heatmap object (from ComplexHeatmap::draw()
).merge_long
: A long-format data frame containing:
x
: Mouse model sample label.y
: Mean similarity score.group
: Tumor or normal.condition
: Experimental model condition.score
: Rounded average TROM score.This merge_long
object is used directly as input for plot_TROMscore_boxplot()
.
heatmap_results <- plot_TROMscore_heatmap(
dm_trom_del_backgroud_clinic = similarity_results$dm_trom_del_backgroud_clinic,
trom_matrix = similarity_results$similarity_matrix,
tcga_sample_size = 424,
tcga_normal_count = 50,
model_group_count = 52,
condition = list("HBV pten p53 ko" = 5, "DEN Cl4" = 16),
clinical_vars = c("Hepatitis B", "Non-Alcoholic"),
cnv_genes = c("TP53", "PTEN"),
group_all = pca_data$group_all,
output_path = "./similarity/TROM_heatmap.pdf"
)
circlize::colorRamp2()
.rev()
) to align correctly with TCGA row order in the matrix.merge_long
result can be reused for barplots or boxplots of similarity scores.output_path
does not exist.The plot_TROMscore_boxplot()
function generates two types of plots to visualize transcriptional similarity scores (TROM scores):
These visualizations help assess which mouse models best resemble specific TCGA tumor profiles.
merge_long
should be the output from plot_TROMscore_heatmap()
, a data frame in long format containing:
x
: sample labelsy
: TROM scoregroup
: “tumor” or “normal”condition
: experimental condition/groupscore
: average score (used in tileplot)Parameter | Type | Default | Description |
---|---|---|---|
merge_long | data frame | Required | Data frame from plot_TROMscore_heatmap() , containing TROM scores. |
output_path | character | "./similarity/TROM_boxplot" |
Path prefix for saving the output PDF file. |
combine_plots | plot_object | TRUE | Whether to return a combined tileplot and boxplot figure. |
merge_long
contains the required columns.ggplot2::geom_boxplot()
.ggplot2::geom_tile()
.patchwork::plot_layout()
if combine_plots = TRUE
..pdf
.combine_plots = TRUE
, returns a single combined ggplot
object.combine_plots = FALSE
, returns a list of two ggplot objects: list(tileplot, boxplot)
.TROM_boxplot.pdf
is saved to the output_path
.plot_TROMscore_boxplot(
merge_long = heatmap_results$merge_long,
output_path = "./similarity/TROM_boxplot",
combine_plots = TRUE
)
xintercept = 2.6
) is included for visual reference (e.g., model split).+ theme()
or other ggplot2
functions.This function performs Gene Ontology (GO) enrichment analysis on differentially expressed genes (DEGs) identified from edgeR
results. It supports analysis across species (human
, mouse
, rat
) and GO categories (BP
, MF
, CC
). The enriched GO terms are visualized using GO term similarity clustering via simplifyGO
.
Please refer to the input data structure described in the merge_DEG_datasets()
documentation.
Parameter | Type | Default | Description |
---|---|---|---|
deg_file | character | Required | File path to DEG results (.tsv or .csv ) with logFC and FDR columns. |
species | character | Required | One of "hsa" , "mmu" , "rno" for human, mouse, or rat. |
org_db | character | Required | Organism annotation database (e.g., "org.Hs.eg.db" ). |
ont | character | “BP” | GO ontology type: "BP" (Biological Process), "MF" (molecular function), or "CC" (cellular component). |
column_title | character | Required | Title shown in the GO similarity heatmap. |
width | numeric | 10.36 | Width of the saved PDF plot. |
height | numeric | 6.35 | Height of the saved PDF plot. |
output_path | character | ”./enrichment” | Directory to save output PDF. Will be created if it doesn’t exist. |
|logFC| > 1
and FDR < 0.05
.OrgDb
package (e.g., org.Hs.eg.db
).clusterProfiler::enrichGO()
.simplifyEnrichment::GO_similarity()
.simplifyGO()
..pdf
file showing GO term similarity clustering, saved to output_path
.GO_enrichment_analysis(
deg_file = "./data/TCGA_DEG.tsv",
species = "hsa",
org_db = "org.Hs.eg.db",
ont = "BP",
column_title = "TCGA_GO_BP_terms"
)
org.Hs.eg.db
, org.Mm.eg.db
, or org.Rn.eg.db
) before running the function.logFC
and FDR
.This function performs immune infiltration analysis from differential expression gene matrix using various deconvolution methods (e.g., CIBERSORT, xCell), and visualizes the relative abundance of immune cell types across experimental groups in a stacked barplot.
Please refer to the input format described in the documentation for merge_DEG_datasets()
. The input can be either:
input_type = "TPM"
), with matching gene identifiers.Parameter | Type | Default | Description |
---|---|---|---|
data_matrix | matrix | Required | Matrix or data frame. Either raw count matrix or TPM data. |
input_type | character | “count” | Specify "count" to convert counts to TPM or "TPM" to use data directly. |
method | character | Required | Immune deconvolution method: "cibersort" , "mcpcounter" , "xcell" , or "quantiseq" . |
group_as | character | Required | Character vector defining the group of each sample (must match column order). |
output_path | character | ”./TME” | Directory to save output plots and result files. |
height | numeric | 5.2 | Height (in inches) of the saved PDF barplot. |
width | numeric | 10 | Width (in inches) of the saved PDF barplot. |
IOBR::count2tpm()
(if needed).IOBR::deconvo_tme()
using the specified method.IOBR::cell_bar_plot()
.method_cell_bar_plot.pdf
) visualizing immune cell compositions.method_result.csv
) for all samples.eset_tpm
: TPM-normalized expression matrix (used in analysis)method_model
: Full immune infiltration results (before group merging)group_as <- c(
rep("TCGA_normal", 50),
rep("TCGA_tumor", 374),
rep("HBV_Pten_KO_normal", 3),
rep("HBV_Pten_KO_tumor", 3)
)
TME_barplot_result <- plot_TME_barplot(
data_matrix = merge.data,
input_type = "count",
method = "cibersort",
group_as = group_as
)
# Extract TPM for downstream analysis
eset_tpm <- TME_barplot_result$eset_tpm
data_matrix
.count2tpm
) is currently "hsa"
only. Modify org =
parameter if needed.IOBR
; make sure it’s properly installed and loaded.output_path
as .pdf
and .csv
.This function calculates metabolism-related gene signature scores from TPM-normalized RNA-seq expression data.
Same as described in the plot_TME_barplot()
documentation. The input must be a TPM matrix (genes × samples) with gene identifiers as row names and sample names as column names. Typically generated from plot_TME_barplot()
.
Parameter | Type | Default | Description |
---|---|---|---|
eset_tpm | matrix | Required | TPM-normalized gene expression matrix. Typically obtained from plot_TME_barplot() . |
method | character | “pca” | Method used for signature scoring. Options: "pca" , "ssgsea" , "zscore" , "integration" . |
mini_gene_count | numeric | 2 | Minimum number of genes required to calculate score for a given signature. |
IOBR::calculate_sig_score()
using a predefined metabolism signature ("signature_metabolism"
).This can be used for downstream comparison, group visualization, or correlation analysis.
# Assume eset_tpm is output from plot_TME_barplot()
sig_meta <- calculate_metabolism_score(
eset_tpm = eset_tpm,
method = "pca",
mini_gene_count = 2
)
IOBR
.mini_gene_count
not met), those samples may be excluded.IOBR
is properly installed and loaded before running this function.This function generates a heatmap visualization of metabolism-related signature scores across sample groups.
The input sig_meta
must be a data frame of metabolism signature scores, usually the output from calculate_metabolism_score()
.
It must include a first column identifying group labels (e.g., group_as
).
Parameter | Type | Default | Description |
---|---|---|---|
sig_meta | data frame | Required | Data frame output from calculate_metabolism_score() . |
output_path | character | "./metabolism" |
Path to save the PDF heatmap. Directory will be created if it doesn’t exist. |
clustering_memthod | character | "manhattan" |
Distance metric for hierarchical clustering. Choices: "manhattan" or "canberra" . |
width | numeric | 10 | Width of the output PDF (in inches). |
height | numeric | 22 | Height of the output PDF (in inches). |
row_name_width | numeric | 7 | Maximum width for row names in centimeters. |
right_padding | numeric | 6 | Right margin in centimeters to avoid label cutoff. |
row_height_pt | numeric | 5 | Height of each heatmap row in points. |
group_as
(must be defined globally).ComplexHeatmap::Heatmap()
to create a heatmap.output_path
.heatmap.pdf
saved to output_path
.# Assuming `sig_meta` is the output of `calculate_metabolism_score()`
heatmap_obj <- plot_metabolism_heatmap(
sig_meta = sig_meta,
output_path = "./metabolism/",
clustering_method = "canberra",
width = 10,
height = 22
)
group_as
is not found or doesn’t match the sample structure, the function will fail silently or return incorrect results.manhattan
vs canberra
) can influence clustering results; try both to explore structure.ggsave()
or other output functions.This function computes immune infiltration scores using the CIBERSORT method and visualizes them via boxplots for each immune signature. It is primarily used to compare immune cell composition between groups across different mouse models or experimental conditions.
data.frame
of gene expression values (TPM), genes × samples format.
Usually generated by plot_TME_barplot()
.group_as
)
A character vector
of group labels in the format "condition_group"
(e.g., "HBV_normal"
, "HBV_tumor"
, "DEN_normal"
).condition_list
)
A named list specifying the number of samples per mouse model:condition_list <- list("GSEXXX" = 6, "GSExxx" = 8)
Parameter | Type | Default | Description |
---|---|---|---|
eset_tpm | matrix | Required | TPM-transformed expression matrix. Output of plot_TME_barplot() . |
group_as | character | Required | Vector of group labels, e.g., "HBV_tumor" ; must match column order of eset_tpm . |
condition_list | character | Required | Named list showing sample counts per mouse model. Used for factor() levels. |
width | numeric | 10 | Width (in inches) of the output PDF plot. |
height | numeric | 6 | Height (in inches) of the output PDF plot. |
output_path | character | ”./TME_boxplots” | Directory to save the PDF files. Created automatically if it doesn’t exist. |
IOBR::deconvo_tme()
to get immune cell fractions.output_path
.# Define sample count per condition
condition_list <- list("GSEXXX" = 6, "GSExxx" = 8)
# Run boxplot function
plot_list <- plot_TME_boxplot(
eset_tpm = TME_barplot_result$eset_tpm,
group_as = group_as,
condition_list = condition_list
)
All plots generated are publication-ready, saved in PDF format by default, and include:
We warmly welcome contributions and suggestions! Please feel free to open an issue or pull request.