Extending ggplot2 Grammar to High-Dimensional Data
Sysbiolab Team
2026-06-23
Source:vignettes/articles/high-dimensional.Rmd
high-dimensional.RmdPackage: RGraphSpace 1.4.1
Overview
While developing RGraphSpace, the challenge of representing
graph structures in ggplot2 highlighted a core design
restriction: that plotting data must be expressed in tabular form,
typically as a single data.frame (Wickham 2016). This approach works remarkably
well for most applications, but becomes too restrictive when dealing
with data objects composed of multiple interdependent components.
In this vignette, we explore how the ggplot-GraphSpace
interface can be applied to high-dimensional data, enabling direct
interaction with the ggplot2 grammar and its aesthetic mapping
mechanisms without repeated conversion into flat tabular
representations. We demonstrate this approach using the Seurat
package (Hao et al. 2024).
Seurat objects encapsulate multiple coordinated
representations of single-cell data, making them representative examples
of high-dimensional data that cannot be naturally expressed as a single
data.frame.
Before you start
This vignette assumes prior experience with Seurat (Hao et al. 2024), especially for handling transcriptomics data.
Note: If you are new to Seurat, we recommend reviewing its visualization tutorials before proceeding.
Computational requirement:
Hardware: RAM >= 16 GB
Software: R (>=4.5) and RStudio
Required packages
Before proceeding, ensure that all packages described in the Installation Instructions are installed.
# Check versions
if (packageVersion("RGraphSpace") < "1.4.1"){
message("Need to update 'RGraphSpace' for this vignette")
remotes::install_github("sysbiolab/RGraphSpace")
}
if (packageVersion("Seurat") < "5.5.0"){
message("Need to update 'Seurat' for this vignette")
remotes::install_github("satijalab/Seurat")
}Setting input data
# Load packages
library("RGraphSpace")
library("Seurat")
library("SeuratObject")
library("SeuratData")
library("patchwork")Loading the dataset
We will use the pbmc3k dataset from the
SeuratData package, consisting of single-cell transcriptomics
data from peripheral blood mononuclear cells. This dataset is commonly
used to showcase Seurat workflows (Hao
et al. 2024).
# Install a Seurat dataset (required only once)
SeuratData::InstallData("pbmc3k")
# Check manifest of installed datasets
# SeuratData::InstalledData()
# Load the 'pbmc3k' dataset
# Note: LoadData() may print conversion warnings when loading pbmc3k.
# These are expected and come from SeuratData's internal v4-to-v5
# object migration — they can be safely ignored.
seurat_obj <- LoadData("pbmc3k", type = "pbmc3k.final")
## Common Seurat preprocessing workflow.
## Shown for reference only, as the 'pbmc3k'
## dataset has already been preprocessed.
# seurat_obj <- NormalizeData(seurat_obj)
# seurat_obj <- ScaleData(seurat_obj)
# seurat_obj <- FindVariableFeatures(seurat_obj)
# seurat_obj <- RunPCA(seurat_obj)
# seurat_obj <- RunUMAP(seurat_obj, dims = 1:30)Seurat reference plots
Before introducing the ggplot-GraphSpace interface, we
first reproduce two typical Seurat visualizations: a
cluster-level embedding (DimPlot) and a feature expression
map (FeaturePlot). These examples provide a useful baseline
for the corresponding visualizations constructed later with
ggplot-GraphSpace.
cpal <- DiscretePalette(nlevels(seurat_obj$seurat_annotations),
palette = "polychrome")
# Left: cluster-level embedding
p1 <- DimPlot(seurat_obj, pt.size = 1, cols = cpal) +
theme_minimal() + theme(aspect.ratio = 1)
p1
# Right: feature expression map
p2 <- FeaturePlot(seurat_obj, features = "LYZ",
cols = c("lightgrey", "blue"), pt.size = 1) +
theme_minimal() + theme(aspect.ratio = 1)
p2
# Plot side-by-side
# p1 + p2
Note that DimPlot and FeaturePlot are
high-level functions. Internally, Seurat extracts the relevant
data and generates ggplot objects. While convenient, the
underlying data structures are not directly exposed to the user, making
advanced use of the ggplot2 grammar more difficult. For
example, defining custom aesthetic mappings or interoperating with other
visualization workflows would require extra data extraction steps.
Exposing the ggplot2 grammar
We now apply as.GraphSpace() to expose the
Seurat object’s underlying high-dimensional data directly
to ggplot2 (for additional details, see the coercing high-dimensional
data section).
# Create a GraphSpace from 'seurat_obj'
gs <- as.GraphSpace(seurat_obj, space = "embedding", reduction = "umap")
# Seurat object converted to GraphSpace:
# ℹ space=embedding, layer=default, features=13714, samples=2638, reduction="umap"
# Node spatial boundaries:
# ℹ x: [-9, 13] (cols)
# ℹ y: [-9, 14] (rows)
gs
# A GraphSpace-class object for:
# IGRAPH 1393201 UN-- 2638 0 --
# + attr: x (v/n), y (v/n), name (v/c), nodeLabel (v/c), nodeSize (v/n), orig.ident (v/x),
# | nCount_RNA (v/n), nFeature_RNA (v/n), seurat_annotations (v/x), percent.mt (v/n),
# | RNA_snn_res.0.5 (v/x), seurat_clusters (v/x), arrowType (e/n)
# + features: 13714 (AL627309.1, AP006222.2, RP11-206L10.2, RP11-206L10.9, ...)With the GraphSpace object ready, we reproduce the same
plots using ggplot2 geoms and aesthetic mappings.
cpal <- DiscretePalette(nlevels(gs$seurat_annotations),
palette = "polychrome")
# Left: cluster-level embedding
p3 <- ggplot(gs) +
geom_nodespace(mapping = aes(fill = seurat_annotations),
size = 1.5, color = "grey90", stroke = 0.3) +
scale_fill_manual(values = cpal) +
labs(x = "UMAP_1", y = "UMAP_2") +
theme_gspace_legend(discrete_fill = TRUE) +
theme_minimal() + theme(aspect.ratio = 1)
p3
# Right: feature expression map
p4 <- ggplot(gs) +
geom_nodespace(mapping = aes(fill = LYZ),
size = 1.5, color = "lightgrey", stroke = 0.3) +
scale_fill_continuous(palette = c("lightgrey", "blue")) +
labs(x = "UMAP_1", y = "UMAP_2") +
theme_minimal() + theme(aspect.ratio = 1)
p4
# Plot side-by-side
# p3 + p4 
The resulting plots are essentially the same, but users now have full flexibility to interact directly with ggplot2 and its rich ecosystem of extensions and companion packages.
Coercing high-dimensional data
The as.GraphSpace() function provides a convenient way
to coerce high-dimensional data into a GraphSpace object.
However, no coercion method can anticipate every possible data
structure. Below, we show how to access the relevant components of a
Seurat object and use them to construct a
GraphSpace manually. For another coercion example, see the
spatial data
tutorial.
# Extract UMAP embeddings as node coordinates
coords <- Embeddings(seurat_obj, reduction = "umap")
coords <- coords[, seq_len(2)] |> as.data.frame()
colnames(coords) <- c("x", "y")
# Extract cell metadata
metadata <- seurat_obj[[]]
# Merge coordinates and metadata using common cell identifiers
ids <- intersect(rownames(coords), rownames(metadata))
coords <- cbind(coords[ids, ], metadata[ids, ])
# Construct a GraphSpace object
# Metadata become node attributes
gs <- GraphSpace(coords)
# Add high-dimensional feature data
# Stored separately for lazy aesthetic mapping
gs_fdata(gs) <- SeuratObject::LayerData(seurat_obj, layer = "data")
# Optional: normalize node coordinates
gs <- normalizeGraphSpace(gs, mar = 0.01)Session information
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/Sao_Paulo
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] patchwork_1.3.2 stxBrain.SeuratData_0.1.2
#> [3] ssHippo.SeuratData_3.1.4 pbmc3k.SeuratData_3.1.4
#> [5] SeuratData_0.2.2.9002 Seurat_5.5.0
#> [7] SeuratObject_5.4.0 sp_2.2-1
#> [9] RGraphSpace_1.4.1 ggplot2_4.0.3
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 rstudioapi_0.18.0 jsonlite_2.0.0
#> [4] magrittr_2.0.5 spatstat.utils_3.2-3 ggbeeswarm_0.7.3
#> [7] farver_2.1.2 rmarkdown_2.31 fs_2.1.0
#> [10] ragg_1.5.2 vctrs_0.7.3 ROCR_1.0-12
#> [13] spatstat.explore_3.8-1 htmltools_0.5.9 sass_0.4.10
#> [16] sctransform_0.4.3 parallelly_1.47.0 KernSmooth_2.23-26
#> [19] bslib_0.11.0 htmlwidgets_1.6.4 desc_1.4.3
#> [22] ica_1.0-3 fontawesome_0.5.3 plyr_1.8.9
#> [25] plotly_4.12.0 zoo_1.8-15 cachem_1.1.0
#> [28] igraph_2.3.2 mime_0.13 lifecycle_1.0.5
#> [31] pkgconfig_2.0.3 Matrix_1.7-5 R6_2.6.1
#> [34] fastmap_1.2.0 fitdistrplus_1.2-6 future_1.70.0
#> [37] shiny_1.13.0 digest_0.6.39 tensor_1.5.1
#> [40] RSpectra_0.16-2 irlba_2.3.7 textshaping_1.0.5
#> [43] progressr_0.19.0 spatstat.sparse_3.2-0 httr_1.4.8
#> [46] polyclip_1.10-7 abind_1.4-8 compiler_4.6.0
#> [49] withr_3.0.2 S7_0.2.2 fastDummies_1.7.6
#> [52] MASS_7.3-65 rappdirs_0.3.4 tools_4.6.0
#> [55] vipor_0.4.7 lmtest_0.9-40 otel_0.2.0
#> [58] beeswarm_0.4.0 httpuv_1.6.17 future.apply_1.20.2
#> [61] goftest_1.2-3 glue_1.8.1 nlme_3.1-169
#> [64] promises_1.5.0 grid_4.6.0 Rtsne_0.17
#> [67] cluster_2.1.8.2 reshape2_1.4.5 generics_0.1.4
#> [70] gtable_0.3.6 spatstat.data_3.1-9 tidyr_1.3.2
#> [73] data.table_1.18.4 tidygraph_1.3.1 spatstat.geom_3.8-1
#> [76] RcppAnnoy_0.0.23 ggrepel_0.9.8 RANN_2.6.2
#> [79] pillar_1.11.1 stringr_1.6.0 spam_2.11-4
#> [82] RcppHNSW_0.7.0 later_1.4.8 splines_4.6.0
#> [85] dplyr_1.2.1 lattice_0.22-9 survival_3.8-6
#> [88] deldir_2.0-4 tidyselect_1.2.1 miniUI_0.1.2
#> [91] pbapply_1.7-4 knitr_1.51 gridExtra_2.3
#> [94] scattermore_1.2 xfun_0.58 matrixStats_1.5.0
#> [97] stringi_1.8.7 lazyeval_0.2.3 yaml_2.3.12
#> [100] evaluate_1.0.5 codetools_0.2-20 tibble_3.3.1
#> [103] cli_3.6.6 uwot_0.2.4 xtable_1.8-8
#> [106] reticulate_1.46.0 systemfonts_1.3.2 jquerylib_0.1.4
#> [109] Rcpp_1.1.1-1.1 globals_0.19.1 spatstat.random_3.5-0
#> [112] png_0.1-9 ggrastr_1.0.2 spatstat.univar_3.2-0
#> [115] parallel_4.6.0 pkgdown_2.2.0 dotCall64_1.2
#> [118] listenv_0.10.1 viridisLite_0.4.3 scales_1.4.0
#> [121] ggridges_0.5.7 crayon_1.5.3 purrr_1.2.2
#> [124] rlang_1.2.0 cowplot_1.2.0