PathwaySpace: Spatial projection of network signals along geodesic paths.

The Cancer Genome Atlas Analysis Network.

2024-05-16

Abstract

PathwaySpace is an R package that creates landscape images from graphs containing vertices (nodes), edges (lines), and a signal associated with the vertices. The package processes the signal using a convolution algorithm that considers the graph’s topology to project the signal on a 2D space. PathwaySpace could have various applications, such as visualizing network data in a graphical format that highlights the relationships and signal strengths between vertices. It can be particularly useful for understanding the influence of signals through complex networks. By combining graph theory, signal processing, and visualization, the PathwaySpace package provides a novel way of representing network signals.

Package

PathwaySpace 0.99.1

1 Overview

For a given igraph object containing vertices, edges, and a signal associated with the vertices, the PathwaySpace performs a convolution operation, which involves a weighted combination of neighboring node signals based on the graph structure. Figure 1 illustrates the convolution operation problem. Each vertex’s signal is placed at a specific position in the 2D space. The x and y coordinates of this space correspond either to vertex-signal positions (e.g. red, green, and blue lollipops in Fig.1A) or null-signal positions for which no signal information is available (question marks in Fig.1A). Our model considers the vertex-signal positions as source points (or transmitters) and the null-signal positions as end points (or receivers). The signal values from vertex-signal positions are then projected to the null-signal positions according to a decay function, which will control how the signal values attenuate as they propagate across the 2D space. Available decay functions include linear (Bourque and Petri 2023), exponential (Beebe 2017), and Weibull (Kızılersü, Kreer, and Thomas 2018) functions (Fig.1B). For a given null-signal position, a k-nearest neighbors (kNN) algorithm is used to define the contributing vertices for signal convolution. The convolution operation combines the signals from these contributing vertices, considering their distances and signal strengths, and applies the decay function to model the attenuation of the signal. Users can adjust both the decay function’s parameters and the value of k in the kNN algorithm. These parameters control how the signal decays, allowing users to explore different scenarios and observe how varying parameters influence the landscape image. The resulting image forms geodesic paths in which the signal has been projected from vertex- to null-signal positions, using a density metric to measure the signal intensity along these paths.

Signal processing addressed by the *PathwaySpace* package. **A**) Representation of a graph superimposed on a 2D coordinate system. Each lollipop icon represents a graph vertex (referred to as vertex-signal positions), while question marks highlight points in the 2D space where no signal information is available (referred to as null-signal positions). **B**) Signal decay profiles of linear, exponential, and Weibull decay functions.

Figure 1: Signal processing addressed by the PathwaySpace package
A) Representation of a graph superimposed on a 2D coordinate system. Each lollipop icon represents a graph vertex (referred to as vertex-signal positions), while question marks highlight points in the 2D space where no signal information is available (referred to as null-signal positions). B) Signal decay profiles of linear, exponential, and Weibull decay functions.

2 Quick start

#--- Load required packages for this section
library(PathwaySpace)
library(igraph)
library(ggplot2)

2.1 Setting basic input data

This section will create an igraph object containing a binary signal associated to each vertex. The graph layout is configured manually to ensure that users can easily view all the relevant arguments needed to prepare the input data for the PathwaySpace package. The igraph’s make_star() function creates a star-like graph and the V() function is used to set attributes for the vertices. The PathwaySpace package will require that all vertives have x, y, and name attributes.

# Make a 'toy' undirected igraph
gtoy1 <- make_star(5, mode="undirected")

# Assign xy coordinates to each vertex
V(gtoy1)$x <- c(0, 1.5, -4, -4, -9)
V(gtoy1)$y <- c(0, 0,  4, -4,  0)

# Assign a name to each vertex (here, from n1 to n5)
V(gtoy1)$name <- paste0("n", 1:5)

Our gtoy1 graph is now ready for the PathwaySpace package. We can check its layout using the plot.igraph() function. Alternatively, to lay out and visualize large graphs we suggest the RedeR package.

# Check the graph layout
plot.igraph(gtoy1)

2.2 Creating a PathwaySpace

Next, we will create a PathwaySpace-class object using the buildPathwaySpace() constructor. This function will check the validity of the igraph object. It will also calculate pairwise distances between vertices, subsequently required by the signal projection methods. Note that for this example we adjusted mar = 0.2. This argument sets the outer margins as a fraction of the 2D image space on which the convolution operation will project the signal.

# Run the PathwaySpace constructor
pspace1 <- buildPathwaySpace(gtoy1, mar = 0.2)

As a default behavior, the buildPathwaySpace() constructor initializes the signal of each vertex as 0. We can use the length(), names(), and vertexSignal() accessors to get and set vertex signals in the PathwaySpace object; for example, in order to get vertex names and signal values:

# Check the number of vertices in the PathwaySpace object
length(pspace1)
## [1] 5

# Check vertex names
names(pspace1)
## [1] "n1" "n2" "n3" "n4" "n5"

# Check signal (initialized with '0')
vertexSignal(pspace1)
## n1 n2 n3 n4 n5 
##  0  0  0  0  0

…and for setting new signal values in PathwaySpace objects:

# Set new signal to all vertices
vertexSignal(pspace1) <- c(1, 3, 2, 3, 2)

# Set a new signal to the 1st vertex
vertexSignal(pspace1)[1] <- 2

# Set a new signal to vertex "n1"
vertexSignal(pspace1)["n1"] <- 4

# Check updated signal values
vertexSignal(pspace1)
## n1 n2 n3 n4 n5 
##  4  3  2  3  2

3 Signal projection

3.1 Circular projection

Following that, we will use the circularProjection() function to project the network signals, using the weibullDecay() function with default settings. We set knn = 1, defining the contributing vertices for signal convolution. In this case, each null-signal position will receive the projection from a single vertex-signal position (i.e. from the nearest signal source in the pathway space). We then create a landscape image using the plotPathwaySpace() function.

# Run network signal projection
pspace1 <- circularProjection(pspace1, knn = 1, pdist = 0.4)

# Plot a PathwaySpace image
plotPathwaySpace(pspace1, marks = TRUE)

The pdist term determines a distance unit for the signal convolution related to the pathway space. This distance unit will affect the extent over which the convolution operation projects the signal in the pathway space. Next, we reassess the same PathwaySpace object using knn = 2. The user can also customize a few arguments in plotPathwaySpace() function, which is a wrapper to create dedicated ggplot graphics for PathwaySpace-class objects.

# Re-run the network signal projection with 'knn = 2'
pspace1 <- circularProjection(pspace1, knn = 2, pdist = 0.4)

# Plot the PathwaySpace image
plotPathwaySpace(pspace1, marks = c("n3","n4"), theme = "th2")

The decay function used in the signal projection was passed to the circularProjection() function by the decay_fun argument. The user can pass additional arguments to the decay function using the ... argument, for example:

# Re-run the network signal projection, passing 'shape' to the decay function
pspace1 <- circularProjection(pspace1, knn = 2, pdist = 0.2, shape = 2)

# Plot the PathwaySpace image
plotPathwaySpace(pspace1, marks = "n1", theme = "th2")

In this case, we set the shape of a 3-parameter Weibull function. This parameter allows a projection to take a variety of shapes. When shape = 1 the Weibull decay follows an exponential decay, and when shape > 1 the projection is first convex, then concave with an inflexion point along the decay path.

3.2 Polar projection

In this section we will project the network signal using a polar coordinate system. This representation may be useful for certain types of data, for example, to highlight patterns of signal propagation on directed graphs, especially to explore the orientation aspect of signal flow. To demonstrate this feature we will used the gtoy2 directed graph, already available in the PathwaySpace package.

# Load a pre-processed directed igraph object
data("gtoy2", package = "PathwaySpace")

# Check the graph layout
plot.igraph(gtoy2)

# Build a PathwaySpace for the 'gtoy2' igraph
pspace2 <- buildPathwaySpace(gtoy2, mar = 0.2)

# Set '1s' as vertex signal
vertexSignal(pspace2) <- 1

# Run the network signal projection using polar coordinates
pspace2 <- polarProjection(pspace2, knn = 2, theta = 45, shape = 2)

# Plot the PathwaySpace image
plotPathwaySpace(pspace2, theme = "th2", marks = TRUE)

Note that this projection emphasizes signals along the edges of the network. In order to also consider the direction of edges, next we set directional = TRUE.

# Re-run the network signal projection using 'directional = TRUE'
pspace2 <- polarProjection(pspace2, knn = 2, theta = 45, shape = 2, 
  directional = TRUE)

# Plot the PathwaySpace image
plotPathwaySpace(pspace2, theme = "th2", marks = c("n1","n3","n4","n5"))

This updated PathwaySpace polar projection emphasizes the signal flow into a defined direction (see the directional pattern of the igraph plot at the top of this section). However, when interpreting the results, users must be aware that this method may introduce distortions. For example, depending on the network’s structure, the polar projection may not capture all aspects of a directed graph, such as cyclic dependencies, feedforward and feedback loops, or other intricate edge interplays.

3.3 Signal types

The PathwaySpace accepts binary, integer, and numeric signal types, including NAs. When a vertex is assigned with NA, it will be excluded from the signal projection, not evaluated by the convolution algorithm. Logical values are also allowed, but it will be treated as binary. Next, we show the projection of a signal that includes negative values, using the pspace1 object created previously.

# Set a negative signal to vertices "n3" and "n4"
vertexSignal(pspace1)[c("n3","n4")] <- c(-2, -4)

# Check updated signal vector
vertexSignal(pspace1)
# n1 n2 n3 n4 n5 
#  4  3 -2 -4  2 

# Re-run the network signal projection
pspace1 <- circularProjection(pspace1, knn = 2, shape = 2)

# Plot the PathwaySpace image
plotPathwaySpace(pspace1, bg.color = "white", font.color = "grey20",
  marks = TRUE, mark.color = "magenta", theme = "th2")

Note that the original signal vector was rescale to [-1, +1]. If the signal vector is >=0, then it will be rescaled to [0, 1]; if the signal vector is <=0, it will be rescaled to [-1, 0]; and if the signal vector is in (-Inf, +Inf), then it will be rescaled to [-1, +1]. To override this signal processing, simply set the rescale argument to FALSE in the projection functions.

4 PathwaySpace decoration

In order to enhance clarity and make it less likely for viewers to miss important details of large graphs, in this section we introduce visual elements to large PathwaySpace images. We will use an igraph object with n = 12990 vertices to create a large PathwaySpace object, upon which we will project binary signals from a relatively small number of vertices. This example will emphasize clusters of vertices forming summits, but it might also come at the cost of reduced clarity in displaying the graph’s overall structure, particularly in regions far from the summit areas. In order to balance between emphasizing clusters and maintaining the visibility of the entire graph structure, we will outline graph silhouettes as decoration elements in the PathwaySpace image.

#--- Load required packages for this section
library(PathwaySpace)
library(RGraphSpace)
library(igraph)
library(ggplot2)

4.1 Loading a large graph

Next, we will load an igraph object with n = 12990 vertices, containing gene interaction data available from the Pathway Commons database (version 12) (Rodchenkov et al. 2019).

# Load a large igraph object
data("PCv12_pruned_igraph", package = "PathwaySpace")

# Check number of vertices
length(PCv12_pruned_igraph)
# [1] 12990

# Check vertex names
head(V(PCv12_pruned_igraph)$name)
# [1] "A1BG" "AKT1" "CRISP3" "GRB2" "PIK3CA" "PIK3R1"

# Get top-connected nodes for visualization
top10hubs <- igraph::degree(PCv12_pruned_igraph)
top10hubs <- names(sort(top10hubs, decreasing = TRUE)[1:10])
head(top10hubs)
# [1] "GNB1" "TRIM28" "RPS27A" "CTNNB1" "TP53" "ACTB"

Depending on the graphics devices available in the current R session, rendering a large graph can take a while. To visualize the graph layout, next we use the plotGraphSpace() function from the RGraphSpace package for plotting optimization.

## Visualize the graph layout labeled with 'top10hubs' nodes
plotGraphSpace(PCv12_pruned_igraph, marks = top10hubs, 
  mark.color = "blue", theme = "th3")

We will also load gene sets from the MSigDB collection (Liberzon et al. 2015), which are subsequently used to project a binary signal in the PathwaySpace image.

# Load a list with Hallmark gene sets
data("Hallmarks_v2023_1_Hs_symbols", package = "PathwaySpace")

# There are 50 gene sets in "hallmarks"
length(hallmarks)
# [1] 50

# We will use the 'HALLMARK_P53_PATHWAY' (n=200 genes) for demonstration
length(hallmarks$HALLMARK_P53_PATHWAY)
# [1] 200

4.2 Running PathwaySpace

We now follow the PathwaySpace pipeline as explaned in the previous sections, that is, using the buildPathwaySpace() constructor to initialize a new PathwaySpace object with the Pathway Commons interactions.

# Run the PathwaySpace constructor
pspace_PCv12 <- buildPathwaySpace(g=PCv12_pruned_igraph, nrc=500)
# Note: 'nrc' sets the number of rows and columns of the
# image space, which will affect the image resolution (in pixels)

…and now we mark the HALLMARK_P53_PATHWAY genes in the PathwaySpace object.

# Intersect Hallmark genes with the PathwaySpace
hallmarks <- lapply(hallmarks, intersect, y = names(pspace_PCv12) )

# After intersection, the 'HALLMARK_P53_PATHWAY' dropped to n=173 genes
length(hallmarks$HALLMARK_P53_PATHWAY)
# [1] 173

# Set a binary signal (1s) to 'HALLMARK_P53_PATHWAY' genes
vertexSignal(pspace_PCv12) <- 0
vertexSignal(pspace_PCv12)[ hallmarks$HALLMARK_P53_PATHWAY ] <- 1

…and run the circularProjection() function.

# Run network signal projection
pspace_PCv12 <- circularProjection(pspace_PCv12)
plotPathwaySpace(pspace_PCv12, title="HALLMARK_P53_PATHWAY", 
  marks = top10hubs, mark.size = 2, theme = "th3")

Note that this image emphasizes groups of vertices forming summits, but it misses the outline of the graph structure, which faded with the signal that reaches the furthermost points of the network.

4.3 Mapping silhouettes

Next, we will decorate the PathwaySpace image with graph’s silhouettes.

# Add silhouettes
pspace_PCv12 <- silhouetteMapping(pspace_PCv12)
plotPathwaySpace(pspace_PCv12, title="HALLMARK_P53_PATHWAY", 
  marks = top10hubs, mark.size = 2, theme = "th3")

4.4 Mapping summits

The summits represent regions within the graph that exhibit signal values that are notably higher than the baseline level. These regions may be of interest for downstream analyses. One potential downstream analysis is to determine which vertices projected the original input signal. This could provide insights into the communities within these summit regions. One may also wish to explore other vertices within the summits, by querying associations with the original input gene set. In order to extract vertices within summits, next we use the summitMapping() function, which also decorate summits with contour lines.

# Mapping summits
pspace_PCv12 <- summitMapping(pspace_PCv12, minsize = 50)
plotPathwaySpace(pspace_PCv12, title="HALLMARK_P53_PATHWAY", theme = "th3")

# Extracting summits from a PathwaySpace
summits <- getPathwaySpace(pspace_PCv12, "summits")
class(summits)
# [1] "list"

5 Case study

This will be incorporated into the PathwaySpace documentation following the acceptance of Ellrott et al. (2023).

6 Citation

If you use PathwaySpace, please cite:

The Cancer Genome Atlas Analysis Network. PathwaySpace: Spatial projection of network signals along geodesic paths. R package, 2023.
Ellrott et al. (under review)

7 Other useful links

Castro MA, Wang X, Fletcher MN, Meyer KB, Markowetz F (2012). “RedeR: R/Bioconductor package for representing modular structures, nested networks and multiple levels of hierarchical associations.” Genome Biology, 13(4), R29. https://bioconductor.org/packages/RedeR/
Cardoso MA, Rizzardi LEA, Kume LW, Groeneveld C, Trefflich S, Morais DAA, Dalmolin RJS, Ponder BAJ, Meyer KB, Castro MAA. “TreeAndLeaf: an R/Bioconductor package for graphs and trees with focus on the leaves.” Bioinformatics, 38(5):1463-1464, 2022. https://bioconductor.org/packages/TreeAndLeaf/
Csardi G and Nepusz T. “The Igraph Software Package for Complex Network Research.” InterJournal, ComplexSystems:1695, 2006. https://igraph.org

8 Session information

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base     
## 
## other attached packages:
## [1] PathwaySpace_0.99.1 RGraphSpace_1.0.5   ggplot2_3.5.1      
## [4] igraph_2.0.3        BiocStyle_2.30.0   
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.5        jsonlite_1.8.8      compiler_4.3.1     
##  [4] BiocManager_1.30.23 renv_1.0.7          highr_0.10         
##  [7] Rcpp_1.0.12         jquerylib_0.1.4     scales_1.3.0       
## [10] yaml_2.3.8          fastmap_1.1.1       R6_2.5.1           
## [13] knitr_1.46          ggrepel_0.9.5       tibble_3.2.1       
## [16] bookdown_0.39       munsell_0.5.1       bslib_0.7.0        
## [19] pillar_1.9.0        rlang_1.1.3         utf8_1.2.4         
## [22] cachem_1.0.8        RANN_2.6.1          xfun_0.43          
## [25] sass_0.4.9          cli_3.6.2           withr_3.0.0        
## [28] magrittr_2.0.3      digest_0.6.35       grid_4.3.1         
## [31] lifecycle_1.0.4     vctrs_0.6.5         evaluate_0.23      
## [34] glue_1.7.0          fansi_1.0.6         colorspace_2.1-0   
## [37] rmarkdown_2.26      tools_4.3.1         pkgconfig_2.0.3    
## [40] htmltools_0.5.8.1

References

Beebe, Nelson H. F. 2017. “Exponential and Logarithm.” In The Mathematical-Function Computation Handbook: Programming Using the MathCW Portable Software Library, 267–98. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-64110-2_10.

Bourque, Maxime Fortier, and Bram Petri. 2023. “Linear Programming Bounds for Hyperbolic Surfaces.” https://arxiv.org/abs/2302.02540.

Kızılersü, Ayşe, Markus Kreer, and Anthony W. Thomas. 2018. “The Weibull Distribution.” Significance 15 (2): 10–11. https://doi.org/10.1111/j.1740-9713.2018.01123.x.

Liberzon, Arthur, Chet Birger, Helga Thorvaldsdottir, Mahmoud Ghandi, Jill Mesirov, and Pablo Tamayo. 2015. “The Molecular Signatures Database (MSigDB) Hallmark Gene Set Collection.” Cell Systems 1 (5): 417–25. https://doi.org/10.1016/j.cels.2015.12.004.

Rodchenkov, Igor, Ozgun Babur, Augustin Luna, Bulent Arman Aksoy, Jeffrey V Wong, Dylan Fong, Max Franz, et al. 2019. “Pathway Commons 2019 Update: integration, analysis and exploration of pathway data.” Nucleic Acids Research 48 (D1): D489–97. https://doi.org/10.1093/nar/gkz946.