Partition a Network Graph Into Clusters

Given a list of network objects returned by buildRepSeqNetwork() or generateNetworkObjects(), partitions the network graph into clusters using the specified clustering algorithm, adding a cluster membership variable to the node metadata.

Usage

addClusterMembership(
  net,
  cluster_fun = "fast_greedy",
  cluster_id_name = "cluster_id",
  overwrite = FALSE,
  verbose = FALSE,
  ...,
  data = deprecated(),
  fun = deprecated()
)

Arguments

net: A list of network objects conforming to the output of buildRepSeqNetwork() or generateNetworkObjects(). See details. Alternatively, this argument accepts the network igraph, with the node metadata passed to the data argument. However, this alternative functionality is deprecated and will eventually be removed.
cluster_fun: A character string specifying the clustering algorithm to use. See details.
cluster_id_name: A character string specifying the name of the cluster membership variable to be added to the node metadata.
overwrite: Logical. Should the variable specified by cluster_id_name be overwritten if it already exists?
verbose: Logical. If TRUE, generates messages about the tasks performed and their progress, as well as relevant properties of intermediate outputs. Messages are sent to stderr().
...: Named optional arguments to the function specified by cluster_fun.
data: See net.
fun: Replaced by cluster_fun.

Details

The list net must contain the named elements igraph (of class igraph), adjacency_matrix (a matrix or dgCMatrix encoding edge connections), and node_data (a data.frame containing node metadata), all corresponding to the same network. The lists returned by buildRepSeqNetwork() and generateNetworkObjects() are examples of valid inputs for the net argument.

Alternatively, the igraph may be passed to net and the node metadata to data. However, this alternative functionality is deprecated and will eventually be removed.

A clustering algorithm is used to partition the network graph into clusters (densely-connected subgraphs). Each cluster represents a collection of clones/cells with similar receptor sequences. The method used to partition the graph depends on the choice of clustering algorithm, which is specified using the cluster_fun argument.

The available options for cluster_fun are listed below. Each refers to an igraph function implementing a particular clustering algorithm. Follow the links to learn more about the individual clustering algorithms.

Optional arguments to each clustering algorithm can have their values specified using the ellipses (...) argument of addClusterMembership().

Each cluster is assigned a numeric cluster ID. A cluster membership variable, whose name is specified by cluster_id_name, is added to the node metadata, encoding the cluster membership of the node for each row. The cluster membership is encoded as the cluster ID number of the cluster to which the node belongs.

The overwrite argument controls whether to overwrite pre-existing data. If the variable specified by cluster_id_name is already present in the node metadata, then overwrite must be set to TRUE in order to perform clustering and overwrite the variable with new cluster membership values. Alternatively, by specifying a value for cluster_id_name that is not among the variables in the node metadata, a new cluster membership variable can be created while preserving the old cluster membership variable. In this manner, clustering can be performed multiple times on the same network using different clustering algorithms, without losing the results.

Value

If the variable specified by cluster_id_name is not present in net$node_data, returns a copy of net with this variable added to net$node_data encoding the cluster membership of the network node corresponding to each row. If the variable is already present and overwrite = TRUE, then its values are replaced with the new values for cluster membership.

Additionally, if net contains a list named details, then the following elements will be added to net$details if they do not already exist:

clusters_in_network: A named numeric vector of length 1. The first entry's name is the name of the clustering algorithm, and its value is the number of clusters resulting from performing clustering on the network.
cluster_id_variable: A named numeric vector of length 1. The first entry's name is the name of the clustering algorithm, and its value is the name of the corresponding cluster membership variable in the node metadata (i.e., the value of cluster_id_name).

If net$details already contains these elements, they will be updated according to whether the cluster membership variable specified by cluster_id_name is added to net$node_data or already exists and is overwritten. In the former case (the cluster membership variable does not already exist), the length of each vector (clusters_in_network) and (cluster_id_variable) is increased by 1, with the new information appended as a new named entry to each. In the latter case (the cluster membership variable is overwritten), the new information overwrites the name and value of the last entry of each vector.

In the event where overwrite = FALSE and net$node_data contains a variable with the same name as the value of cluster_id_name, then an unaltered copy of net is returned with a message notifying the user.

Under the alternative (deprecated) input format where the node metadata is passed to data and the igraph is passed to net, the node metadata is returned instead of the list of network objects, with the cluster membership variable added or updated as described above.

References

Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825

Webpage for the NAIR package

Author

Brian Neal (Brian.Neal@ucsf.edu)

Examples

set.seed(42)
toy_data <- simulateToyData()

net <- generateNetworkObjects(
  toy_data, "CloneSeq"
)

# Perform cluster analysis,
# add cluster membership to net$node_data
net <- addClusterMembership(net)

net$details$clusters_in_network
#> fast_greedy 
#>          20 
net$details$cluster_id_variable
#>  fast_greedy 
#> "cluster_id" 

# overwrite values in net$node_data$cluster_id
# with cluster membership values obtained using "cluster_leiden" algorithm
net <- addClusterMembership(
  net,
  cluster_fun = "leiden",
  overwrite = TRUE
)

net$details$clusters_in_network
#> leiden 
#>     53 
net$details$cluster_id_variable
#>       leiden 
#> "cluster_id" 

# perform clustering using "cluster_louvain" algorithm
# saves cluster membership values to net$node_data$cluster_id_louvain
# (net$node_data$cluster_id retains membership values from "cluster_leiden")
net <- addClusterMembership(
  net,
  cluster_fun = "louvain",
  cluster_id_name = "cluster_id_louvain",
)

net$details$clusters_in_network
#>  leiden louvain 
#>      53      19 
net$details$cluster_id_variable
#>               leiden              louvain 
#>         "cluster_id" "cluster_id_louvain"