Partition a Network Graph Into Clusters
addClusterMembership.Rd
Given a list of network objects returned by
buildRepSeqNetwork()
or
generateNetworkObjects()
,
partitions the network graph into clusters using the specified clustering
algorithm, adding a cluster membership variable to the node metadata.
Usage
addClusterMembership(
net,
cluster_fun = "fast_greedy",
cluster_id_name = "cluster_id",
overwrite = FALSE,
verbose = FALSE,
...,
data = deprecated(),
fun = deprecated()
)
Arguments
- net
A
list
of network objects conforming to the output ofbuildRepSeqNetwork()
orgenerateNetworkObjects()
. See details. Alternatively, this argument accepts the networkigraph
, with the node metadata passed to thedata
argument. However, this alternative functionality is deprecated and will eventually be removed.- cluster_fun
A character string specifying the clustering algorithm to use. See details.
- cluster_id_name
A character string specifying the name of the cluster membership variable to be added to the node metadata.
- overwrite
Logical. Should the variable specified by
cluster_id_name
be overwritten if it already exists?- verbose
Logical. If
TRUE
, generates messages about the tasks performed and their progress, as well as relevant properties of intermediate outputs. Messages are sent tostderr()
.- ...
Named optional arguments to the function specified by
cluster_fun
.- data
- fun
Details
The list net
must contain the named elements
igraph
(of class igraph
),
adjacency_matrix
(a matrix
or
dgCMatrix
encoding edge connections),
and node_data
(a data.frame
containing node metadata),
all corresponding to the same network. The lists returned by
buildRepSeqNetwork()
and
generateNetworkObjects()
are examples of valid inputs for the net
argument.
Alternatively, the igraph
may be passed to net
and the node
metadata to data
. However, this alternative functionality is
deprecated and will eventually be removed.
A clustering algorithm is used to partition the network graph into clusters
(densely-connected subgraphs). Each cluster represents a collection of
clones/cells with similar receptor sequences. The method used to partition
the graph depends on the choice of clustering algorithm, which is specified
using the cluster_fun
argument.
The available options for cluster_fun
are listed below. Each refers
to an igraph
function implementing a
particular clustering algorithm. Follow the links to learn more about the
individual clustering algorithms.
Optional arguments to each clustering algorithm can have their
values specified using the ellipses (...
) argument of
addClusterMembership()
.
Each cluster is assigned a numeric cluster ID. A cluster membership variable,
whose name is specified by cluster_id_name
, is added to the node
metadata, encoding the cluster membership of the node for each row. The cluster
membership is encoded as the cluster ID number of the cluster to which the node
belongs.
The overwrite
argument controls whether to overwrite pre-existing data.
If the variable specified by cluster_id_name
is already present in
the node metadata, then overwrite
must be set to TRUE
in
order to perform clustering and overwrite the variable with new cluster
membership values. Alternatively, by specifying a value for
cluster_id_name
that is not among the variables in the node metadata,
a new cluster membership variable can be created while preserving the old
cluster membership variable. In this manner, clustering can be performed
multiple times on the same network using different clustering algorithms,
without losing the results.
Value
If the variable specified by cluster_id_name
is not present in
net$node_data
, returns a copy of net
with this variable
added to net$node_data
encoding the cluster membership of the network
node corresponding to each row. If the variable is already present and
overwrite = TRUE
, then its values are replaced with the new values
for cluster membership.
Additionally, if net
contains a list named details
, then the
following elements will be added to net$details
if they do not
already exist:
clusters_in_network
A named numeric vector of length 1. The first entry's name is the name of the clustering algorithm, and its value is the number of clusters resulting from performing clustering on the network.
cluster_id_variable
A named numeric vector of length 1. The first entry's name is the name of the clustering algorithm, and its value is the name of the corresponding cluster membership variable in the node metadata (i.e., the value of
cluster_id_name
).
If net$details
already contains these elements, they will be updated
according to whether the cluster membership variable specified by
cluster_id_name
is added to net$node_data
or already exists and is overwritten.
In the former case (the cluster membership variable does not already exist),
the length of each vector
(clusters_in_network
) and (cluster_id_variable
)
is increased by 1, with the new information appended as a new named entry
to each. In the latter case (the cluster membership variable is overwritten),
the new information overwrites the name and value of the last entry of each
vector.
In the event where overwrite = FALSE
and net$node_data
contains
a variable with the same name as the value of cluster_id_name
, then an
unaltered copy of net
is returned with a message notifying the user.
Under the alternative (deprecated) input format where the node metadata is
passed to data
and the igraph
is passed to net
, the
node metadata is returned instead of the list of network objects, with the
cluster membership variable added or updated as described above.
References
Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825
Author
Brian Neal (Brian.Neal@ucsf.edu)
Examples
set.seed(42)
toy_data <- simulateToyData()
net <- generateNetworkObjects(
toy_data, "CloneSeq"
)
# Perform cluster analysis,
# add cluster membership to net$node_data
net <- addClusterMembership(net)
net$details$clusters_in_network
#> fast_greedy
#> 20
net$details$cluster_id_variable
#> fast_greedy
#> "cluster_id"
# overwrite values in net$node_data$cluster_id
# with cluster membership values obtained using "cluster_leiden" algorithm
net <- addClusterMembership(
net,
cluster_fun = "leiden",
overwrite = TRUE
)
net$details$clusters_in_network
#> leiden
#> 53
net$details$cluster_id_variable
#> leiden
#> "cluster_id"
# perform clustering using "cluster_louvain" algorithm
# saves cluster membership values to net$node_data$cluster_id_louvain
# (net$node_data$cluster_id retains membership values from "cluster_leiden")
net <- addClusterMembership(
net,
cluster_fun = "louvain",
cluster_id_name = "cluster_id_louvain",
)
net$details$clusters_in_network
#> leiden louvain
#> 53 19
net$details$cluster_id_variable
#> leiden louvain
#> "cluster_id" "cluster_id_louvain"