Compute Cluster-Level Network Properties
getClusterStats.Rd
Given the node-level metadata and adjacency matrix for a network graph that has been partitioned into clusters, computes network properties for the clusters and returns them in a data frame.
addClusterStats()
is preferred to getClusterStats()
in most situations.
Usage
getClusterStats(
data,
adjacency_matrix,
seq_col = NULL,
count_col = NULL,
cluster_id_col = "cluster_id",
degree_col = NULL,
cluster_fun = deprecated(),
verbose = FALSE
)
Arguments
- data
A data frame containing the node-level metadata for the network, with each row corresponding to a network node.
- adjacency_matrix
The adjacency matrix for the network.
- seq_col
Specifies the column(s) of
data
containing the receptor sequences upon whose similarity the network is based. Accepts a character or numeric vector of length 1 or 2, containing either column names or column indices. If provided, then related cluster-level properties will be computed.- count_col
Specifies the column of
data
containing a measure of abundance (such as clone count or UMI count). Accepts a character string containing the column name or a numeric scalar containing the column index. If provided, related cluster-level properties will be computed.- cluster_id_col
Specifies the column of
data
containing the cluster membership variable that identifies the cluster to which each node belongs. Accepts a character string containing the column name or a numeric scalar containing the column index.- degree_col
Specifies the column of
data
containing the network degree of each node. Accepts a character string containing the column name or a numeric scalar containing the column index. If the column does not exist, the network degree will be computed.- cluster_fun
- verbose
Logical. If
TRUE
, generates messages about the tasks performed and their progress, as well as relevant properties of intermediate outputs. Messages are sent tostderr()
.
Details
To use getClusterStats()
,
the network graph must first be partitioned into clusters,
which can be done using
addClusterMembership()
.
The name of the cluster membership variable in the node metadata
must be provided to the cluster_id_col
argument
when calling getClusterStats()
.
Value
A data frame containing one row for each cluster in the network and the following variables:
- cluster_id
The cluster ID number.
- node_count
The number of nodes in the cluster.
- mean_seq_length
The mean sequence length in the cluster. Only present when
length(seq_col) == 1
.- A_mean_seq_length
The mean first sequence length in the cluster. Only present when
length(seq_col) == 2
.- B_mean_seq_length
The mean second sequence length in the cluster. Only present when
length(seq_col) == 2
.- mean_degree
The mean network degree in the cluster.
- max_degree
The maximum network degree in the cluster.
- seq_w_max_degree
The receptor sequence possessing the maximum degree within the cluster. Only present when
length(seq_col) == 1
.- A_seq_w_max_degree
The first sequence of the node possessing the maximum degree within the cluster. Only present when
length(seq_col) == 2
.- B_seq_w_max_degree
The second sequence of the node possessing the maximum degree within the cluster. Only present when
length(seq_col) == 2
.- agg_count
The aggregate count among all nodes in the cluster (based on the counts in
count_col
).- max_count
The maximum count among all nodes in the cluster (based on the counts in
count_col
).- seq_w_max_count
The receptor sequence possessing the maximum count within the cluster. Only present when
length(seq_col) == 1
.- A_seq_w_max_count
The first sequence of the node possessing the maximum count within the cluster. Only present when
length(seq_col) == 2
.- B_seq_w_max_count
The second sequence of the node possessing the maximum count within the cluster. Only present when
length(seq_col) == 2
.- diameter_length
The longest geodesic distance in the cluster, computed as the length of the vector returned by
get_diameter()
.- assortativity
The assortativity coefficient of the cluster's graph, based on the degree (minus one) of each node in the cluster (with the degree computed based only upon the nodes within the cluster). Computed using
assortativity_degree()
.- global_transitivity
The transitivity (i.e., clustering coefficient) for the cluster's graph, which estimates the probability that adjacent vertices are connected. Computed using
transitivity()
withtype = "global"
.- edge_density
The number of edges in the cluster as a fraction of the maximum possible number of edges. Computed using
edge_density()
.- degree_centrality_index
The centrality index of the cluster's graph based on within-cluster network degree. Computed as the
centralization
element of the output fromcentr_degree()
.- closeness_centrality_index
The centrality index of the cluster's graph based on closeness, i.e., distance to other nodes in the cluster. Computed using
centralization()
.- eigen_centrality_index
The centrality index of the cluster's graph based on the eigenvector centrality scores, i.e., values of the first eigenvector of the adjacency matrix for the cluster. Computed as the
centralization
element of the output fromcentr_eigen()
.- eigen_centrality_eigenvalue
The eigenvalue corresponding to the first eigenvector of the adjacency matrix for the cluster. Computed as the
value
element of the output fromeigen_centrality()
.
References
Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825
Author
Brian Neal (Brian.Neal@ucsf.edu)
Examples
set.seed(42)
toy_data <- simulateToyData()
net <-
generateNetworkObjects(
toy_data, "CloneSeq"
)
net <- addClusterMembership(net)
net$cluster_data <-
getClusterStats(
net$node_data,
net$adjacency_matrix,
seq_col = "CloneSeq",
count_col = "CloneCount"
)