Build Global Network of Public TCR/BCR Clusters
buildPublicClusterNetwork.Rd
Part of the workflow
Searching for Public TCR/BCR Clusters.
Intended for use following findPublicClusters()
.
Given node-level metadata for each sample's filtered clusters, combines the data into a global network and performs network analysis and cluster analysis.
Usage
buildPublicClusterNetwork(
## Input ##
file_list,
input_type = "rds",
data_symbols = "ndat",
header = TRUE, sep,
read.args = list(row.names = 1),
seq_col,
## Network Settings ##
drop_isolated_nodes = FALSE,
node_stats = deprecated(),
stats_to_include = deprecated(),
cluster_stats = deprecated(),
## Visualization ##
color_nodes_by = "SampleID",
color_scheme = "turbo",
plot_title = "Global Network of Public Clusters",
## Output ##
output_dir = NULL,
output_name = "PublicClusterNetwork",
verbose = FALSE,
...
)
Arguments
- file_list
A character vector of file paths, or a list containing
connections
and file paths. Each element corresponds to a single file containing the data for a single sample.loadDataFromFileList()
.- input_type
A character string specifying the file format of the input files. Options are
"csv"
,"rds"
and"rda"
. Passed toloadDataFromFileList()
.- data_symbols
Used when
input_type = "rda"
. Specifies the name of the data frame within each Rdata file. Passed toloadDataFromFileList()
.- header
For values of
input_type
other than"rds"
and"rda"
, this argument can be used to specify a non-default value of theheader
argument toread.table()
,read.csv()
, etc.- sep
For values of
input_type
other than"rds"
and"rda"
, this argument can be used to specify a non-default value of thesep
argument toread.table()
,read.csv()
, etc.- read.args
For values of
input_type
other than"rds"
and"rda"
, this argument can be used to specify non-default values of optional arguments toread.table()
,read.csv()
, etc. Accepts a named list of argument values. Values ofheader
andsep
in this list take precedence over values specified via theheader
andsep
arguments.- seq_col
Specifies the column in the node-level metadata that contains the TCR/BCR sequences. Accepts a character string containing the column name or a numeric scalar containing the column index.
- drop_isolated_nodes
Passed to
buildRepSeqNetwork()
when constructing the global network.- node_stats
- stats_to_include
- cluster_stats
- color_nodes_by
Passed to
buildRepSeqNetwork()
when constructing the global network. The node-level network properties for the global network (see details) are included among the valid options.- color_scheme
Passed to
addPlots()
when constructing the global network.- plot_title
Passed to
buildRepSeqNetwork()
when constructing the global network.- output_dir
Passed to
buildRepSeqNetwork()
when constructing the global network.- output_name
Passed to
buildRepSeqNetwork()
when constructing the global network.- verbose
Logical. If
TRUE
, generates messages about the tasks performed and their progress, as well as relevant properties of intermediate outputs. Messages are sent tostderr()
.- ...
Other arguments to
buildRepSeqNetwork()
(including arguments toaddPlots()
) when constructing the global network. Does not includenode_stats
,stats_to_include
,cluster_stats
orcluster_id_name
.
Details
The node-level metadata for the filtered clusters from all samples is combined
and the global network is constructed by calling
buildNet()
with
node_stats = TRUE
, stats_to_include = "all"
,
cluster_stats = TRUE
and cluster_id_name = "ClusterIDPublic"
.
The computed node-level network properties are renamed to reflect their correspondence to the global network. This is done to distinguish them from the network properties that correspond to the sample-level networks. The names are:
ClusterIDPublic
PublicNetworkDegree
PublicTransitivity
PublicCloseness
PublicCentralityByCloseness
PublicEigenCentrality
PublicCentralityByEigen
PublicBetweenness
PublicCentralityByBetweenness
PublicAuthorityScore
PublicCoreness
PublicPageRank
See the Searching for Public TCR/BCR Clusters article on the package website.
Value
A list of network objects as returned by
buildRepSeqNetwork()
.
The list is returned invisibly.
If the input data contains a combined total of fewer than two rows, or if the
global network contains no nodes, then the function returns NULL
,
invisibly, with a warning.
References
Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825
Searching for Public TCR/BCR Clusters article on package website
Author
Brian Neal (Brian.Neal@ucsf.edu)
Examples
set.seed(42)
## Simulate 30 samples with a mix of public/private sequences ##
samples <- 30
sample_size <- 30 # (seqs per sample)
base_seqs <- c(
"CASSIEGQLSTDTQYF", "CASSEEGQLSTDTQYF", "CASSSVETQYF",
"CASSPEGQLSTDTQYF", "RASSLAGNTEAFF", "CASSHRGTDTQYF", "CASDAGVFQPQHF",
"CASSLTSGYNEQFF", "CASSETGYNEQFF", "CASSLTGGNEQFF", "CASSYLTGYNEQFF",
"CASSLTGNEQFF", "CASSLNGYNEQFF", "CASSFPWDGYGYTF", "CASTLARQGGELFF",
"CASTLSRQGGELFF", "CSVELLPTGPLETSYNEQFF", "CSVELLPTGPSETSYNEQFF",
"CVELLPTGPSETSYNEQFF", "CASLAGGRTQETQYF", "CASRLAGGRTQETQYF",
"CASSLAGGRTETQYF", "CASSLAGGRTQETQYF", "CASSRLAGGRTQETQYF",
"CASQYGGGNQPQHF", "CASSLGGGNQPQHF", "CASSNGGGNQPQHF", "CASSYGGGGNQPQHF",
"CASSYGGGQPQHF", "CASSYKGGNQPQHF", "CASSYTGGGNQPQHF",
"CAWSSQETQYF", "CASSSPETQYF", "CASSGAYEQYF", "CSVDLGKGNNEQFF")
# Relative generation probabilities
pgen <- cbind(
stats::toeplitz(0.6^(0:(sample_size - 1))),
matrix(1, nrow = samples, ncol = length(base_seqs) - samples)
)
simulateToyData(
samples = samples,
sample_size = sample_size,
prefix_length = 1,
prefix_chars = c("", ""),
prefix_probs = cbind(rep(1, samples), rep(0, samples)),
affixes = base_seqs,
affix_probs = pgen,
num_edits = 0,
output_dir = tempdir(),
no_return = TRUE
)
#> [1] TRUE
## 1. Find Public Clusters in Each Sample
sample_files <-
file.path(tempdir(),
paste0("Sample", 1:samples, ".rds")
)
findPublicClusters(
file_list = sample_files,
input_type = "rds",
seq_col = "CloneSeq",
count_col = "CloneCount",
min_seq_length = NULL,
drop_matches = NULL,
top_n_clusters = 3,
min_node_count = 5,
min_clone_count = 15000,
output_dir = tempdir()
)
## 2. Build Global Network of Public Clusters
public_clusters <-
buildPublicClusterNetwork(
file_list =
list.files(
file.path(tempdir(), "node_meta_data"),
full.names = TRUE
),
seq_col = "CloneSeq",
count_col = "CloneCount",
plot_title = NULL,
plot_subtitle = NULL,
print_plots = TRUE
)