Compute Graph Adjacency Matrix for Immune Repertoire Network
generateAdjacencyMatrix.Rd
Given a list of receptor sequences, computes the adjacency matrix for the network graph based on sequence similarity.
sparseAdjacencyMatFromSeqs()
is a deprecated equivalent of
generateAdjacencyMatrix()
.
Usage
generateAdjacencyMatrix(
seqs,
dist_type = "hamming",
dist_cutoff = 1,
drop_isolated_nodes = TRUE,
method = "default",
verbose = FALSE
)
# Deprecated equivalent:
sparseAdjacencyMatFromSeqs(
seqs,
dist_type = "hamming",
dist_cutoff = 1,
drop_isolated_nodes = TRUE,
method = "default",
verbose = FALSE,
max_dist = deprecated()
)
Arguments
- seqs
A character vector containing the receptor sequences.
- dist_type
Specifies the function used to quantify the similarity between sequences. The similarity between two sequences determines the pairwise distance between their respective nodes in the network graph, with greater similarity corresponding to shorter distance. Valid options are
"hamming"
(the default), which useshamDistBounded
, and"levenshtein"
, which useslevDistBounded
.- dist_cutoff
A nonnegative scalar. Specifies the maximum pairwise distance (based on
dist_type
) for an edge connection to exist between two nodes. Pairs of nodes whose distance is less than or equal to this value will be joined by an edge connection in the network graph. Controls the stringency of the network construction and affects the number and density of edges in the network. A lower cutoff value requires greater similarity between sequences in order for their respective nodes to be joined by an edge connection. A value of0
requires two sequences to be identical in order for their nodes to be joined by an edge.- drop_isolated_nodes
Logical. When
TRUE
, removes each node that is not joined by an edge connection to any other node in the network graph.- method
A character string specifying the algorithm to use. Choices are
"default"
and"pattern"
."pattern"
is only valid whendist_cutoff < 3
, but tends to be faster than"default"
for sparsely connected networks, at the cost of greater memory usage (can cause crashes for large or densely-connected networks, particularly fordist_cutoff = 2
). The default algorithm tends to be faster for densely-connected networks or long sequences.- verbose
Logical. If
TRUE
, generates messages about the tasks performed and their progress, as well as relevant properties of intermediate outputs. Messages are sent tostderr
.- max_dist
Details
The adjacency matrix of a graph with \(n\) nodes is the symmetric \(n \times n\) matrix for which entry \((i,j)\) is equal to 1 if nodes \(i\) and \(j\) are connected by an edge in the network graph and 0 otherwise.
To construct the graph of the immune repertoire network, each receptor sequence is modeled as a node. The similarity between receptor sequences, as measured using either the Hamming or Levenshtein distance, determines the distance between nodes in the network graph. The more similar two sequences are, the shorter the distance between their respective nodes. Two nodes in the graph are joined by an edge if the distance between them is sufficiently small, i.e., if their receptor sequences are sufficiently similar.
Value
A sparse matrix of class dgCMatrix
(see dgCMatrix-class
).
If drop_isolated_nodes = TRUE
, the row and column names of the matrix
indicate which receptor sequences in the seqs
vector correspond to each
row and column of the matrix. The row and column names can be accessed using
dimnames
. This returns a list containing two character vectors,
one for the row names and one for the column names. The name of the \(i\)th
matrix row is the index of the seqs
vector corresponding to the \(i\)th
row and \(i\)th column of the matrix. The name of the \(j\)th matrix column
is the receptor sequence corresponding to the \(j\)th row and \(j\)th column
of the matrix.
References
Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825
Author
Brian Neal (Brian.Neal@ucsf.edu)
Examples
generateAdjacencyMatrix(
c("fee", "fie", "foe", "fum", "foo")
)
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#> fee fie foe foo
#> 1 1 1 1 .
#> 2 1 1 1 .
#> 3 1 1 1 1
#> 5 . . 1 1
# No edge connections exist based on a Hamming distance of 1
# (returns a 0x0 sparse matrix)
generateAdjacencyMatrix(
c("foo", "foobar", "fubar", "bar")
)
#> Warning: No edges exist using the specified distance cutoff
#> 0 x 0 sparse Matrix of class "dgCMatrix"
#> <0 x 0 matrix>
# Same as the above example, but keeping all nodes
# (returns a 4x4 sparse matrix)
generateAdjacencyMatrix(
c("foo", "foobar", "fubar", "bar"),
drop_isolated_nodes = FALSE
)
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#>
#> [1,] 1 . . .
#> [2,] . 1 . .
#> [3,] . . 1 .
#> [4,] . . . 1
# Relaxing the edge criteria using a Hamming distance of 2
# (still results in no edge connections)
generateAdjacencyMatrix(
c("foo", "foobar", "fubar", "bar"),
dist_cutoff = 2
)
#> Warning: No edges exist using the specified distance cutoff
#> 0 x 0 sparse Matrix of class "dgCMatrix"
#> <0 x 0 matrix>
# Using a Levenshtein distance of 2, however,
# does result in edge connections
generateAdjacencyMatrix(
c("foo", "foobar", "fubar", "bar"),
dist_type = "levenshtein",
dist_cutoff = 2
)
#> 3 x 3 sparse Matrix of class "dgCMatrix"
#> foobar fubar bar
#> 2 1 1 .
#> 3 1 1 1
#> 4 . 1 1
# Using a Hamming distance of 3
# also results in (different) edge connections
generateAdjacencyMatrix(
c("foo", "foobar", "fubar", "bar"),
dist_cutoff = 3
)
#> 3 x 3 sparse Matrix of class "dgCMatrix"
#> foo foobar bar
#> 1 1 1 1
#> 2 1 1 .
#> 4 1 . 1