Compute Graph Adjacency Matrix for Immune Repertoire Network

Given a list of receptor sequences, computes the adjacency matrix for the network graph based on sequence similarity.

sparseAdjacencyMatFromSeqs() is a deprecated equivalent of generateAdjacencyMatrix().

Usage

generateAdjacencyMatrix(
  seqs,
  dist_type = "hamming",
  dist_cutoff = 1,
  drop_isolated_nodes = TRUE,
  method = "default",
  verbose = FALSE
)

# Deprecated equivalent:
sparseAdjacencyMatFromSeqs(
  seqs,
  dist_type = "hamming",
  dist_cutoff = 1,
  drop_isolated_nodes = TRUE,
  method = "default",
  verbose = FALSE,
  max_dist = deprecated()
)

Arguments

seqs: A character vector containing the receptor sequences.
dist_type: Specifies the function used to quantify the similarity between sequences. The similarity between two sequences determines the pairwise distance between their respective nodes in the network graph, with greater similarity corresponding to shorter distance. Valid options are "hamming" (the default), which uses hamDistBounded, and "levenshtein", which uses levDistBounded.
dist_cutoff: A nonnegative scalar. Specifies the maximum pairwise distance (based on dist_type) for an edge connection to exist between two nodes. Pairs of nodes whose distance is less than or equal to this value will be joined by an edge connection in the network graph. Controls the stringency of the network construction and affects the number and density of edges in the network. A lower cutoff value requires greater similarity between sequences in order for their respective nodes to be joined by an edge connection. A value of 0 requires two sequences to be identical in order for their nodes to be joined by an edge.
drop_isolated_nodes: Logical. When TRUE, removes each node that is not joined by an edge connection to any other node in the network graph.
method: A character string specifying the algorithm to use. Choices are "default" and "pattern". "pattern" is only valid when dist_cutoff < 3, but tends to be faster than "default" for sparsely connected networks, at the cost of greater memory usage (can cause crashes for large or densely-connected networks, particularly for dist_cutoff = 2). The default algorithm tends to be faster for densely-connected networks or long sequences.
verbose: Logical. If TRUE, generates messages about the tasks performed and their progress, as well as relevant properties of intermediate outputs. Messages are sent to stderr.
max_dist: Equivalent to dist_cutoff.

Details

The adjacency matrix of a graph with \(n\) nodes is the symmetric \(n \times n\) matrix for which entry \((i,j)\) is equal to 1 if nodes \(i\) and \(j\) are connected by an edge in the network graph and 0 otherwise.

To construct the graph of the immune repertoire network, each receptor sequence is modeled as a node. The similarity between receptor sequences, as measured using either the Hamming or Levenshtein distance, determines the distance between nodes in the network graph. The more similar two sequences are, the shorter the distance between their respective nodes. Two nodes in the graph are joined by an edge if the distance between them is sufficiently small, i.e., if their receptor sequences are sufficiently similar.

Value

A sparse matrix of class dgCMatrix (see dgCMatrix-class).

If drop_isolated_nodes = TRUE, the row and column names of the matrix indicate which receptor sequences in the seqs vector correspond to each row and column of the matrix. The row and column names can be accessed using dimnames. This returns a list containing two character vectors, one for the row names and one for the column names. The name of the \(i\)th matrix row is the index of the seqs vector corresponding to the \(i\)th row and \(i\)th column of the matrix. The name of the \(j\)th matrix column is the receptor sequence corresponding to the \(j\)th row and \(j\)th column of the matrix.

References

Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825

Webpage for the NAIR package

Author

Brian Neal (Brian.Neal@ucsf.edu)

Examples

generateAdjacencyMatrix(
  c("fee", "fie", "foe", "fum", "foo")
)
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#>   fee fie foe foo
#> 1   1   1   1   .
#> 2   1   1   1   .
#> 3   1   1   1   1
#> 5   .   .   1   1

# No edge connections exist based on a Hamming distance of 1
# (returns a 0x0 sparse matrix)
generateAdjacencyMatrix(
  c("foo", "foobar", "fubar", "bar")
)
#> Warning: No edges exist using the specified distance cutoff
#> 0 x 0 sparse Matrix of class "dgCMatrix"
#> <0 x 0 matrix>

# Same as the above example, but keeping all nodes
# (returns a 4x4 sparse matrix)
generateAdjacencyMatrix(
  c("foo", "foobar", "fubar", "bar"),
  drop_isolated_nodes = FALSE
)
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#>             
#> [1,] 1 . . .
#> [2,] . 1 . .
#> [3,] . . 1 .
#> [4,] . . . 1

# Relaxing the edge criteria using a Hamming distance of 2
# (still results in no edge connections)
generateAdjacencyMatrix(
  c("foo", "foobar", "fubar", "bar"),
  dist_cutoff = 2
)
#> Warning: No edges exist using the specified distance cutoff
#> 0 x 0 sparse Matrix of class "dgCMatrix"
#> <0 x 0 matrix>

# Using a Levenshtein distance of 2, however,
# does result in edge connections
generateAdjacencyMatrix(
  c("foo", "foobar", "fubar", "bar"),
  dist_type = "levenshtein",
  dist_cutoff = 2
)
#> 3 x 3 sparse Matrix of class "dgCMatrix"
#>   foobar fubar bar
#> 2      1     1   .
#> 3      1     1   1
#> 4      .     1   1

# Using a Hamming distance of 3
# also results in (different) edge connections
generateAdjacencyMatrix(
  c("foo", "foobar", "fubar", "bar"),
  dist_cutoff = 3
)
#> 3 x 3 sparse Matrix of class "dgCMatrix"
#>   foo foobar bar
#> 1   1      1   1
#> 2   1      1   .
#> 4   1      .   1