Identify Cells or Clones in a Neighborhood Around a Target Sequence
getNeighborhood.Rd
Given Adaptive Immune Receptor Repertoire Sequencing (AIRR-Seq) data and a target receptor sequence that is present within the data, identifies a "neighborhood" comprised of cells/clones with receptor sequences sufficiently similar to the target sequence.
Arguments
- data
A data frame containing the AIRR-Seq data.
- seq_col
Specifies the column of
data
containing the receptor sequences. Accepts a character string containing the column name or a numeric scalar containing the column index.- target_seq
A character string containing the target receptor sequence. Must be a receptor sequence possessed by one of the clones/cells in the AIRR-Seq data.
- dist_type
Specifies the function used to quantify the similarity between receptor sequences. The similarity between two sequences determines their pairwise distance, with greater similarity corresponding to shorter distance. Valid options are
"hamming"
(the default), which useshamDistBounded()
, and"levenshtein"
, which useslevDistBounded()
.- max_dist
Determines whether each cell/clone belongs to the neighborhood based on its receptor sequence's distance from the target sequence. The distance is based on the
dist_type
argument.max_dist
specifies the maximum distance at which a cell/clone belongs to the neighborhood. Lower values require greater similarity between the target sequence and the receptor sequences of cells/clones in its neighborhood.
Value
A data frame containing the rows of data
corresponding to the
cells/clones in the neighborhood.
If no cell/clone in the AIRR-Seq data possesses the target sequence as its
receptor sequence, then a value of NULL
is returned.
References
Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825
Author
Brian Neal (Brian.Neal@ucsf.edu)
Examples
set.seed(42)
toy_data <- simulateToyData(sample_size = 500)
# Get neighborhood around first clone sequence
nbd <-
getNeighborhood(
toy_data,
seq_col = "CloneSeq",
target_seq = "GGGGGGGAATTGG"
)
head(nbd)
#> CloneSeq CloneFrequency CloneCount SampleID
#> 12 GGGGGGGAATTG 0.001783780 3518 Sample1
#> 24 GGGGGGGAATCGG 0.001824344 3598 Sample1
#> 30 GGGCGGGAATTGG 0.002689868 5305 Sample1
#> 31 GGGGGAGAATTGG 0.002538262 5006 Sample1
#> 35 GGGGGGGAATTGC 0.002619896 5167 Sample1
#> 37 GGGAGGGAATTGG 0.001884175 3716 Sample1