Identify Cells or Clones in a Neighborhood Around a Target Sequence

Given Adaptive Immune Receptor Repertoire Sequencing (AIRR-Seq) data and a target receptor sequence that is present within the data, identifies a "neighborhood" comprised of cells/clones with receptor sequences sufficiently similar to the target sequence.

Usage

getNeighborhood(
    data,
    seq_col,
    target_seq,
    dist_type = "hamming",
    max_dist = 1
)

Arguments

data: A data frame containing the AIRR-Seq data.
seq_col: Specifies the column of data containing the receptor sequences. Accepts a character string containing the column name or a numeric scalar containing the column index.
target_seq: A character string containing the target receptor sequence. Must be a receptor sequence possessed by one of the clones/cells in the AIRR-Seq data.
dist_type: Specifies the function used to quantify the similarity between receptor sequences. The similarity between two sequences determines their pairwise distance, with greater similarity corresponding to shorter distance. Valid options are "hamming" (the default), which uses hamDistBounded(), and "levenshtein", which uses levDistBounded().
max_dist: Determines whether each cell/clone belongs to the neighborhood based on its receptor sequence's distance from the target sequence. The distance is based on the dist_type argument. max_dist specifies the maximum distance at which a cell/clone belongs to the neighborhood. Lower values require greater similarity between the target sequence and the receptor sequences of cells/clones in its neighborhood.

Value

A data frame containing the rows of data corresponding to the cells/clones in the neighborhood.

If no cell/clone in the AIRR-Seq data possesses the target sequence as its receptor sequence, then a value of NULL is returned.

References

Hai Yang, Jason Cham, Brian Neal, Zenghua Fan, Tao He and Li Zhang. (2023). NAIR: Network Analysis of Immune Repertoire. Frontiers in Immunology, vol. 14. doi: 10.3389/fimmu.2023.1181825

Webpage for the NAIR package

Author

Brian Neal (Brian.Neal@ucsf.edu)

Examples

set.seed(42)
toy_data <- simulateToyData(sample_size = 500)

# Get neighborhood around first clone sequence
nbd <-
  getNeighborhood(
    toy_data,
    seq_col = "CloneSeq",
    target_seq = "GGGGGGGAATTGG"
  )

head(nbd)
#>         CloneSeq CloneFrequency CloneCount SampleID
#> 12  GGGGGGGAATTG    0.001783780       3518  Sample1
#> 24 GGGGGGGAATCGG    0.001824344       3598  Sample1
#> 30 GGGCGGGAATTGG    0.002689868       5305  Sample1
#> 31 GGGGGAGAATTGG    0.002538262       5006  Sample1
#> 35 GGGGGGGAATTGC    0.002619896       5167  Sample1
#> 37 GGGAGGGAATTGG    0.001884175       3716  Sample1