Dual-Chain Network Analysis • NAIR

For single-cell data, cell-level network analysis can be performed based on joint similarity in alpha chain sequence and beta chain sequence.

We simulate some toy data to demonstrate the usage.

set.seed(42)
library(NAIR)

dat <- simulateToyData(chains = 2)
head(dat)
#>        AlphaSeq        BetaSeq Count UMIs SampleID
#> 1 TTGAGGAAATTCG TTGAGGAAATTCGG  3095    4  Sample1
#> 2 GGAGATGAATCGG  GGAGATGAATCGG  3057    6  Sample1
#> 3 GTCGGGTAATTGG GTCGGGTAATTGGG  3575    8  Sample1
#> 4 GCCGGGTAATTCG GCCGGGTAATTCGG  3994    7  Sample1
#> 5 GAAAGAGAATTCG GAAAGAGAATTCGG  3670    3  Sample1
#> 6 AGGTGGGAATTCG  AGGTGGGAATTCG  4076    5  Sample1

The input data is assumed to have the following format:

Each row corresponds to a unique cell
The data contains separate columns for alpha chain sequence and beta chain sequence

Dual-chain network analysis can be performed using buildRepSeqNetwork() (or generateNetworkObjects()) by supplying a length-2 vector to the seq_col parameter:

First entry should reference the column for alpha chain sequence
Second entry should reference the column for beta chain sequence

# Build network based on joint dual-chain similarity
network <- buildNet(dat, 
                    seq_col = c("AlphaSeq", "BetaSeq"),
                    count_col = "UMIs",
                    node_stats = TRUE, 
                    stats_to_include = "all",
                    cluster_stats = TRUE, 
                    color_nodes_by = "SampleID",
                    size_nodes_by = "UMIs",
                    node_size_limits = c(0.5, 3)
)

We print the network graph plot with labels added for the largest two clusters:

addClusterLabels(network$plots$SampleID, network, top_n_clusters = 2, size = 8)

The list returned buildRepSeqNetwork() the following items:

names(network)
#> [1] "details"          "igraph"           "adjacency_matrix" "adj_mat_a"       
#> [5] "adj_mat_b"        "node_data"        "cluster_data"     "plots"

Notice that the list contains three adjacency matrices: adjacency_matrix corresponds to the network based on joint similarity in both chain sequences, while adj_mat_a corresponds to the network based only on similarity in the alpha-chain sequence (and similarly for adj_mat_b).

The cluster-level data contains sequence-based cluster statistics for each of the alpha and beta chain sequences:

head(network$cluster_data)
#>   cluster_id node_count eigen_centrality_eigenvalue eigen_centrality_index
#> 1          1         15                    3.680389           6.385488e-01
#> 2          2         13                    4.419380           6.131393e-01
#> 3          3         16                    7.257172           5.291669e-01
#> 4          4         10                    3.750958           6.107669e-01
#> 5          5          3                    1.414214           5.857864e-01
#> 6          6          3                    2.000000           8.881784e-16
#>   closeness_centrality_index degree_centrality_index edge_density
#> 1                  0.4497821               0.3190476    0.1809524
#> 2                  0.4357891               0.3141026    0.2692308
#> 3                  0.4650078               0.3250000    0.3416667
#> 4                  0.4889192               0.3555556    0.3111111
#> 5                  1.0000000               0.3333333    0.6666667
#> 6                  0.0000000               0.0000000    1.0000000
#>   global_transitivity assortativity diameter_length max_degree mean_degree
#> 1           0.2884615   -0.16503588               6          7        2.60
#> 2           0.3802817   -0.15180055               7         11        4.00
#> 3           0.6328125   -0.08424855               6         12        5.81
#> 4           0.3750000   -0.33425414               6          6        2.90
#> 5           0.0000000   -1.00000000               3          2        1.67
#> 6           1.0000000           NaN               2          2        2.00
#>   mean_A_seq_length mean_B_seq_length A_seq_w_max_degree B_seq_w_max_degree
#> 1             12.13             12.87       AAAAAAAAATTC      AAAAAAAAATTCG
#> 2             13.00             13.08      GGGGGGGAATTGG      GGGGGGGAATTGG
#> 3             13.00             13.94      GGGGGGGAATTGG     GGGGGGGAATTGGG
#> 4             12.00             12.00       AAAAAGAAATTG       AAAAAGAAATTG
#> 5             13.00             14.00      AGGGGAGAATTGG     AGGGGAGAATTGGG
#> 6             13.00             14.00      AAAAAAGAATTGC     AAAAAAGAATTGCG
#>   max_count agg_count A_seq_w_max_count B_seq_w_max_count
#> 1         6        42      AAAAAAAAATTC      AAAAAAAAATTC
#> 2         6        28     GGGGTGGAATTGG     GGGGTGGAATTGG
#> 3         6        49     GGGGAGAAATTGG    GGGGAGAAATTGGG
#> 4         7        39      AAAGAAAAATTG      AAAGAAAAATTG
#> 5         5        10     AGGGGAGAATTGG    AGGGGAGAATTGGG
#> 6         2         4     AGAAAAGAATTGC    AGAAAAGAATTGCG

The remainder of the output and customization follows the general case for buildRepSeqNetwork().