Changelog
Source:NEWS.md
NAIR 1.0.4
CRAN release: 2024-03-02
Minor Changes and Bug Fixes
- Updated tests for compatibility with changes to guides in
ggplot2
(thanks to Teun van den Brand and theggplot2
development team for contributing the updates) - Saving network output using
output_type = "individual"
now also saves the entire network list as an RData file (.rda
).
NAIR 1.0.3
CRAN release: 2024-01-09
Minor Changes and Bug Fixes
- Removed a package test that checked for particular numbers of clusters resulting from specific applications of clustering algorithms from the
igraph
package. The test no longer passes withigraph
version 1.6.0. Rather than update the test to pass, it has been removed to avoid future occurrences of this issue.
NAIR 1.0.2
CRAN release: 2023-09-27
Minor Changes and Bug Fixes
- Fixed a bug in
levDistBounded()
that causes undefined behavior when either string is empty after removing the common prefix and suffix. This bug does not appear to affect the returned value. -
levDistBounded.cpp
andhamDistBounded.cpp
now use thestring.h
header instead ofstrings.h
NAIR 1.0.1
CRAN release: 2023-09-14
Breaking Changes
-
getClusterStats()
now requires the cluster ID column to be specified and present in the provided node metadata; it will no longer compute cluster membership since it does not return the node metadata (so any membership values computed are lost). -
addClusterMembership()
now accepts and returns the list of network objects instead of accepting and returning the node metadata with the igraph as an additional input. The first parameterdata
has been deprecated and moved in position, with the second parameternet
becoming the first parameter and accepting the list of network objects instead of just the igraph. The function still also supports the old usage (for now), as long asnet
anddata
are specified by name (or the updated argument positions are used). See section “Unified Primary Argument Across Functions” for context. - Functions no longer save output to file by default. The user must provide a directory/file path to the appropriate parameter for output to be saved.
- All instances of
"individual"
as a default value foroutput_type
have been changed to"rds"
."rds"
is the preferred default since it reduces file size/clutter and the list of network objects can be restored intact (the list is the primary input/output of coreNAIR
functions) under any name desired."rda"
should be used if the file will be transferred across machines (the list will be restored under the namenet
), and"individual"
should be used when the output is to be accessed from outside of R. -
output_type = "individual"
now writes the row names of the node metadata to the first column of the csv file. These contain the original row IDs from the input data. - Default value of
output_type
infindAssociatedClones()
andinput_type
inbuildAssociatedClusterNetwork()
changed from"csv"
to"rds"
, since these files are intermediate outputs and typically there should be no need to access them from outside of R or from another machine. -
buildPublicClusterNetworkByRepresentative()
default value ofoutput_type
changed from"rda"
to"rds"
.
New Features
This section covers general new features. Other new features are grouped by subject in the following few sections.
-
buildRepSeqNetwork()
now has the convenient aliasbuildNet()
. - The list returned by
buildRepSeqNetwork()
now contains an elementdetails
with network metadata such as the argument values used in the function call. - Plots with nodes colored according to a continuous variable will now have their legends displayed using a color bar instead of discrete legend values, unless that variable is also used to size the nodes.
- In most cases where an invalid value is supplied to a function argument for which a meaningful default exists, instead of raising an error, the argument’s value is replaced by the default value and a warning is raised.
Unified Primary Argument Across Functions
Several changes and additions have been made in favor of using the list of network objects returned by buildRepSeqNetwork()
as a unified primary input and output across the core NAIR
functions. Adopting this convention offers several benefits: It greatly simplifies usage, since users no longer need to know which components of the list to input to which function (or what each function returns); it eliminates the task of manually updating the list of network objects; it results in the core functions working with the pipe operator; and most importantly, it improves functionality within and between functions, since functions can read and modify anything in the network list. For instance, addPlots()
can use the coordinate layout of any existing plots to ensure a consistent layout across plots (which is no longer guaranteed otherwise), while addClusterStats()
can add cluster membership values to the node metadata and record in details
that the cluster properties correspond to these membership values (and not the values from a different instance of clustering using a different algorithm).
The following changes encompass the move toward using the network list as a primary input/output:
-
addClusterMembership()
parameters and return value have changed. See the Breaking Changes section for details. -
addPlots()
added as the preferred alternative togenerateNetworkGraphPlots()
andplotNetworkGraph()
-
addClusterStats()
added as the preferred alternative togetClusterStats()
-
addNodeStats()
added as the preferred alternative toaddNodeNetworkStats()
-
labelClusters()
added as the preferred alternative toaddClusterLabels()
-
labelNodes()
added as the preferred alternative toaddGraphLabels()
See the new “Supplementary Functions” vignette for examples.
Multiple Instances of Clustering
The following changes and additions have been made to facilitate multiple instances of clustering on the same network using different clustering algorithms. See the new “Cluster Analysis” vignette for examples.
- All functions that can perform clustering now have a parameter
cluster_id_name
that can be used to specify a custom name for the cluster membership variable added to the node metadata. - Each time a new cluster membership variable is added to the node metadata, information is added to
details
recording the clustering algorithm used and the name of the corresponding cluster membership variable. - When cluster properties are computed with
addClusterStats()
, information is added todetails
recording the cluster membership variable corresponding to the cluster properties. -
labelClusters()
andaddClusterLabels()
now checkdetails
to confirm that the cluster properties match the specified cluster membership variable before using the node counts in the cluster properties. -
labelClusters()
andaddClusterLabels()
can now be used without cluster properties; node count is computed from the cluster membership values. -
labelClusters()
can be used to label multiple plots at once. -
addClusterMembership()
,addClusterStats()
andaddNodeStats()
now allow custom argument values for optional parameters of the clustering algorithm through the ellipses (...
) argument.
It may also be of interest in the future to add functionality allowing the network list to contain multiple sets of cluster properties corresponding to different instances of clustering.
Plots and Graph Layout
Plotting functions no longer fix the random seed when generating the coordinate layout for a plot. In order to facilitate a consistent layout across multiple plots of the same network graph, the following changes have been made.
- Multiple plots produced in the same call to
buildRepSeqNetwork()
,addPlots()
andgenerateNetworkGraphPlots()
will all use a common layout. - Plot lists created by
buildRepSeqNetwork()
,addPlots()
andgenerateNetworkGraphPlots()
now include a matrixgraph_layout
containing the layout used in the plots. -
addPlots()
will automatically use thegraph_layout
mentioned above to ensure that new plots use the same layout as existing plots. - If the network list already contains plots but
graph_layout
is absent,addPlots()
will extract the layout from the first plot and use it for the new plots. -
generateNetworkGraphPlots()
has a new parameterlayout
that can be used to specify the layout. Can be used to generate new plots with the same layout as existing plots (thoughaddPlots()
is easier). Can also be used to generate plots with custom layout types other than the default layout created usingigraph::layout_components()
. -
saveNetworkPlots()
has a new parameteroutfile_layout
that can be used to save the graph layout. -
saveNetwork()
automatically saves the graph layout whenoutput_type = "individual"
.
Essentially, generating new plots with addPlots()
will ensure a consistent layout with the initial plots. Fixing a random seed before calling buildRepSeqNetwork()
(or before the first call to addPlots()
, if buildRepSeqNetwork()
is called with plots = FALSE
) allows the same layout to be reproduced across multiple executions of the same code in which the initial plots are generated.
Improved File Input Functionality
- Most instances of the
file_list
argument now accept a list containing connections and file paths instead of only a character vector of file paths. This allows a greater variety of data sources to be used. - A greater variety of input data formats are now supported. Instances of the
input_type
parameter that accept text formats have a new parameterread.args
that accepts a named list of optional arguments toread.table()
and its variantsread.csv()
, etc. Dedicated arguments forheader
andsep
still exist apart fromread.args
for backwards compatibility, but their defaults now matchinput_type
(e.g.,sep
defaults to","
forinput_type = "csv"
and to""
forinput_type = "table"
). -
input_type = "tsv"
now reads files usingread.delim()
instead ofread.table()
. - Most instances of the
input_type
argument now also support the value"csv2"
for reading files usingread.csv2()
.
Lifecycle Changes
- Version 1.0.1 is the first major release of the NAIR package. Going forward, release version numbers will follow the format
<major>.<minor>.<patch>
, and in-development versions will follow the format<major>.<minor>.<patch>.<dev>
. -
plotNetworkGraph()
deprecated in favor ofaddPlots()
. -
filterInputData()
argumentcount_col
deprecated. Rows with NA counts are no longer dropped. -
getClusterFun()
argumentcluster_fun
deprecated (see Breaking Changes) -
addNodeNetworkStats()
deprecated in favor ofaddNodeStats()
(see section “Unified Primary Argument Across Functions”) -
addClusterMembership()
argumentdata
deprecated (see section “Unified Primary Argument Across Functions”) -
addClusterMembership()
argumentfun
deprecated in favor ofcluster_fun
for consistency with other functions. -
sparseAdjacencyMatFromSeqs()
argumentmax_dist
deprecated in favor ofdist_cutoff
for consistency with other functions. -
saveNetwork()
argumentoutput_filename
deprecated in favor ofoutput_name
for consistency with other functions. -
sparseAdjacencyMatFromSeqs()
deprecated in favor of its better-named twingenerateAdjacencyMatrix()
. -
generateNetworkFromAdjacencyMat()
deprecated in favor of its better-named twingenerateNetworkGraph()
.
Minor Changes and Bug Fixes
-
output_type = "individual"
now also saves the list of plots (if present) to an RDS file. This prevents theggraph
objects containing the plots from being lost, in case the user wishes to modify these plots in the future. - All instances of the
output_name
parameter now automatically replace potentially unsafe characters with underscores and removes any leading or trailing non-alphanumeric characters. Safe characters include alphanumeric characters, underscores and hyphens. - Package functions no longer print messages to the console by default. Functions now have a
verbose
argument which can be set toTRUE
to enable printing of console messages. For logging purposes, these messages are now generated usingmessage()
rather thancat()
, and so send their output tostd.err()
rather thanstd.out()
. -
buildRepSeqNetwork()
,addPlots()
andgenerateNetworkGraphPlots()
now haveprint_plots
set toFALSE
by default (plots are no longer printed to the R plotting window unless manually specified). -
buildAssociatedClusterNetwork()
now removes duplicate observations after loading the data from all neighborhoods. When multiple associated sequences are similar, the same clone from a given sample can belong to multiple neighborhoods. Previously, this occurrence resulted in the same clone appearing multiple times in the global network. -
simulateToyData()
argumentseed_value
removed. Users can set a seed prior to calling the function if desired. -
generateNetworkGraphPlots()
now handles the case wherecolor_nodes_by
contains duplicate values by removing the duplicate values with a warning. Ifcolor_scheme
is a vector, the corresponding entries ofcolor_scheme
are also removed. Previously, this case resulted in a list of plots containing two elements with the same name. - When
generateNetworkGraphPlots()
is called with a non-numeric variable specified forsize_nodes_by
, the function now defaults to fixed node sizes with a warning. -
addClusterStats()
andbuildRepSeqNetwork(cluster_stats = TRUE)
now callsum()
andmax()
withna.rm = TRUE
when computing abundance-based properties. This change reflects the fact thatbuildRepSeqNetwork()
no longer drops input data rows withNA
andNaN
values in the count column. -
combineSamples()
andloadDataFromFileList()
now preserve the original row IDs of each input file, which are prepended in the combined data by sample IDs (if available) or the file number based on the order infile_list
.
NAIR 0.0.9044
Breaking Changes
- Removed the following functions:
installPythonModules()
kmeansAtchley()
adjacencyMatAtchleyFromSeqs()
encodeTCRSeqsByAtchleyFactor()
- The
dist_type
argument of various package functions no longer accepts the value"euclidean_on_atchley"
.
New Features
- Unit tests added for
hamDistBounded()
,levDistBounded()
,sparseAdjacencyMatFromSeqs()
, and low-level argument checks. - Argument checks have been expanded to encompass most user-facing package functions.
- In all functions where it appears, the
dist_type
argument now accepts abbreviations of both"hamming"
and"levenshtein"
, such as"ham"
,"lev"
,"h"
and"l"
. - The
fun
argument ofaddClusterMembership()
is now passed tomatch.fun()
before being called. This change affects thecluster_fun
argument of higher-level functions, allowing users to specify clustering algorithms using the syntax, e.g.,cluster_fun = "cluster_walktrap"
in addition to the previously-acceptedcluster_fun = cluster_walktrap
.
Minor Changes and Bug Fixes
- The internal C++ functions that compute network adjacency matrices no longer write temporary files to the current working directory, instead writing them to the temporary directory of the current R session.
-
findAssociatedClones()
now cleans up after itself, removing temporary files and directories it creates within the temporary directory while performing its tasks.
Lifecycle Changes
- A lifecycle stage of Experimental has been added for the package.
- The functions and arguments within the package that were previously deprecated now have their signaling and warnings handled through
lifecycle
package functions.
Documentation Changes
- The vignettes
Searching for Associated TCR/BCR Clusters
,Searching for Public TCR/BCR Clusters
andNetwork Visualization
have been removed from the package and now exist as articles on the package’s website. This was done to reduce the size of the installed package. - URLs in documentation files and vignettes have been curated to conform with CRAN’s policies. Specifically, any URL that redirected to another URL has been replaced by the target of the redirection. Links to package CRAN pages are now in canonical form.
- All function reference files now run their examples when the package is built or checked. Some examples have been expanded.
- Examples and vignette code now remove files and directories created in the temporary directory for the current R session.
- Added a documentation file for the package itself (
NAIR-package
). - All function reference files now have a Value section, including functions that do not return a value.
- The package Readme file now includes a badge for the package lifecycle
Backend Package Changes
- Added R minimum version requirement 3.1.0 to
Depends
field of DESCRIPTION, since version 3.0.2 or greater is needed to require specific minimum versions of RcppArmadillo and Rcpp in theLinkingTo
field (requiring 3.1.0 since CRAN advises against requiring R versions that don’t have 0 as the third value). - Reintroduced compile flags for OpenMP support to Makevars (this change only applies to MacOS and Linux, as these compile flags were never removed from Makevars.win)
- Added the
lifecycle
package toImports
, imported thedeprecated()
function and copied lifecycle badge images into the package files. Functions and their arguments can now be assigned lifecycle stages and badges can be used in package documentation files. - Removed the
reticulate
package fromImports
and removed the associated scaffolding throughout the package that was set up for integration with python scripts.
NAIR 0.0.9043
Vignettes
- Function names in vignettes have been reformatted so that when pkgdown builds the package webpage articles from the vignettes, function names will link to the webpages with their documentation files.
Website
- Custom index added for the reference topics. Topics are now organized into named sections, dramatically improving the ability to find particular topics or functions of interest.
Package
-
packageStartupMessage()
added to.onAttach()
: When loaded, the package will provide a welcome message with instructions for getting started.
NAIR 0.0.9042
Package Functions
-
findAssociatedClones
- The variable
SampleID
created in the output data is now forced to be of type character. Previously, values from the argumentsample_ids
were sometimes unintentionally converted from character to numeric, such as the default values insample_ids
, which were"1"
,"2"
, etc. This was causing these variables to be treated as continuous variables when used to color nodes in the network graph plot, which resulted in their color scales being depicted in the wrong format in the plot legend. - The
sample_ids
argument is now coerced to a character vector. This prevents an error when saving the output that occurred whensample_ids
used numeric values. - Default value of
sample_ids
now has entries"Sample1"
,"Sample2"
, etc., instead of"1"
,"2"
, etc.
- The variable
-
findPublicClusters
- The variables
SampleID
,SubjectID
andGroupID
created in the output data are now forced to be of type character. Previously, values from the argumentssample_ids
,subject_ids
andgroup_ids
were sometimes unintentionally converted from character to numeric, such as the default values insample_ids
, which were"1"
,"2"
, etc. This was causing these variables to be treated as continuous variables when used to color nodes in the network graph plot, which resulted in their color scales being depicted in the wrong format in the plot legend. - The
sample_ids
argument is now coerced to a character vector. This prevents an error when saving the output that occurred whensample_ids
used numeric values (which was the previous default!). - Default value of
sample_ids
now has entries"Sample1"
,"Sample2"
, etc., instead of1
,2
, etc.
- The variables
-
buildPublicClusterNetwork
- Argument
plot_title
added with default value"Global Network of Public Clusters"
. Previously this argument was passed tobuildRepSeqNetwork
through the ellipses...
argument, and thus used a default value of"auto"
, which resulted in the default plot title being the value of theoutput_name
argument, which is"PublicClusterNetwork"
by default.
- Argument
-
plotNetworkGraph
- Added arguments
pdf_width
andpdf_height
for adjusting the dimensions of the pdf when saving this function’s output directly using theoutfile
argument. Other package functions usesaveNetworkPlots
for saving plots created usingplotNetworkGraph
, so the absence of these arguments in theplotNetworkGraph
function had gone unnoticed previously. But since the function has an option to save the output directly to pdf using theoutfile
argument, it is only appropriate to also provide control over the pdf dimensions.
- Added arguments
-
kmeansAtchley
- Default values for
amino_col
andsample_col
arguments removed as the previous defaults are no longer useful. They were originally designed based on a previous version of the associated clusters workflow. - Default filenames for the pdfs of the heatmaps have been changed to
"atchley_kmeans_TCR_fraction_per_cluster.pdf"
and"atchley_kmeans_correlation_heatmap.pdf"
. The previous values"atchley_kmeans_cluster_relative_size_profiles_by_sample.pdf"
and"atchley_kmeans_corr_in_cluster_size_profile_between_samples.pdf"
were longer and potentially more confusing in their meaning.
- Default values for
NAIR 0.0.9041
Package Functions
-
buildAssociatedClusterNetwork
- Now performs clustering analysis to obtain cluster membership ID even if the user manually specifies not to compute cluster stats or the
cluster_id
network property. This is because performing clustering and obtaining the cluster membership is a primary purpose of this function. It is still desirable for the user to be able to prevent other node-level properties as well as cluster-level properties from being computed if desired, but now doing so will not interfere with the function accomplishing its purpose.
- Now performs clustering analysis to obtain cluster membership ID even if the user manually specifies not to compute cluster stats or the
-
findPublicClusters
- Fixed a bug that was causing the sample-level network node property
SampleLevelCloseness
to be left named ascloseness
in the data frames for the filtered node-level data. This bug was in turn causing this property to be overwritten by the global network node propertyPublicCloseness
when callingbuildPublicClusterNetwork
.
- Fixed a bug that was causing the sample-level network node property
NAIR 0.0.9040
Package Functions
-
buildAssociatedClusterNetwork
- Default value of
data_symbols
argument changed fromNULL
to"data"
in order to match the output format offindAssociatedClones
whenfindAssociatedClones
is called withoutput_type = "rda"
. Note this change only affects the case whenbuildAssociatedClusterNetwork
is called withinput_type = "rda"
- Now performs clustering analysis to obtain cluster membership ID even if the user manually specifies not to compute cluster stats or the
cluster_id
network property. This is because performing clustering and obtaining the cluster membership is a primary purpose of this function. It is still desirable for the user to be able to prevent other node-level properties as well as cluster-level properties from being computed if desired, but now doing so will not interfere with the function accomplishing its purpose.
- Default value of
Vignettes
-
buildRepSeqNetwork
- For both
buildRepSeqNetwork
andsaveNetwork
, the vignette now specifies the R environment variable name for the output list when it is saved to an Rdata file usingoutput_type = "rda"
.
- For both
-
Searching for Associated TCR/BCR Clusters
- Content added to more completely explain certain behavior and arguments that were not previously covered.
NAIR 0.0.9039
Package Metadata
- Updated authors in DESCRIPTION
- Updated journal article citation in Readme and documentation files
Vignettes
- The term “AIRR-Seq” is now spelled out in full as “Adaptive Immune Receptor Repertoire Sequencing” prior to the first instance of its abbreviation in each vignette in which it appears
-
buildRepSeqNetwork
- Meaning of cluster membership ID and corresponding
cluster_id
network property are now more clearly explained
- Meaning of cluster membership ID and corresponding
-
Searching for Associated TCR/BCR Clusters
- Header levels and document structure updated
- Further revisions/additions, including fixing the remaining broken links, are forthcoming
NAIR 0.0.9038
Bug Fixes
- Fixed a bug in
filterInputData
that raised an error when thecount_col
andsubset_cols
arguments were both non-null
Functions
- When calling
plotNetworkGraph
directly with a vector provided tocolor_nodes_by
and withcolor_title = "auto"
(the default), the function will attempt to use the name of the vector for the color legend title. A similar change applies with respect to the argumentssize_nodes_by
andsize_title
. -
buildPublicClusterNetwork
argumentsnode_stats
,stats_to_include
andcluster_stats
are now deprecated and do nothing. All node-level and cluster-level network properties are now automatically computed. The arguments remain in order to maintain backwards compatibility with user code, but raise a warning notifying the user of their deprecated state when a non-null value is provided. - Functions for clustering algorithms imported from the
igraph
package (such ascluster_fast_greedy
) are now exported in the package NAMESPACE file so that they are available to users. These functions can now be used as inputs to thecluster_fun
argument of variousNAIR
package functions without the need to use theigraph::
prefix. - Documentation file added for the above re-exported functions
Vignettes and Documentation
-
Utility Functions
vignette (formerly titledDownstream Analysis
) removed. Its content has been absorbed into thebuildRepSeqNetwork
andNetwork Visualization
vignettes - Revisions and content additions to the following vignettes:
buildRepSeqNetwork
Network Visualization
Searching for Public Clusters
- The help file for
plotNetworkGraph
now recommends that users prefer the higher-level functiongenerateNetworkGraphPlots
overplotNetworkGraph
, since the former has arguments that behave identically to those ofbuildRepSeqNetwork
and supports generation of multiple plots.plotNetworkGraph
is called bygenerateNetworkGraphPlots
, so users should have no need to callplotNetworkGraph
directly. However,plotNetworkGraph
remains as an exported function available to the user in order to maintain backwards compatibility with user code.
NAIR 0.0.9037
- Changes to
findAssociatedSeqs
:-
groups
argument still exists but is now deprecated and no longer used. Group labels are now automatically determined from the unique values ofgroup_ids
-
sample_ids
argument still exists but is now deprecated and no longer used. Custom sample IDs play no role infindAssociatedSeqs
; the argument was inherited from a previous function that included the functionality of bothfindAssociatedSeqs
andfindAssociatedClones
-
-
findPublicClusters
now ignoresplots = TRUE
whenprint_plots = FALSE
andoutput_dir_unfiltered = NULL
. This prevents unused plots from being generated -
buildAssociatedClusterNetwork
now uses group ID as the default variable for node colors -
buildPublicClusterNetwork
andbuildPublicClusterNetworkByRepresentative
now use sample ID as the default variable for node colors -
buildPublicClusterNetworkByRepresentative
default plot title and subtitle updated for better clarity -
buildRepSeqNetwork
,generateNetworkObjects
andgenerateNetworkGraphPlots
now usecount_col
as the default variable for node colors if available, followed in priority by cluster ID, then network degree. - New arguments to
addClusterLabels
:-
cluster_id_col
added to permit use with node data where the cluster ID variable has a custom name (e.g., with the output ofbuildPublicClusterNetwork
) -
greatest_values
added, which can be set toFALSE
to prioritize the clusters to label based on the least values of thecriterion
variable rather than the greatest values
-
- A function
exclusiveNodeStats
has been added. This function behaves in the same manner aschooseNodeStats
, but all arguments are set toFALSE
by default. Useful when the user only wishes to specify a small number of node-level properties to compute, with all other properties excluded. - Major revisions to the following vignettes:
NAIR: Network Analysis of Immune Repertoire
Searching for Public TCR/BCR Clusters
Searching for Associated TCR/BCR Clusters
buildRepSeqNetwork
-
Network Visualization
(incomplete, in progress) - Of particular note, the associated clusters and public clusters vignettes now simulate more reasonable toy data for demonstration purposes.
-
Downstream Analysis
vignette title renamed toUtility Functions
. A revision to this vignette is planned prior to version 1.0. - CXX_STD = CXX11 flag removed from src/Makevars and src/Makevars.win
NAIR 0.0.9036
-
buildRepSeqNetwork
no longer returns an error withdist_cutoff = 0
(fixed a bug involving the argument checks added in version 0.0.9035).
NAIR 0.0.9035
- Argument checks added to
buildRepSeqNetwork
-
buildRepSeqNetwork
now automatically attempts to perform the following conversions:- coerces the input data to a data frame
- coerces the sequence column to character
- coerces the count column to numeric, if provided
-
filterInputData
, which affect top-level functions such asbuildRepSeqNetwork
that call it:- automatically drops data rows with
NA
values in the sequence column, with a warning produced - added optional
count_col
arg; if provided, the count column will be coerced to numeric and rows withNA/NaN
values in the count column will be dropped with a warning
- automatically drops data rows with
- Changes related to choosing node-level network statistics:
- The
node_stat_settings
function now has a duplicate with the less-confusing namechooseNodeStats
; the newer name is now used in place ofnode_stat_settings
for defaults and in the tutorials - The
stats_to_include
argument ofaddNodeNetworkStats
,buildRepSetNetwork
, etc., now also accepts a named logical vector with the same named elements as the list previously required. A list will still work, for backwards compatibility. -
chooseNodeStats
/node_stat_settings
now generate a named logical vector rather than a list.
- The
- The
dist_type
argument is now more flexible in the values it will accept; for example"lev"
or simply"l"
is now equivalent to"levenshtein"
NAIR 0.0.9033
- Rdocumentation files
- Added documentation for
simulateToyData
- All examples now use
simulateToyData
to generate data
- Added documentation for
NAIR 0.0.9032
- Rdocumentation files
- Fixed instances where tildes were erroneously used in place of the
\code{}
environment
- Fixed instances where tildes were erroneously used in place of the
NAIR 0.0.9031
- Added documentation for the following functions:
addGraphLabels
addClusterLabels
- Added the following content to the package vignettes:
- New vignette
Dual-Chain Network Analysis
for dual-chain network analysis on single-cell data -
buildRepSeqNetwork
vignette:-
cluster_fun
argument (clustering algorithm)
-
-
Network Visualization
vignette:-
addClusterLabels
function
-
-
Downstream Analysis
vignette:-
addClusterLabels
function
-
-
Finding Associated Clones
vignette:-
addClusterLabels
function used to label the clusters
-
- New vignette
NAIR 0.0.9030
- Added function
addGraphLabels
for adding text labels to the nodes of a graph plot - Added function
addClusterLabels
for adding labels to certain clusters in a graph plot
NAIR 0.0.9029
- The algorithm used to identify clusters in
addClusterMembership()
can now be controlled via a new argumentfun
. - The following functions have a new argument
cluster_fun
that is passed to thefun
argument ofaddClusterMembership()
:
NAIR 0.0.9028
-
buildRepSeqNetwork()
andgenerateNetworkObjects()
now returnNULL
with a warning when the constructed network contains no edges.
NAIR 0.0.9027
-
getClusterStats()
now computes sequence-based statistics (e.g., sequence with max count) for dual-chain networks, including a separate set of such statistics for each chain. - The name of some variables in the output of
getClusterStats()
have been changed to reflect broader applicability to single-cell data:-
max_clone_count
changed tomax_count
-
agg_clone_count
changed toagg_count
-
NAIR 0.0.9026
- Added an argument
verbose
tofindAssociatedClones()
that can be optionally set toTRUE
in order to print additional console output reporting the number of clones in each neighborhood, both by sample and in total. - Discovered and fixed the following bugs that were present from 0.0.9018 onward:
- Fixed a bug whereby
findAssociatedSeqs()
was not correctly computing the counts used for Fisher’s exact test - Fixed a bug in
findPublicClones()
involving identification of the top n clusters by node count in each sample: when more than one cluster possessed the nth highest node count, all of these clusters were included in the top n clusters, resulting in more than n clusters identified by this criterion. This has been reverted to the behavior that existed prior to version 0.0.9018, whereby the first n clusters are selected after sorting data rows by descending node count using theorder
function.
- Fixed a bug whereby
NAIR 0.0.9025
- Fixed a bug in
filterInputData()
that was preventing filtering by minimum sequence length - Removed
BiocManager
fromSuggests
field of DESCRIPTION, since it is no longer used to access demonstration data when building vignettes.
NAIR 0.0.9024
- Converted all package vignettes to use data created with
simulateToyData()
- Additional detail added to vignettes for associated clones and public clones workflows
NAIR 0.0.9023
- Added user-level function
simulateToyData
for generating example (toy) data, primarily for use in vignettes, examples and tests. - Converted README to use
simulateToyData
NAIR 0.0.9022
- Numerous utility functions that were previously internal have been renamed and exported to be available to the user. These include
generateNetworkObjects()
,generateNetworkGraphPlots()
,filterInputData()
,getNeighborhood()
,loadDataFromFileList()
,combineSamples()
,saveNetwork()
, andsaveNetworkPlots()
. - Remaining documentation added for all user-level functions.
- The package vignette content has now been split across multiple vignettes, with the package vignette serving as an overview and hub linking to the other vignettes.
- Removed some previously user-facing utility functions from the package’s exported namespace that were redundant or unnecessary to expose to the user, including
levAdjacencyMatSparse
,hamAdjacencyMatSparse
,generateNetworkFromSeqs
,getSimilarClones
andfilterClonesBySequenceLength
.
NAIR 0.0.9021
- Minor bug fixes to a few functions in
utils.R
that caused errors or warnings in rare cases
NAIR 0.0.9020
- Internal function
.saveNetwork
changed to user-facing functionsaveNetwork
, for use in saving output during downstream analysis
NAIR 0.0.9018
- Changes to
buildRepSeqNetwork()
(many of these changes carry over to other functions):- Changed arguments and functionality for saving output. A new
output_type
argument can be used to save the output list to a rds or rda file, rather than the default behavior of saving each item in an individual, uncompressed file. Rather than specifying the filename of each item individually, theoutput_name
argument accepts a character string to be used as a common prefix for any files saved. All items are now saved, and thesave_all
argument has been removed. - Changed name of argument
other_cols
tosubset_cols
to more accurately reflect its current role (for keeping only certain input columns rather than all) - Changed name of argument
drop_chars
todrop_matches
to better imply that it takes regular expressions and character strings - Changed names of arguments
plot_width
andplot_height
topdf_width
andpdf_height
to more clearly indicate that they affect the dimensions of the saved pdf file, but not those of the plot at theR
object level (ggplot
) or as it appears in the R plotting window. - Added logical argument
plots
which can be used to prevent plots from being generated. - Many of the plotting arguments are now passed to
generateNetworkGraphPlots
via elipses (...
) - The returned value is now always a list and always contains the igraph, adjacency matrix and plots in addition to the node data. The
return_all
argument has been removed. - When computing cluster-level properties (
cluster_stats = TRUE
), the corresponding data frame in the output list is now namedcluster_data
(previously wascluster_stats
) - Now invisibly returns its output, such that it can be assigned but will not be printed when the function is called without assigning its output.
- Now invisibly returns
NULL
with a warning when fewer than 2 sequences exist after filtering; previously it returned an error.
- Changed arguments and functionality for saving output. A new
-
buildRepSeqNetwork
now supports a dual-chain approach to analyzing single-cell RepSeq data: two cells (nodes) are considered adjacent if and only if they possess similar receptor sequences in both of two chains (e.g., alpha chain and beta chain). This is done by supplying a vector with two column references toseq_col
instead of a single column reference, where the two columns each contain the receptor sequence from a different chain (e.g., CDR3 sequences from alpha and beta chains) and each row corresponds to a unique cell. This functionality can more generally be used to perform network analysis where similarity is based on any two types of sequences instead of one. - The functions for the public clusters workflow have been redesigned. Where previously the workflow was handled by a single function,
findPublicClusters
, it is now split across multiple functions in a manner that reduces memory usage and increases the flexibility of the workflow.-
findPublicClusters
now performs network analysis on each sample individually to search for public clusters -
buildPublicClusterNetwork
combines the public clusters across samples and performs network analysis -
buildPublicClusterNetworkByRepresentative
can be used to perform network analysis on the combined public clusters using only a single representative clone from each cluster - The optional step for K-means clustering based on numeric encoding of TCR sequences is now performed directly using the
kmeansAtchley
function
-
- The functions for the associated clusters workflow have been redesigned in a manner similar to those for the public cluster workflow, and now use the same input format as the public cluster functions (one file per sample instead of a single data frame containing all samples):
-
findAssociatedSeqs
searches across samples for associated clone sequences based on sample membership and Fisher’s exact test P-value -
findAssociatedClones
searches across samples for clones within a neighborhood of each associated clone sequence -
buildAssociatedClusterNetwork
combines the neighborhoods and performs network analysis and clustering - Building networks for individual associated clusters/neighborhoods is now done directly using the
buildRepSeqNetwork
function on the desired subset of the output frombuildAssociatedClusterNetwork
- K-means clustering on numerically encoded TCR sequences is now done directly using the
kmeansAtchley
function
-
- A new function
generateNetworkGraphPlots()
has been added, which is capable of generating multiple plots with argument usage similar to that used inbuildRepSeqNetwork
(e.g., multiple color-code variables can be supplied, in which case color scheme and color legend title arguments will meaningfully accept either a scalar or vector valued argument) - Changes to
plotNetworkGraph()
:- Some arguments were renamed and reordered to conform with the arguments of
buildRepSeqNetwork()
- Now accepts the argument
show_color_legend = "auto"
, which will show the color legend ifcolor_nodes_by
is a continuous variable or a discrete variable with at most 20 distinct values.
- Some arguments were renamed and reordered to conform with the arguments of
-
getClusterStats()
can now be used withseq_col = NULL
, as the sequence variable is only used for a small number of statistics; similar to whencount_col = NULL
, the dependent statistics will beNA
in the returned data frame, but other cluster properties will still be computed. -
buildRepSeqNetwork()
and other high-level functions that generate a network from sequences now coerce the list of sequences to a character vector if it is not already in this format (e.g., factors). -
buildRepSeqNetwork()
and other top-level functions now skip automatic plot generation when more than 1 million nodes are present in the network. This is done to avoid a potential error when callingggplot
that occurs when the combined nodes and edges exceed its limitations. After the network is generated and returned, the user can still attempt to manually generate the plot usingplotNetworkGraph()
; in this manner, the potential error will not interfere with completion of building the network.
NAIR 0.0.9016
- Package vignette added, which includes an introduction to the package and a tutorial of the
buildRepSeqNetwork()
function - Package readme file updated
-
findPublicClusters()
now supports.rds
and.rda
file types; thecsv_files
argument has been replaced with an argument namedfile_type
. - Identified and fixed an error related to
filterClonesBySequenceLength()
that occurs when the input data only has a single column; this was affecting higher-level functions includingbuildRepSeqNetwork()
- Identified and fixed an error in
getAssociatedClusters()
that occurred whenneighborhood_plots = FALSE
andreturn_all = TRUE
(the function tried to include output related to the neighborhood plots when none existed). -
findAssociatedClones()
now returns an informative error when no sequences pass the filter for minimum sample membership.
NAIR 0.0.9014
- The following argument names have been changed in functions in which they appear:
-
clone_col
toseq_col
-
clones
toseqs
, (except forembedClonesByAtchleyFactor()
, for which it was changed tocdr3_AA
) -
edge_dist
todist_cutoff
-
- The following functions have been renamed:
-
generateNetworkFromClones
togenerateNetworkFromSeqs
-
sparseAdjacencyMatFromClones
tosparseAdjacencyMatFromSeqs
-
adjacencyMatAtchleyFromClones
toadjacencyMatAtchleyFromSeqs
-
embedClonesByAtchleyFactor
toembedTCRSeqsByAtchleyFactor
-
-
getSimilarClones()
: changed default value to ofdrop_chars
argument toNULL
-
buildRepSeqNetwork()
had its usage revised, primarily regarding the arguments related to the input data:- The
nucleo_col
,amino_col
andclone_seq_type
arguments have been replaced by a singleseq_col
argument; the function no longer requires both nucleotide and amino acid sequences in the data, and no longer distinguishes between the two - The
count_col
is now optional- By default, graph plots now use fixed node sizes
- Other column arguments (
freq_col
,vgene_col
,cdr3_length
, etc.) have been removed; these columns were not used for anything specific in the pipeline, so it is not necessary for them to each have their own dedicated argument. - By default, all columns of the input data are now carried over to the output. If only some columns are desired, they can be specified using the
other_cols
argument. - Input columns are no longer renamed in the output data.
- The option to aggregate counts/frequencies for identical clones has been removed; this can be done as a data preprocessing step, either manually or using the
aggregateIdenticalClones()
function. - An argument
print_plots
has been added to allow the option not to print the plot(s) inR
. The default isTRUE
, which corresponds to the previous behavior (all plots are printed).
- The
-
findAssociatedClones()
,getAssociatedClusters()
andfindPublicClusters()
have had their arguments revised according to the changes tobuildRepSeqNetwork()
. - R documentation files added for the following functions:
plotNetworkGraph
sparseAdjacencyMatFromSeqs
adjacencyMatAtchleyFromSeqs
embedTCRSeqsByAtchleyFactor
NAIR 0.0.9013
- R Documentation files added for package functions:
aggregateIdenticalClones
filterClonesBySequenceLength
getSimilarClones
generateNetworkFromClones
generateNetworkFromAdjacencyMat
addNodeNetworkStats
node_stat_settings
addClusterMembership
getClusterStats
- Implemented on-install testing of package functions using package
testthat
NAIR 0.0.9012
- Package name changed to NAIR (Network Analysis for Immune Repertoire)
- Added
findPublicClusters()
-
buildClustersAroundSelectedClones()
renamed togetAssociatedClusters()
-
getPotentialAssociatedClones()
renamed tofindAssociatedClones()
-
generateAtchleyCorrHeatmap()
renamed tokmeansAtchley()
-
levAdjacencyMatSparse()
andhamAdjacencyMatSparse()
have a new argumentdrop_isolated_nodes
that can be set toFALSE
to keep isolated nodes. This argument has been added to higher-level functions that dispatch calls to these routines.
NAIR 0.0.9011
- Separate function
embedClonesByAtchleyFactor()
created to perform embedding of TCR CDR3 amino acid sequences in Euclidean 30-space based on Atchley factor representation; this was previously done within the functionadjacencyMatAtchleyFromClones()
, but has now been placed in its own function for more general use - New function
analyzeDiseaseAssociatedClusters()
created, which is used to perform a combined network analysis on the disease-associated clusters generated bygenerateDiseaseAssociatedClusters()
- New function
generateAtchleyCorrHeatmap()
created -
graphics
,reshape2
,gplots
,viridisLite
andRColorBrewer
added as package dependencies via theImports
directive of theDESCRIPTION
file
NAIR 0.0.9010
-
computeMetaForCandidateSeqs()
(helper forfindDiseaseMotifsFromMergedSamples()
) redesigned and renamed tofindDiseaseAssociatedClones()
; this function now takes only the merged sample data as its input data, and filters sequences by a set of criteria (number of samples shared by and minimum seq length) to obtain the list of candidates before conducting the Fisher’s exact tests; previously the list of candidates was obtained as input data to the function. -
findDiseaseMotifsFromMergedSamples()
redesigned to use candidate sequence metadata as input (previously it computed this metadata from the merged sample data) and renamed togenerateDiseaseAssociatedClusters()
-
generateNetworkWithStats()
now automatically prints the ggraph in R when called (previously the user needed to access the variablegraph_plot
contained in the returned list) - Added package
dplyr
as a dependency via theImports
directive of theDESCRIPTION
file
NAIR 0.0.9009
-
buildNetwork()
renamed togenerateNetworkWithStats()
-
adjacencyMatrix()
renamed tosparseAdjacencyMatFromClones()
-
genNetworkGraph()
renamed togenerateNetworkFromAdjacencyMat()
- Added general-purpose helper functions:
aggregateCountsByAminoAcidSeq
filterDataBySequenceLength
generateNetworkFromClones
computeNodeNetworkStats
addClusterMembership
computeClusterNetworkStats
plotNetworkGraph
adjacencyMatAtchleyFromClones
- Added function
findDiseaseMotifsFromMergedSamples
and helper functions:computeMetaForCandidateSeqs
subsetDataNearTargetMotif
- Added vignette for
generateNetworkWithStats()
NAIR 0.0.9008
- New function
hamDistBounded
for computing bounded Hamming distance in C++- Supports strings of unequal length; the longer string is effectively truncated to the length of the shorter string, and the difference in length is added to the Hamming distance with the truncated version. This is equivalent to extending the shorter string to the length of the longer string by appending placeholder characters that differ from their counterparts in the longer string.
- New function
hamAdjacencyMatSparse
for computing Hamming adjacency matrix in C++ - Changes to function
adjacencyMatrix
:- Now supports
dist_type = "hamming"
to use Hamming distance for determining network adjacency - Original indices of input sequences that appear in the adjacency matrix are stored in the row names of the output matrix; the sequences themselves are stored in the column names (accessible via
dimnames()
) - The file
col_ids.txt
created by the C++ function that computes the adjacency matrix is now deleted byadjacencyMatrix()
after it has finished its other tasks. The information in the file is now stored in the row names of the output matrix and so the file is no longer needed.
- Now supports
NAIR 0.0.9007
-
.genNetworkGraph()
internal helper forbuildNetwork()
renamed into a public versiongenNetworkGraph()
for use by other package functions and by users; moved to a new fileutils.R
that will be used to house shared helper functions used by multiple package functions. - Added
inst/python/Atchley_factors.csv
, which stores the Atchley factor amino acid embedding used byBriseisEncoder.py
NAIR 0.0.9006
- Python module
tensorflow
added toinstallPythonModules()
and Config/reticulate field of the DESCRIPTION file.tensorflow
is required by the Python modulekeras
.
NAIR 0.0.9005
- file
zzz.R
created with .onUnload() directive to unload package dll via call tolibrary.dynam.unload()
when the package is unloaded
NAIR 0.0.9004
- Reintroduced reticulate package dependency as well as R .onLoad() directives and R functions related to python integration, dependency management and module installation
- Added Python source code for BriseisEncoder and h5 trained encoder file
- Removed $(SHLIB_OPENMP_CXXFLAGS) from PKG_CXXFLAGS and PKG_LIBS in Makevars as OpenMP is not supported for MacOS (still enabled in Makevars.win). Added comments with instructions for Linux users to enable OpenMP if desired, and added a corresponding note to the main Readme file’s installation section for Linux users.
NAIR 0.0.9003
- Removed Python source code (previously used to compute adjacency matrix)
- Removed reticulate package dependency (previously used to manage python dependencies)
- Removed R functions involved in Python management and calling Python scripts
- Changed internal helper R function .createAdjacencyMatrix() to public function adjacencyMatrix() and switched its calls to Python scripts for building the matrix into a call to the corresponding new C++ version added in 0.0.9002 (currently only Levenshtein distance is implemented; Hamming distance will be implemented in a future update)
- Changed buildNetwork() function to use adjacencyMatrix() instead of the previous function .createAdjacencyMatrix()
- Removed levAdjacencyMatDense in favor of always using levAdjacencyMatSparse
- Enabled ARMA_64BIT_WORD to accommodate larger matrices when using the C++ function levAdjacencyMatSparse; specifically, #define ARMA_64BIT_WORD=1 was added to the beginning of each .cpp source file, and -DARMA_64BIT_WORD=1 was added to the PKG_CXXFLAGS definition in Makevars and Makevars.win