Build a unified SNP database from all samples
buildSNPDatabase.Rd
Integrates SNP data, annotations, and cell metadata from all samples in the variantCell project into a unified database. This function creates merged sparse matrices for alternative allele (AD) and depth (DP) counts across all cells, combines cell metadata, annotates SNPs with genomic features, and calculates database-wide metrics. Optionally adds rsID numbers from VCF file reference.
Value
Invisibly returns self (the variantCell object) with the unified SNP database constructed and stored in the snp_database field.
Details
This function performs several key steps:
Collects SNP information across all samples and identifies unique SNPs
Retrieves genomic annotations for all SNPs (exonic, intronic, promoter, etc.)
Combines metadata from all samples, handling missing columns appropriately
Creates unified sparse matrices for AD and DP counts across all cells
If available, also creates a matrix of normalized counts
Calculates database-wide metrics for each SNP
Generates a QC report with summary statistics
Optionally, adds rsIDs
The function handles the complexities of integrating data from multiple samples with potentially different sets of SNPs and metadata columns. It manages matrix indexing, column alignment, and other technical aspects needed to build a cohesive database.
After running this function, all subsequent analyses (differential expression, plotting, etc.) will use the unified database rather than individual sample data.
Note
This function must be called after adding all desired samples with
addSampleData()
The function requires at least one sample to be added
SNP annotation may take significant time for large datasets
The resulting database can use substantial memory for projects with many cells and SNPs
Examples
if (FALSE) { # \dontrun{
# Initialize a variantCell project
project <- variantCell$new()
# Add samples
project$addSampleData(...)
project$addSampleData(...)
# Build the unified SNP database
project$buildSNPDatabase()
# Now the project is ready for analysis
project$setProjectIdentity("cell_type")
results <- project$findDESNPs(...)
} # }