aggregateByGroup: Aggregate SNP data by cell groups
aggregateByGroup.Rd
Aggregates single-cell SNP data into group-level summaries based on a specified metadata column. This function collapses individual cell SNP counts into group-level matrices, which can be used for group-level differential SNP analyses. The function supports both transplant and non-transplant modes, donor type filtering, and normalized expression values.
Arguments
- group_by
Character. Column name in metadata to use for grouping cells. Must be present in cell_metadata.
- donor_type
Character, optional. Specific donor type to analyze (e.g., "Donor" or "Recipient"). If NULL, uses all cells. Ignored in non-transplant mode.
- min_cells_per_group
Integer. Minimum number of cells required for a group to be included in analysis. Groups with fewer cells are marked as "filtered_low_cells" in the metadata.
- use_normalized
Logical. Whether to include normalized depth counts in the output (TRUE) or only use raw counts (FALSE).
Value
A list containing:
- ad_matrix
Aggregated alternative allele counts matrix (SNPs x Groups)
- dp_matrix
Aggregated depth matrix (SNPs x Groups)
- dp_matrix_normalized
Aggregated normalized depth matrix (SNPs x Groups), if available and requested
- metadata
Data frame with group-level metadata and QC metrics
- group_by
The metadata column used for grouping
- parameters
List of parameters used for aggregation
- snp_info
Data frame with SNP information
- snp_annotations
Data frame with SNP annotations
Details
This function works by:
Filtering cells based on donor_type if specified (e.g., only use Donor cells)
Identifying unique values in the grouping column (e.g., cell_type)
Summing alternative allele counts and depth counts across all cells in each group
Creating group-level metadata with cell counts and quality metrics
Filtering groups with fewer cells than the specified threshold
The function automatically detects non-transplant mode (single donor type) and adjusts its behavior accordingly. It also checks for normalized counts and includes them in the output if available and requested.
Note
This function is typically used as a preprocessing step before
findSNPsByGroup()
The aggregated matrices no longer contain cell-level information; all counts are summed across cells in each group
For transplant data, it's often useful to analyze donor and recipient cells separately by specifying the donor_type parameter
Groups with fewer cells than min_cells_per_group are marked as "filtered_low_cells" in the metadata but are still included in the output matrices
Examples
if (FALSE) { # \dontrun{
# Basic usage - aggregate by cell type
collapsed <- project$aggregateByGroup(
group_by = "cell_type",
use_normalized = TRUE
)
# Analyze only donor cells with stricter filtering
donor_agg <- project$aggregateByGroup(
group_by = "cell_type",
donor_type = "Donor",
min_cells_per_group = 5
)
# Aggregate by disease status
disease_agg <- project$aggregateByGroup(
group_by = "disease_status",
use_normalized = TRUE
)
} # }