aggregateByGroup: Aggregate SNP data by cell groups — aggregateByGroup • variantCell

Aggregates single-cell SNP data into group-level summaries based on a specified metadata column. This function collapses individual cell SNP counts into group-level matrices, which can be used for group-level differential SNP analyses. The function supports both transplant and non-transplant modes, donor type filtering, and normalized expression values.

Arguments

group_by: Character. Column name in metadata to use for grouping cells. Must be present in cell_metadata.
donor_type: Character, optional. Specific donor type to analyze (e.g., "Donor" or "Recipient"). If NULL, uses all cells. Ignored in non-transplant mode.
min_cells_per_group: Integer. Minimum number of cells required for a group to be included in analysis. Groups with fewer cells are marked as "filtered_low_cells" in the metadata.
use_normalized: Logical. Whether to include normalized depth counts in the output (TRUE) or only use raw counts (FALSE).

Value

A list containing:

ad_matrix: Aggregated alternative allele counts matrix (SNPs x Groups)
dp_matrix: Aggregated depth matrix (SNPs x Groups)
dp_matrix_normalized: Aggregated normalized depth matrix (SNPs x Groups), if available and requested
metadata: Data frame with group-level metadata and QC metrics
group_by: The metadata column used for grouping
parameters: List of parameters used for aggregation
snp_info: Data frame with SNP information
snp_annotations: Data frame with SNP annotations

Details

This function works by:

Filtering cells based on donor_type if specified (e.g., only use Donor cells)
Identifying unique values in the grouping column (e.g., cell_type)
Summing alternative allele counts and depth counts across all cells in each group
Creating group-level metadata with cell counts and quality metrics
Filtering groups with fewer cells than the specified threshold

The function automatically detects non-transplant mode (single donor type) and adjusts its behavior accordingly. It also checks for normalized counts and includes them in the output if available and requested.

Note

This function is typically used as a preprocessing step before findSNPsByGroup()
The aggregated matrices no longer contain cell-level information; all counts are summed across cells in each group
For transplant data, it's often useful to analyze donor and recipient cells separately by specifying the donor_type parameter
Groups with fewer cells than min_cells_per_group are marked as "filtered_low_cells" in the metadata but are still included in the output matrices

Examples

if (FALSE) { # \dontrun{
# Basic usage - aggregate by cell type
collapsed <- project$aggregateByGroup(
  group_by = "cell_type",
  use_normalized = TRUE
)

# Analyze only donor cells with stricter filtering
donor_agg <- project$aggregateByGroup(
  group_by = "cell_type",
  donor_type = "Donor",
  min_cells_per_group = 5
)

# Aggregate by disease status
disease_agg <- project$aggregateByGroup(
  group_by = "disease_status",
  use_normalized = TRUE
)
} # }