process_vireo_dataframe: Integrate Vireo donor assignments with a metadata data frame
process_vireo_dataframe.Rd
Processes donor genetic identity data from Vireo and integrates it with an existing metadata data frame. The function matches cell identifiers between the Vireo output and the metadata, and generates basic cell count statistics per donor.
Arguments
- metadata_df
A data frame. The metadata data frame to match with donor information, with cell identifiers as row names.
- vireo_path
Character. Path to the Vireo donor_ids.tsv file containing donor assignments.
- prefix_text
Character. Text to prepend to cell identifiers in the Vireo data to match the cell identifiers in the metadata data frame.
Value
A list containing:
- metadata
The original metadata data frame, unchanged
- donor_data
Data frame containing donor assignments for matched cells
- matching_cells
Character vector of cell identifiers that matched between metadata and Vireo
- summaries
List of summary statistics including:
donor_summaries: Per-donor cell counts
cells_matched: Total number of cells successfully matched
total_cells: Total number of cells in the metadata data frame
total_vireo_cells: Total number of cells in the Vireo data
Details
This function provides a simpler alternative to the Seurat and SingleCellExperiment integrations
when you only have a data frame of metadata. It first processes the Vireo TSV file using the
process_tsv
function, adding the prefix to cell identifiers, then finds matching cells
between the Vireo data and the metadata data frame, and generates basic summary statistics.
Note
Unlike the Seurat and SingleCellExperiment integration functions, this function does not modify the input metadata data frame. It only returns the matching information.
Examples
if (FALSE) { # \dontrun{
# Process a metadata data frame with Vireo donor assignments
results <- process_vireo_dataframe(
metadata_df = cell_metadata,
vireo_path = "path/to/vireo/donor_ids.tsv",
prefix_text = "Patient1_Sample3_"
)
# Check matching statistics
results$summaries$cells_matched
results$summaries$total_cells
# Access donor assignments for matched cells
donor_assignments <- results$donor_data
} # }