Filtering of spots/cells, genes or samples, as well as count-based filtering

filter_data(
  x = NULL,
  spot_minreads = 0,
  spot_maxreads = NULL,
  spot_mingenes = 0,
  spot_maxgenes = NULL,
  spot_minpct = 0,
  spot_maxpct = NULL,
  gene_minreads = 0,
  gene_maxreads = NULL,
  gene_minspots = 0,
  gene_maxspots = NULL,
  gene_minpct = 0,
  gene_maxpct = NULL,
  samples = NULL,
  rm_tissue = NULL,
  rm_spots = NULL,
  rm_genes = NULL,
  rm_genes_expr = NULL,
  spot_pct_expr = "^MT-"
)

Arguments

x

an STlist

spot_minreads

the minimum number of total reads for a spot to be retained

spot_maxreads

the maximum number of total reads for a spot to be retained

spot_mingenes

the minimum number of non-zero counts for a spot to be retained

spot_maxgenes

the maximum number of non-zero counts for a spot to be retained

spot_minpct

the minimum percentage of counts for features defined by spot_pct_expr for a spot to be retained.

spot_maxpct

the maximum percentage of counts for features defined by spot_pct_expr for a spot to be retained.

gene_minreads

the minimum number of total reads for a gene to be retained

gene_maxreads

the maximum number of total reads for a gene to be retained

gene_minspots

he minimum number of spots with non-zero counts for a gene to be retained

gene_maxspots

the maximum number of spots with non-zero counts for a gene to be retained

gene_minpct

the minimum percentage of spots with non-zero counts for a gene to be retained

gene_maxpct

the maximum percentage of spots with non-zero counts for a gene to be retained

samples

samples (as in names(x@counts)) to perform filtering.

rm_tissue

sample (as in names(x@counts)) to remove from STlist. Removes samples in x@counts, x@tr_counts, x@spatial_meta, x@gene_meta, and x@sample_meta

rm_spots

vector of spot/cell IDs to remove. Removes spots/cells in x@counts, x@tr_counts, and x@spatial_meta

rm_genes

vector of gene names to remove from STlist. Removes genes in x@counts, x@tr_counts, and x@gene_meta

rm_genes_expr

a regular expression that matches genes to remove. Removes genes in x@counts, x@tr_counts, and x@gene_meta

spot_pct_expr

a expression to use with spot_minpct and spot_maxpct. By default '^MT-'.

Value

an STlist containing the filtered data

Details

This function provides options to filter elements in an STlist. It can remove cells/spots or genes based on raw counts (x@counts). Users can input an regular expression to query gene names and calculate percentages (for example % mtDNA genes). The function also can filter entire samples. Note that the function removes cells/spots, genes, and/or samples in the raw counts, transformed counts, spatial variables, gene variables, and sample metadata. Also note that the function filters in the following order:

  1. Samples (rm_tissue)

  2. Spots (rm_spots)

  3. Genes (rm_genes)

  4. Genes matching rm_genes_expr

  5. Min and max counts

Examples

# Using included melanoma example (Thrane et al.)
library('spatialGE')
data_files <- list.files(system.file("extdata", package="spatialGE"), recursive=T, full.names=T)
count_files <- grep("counts", data_files, value=T)
coord_files <- grep("mapping", data_files, value=T)
clin_file <- grep("thrane_clinical", data_files, value=T)
melanoma <- STlist(rnacounts=count_files[c(1,2)], spotcoords=coord_files[c(1,2)], samples=clin_file) # Only first two samples
#> Warning: Sample ST_mel2_rep1 was not found among the count/coordinate files.
#> Warning: Sample ST_mel2_rep2 was not found among the count/coordinate files.
#> Warning: Sample ST_mel3_rep1 was not found among the count/coordinate files.
#> Warning: Sample ST_mel3_rep2 was not found among the count/coordinate files.
#> Warning: Sample ST_mel4_rep1 was not found among the count/coordinate files.
#> Warning: Sample ST_mel4_rep2 was not found among the count/coordinate files.
#> Found matrix data
#> Matching gene expression and coordinate data...
#> Converting counts to sparse matrices
#> Completed STlist!
#> 
melanoma <- filter_data(melanoma, spot_minreads=2000)