Creates an STlist object from one or multiple spatial transcriptomic samples.

STlist(rnacounts = NULL, spotcoords = NULL, samples = NULL, cores = NULL)

Arguments

rnacounts

the count data which can be provided in one of these formats:

  • File paths to comma- or tab-delimited files containing raw gene counts, one file for each spatial sample. The first column contains gene names and subsequent columns contain the expression data for each cell/spot. Duplicate gene names will be modified using make.unique. Requires spotcoords and samples

  • File paths to Visium output directories (one per spatial sample). The directory should follow the structure resulting from spaceranger count. The directory contains the .h5 and spatial sub-directory. If no .h5 file is available, sparse matrices (MEX) from spaceranger count. In that case a second sub-directory called filtered_feature_bc_matrix should contain contain the barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz files. The spatial sub-directory minimally contains the coordinates (tissue_positions_list.csv), and optionally the high resolution PNG image and accompanying scaling factors (scalefactors_json.json). Requires samples

  • The exprMat file for each slide of a CosMx-SMI output. The file must contain the "fov" and "cell_ID" columns. The STlist function will separate data from each FOV, since analysis in spatialGE is conducted at the FOV level. Requires samples and spotcoords

  • A named list of data frames with raw gene counts (one data frame per spatial sample). Requires spotcoords. Argument samples only needed when a file path to sample metadata is the input.

spotcoords

the cell/spot coordinates. Not required if inputs are Visium space ranger outputs

  • File paths to comma- or tab-delimited files containing cell/spot coordinates, one for each spatial sample. The files must contain three columns: cell/spot IDs, Y positions, and X positions. The cell/spot IDs must match the column names for each cells/spots (columns) in the gene count files

  • The metadata file for each slide of a CosMx-SMI output. The file must contain the "fov", "cell_ID", "CenterX_local_px", and "CenterY_local_px" columns. The STlist function will separate data from each FOV, since analysis in spatialGE is conducted at the FOV level. Requires samples and rnacounts

  • A named list of data frames with cell/spot coordinates. The list names must match list names of the gene counts list

samples

the sample names/IDs and (optionally) metadata associated with each spatial sample. This file can also include files paths to gene counts and cell/spot coordinate files, bypassing the need to specify rnacounts and spotcoords. The following options are available for samples:

  • A vector with sample names, which will be used to partially match gene counts and cell/spot coordinates file paths. A sample name must not match file paths for two different samples. For example, instead of using "tissue1" and "tissue12", use "tissue01" and "tissue12".

  • A path to a file containing a table with metadata. This file is a comma- or tab-separated table with one sample per row and sample names/IDs in the first column. Paths to gene counts and coordinate files can be placed in the second and third columns respectively (while leaving the rnacounts and spotcoords arguments empty). If Visium directories are provided, only the second column with paths to spaceranger count directories are expected. Subsequent columns can contain variables associated with each spatial sample

cores

integer indicating the number of cores to use during parallelization. If NULL, the function uses half of the available cores at a maximum. The parallelization uses parallel::mclapply and works only in Unix systems.

Value

x an STlist object containing the counts and coordinates, and optionally the sample metadata, which can be used for downstream analysis with spatialGE

Details

Objects of the S4 class STlist are the starting point of analyses in spatialGE. The STlist contains data from one or multiple samples (i.e., tissue slices), and results from most spatialGE's functions are stored within the object.

  • Raw gene counts and spatial coordinates. Gene count data have genes in rows and sampling units (e.g., cells, spots, ROIs) in columns. Spatial coordinates have sampling units in rows and three columns: sample unit IDs, Y position, and X position.

  • Visium outputs from space ranger. The Visium directory should generally have the file structure resulting from spaceranger count, with either a count matrix represented in MEX files or a h5 file. The directory should also contain a spatial sub-directory, with the spatial coordinates (tissue_positions_list.csv), and optionally the high resolution tissue image and scaling factor file scalefactors_json.json.

  • CosMx-SMI outputs. Two files are required to process SMI outputs: The exprMat and metadata files. Both files must contain the "fov" and "cell_ID" columns. In addition, the metadata files must contain the "CenterX_local_px" and "CenterY_local_px" columns.

Optionally, the user can input a path to a file containing a table of sample-level metadata (e.g., clinical outcomes, tissue type, age). This sample metadata file should contain sample IDs in the first column partially matching the file names of the count/coordinate file paths or Visium directories. Note: The sample ID of a given sample cannot be a substring of the sample ID of another sample. For example, instead of using "tissue1" and "tissue12", use "tissue01" and "tissue12".

The function uses parallelization if run in a unix system. Windows users will experience longer times depending on the number of samples.

Examples

# Using included melanoma example (Thrane et al.)
library('spatialGE')
data_files <- list.files(system.file("extdata", package="spatialGE"), recursive=T, full.names=T)
count_files <- grep("counts", data_files, value=T)
coord_files <- grep("mapping", data_files, value=T)
clin_file <- grep("thrane_clinical", data_files, value=T)
melanoma <- STlist(rnacounts=count_files[c(1,2)], spotcoords=coord_files[c(1,2)], samples=clin_file) # Only first two samples
#> Warning: Sample ST_mel2_rep1 was not found among the count/coordinate files.
#> Warning: Sample ST_mel2_rep2 was not found among the count/coordinate files.
#> Warning: Sample ST_mel3_rep1 was not found among the count/coordinate files.
#> Warning: Sample ST_mel3_rep2 was not found among the count/coordinate files.
#> Warning: Sample ST_mel4_rep1 was not found among the count/coordinate files.
#> Warning: Sample ST_mel4_rep2 was not found among the count/coordinate files.
#> Found matrix data
#> Matching gene expression and coordinate data...
#> Converting counts to sparse matrices
#> Completed STlist!
#> 
melanoma
#> Spatial Transcriptomics List (STlist).
#> 2 spatial array(s):
#> 	ST_mel1_rep1 (279 ROIs|spots|cells x 15666 genes)
#> 	ST_mel1_rep2 (293 ROIs|spots|cells x 16148 genes)
#> 
#> 9 variables in sample-level data:
#> 	 patient, slice, gender, BRAF_status, NRAS_status, CDKN2A_status, survival, survival_months, RIN