STlist.Rd
Creates an STlist object from one or multiple spatial transcriptomic samples.
STlist(rnacounts = NULL, spotcoords = NULL, samples = NULL, cores = NULL)
the count data which can be provided in one of these formats:
File paths to comma- or tab-delimited files containing raw gene counts, one file
for each spatial sample. The first column contains gene names and subsequent columns
contain the expression data for each cell/spot. Duplicate gene names will be
modified using make.unique
. Requires spotcoords
and samples
.
File paths to Visium output directories (one per spatial sample). The directory
must follow the structure resulting from spaceranger count
. The directory contains
the .h5
and spatial
sub-directory. If no .h5
file is available, sparse
matrices (MEX) from spaceranger count
. In that case a second sub-directory
called filtered_feature_bc_matrix
should contain contain the barcodes.tsv.gz
,
features.tsv.gz
, and matrix.mtx.gz
files. The spatial
sub-directory minimally
contains the coordinates (tissue_positions_list.csv
), and optionally the high
resolution PNG image and accompanying scaling factors (scalefactors_json.json
).
Requires samples
.
#'
File paths to Xenium output directories (one per spatial sample). The directory
must follow the structure resulting from the xeniumranger
pipeline. The directory
contains the .h5
or sparse matrices (MEX). In that case a second sub-directory
called cell_feature_matrix
should contain contain the barcodes.tsv.gz
,
features.tsv.gz
, and matrix.mtx.gz
files. The coordinates must be available
in the cells.parquet
. Requires samples
.
The exprMat
file for each slide of a CosMx-SMI output. The file must contain
the "fov" and "cell_ID" columns. The STlist
function will separate data from each
FOV, since analysis in spatialGE is conducted at the FOV level. Requires samples
and
spotcoords
.
Seurat object (V4). A Seurat V4 object produced via Seurat::Load10X_Spatial
.
Multiple samples are allowed as long as they are stored as "slices" in the Seurat object.
Does not require samples
as sample names are taken from names(seurat_obj@images)
A named list of data frames with raw gene counts (one data frame per spatial
sample). Requires spotcoords
. Argument samples
only needed when a file path to
sample metadata is provided.
the cell/spot coordinates. Not required if inputs are Visium or Xenium (spaceranger or xeniumranger outputs).
File paths to comma- or tab-delimited files containing cell/spot coordinates, one
for each spatial sample. The files must contain three columns: cell/spot IDs, Y positions, and
X positions. The cell/spot IDs must match the column names for each cells/spots (columns) in
the gene count files. Requires samples
and rnacounts
.
The metadata
file for each slide of a CosMx-SMI output. The file must contain
the "fov", "cell_ID", "CenterX_local_px", and "CenterY_local_px" columns. The STlist
function will separate data from each FOV, since analysis in spatialGE is conducted at
the FOV level. Requires samples
and rnacounts
.
A named list of data frames with cell/spot coordinates. The list names must match list names of the gene counts list
the sample names/IDs and (optionally) metadata associated with
each spatial sample.
The following options are available for samples
:
A vector with sample names, which will be used to match gene the counts and cell/spot coordinates file paths. A sample name must not match file paths for two different samples. For example, instead of using "tissue1" and "tissue12", use "tissue01" and "tissue12".
A path to a file containing a table with metadata. This file is a comma- or tab-separated table with one sample per row and sample names/IDs in the first column. Subsequent columns may contain variables associated with each spatial sample
integer indicating the number of cores to use during parallelization.
If NULL, the function uses half of the available cores at a maximum. The parallelization
uses parallel::mclapply
and works only in Unix systems.
an STlist object containing the counts and coordinates, and optionally
the sample metadata, which can be used for downstream analysis with spatialGE
Objects of the S4 class STlist are the starting point of analyses in spatialGE
.
The STlist contains data from one or multiple samples (i.e., tissue slices), and
results from most spatialGE
's functions are stored within the object.
Raw gene counts and spatial coordinates. Gene count data have genes in rows and sampling units (e.g., cells, spots) in columns. Spatial coordinates have sampling units in rows and three columns: sample unit IDs, Y position, and X position.
Visium outputs from Space Ranger. The Visium directory must have the directory
structure resulting from spaceranger count
, with either a count matrix represented in
MEX files or a h5 file. The directory should also contain a spatial
sub-directory,
with the spatial coordinates (tissue_positions_list.csv
), and
optionally the high resolution tissue image and scaling factor file scalefactors_json.json
.
Xenium outputs from Xenium Ranger. The Xenium directory must have the directory
structure resulting from the xeniumranger
pipeline, with either a cell-feature matrix
represented in MEX files or a h5 file. The directory should also contain a parquet file,
with the spatial coordinates (cells.parquet
).
CosMx-SMI outputs. Two files are required to process SMI outputs: The exprMat
and
metadata
files. Both files must contain the "fov" and "cell_ID" columns. In addition,
the metadata
files must contain the "CenterX_local_px" and "CenterY_local_px" columns.
Seurat object (V4). A Seurat V4 object produced via Seurat::Load10X_Spatial
.
Optionally, the user can input a path to a file containing a table of sample-level metadata (e.g., clinical outcomes, tissue type, age). This sample metadata file should contain sample IDs in the first column partially matching the file names of the count/coordinate file paths or Visium directories. Note: The sample ID of a given sample cannot be a substring of the sample ID of another sample. For example, instead of using "tissue1" and "tissue12", use "tissue01" and "tissue12".
The function uses parallelization if run in a Unix system. Windows users will experience longer times depending on the number of samples.
# Using included melanoma example (Thrane et al.)
# Download example data set from spatialGE_Data
thrane_tmp = tempdir()
unlink(thrane_tmp, recursive=TRUE)
dir.create(thrane_tmp)
lk='https://github.com/FridleyLab/spatialGE_Data/raw/refs/heads/main/melanoma_thrane.zip?download='
download.file(lk, destfile=paste0(thrane_tmp, '/', 'melanoma_thrane.zip'), mode='wb')
zip_tmp = list.files(thrane_tmp, pattern='melanoma_thrane.zip$', full.names=TRUE)
unzip(zipfile=zip_tmp, exdir=thrane_tmp)
# Generate the file paths to be passed to the STlist function
count_files <- list.files(paste0(thrane_tmp, '/melanoma_thrane'),
full.names=TRUE, pattern='counts')
coord_files <- list.files(paste0(thrane_tmp, '/melanoma_thrane'),
full.names=TRUE, pattern='mapping')
clin_file <- list.files(paste0(thrane_tmp, '/melanoma_thrane'),
full.names=TRUE, pattern='clinical')
# Create STlist
library('spatialGE')
melanoma <- STlist(rnacounts=count_files[c(1,3)],
spotcoords=coord_files[c(1,3)],
samples=clin_file) # Only first three samples
#> Warning: Sample ST_mel2_rep1 was not found among the count/coordinate files.
#> Warning: Sample ST_mel4_rep2 was not found among the count/coordinate files.
#> Found matrix data
#> Matching gene expression and coordinate data...
#> Converting counts to sparse matrices
#> Completed STlist!
melanoma
#> Spatial Transcriptomics List (STlist).
#> 2 spatial array(s):
#> ST_mel1_rep2 (293 ROIs|spots|cells x 16148 genes)
#> ST_mel3_rep1 (256 ROIs|spots|cells x 15653 genes)
#>
#> 9 variables in sample-level data:
#> patient, slice, gender, BRAF_status, NRAS_status, CDKN2A_status, survival, survival_months, RIN