STlist.Rd
Creates an STlist object from one or multiple spatial transcriptomic samples.
STlist(rnacounts = NULL, spotcoords = NULL, samples = NULL, cores = NULL)
the count data which can be provided in one of these formats:
File paths to comma- or tab-delimited files containing raw gene counts, one file
for each spatial sample. The first column contains gene names and subsequent columns
contain the expression data for each cell/spot. Duplicate gene names will be
modified using make.unique
. Requires spotcoords
and samples
File paths to Visium output directories (one per spatial sample). The directory
should follow the structure resulting from spaceranger count
. The directory contains
the .h5
and spatial
sub-directory. If no .h5
file is available, sparse
matrices (MEX) from spaceranger count
. In that case a second sub-directory
called filtered_feature_bc_matrix
should contain contain the barcodes.tsv.gz
,
features.tsv.gz
, and matrix.mtx.gz
files. The spatial
sub-directory minimally
contains the coordinates (tissue_positions_list.csv
), and optionally the high
resolution PNG image and accompanying scaling factors (scalefactors_json.json
).
Requires samples
The exprMat
file for each slide of a CosMx-SMI output. The file must contain
the "fov" and "cell_ID" columns. The STlist
function will separate data from each
FOV, since analysis in spatialGE is conducted at the FOV level. Requires samples
and
spotcoords
A named list of data frames with raw gene counts (one data frame per spatial
sample). Requires spotcoords
. Argument samples
only needed when a file path to
sample metadata is the input.
the cell/spot coordinates. Not required if inputs are Visium space ranger outputs
File paths to comma- or tab-delimited files containing cell/spot coordinates, one for each spatial sample. The files must contain three columns: cell/spot IDs, Y positions, and X positions. The cell/spot IDs must match the column names for each cells/spots (columns) in the gene count files
The metadata
file for each slide of a CosMx-SMI output. The file must contain
the "fov", "cell_ID", "CenterX_local_px", and "CenterY_local_px" columns. The STlist
function will separate data from each FOV, since analysis in spatialGE is conducted at
the FOV level. Requires samples
and rnacounts
A named list of data frames with cell/spot coordinates. The list names must match list names of the gene counts list
the sample names/IDs and (optionally) metadata associated with
each spatial sample. This file can also include files paths to gene counts and
cell/spot coordinate files, bypassing the need to specify rnacounts
and spotcoords
.
The following options are available for samples
:
A vector with sample names, which will be used to partially match gene counts and cell/spot coordinates file paths. A sample name must not match file paths for two different samples. For example, instead of using "tissue1" and "tissue12", use "tissue01" and "tissue12".
A path to a file containing a table with metadata. This file is a comma- or
tab-separated table with one sample per row and sample names/IDs in the first
column. Paths to gene counts and coordinate files can be placed in the second and
third columns respectively (while leaving the rnacounts
and spotcoords
arguments
empty). If Visium directories are provided, only the second column with paths to
spaceranger count
directories are expected. Subsequent columns can contain
variables associated with each spatial sample
integer indicating the number of cores to use during parallelization.
If NULL, the function uses half of the available cores at a maximum. The parallelization
uses parallel::mclapply
and works only in Unix systems.
x an STlist object containing the counts and coordinates, and optionally
the sample metadata, which can be used for downstream analysis with spatialGE
Objects of the S4 class STlist are the starting point of analyses in spatialGE
.
The STlist contains data from one or multiple samples (i.e., tissue slices), and
results from most spatialGE
's functions are stored within the object.
Raw gene counts and spatial coordinates. Gene count data have genes in rows and sampling units (e.g., cells, spots, ROIs) in columns. Spatial coordinates have sampling units in rows and three columns: sample unit IDs, Y position, and X position.
Visium outputs from space ranger. The Visium directory should generally have
the file structure resulting from spaceranger count
, with either a count matrix
represented in MEX files or a h5 file. The directory should also contain a spatial
sub-directory, with the spatial coordinates (tissue_positions_list.csv
), and
optionally the high resolution tissue image and scaling factor file scalefactors_json.json
.
CosMx-SMI outputs. Two files are required to process SMI outputs: The exprMat
and
metadata
files. Both files must contain the "fov" and "cell_ID" columns. In addition,
the metadata
files must contain the "CenterX_local_px" and "CenterY_local_px" columns.
Optionally, the user can input a path to a file containing a table of sample-level metadata (e.g., clinical outcomes, tissue type, age). This sample metadata file should contain sample IDs in the first column partially matching the file names of the count/coordinate file paths or Visium directories. Note: The sample ID of a given sample cannot be a substring of the sample ID of another sample. For example, instead of using "tissue1" and "tissue12", use "tissue01" and "tissue12".
The function uses parallelization if run in a unix system. Windows users will experience longer times depending on the number of samples.
# Using included melanoma example (Thrane et al.)
library('spatialGE')
data_files <- list.files(system.file("extdata", package="spatialGE"), recursive=T, full.names=T)
count_files <- grep("counts", data_files, value=T)
coord_files <- grep("mapping", data_files, value=T)
clin_file <- grep("thrane_clinical", data_files, value=T)
melanoma <- STlist(rnacounts=count_files[c(1,2)], spotcoords=coord_files[c(1,2)], samples=clin_file) # Only first two samples
#> Warning: Sample ST_mel2_rep1 was not found among the count/coordinate files.
#> Warning: Sample ST_mel2_rep2 was not found among the count/coordinate files.
#> Warning: Sample ST_mel3_rep1 was not found among the count/coordinate files.
#> Warning: Sample ST_mel3_rep2 was not found among the count/coordinate files.
#> Warning: Sample ST_mel4_rep1 was not found among the count/coordinate files.
#> Warning: Sample ST_mel4_rep2 was not found among the count/coordinate files.
#> Found matrix data
#> Matching gene expression and coordinate data...
#> Converting counts to sparse matrices
#> Completed STlist!
#>
melanoma
#> Spatial Transcriptomics List (STlist).
#> 2 spatial array(s):
#> ST_mel1_rep1 (279 ROIs|spots|cells x 15666 genes)
#> ST_mel1_rep2 (293 ROIs|spots|cells x 16148 genes)
#>
#> 9 variables in sample-level data:
#> patient, slice, gender, BRAF_status, NRAS_status, CDKN2A_status, survival, survival_months, RIN