Demo 1: CAST Mark captures common spatial features across multiple samples

[3]:
import CAST
import os
import numpy as np
import anndata as ad
import scanpy as sc
import warnings
warnings.filterwarnings("ignore")

work_dir = '$demo_path' #### input the demo path

Load Data

  • CAST Mark method only require the following data modalities:

    1. gene expression raw counts

    2. spatial coordinates of the cells

  • We organize spatial omics data in the AnnData format (We recommend readers to pre-organize data in this format):

    • adata.X stores the STARmap gene expression raw counts

    • adata.obs contains important cell-level annotation, including spatial coordinates (column name: 'x', 'y')

    • Data for different experimental samples are combined in a single Anndata object (column name 'sample')

Settings

[4]:
### Load the data and set up the output path

# Set up the output path
output_path = f'{work_dir}/demo1_CAST_Mark/demo_output'
os.makedirs(output_path, exist_ok=True)

# Load the data
adata = ad.read_h5ad(f'{output_path}/../data/demo1.h5ad')
adata.layers['norm_1e4'] = sc.pp.normalize_total(adata, target_sum=1e4, inplace=False)['X'].toarray() # we use normalized counts for each cell as input gene expression

# Get the coordinates and expression data for each sample
samples = np.unique(adata.obs['sample']) # used samples in adata
coords_raw = {sample_t: np.array(adata.obs[['x','y']])[adata.obs['sample'] == sample_t] for sample_t in samples}
exp_dict = {sample_t: adata[adata.obs['sample'] == sample_t].layers['norm_1e4'] for sample_t in samples}

Run

[5]:
### Run the model to generate the graph embedding

from CAST import CAST_MARK
embed_dict = CAST_MARK(coords_raw,exp_dict,output_path)

### CPU with single core may takes long time for each epoch. If it takes too long, you could set:
### embed_dict = CAST_MARK(coords_raw,exp_dict,output_path,epoch_t = 20)
Constructing delaunay graphs for 8 samples...
Training on cuda:0...
Loss: -432.028 step time=0.442s: 100%|█████████████████████████████████████████████████████████████████████| 400/400 [03:00<00:00,  2.22it/s]
Finished.
The embedding, log, model files were saved to /home/unix/panj/wanglab/jessica/CAST/demo/demo1_CAST_Mark/demo_output
../_images/notebooks_demo1_CAST_mark_6_3.png
../_images/notebooks_demo1_CAST_mark_6_4.png
../_images/notebooks_demo1_CAST_mark_6_5.png
../_images/notebooks_demo1_CAST_mark_6_6.png
../_images/notebooks_demo1_CAST_mark_6_7.png
../_images/notebooks_demo1_CAST_mark_6_8.png
../_images/notebooks_demo1_CAST_mark_6_9.png
../_images/notebooks_demo1_CAST_mark_6_10.png

Visualize the results

[8]:
### Visualize the embedding with Kmeans clustering

from CAST.visualize import kmeans_plot_multiple


kmeans_plot_multiple(embed_dict,samples,coords_raw,'demo1',output_path,k=20,dot_size = 10,minibatch=False)
Perform KMeans clustering on 72165 cells...
Plotting the KMeans clustering results...
[8]:
array([14, 11, 14, ..., 18,  9, 18], dtype=int32)
../_images/notebooks_demo1_CAST_mark_8_2.png