CAST.models package

CAST.models.aug module

models.aug.random_aug(graph, x, feat_drop_rate, edge_mask_rate)

Given a graph, randomly drops features and masks edges.

Parameters:
  • graph (DGLGraph) – The input graph.

  • x (torch.Tensor) – The input features.

  • feat_drop_rate (float) – The probability of dropping a feature.

  • edge_mask_rate (float) – The probability of masking an edge.

Returns:

  • DGLGraph – The graph after randomly masking edges.

  • torch.Tensor – The features after randomly dropping features.

models.aug.drop_feature(x, drop_prob)

Randomly drop features with probability drop_prob.

Parameters:
  • x (torch.Tensor) – The input features.

  • drop_prob (float) – The probability of dropping a feature.

Returns:

The remaining features after random dropping.

Return type:

torch.Tensor

models.aug.mask_edge(graph, mask_prob)

Randomly mask edges with probability mask_prob.

Parameters:
  • graph (DGLGraph) – The input graph (only used to take the number of edges).

  • mask_prob (float) – The probability of masking an edge.

Returns:

A 1D tensor of indices of the remaining edges after random masking.

Return type:

torch.Tensor

CAST.models.model_GCNII module

class models.model_GCNII.Args(dataname: str, gpu: int = 0, epochs: int = 1000, lr1: float = 0.001, wd1: float = 0.0, lambd: float = 0.001, n_layers: int = 9, der: float = 0.2, dfr: float = 0.2, encoder_dim: int = 256, use_encoder: bool = False)

Bases: object

A class representing the arguments for the GCNII model.

dataname: str

The name of the dataset, used to save the log file

gpu: int = 0

The GPU ID, set to zero for single-GPU nodes

epochs: int = 1000

The number of epochs for training

lr1: float = 0.001

The learning rate

wd1: float = 0.0

The weight decay

lambd: float = 0.001

The lambda in the loss function, refer to online methods

n_layers: int = 9

The number of GCNII layers. More layers mean a deeper model, larger reception field, at the cost of VRAM usage and computation time. By default, we chose the largest number of GCNII layers (n_layer = 9) as recommended by the original GCNII paper. Our experimental results on the simulation dataset and real samples S1-S8 confirm that increasing the number of layers improves the accuracy of CAST alignment, presumably due to the increased contrast and spatial resolution of learned graph embeddings in layer-shaped anatomical regions. These results confirmed that the performance gain from a deep GNN architecture is essential for high-resolution spatial alignment tasks.

der: float = 0.2

The edge dropout rate in CCA-SSG. This hyperparameter controls the extent of graph edge dropout for graph augmentation in the CCA-SSG self-supervised learning model. der = 1 means complete dropout, while der = 0 means no dropout. For CAST, we used default der = 0.5, the same as the default in the CCA-SSG paper. Our sensitivity experiments showed that alignment performance is optimal from 0.3 to 0.7. We recommended users to use the default der value unless necessary.

dfr: float = 0.2

The feature dropout rate in CCA-SSG. This hyperparameter controls the extent of feature dropout for graph augmentation in the CCA-SSG self-supervised learning model. dfr = 1 means complete dropout while dfr = 0 means no dropout. For CAST, we used a default dfr = 0.3, following the CCA-SSG paper. Our parameter sensitivity experiments showed that alignment performance is optimal from 0.1 to 0.4. We recommend users to use the default dfr value unless necessary.

device: str

Set to the GPU_ID if GPU is available and gpu is not -1, otherwise set to cpu.

encoder_dim: int = 256

The encoder dimension, ignored if use_encoder set to False The purpose of the MLP encoder is to reduce the time and space complexity of the model, especially for datasets with large gene panels. For our test set with a gene panel of 2,766 genes, results showed that encoder dimensions 256 and 512 yielded comparable and even slightly better alignment performance than the group without the MLP enocder module. Therefore, we recommend using 256 and 512 for parameter encoder_dim for the datasets with large gene panels (larger than 1,000 genes). We recommend using “No encoder” for datasets with limited gene panels (smaller than 1,000 genes).

use_encoder: bool = False

Whether or not to use an encoder

models.model_GCNII.standardize(x, eps=1e-12)

Standardizes values in x (subtracts the mean and divides by standard deviation).

Parameters:
  • x (torch.Tensor) – The input features.

  • eps (float, optional (default: 1e-12)) – An epsilon value to prevent division by zero.

Returns:

The standardized features.

Return type:

torch.Tensor

class models.model_GCNII.Encoder(in_dim: int, encoder_dim: int)

Bases: Module

A class representing an encoder model with a linear layer and ReLU activation function.

in_dim

The number of input features.

Type:

int

encoder_dim

The number of output features.

Type:

int

forward(x)

Performs a forward pass through the encoder.

Parameters:

x (torch.Tensor) – The input features.

Returns:

The output features after the forward pass.

Return type:

torch.Tensor

class models.model_GCNII.GCNII(in_dim: int, encoder_dim: int, n_layers: int, alpha=None, lambda_=None, use_encoder=False)

Bases: Module

A class representing a GCNII model. The model consists of an optional encoder, followed by GCN2Conv layers, where the first layer is passed to every layer.

in_dim

The number of input features.

Type:

int

encoder_dim

The number of output features of the encoder (ignored if use_encoder is false).

Type:

int

n_layers

The number of GCN2Conv layers.

Type:

int

alpha

The alpha values for each layer.

Type:

List[float] (default: 0.1 for each layer)

lambda_

The lambda values for each layer.

Type:

List[float] (default: 1 for each layer)

use_encoder

Whether or not to use an encoder.

Type:

bool (default: False)

forward(graph, x)

Forward pass through the GCNII model.

Parameters:
  • graph (DGLGraph) – The input graph.

  • x (torch.Tensor) – The input.

Returns:

The output of the forward pass.

Return type:

torch.Tensor

class models.model_GCNII.GCN(in_dim: int, encoder_dim: int, n_layers: int, use_encoder=False)

Bases: Module

A class representing a GCN model. The model consists of an optional encoder, followed by n_layers GraphConv layers

in_dim

The number of input features.

Type:

int

encoder_dim

The number of output features of the encoder (ignored if use_encoder is false).

Type:

int

n_layers

The number of GraphConv layers.

Type:

int

use_encoder

Whether or not to use an encoder.

Type:

bool (default: False)

forward(graph, x)

Forward pass through the GCN model.

Parameters:
  • graph (DGLGraph) – The input graph.

  • x (torch.Tensor) – The input.

Returns:

The output of the forward pass.

Return type:

torch.Tensor

class models.model_GCNII.CCA_SSG(in_dim, encoder_dim, n_layers, backbone='GCNII', alpha=None, lambda_=None, use_encoder=False)

Bases: Module

A class representing a CCA_SSG model - a model for self-supervised represenation learning with graph data using GCNII or GCN as backbone.

in_dim

The number of input features.

Type:

int

encoder_dim

The number of output features of the encoder (ignored if use_encoder is false).

Type:

int

n_layers

The number of layers in the model excluding the optional encoder.

Type:

int

backbone

The backbone of the model, either GCNII or GCN – in initialization, provide ‘GCNII’ | ‘GCN’ as a string.

Type:

GCNII | GCN

alpha

The alpha values for each layer of GCNII (ignored if backbone is GCN).

Type:

List[float] (default: 0.1 for each layer)

lambda_

The lambda values for each layer of GCNII (ignored if backbone is GCN).

Type:

List[float] (default: 1 for each layer)

use_encoder

Whether or not to use an encoder.

Type:

bool (default: False)

get_embedding(graph, feat)

Returns the result of a forward pass on feat.

Parameters:
  • graph (DGLGraph) – The input graph.

  • feat (torch.Tensor) – The input features.

Returns:

The result of the forward pass on the input features.

Return type:

torch.Tensor

forward(graph1, feat1, graph2, feat2)

Returns the standardized embeddings of the input features after a forward pass through the backbone model.

Parameters:
  • graph1 (DGLGraph) – The first input graph.

  • feat1 (torch.Tensor) – The input features for the first input.

  • graph2 (DGLGraph) – The second input graph.

  • feat2 (torch.Tensor) – The features for the second input.

Returns:

The standardized outputs from running each input through a forward pass of the model.

Return type:

Tuple[torch.Tensor, torch.Tensor]