CAST.models package
CAST.models.aug module
- models.aug.random_aug(graph, x, feat_drop_rate, edge_mask_rate)
Given a graph, randomly drops features and masks edges.
- Parameters:
graph (DGLGraph) – The input graph.
x (torch.Tensor) – The input features.
feat_drop_rate (float) – The probability of dropping a feature.
edge_mask_rate (float) – The probability of masking an edge.
- Returns:
DGLGraph – The graph after randomly masking edges.
torch.Tensor – The features after randomly dropping features.
- models.aug.drop_feature(x, drop_prob)
Randomly drop features with probability drop_prob.
- Parameters:
x (torch.Tensor) – The input features.
drop_prob (float) – The probability of dropping a feature.
- Returns:
The remaining features after random dropping.
- Return type:
torch.Tensor
- models.aug.mask_edge(graph, mask_prob)
Randomly mask edges with probability mask_prob.
- Parameters:
graph (DGLGraph) – The input graph (only used to take the number of edges).
mask_prob (float) – The probability of masking an edge.
- Returns:
A 1D tensor of indices of the remaining edges after random masking.
- Return type:
torch.Tensor
CAST.models.model_GCNII module
- class models.model_GCNII.Args(dataname: str, gpu: int = 0, epochs: int = 1000, lr1: float = 0.001, wd1: float = 0.0, lambd: float = 0.001, n_layers: int = 9, der: float = 0.2, dfr: float = 0.2, encoder_dim: int = 256, use_encoder: bool = False)
Bases:
objectA class representing the arguments for the GCNII model.
- dataname: str
The name of the dataset, used to save the log file
- gpu: int = 0
The GPU ID, set to zero for single-GPU nodes
- epochs: int = 1000
The number of epochs for training
- lr1: float = 0.001
The learning rate
- wd1: float = 0.0
The weight decay
- lambd: float = 0.001
The lambda in the loss function, refer to online methods
- n_layers: int = 9
The number of GCNII layers. More layers mean a deeper model, larger reception field, at the cost of VRAM usage and computation time. By default, we chose the largest number of GCNII layers (n_layer = 9) as recommended by the original GCNII paper. Our experimental results on the simulation dataset and real samples S1-S8 confirm that increasing the number of layers improves the accuracy of CAST alignment, presumably due to the increased contrast and spatial resolution of learned graph embeddings in layer-shaped anatomical regions. These results confirmed that the performance gain from a deep GNN architecture is essential for high-resolution spatial alignment tasks.
- der: float = 0.2
The edge dropout rate in CCA-SSG. This hyperparameter controls the extent of graph edge dropout for graph augmentation in the CCA-SSG self-supervised learning model. der = 1 means complete dropout, while der = 0 means no dropout. For CAST, we used default der = 0.5, the same as the default in the CCA-SSG paper. Our sensitivity experiments showed that alignment performance is optimal from 0.3 to 0.7. We recommended users to use the default der value unless necessary.
- dfr: float = 0.2
The feature dropout rate in CCA-SSG. This hyperparameter controls the extent of feature dropout for graph augmentation in the CCA-SSG self-supervised learning model. dfr = 1 means complete dropout while dfr = 0 means no dropout. For CAST, we used a default dfr = 0.3, following the CCA-SSG paper. Our parameter sensitivity experiments showed that alignment performance is optimal from 0.1 to 0.4. We recommend users to use the default dfr value unless necessary.
- device: str
Set to the GPU_ID if GPU is available and gpu is not -1, otherwise set to cpu.
- encoder_dim: int = 256
The encoder dimension, ignored if use_encoder set to False The purpose of the MLP encoder is to reduce the time and space complexity of the model, especially for datasets with large gene panels. For our test set with a gene panel of 2,766 genes, results showed that encoder dimensions 256 and 512 yielded comparable and even slightly better alignment performance than the group without the MLP enocder module. Therefore, we recommend using 256 and 512 for parameter encoder_dim for the datasets with large gene panels (larger than 1,000 genes). We recommend using “No encoder” for datasets with limited gene panels (smaller than 1,000 genes).
- use_encoder: bool = False
Whether or not to use an encoder
- models.model_GCNII.standardize(x, eps=1e-12)
Standardizes values in x (subtracts the mean and divides by standard deviation).
- Parameters:
x (torch.Tensor) – The input features.
eps (float, optional (default: 1e-12)) – An epsilon value to prevent division by zero.
- Returns:
The standardized features.
- Return type:
torch.Tensor
- class models.model_GCNII.Encoder(in_dim: int, encoder_dim: int)
Bases:
ModuleA class representing an encoder model with a linear layer and ReLU activation function.
- in_dim
The number of input features.
- Type:
int
- encoder_dim
The number of output features.
- Type:
int
- forward(x)
Performs a forward pass through the encoder.
- Parameters:
x (torch.Tensor) – The input features.
- Returns:
The output features after the forward pass.
- Return type:
torch.Tensor
- class models.model_GCNII.GCNII(in_dim: int, encoder_dim: int, n_layers: int, alpha=None, lambda_=None, use_encoder=False)
Bases:
ModuleA class representing a GCNII model. The model consists of an optional encoder, followed by GCN2Conv layers, where the first layer is passed to every layer.
- in_dim
The number of input features.
- Type:
int
- encoder_dim
The number of output features of the encoder (ignored if use_encoder is false).
- Type:
int
- n_layers
The number of GCN2Conv layers.
- Type:
int
- alpha
The alpha values for each layer.
- Type:
List[float] (default: 0.1 for each layer)
- lambda_
The lambda values for each layer.
- Type:
List[float] (default: 1 for each layer)
- use_encoder
Whether or not to use an encoder.
- Type:
bool (default: False)
- forward(graph, x)
Forward pass through the GCNII model.
- Parameters:
graph (DGLGraph) – The input graph.
x (torch.Tensor) – The input.
- Returns:
The output of the forward pass.
- Return type:
torch.Tensor
- class models.model_GCNII.GCN(in_dim: int, encoder_dim: int, n_layers: int, use_encoder=False)
Bases:
ModuleA class representing a GCN model. The model consists of an optional encoder, followed by n_layers GraphConv layers
- in_dim
The number of input features.
- Type:
int
- encoder_dim
The number of output features of the encoder (ignored if use_encoder is false).
- Type:
int
- n_layers
The number of GraphConv layers.
- Type:
int
- use_encoder
Whether or not to use an encoder.
- Type:
bool (default: False)
- forward(graph, x)
Forward pass through the GCN model.
- Parameters:
graph (DGLGraph) – The input graph.
x (torch.Tensor) – The input.
- Returns:
The output of the forward pass.
- Return type:
torch.Tensor
- class models.model_GCNII.CCA_SSG(in_dim, encoder_dim, n_layers, backbone='GCNII', alpha=None, lambda_=None, use_encoder=False)
Bases:
ModuleA class representing a CCA_SSG model - a model for self-supervised represenation learning with graph data using GCNII or GCN as backbone.
- in_dim
The number of input features.
- Type:
int
- encoder_dim
The number of output features of the encoder (ignored if use_encoder is false).
- Type:
int
- n_layers
The number of layers in the model excluding the optional encoder.
- Type:
int
- backbone
The backbone of the model, either GCNII or GCN – in initialization, provide ‘GCNII’ | ‘GCN’ as a string.
- alpha
The alpha values for each layer of GCNII (ignored if backbone is GCN).
- Type:
List[float] (default: 0.1 for each layer)
- lambda_
The lambda values for each layer of GCNII (ignored if backbone is GCN).
- Type:
List[float] (default: 1 for each layer)
- use_encoder
Whether or not to use an encoder.
- Type:
bool (default: False)
- get_embedding(graph, feat)
Returns the result of a forward pass on feat.
- Parameters:
graph (DGLGraph) – The input graph.
feat (torch.Tensor) – The input features.
- Returns:
The result of the forward pass on the input features.
- Return type:
torch.Tensor
- forward(graph1, feat1, graph2, feat2)
Returns the standardized embeddings of the input features after a forward pass through the backbone model.
- Parameters:
graph1 (DGLGraph) – The first input graph.
feat1 (torch.Tensor) – The input features for the first input.
graph2 (DGLGraph) – The second input graph.
feat2 (torch.Tensor) – The features for the second input.
- Returns:
The standardized outputs from running each input through a forward pass of the model.
- Return type:
Tuple[torch.Tensor, torch.Tensor]