| Title: | Interpretation of Heterogeneous Single-Cell Gene Expression Data |
|---|---|
| Description: | We develop a novel matrix factorization tool named 'scINSIGHT' to jointly analyze multiple single-cell gene expression samples from biologically heterogeneous sources, such as different disease phases, treatment groups, or developmental stages. Given multiple gene expression samples from different biological conditions, 'scINSIGHT' simultaneously identifies common and condition-specific gene modules and quantify their expression levels in each sample in a lower-dimensional space. With the factorized results, the inferred expression levels and memberships of common gene modules can be used to cluster cells and detect cell identities, and the condition-specific gene modules can help compare functional differences in transcriptomes from distinct conditions. Please also see Qian K, Fu SW, Li HW, Li WV (2022) <doi:10.1186/s13059-022-02649-3>. |
| Authors: | Kun Qian [aut, ctb, cre]
|
| Maintainer: | Kun Qian <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.4 |
| Built: | 2026-05-26 05:50:23 UTC |
| Source: | https://github.com/vivianstats/scinsight |
This function initializes an scINSIGHT object with normalized data passed in.
create_scINSIGHT(norm.data, condition)create_scINSIGHT(norm.data, condition)
norm.data |
List of normalized expression matrices (genes by cells). Gene names should be the same in all matrices. |
condition |
Vector specifying sample conditions. |
scINSIGHT object with norm.data slot set.
# Demonstration using matrices with randomly generated numbers S1 <- matrix(runif(50000,0,2), 500,100) S2 <- matrix(runif(60000,0,2), 500,120) S3 <- matrix(runif(80000,0,2), 500,160) S4 <- matrix(runif(75000,0,2), 500,150) data = list(S1, S2, S3, S4) sample = c("sample1", "sample2", "sample3", "sample4") condition = c("control", "activation", "control", "activation") names(data) = sample names(condition) = sample scINSIGHTx <- create_scINSIGHT(data, condition)# Demonstration using matrices with randomly generated numbers S1 <- matrix(runif(50000,0,2), 500,100) S2 <- matrix(runif(60000,0,2), 500,120) S3 <- matrix(runif(80000,0,2), 500,160) S4 <- matrix(runif(75000,0,2), 500,150) data = list(S1, S2, S3, S4) sample = c("sample1", "sample2", "sample3", "sample4") condition = c("control", "activation", "control", "activation") names(data) = sample names(condition) = sample scINSIGHTx <- create_scINSIGHT(data, condition)
Perform INterpreting single cell gene expresSIon bioloGically Heterogeneous daTa (scINSIGHT) to return factorized , , and matrices.
This factorization produces a matrix (cells by ), a matrix (cells by ), a shared matrix ( by genes)
for each sample, and a ( by genes) matrix for each condition. are the expression matrices of common gene modules for all samples,
is the membership matrix of common gene modules, and it's shared by all samples.
are the expression matrices of condition-specific gene modules for all samples,
and are the membership matrices of condition-specific gene modules for all conditions.
run_scINSIGHT( object, K = seq(5, 15, 2), K_j = 2, LDA = c(0.001, 0.01, 0.1, 1, 10), thre.niter = 500, thre.delta = 0.01, num.cores = 1, B = 5, out.dir = NULL, method = "increase" )run_scINSIGHT( object, K = seq(5, 15, 2), K_j = 2, LDA = c(0.001, 0.01, 0.1, 1, 10), thre.niter = 500, thre.delta = 0.01, num.cores = 1, B = 5, out.dir = NULL, method = "increase" )
object |
|
K |
Number of common gene modules. (default |
K_j |
Number of dataset-specific gene modules. (default 2) |
LDA |
Regularization parameters. (default |
thre.niter |
Maximum number of block coordinate descent iterations to perform. (default 500) |
thre.delta |
Stop iteration when the reduction of objective function is less than the threshold. (default 0.01) |
num.cores |
Number of cores used for optimizing factorizations in parallel (default 1). |
B |
Number of repeats with random seed from 1 to B. (default 5) |
out.dir |
Output directory of scINSIGHT results. (default NULL) |
method |
Method of updating the factorization (default "increase"). If provide multiple For "increase", the algorithm will first perform factorization with the least For "increase", the algorithm will first perform factorization with the largest |
scINSIGHT object with , , , and parameters slots set.
The scINSIGHT object is created from two or more single cell datasets. To construct a scINSIGHT object, the user needs to provide at least two normalized expression (or another single-cell modality) matrices and the condition vector.
The key slots used in the scINSIGHT object are described below.
norm.dataList of normalized expression matrices (genes by cells). Each matrix should have the same number and name of genes.
conditionVector specifying each sample's condition name.
W_1List of estimated by scINSIGHT, names correspond to sample names.
W_2List of estimated by scINSIGHT, names correspond to sample names.
HList of estimated by scINSIGHT, names correspond to condition names.
VMatrix estimated by scINSIGHT.
norm.W_2List of after normalization. Recommended for downstream analysis.
clustersList of cluster results.
parametersList of selected parameters, including and .