Dirichlet Multinomial Mixtures (DMM)的R实现
Dirichlet Multinomial MixturesCommunity typing with Dirichlet Multinomial MixturesDirichlet Multinomial Mixtures (DMM)是一种用于对微生物群落分析数据进行群落分型(或聚类)的概率方法。 这是一个无限的混合模型,这意味着该方法可以推断出最佳数量的群落类型。 请注意,群落类型的数量可能会
·
Dirichlet Multinomial Mixtures
Community typing with Dirichlet Multinomial Mixtures
Dirichlet Multinomial Mixtures (DMM) 是一种用于对微生物群落分析数据进行群落分型(或聚类)的概率方法。 这是一个无限的混合模型,这意味着该方法可以推断出最佳数量的群落类型。 请注意,群落类型的数量可能会随数据大小而增长。
library(microbiome)
library(DirichletMultinomial)
library(reshape2)
library(magrittr)
library(dplyr)
# Load example data
data(dietswap)
pseq <- dietswap
# To speed up, only consider the core taxa
# that are prevalent at 0.1% relative abundance in 50% of the samples
# (note that this is not strictly correct as information is
# being discarded; one alternative would be to aggregate rare taxa)
pseq.comp <- microbiome::transform(pseq, "compositional")
taxa <- core_members(pseq.comp, detection = 0.1/100, prevalence = 50/100)
pseq <- prune_taxa(taxa, pseq)
# Pick the OTU count matrix
# and convert it into samples x taxa format
dat <- abundances(pseq)
count <- as.matrix(t(dat))
拟合 DMM 模型.,让我们将群落类型的最大允许数量设置为3,以加速示例。
fit <- lapply(1:3, dmn, count = count, verbose=TRUE)
## Soft kmeans
## Expectation Maximization setup
## Expectation Maximization
## Hessian
## Soft kmeans
## iteration 10 change 0.000029
## Expectation Maximization setup
## Expectation Maximization
## iteration 10 change 0.000000
## Hessian
## Soft kmeans
## iteration 10 change 0.030731
## iteration 20 change 0.000110
## Expectation Maximization setup
## Expectation Maximization
## iteration 10 change 0.000063
## Hessian
判断拟合效果
lplc <- sapply(fit, laplace) # AIC / BIC / Laplace
aic <- sapply(fit, AIC) # AIC / BIC / Laplace
bic <- sapply(fit, BIC) # AIC / BIC / Laplace
#plot(lplc, type="b", xlab="Number of Dirichlet Components", ylab="Model Fit")
#lines(aic, type="b", lty = 2)
#lines(bic, type="b", lty = 3)
选择最佳模型
best <- fit[[which.min(unlist(lplc))]]
参数pi及theta
mixturewt(best)
## pi theta
## 1 0.3738027 159.10473
## 2 0.3188891 81.91265
## 3 0.3073082 64.24696
元素(otu)分配给不同cluster
ass <- apply(mixture(best), 1, which.max)
每个otu对每个组成群落的贡献
for (k in seq(ncol(fitted(best)))) {
d <- melt(fitted(best))
colnames(d) <- c("OTU", "cluster", "value")
d <- subset(d, cluster == k) %>%
# Arrange OTUs by assignment strength
arrange(value) %>%
mutate(OTU = factor(OTU, levels = unique(OTU))) %>%
# Only show the most important drivers
filter(abs(value) > quantile(abs(value), 0.8))
p <- ggplot(d, aes(x = OTU, y = value)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Top drivers: community type", k))
print(p)
}
更多推荐
已为社区贡献1条内容
所有评论(0)