高斯混合模型的matlab实现(转)

高斯混合函数实现部分是基本上是转载的的pluskid大神文章里的里的代码,加了一点注释,并根据他给的方法二解决 covariance 矩阵 singular 的问题。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
function varargout = gmm(X, K_or_centroids)
% ============================================================
%转载自http://blog.pluskid.org/?p=39
% Expectation-Maximization iteration implementation of
% Gaussian Mixture Model.
%
% PX = GMM(X, K_OR_CENTROIDS)
% [PX MODEL] = GMM(X, K_OR_CENTROIDS)
%
% - X: N-by-D data matrix.%需要注意的是这里的X包括了全部
% - K_OR_CENTROIDS: either K indicating the number of
% components or a K-by-D matrix indicating the
% choosing of the initial K centroids.
%
% - PX: N-by-K matrix indicating the probability of each
% component generating each point.
% - MODEL: a structure containing the parameters for a GMM:
% MODEL.Miu: a K-by-D matrix.
% MODEL.Sigma: a D-by-D-by-K matrix.
% MODEL.Pi: a 1-by-K vector.
% ============================================================
threshold = 1e-15;
[N, D] = size(X);
if isscalar(K_or_centroids)
K = K_or_centroids;
% randomly pick centroids
rndp = randperm(N);
centroids = X(rndp(1:K),:);
else
K = size(K_or_centroids, 1);
centroids = K_or_centroids;
end
% initial values
[pMiu pPi pSigma] = init_params();
Lprev = -inf;
while true
Px = calc_prob();%计算N(x|mu,sigma)
% new value for pGamma
pGamma = Px .* repmat(pPi, N, 1);%估计 gamma 是个N*K的矩阵
pGamma = pGamma ./ repmat(sum(pGamma, 2), 1, K);%对矩阵的理解真是出神入化,
% new value for parameters of each Component
Nk = sum(pGamma, 1);%N_K
pMiu = diag(1./Nk) * pGamma' * X; %数字 *( K-by-N * N-by-D)加个括号有助理解
pPi = Nk/N;
for kk = 1:K
Xshift = X-repmat(pMiu(kk, : ), N, 1);%x-u
pSigma(:, :, kk) = (Xshift' * ...
(diag(pGamma(:, kk)) * Xshift)) / Nk(kk);%更新sigma
end
% check for convergence
L = sum(log(Px*pPi'));
if L-Lprev < threshold
break;
end
Lprev = L;
end
if nargout == 1
varargout = {Px};
else
model = [];
model.Miu = pMiu;
model.Sigma = pSigma;
model.Pi = pPi;
varargout = {pGamma, model};%注意!!!!!这里和大神代码不同,他返回的是px,而我是 pGamma
end
function [pMiu pPi pSigma] = init_params()%初始化参数
pMiu = centroids;% K-by-D matrix
pPi = zeros(1, K);%1-by-K matrix
pSigma = zeros(D, D, K);%
% hard assign x to each centroids
distmat = repmat(sum(X.*X, 2), 1, K) + ... % X is a N-by-D data matrix.
repmat(sum(pMiu.*pMiu, 2)', N, 1) - ...% X->K列 U->N行 XU^T is N-by-K
2*X*pMiu';%计算每个点到K个中心的距离
[~, labels] = min(distmat, [], 2);%找到离X最近的pMiu,[C,I] labels代表这个最小值是从那列选出来的
for k=1:K
Xk = X(labels == k, : );% Xk是所有被归到K类的X向量构成的矩阵
pPi(k) = size(Xk, 1)/N;% 数一数几个归到K类的
pSigma(:, :, k) = cov(Xk); %计算协方差矩阵,D-by-D matrix,最小方差无偏估计
end
end
function Px = calc_prob()
Px = zeros(N, K);
for k = 1:K
Xshift = X-repmat(pMiu(k, : ), N, 1);%x-u
lemda=1e-5;
conv=pSigma(:, :, k)+lemda*diag(diag(ones(D)));%这里处理singular问题,为协方差矩阵加上一个很小lemda*I
inv_pSigma = inv(conv);%协方差的逆
tmp = sum((Xshift*inv_pSigma) .* Xshift, 2);%(X-U_k)sigma.*(X-U_k),tmp是个N*1的向量
coef = (2*pi)^(-D/2) * sqrt(det(inv_pSigma));%前面的参数
Px(:, k) = coef * exp(-0.5*tmp);%把数据点 x 带入到 Gaussian model 里得到的值
end
end
end
%repmat 通过拓展向量到矩阵
%inv 求逆
%min 求矩阵最小值,可以返回标签
%X(labels == k, : ) 对行做筛选
% size(Xk, 1) 求矩阵的长或宽
%scatter 对二维向量绘图

注意:

pluskid大神这里最后返回的是px,我觉得非常奇怪,因为PRML里对点做hard assignment时是根据后验概率来判别的。于是我在大神博客上问了一下,他的解释是最大似然和最大后验的区别,前者是挑x被各个模型产生的概率最大的那个,而后者加上了先验知识,各有道理。一句话就茅塞顿开,真大神也~

Read More

高斯混合模型参数估计详细推导过程

已知多元高斯分布的公式: \[N(x|\mu,\Sigma)=\frac{1}{(2\pi)^{D/2}|\Sigma|^{1/2}}\exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))\] 其中\(D\)为维度,\(x\)\(\mu\)均为\(D\)维向量,协方差\(\Sigma\)为D维矩阵。我们求得后验概率: \[w^{(i)}_j=Q_i(Z^{i}=j)=P(z^{(i)}=j|x^{(i)};\phi,\mu,\Sigma)\] 在E步,\(w^{(i)}_j\)是一个固定值,然后我们用它来估计似然函数\(L(X,Z;\theta)\)(这里\(\theta=(\phi,\mu,\Sigma)\))在分布\(Z\sim P(Z|X;\theta)\)上的期望\(E_{Z|X,\theta_t}[L(X,Z;\theta)]\)(式子1): \[\begin{split} & \sum^m_{i=1}\sum_{z^{(i)}} Q_i(z^{(i)})\log{\frac{p(x^{(i)},z^{(i)};\phi,\mu,\Sigma)}{Q_i(z^{(i)})}} \\& =\sum^m_{i=1}\sum^k_{j=1} Q_i(z^{(i)}=j)\log{\frac{p(x^{(i)}|z^{(i)}=j;\mu,\Sigma)p(z^{(i)}=j;\phi)}{Q_i(z^{(i)})}} \\& =\sum^m_{i=1}\sum^k_{j=1} w^{(i)}_j\log{\frac{\frac{1}{(2\pi)^{D/2}|\Sigma|^{1/2}}\exp(-\frac{1}{2}(x^{(i)}-\mu_j)^T\Sigma_j^{-1}(x^{(i)}-\mu_j))\cdot\phi_j}{ w^{(i)}_j}} \\\end{split}\] 由于分母\(w^{(i)}_j\)在取对数之后是常数,与参数无关,求导时自然会变成0,所以我们写公式的时候为了简便舍去分母。

Read More