The secret ingredient behind the solution that includes matrix division is the Vandermonde matrix https://en.wikipedia.org/wiki/Vandermonde_matrix. The question discusses a linear problem (linear regression), and those can always be formulated as a matrix problem, which \
(mldivide
) can solve in a mean-square error sense‡ https://chat.stackoverflow.com/transcript/message/27426179#27426179. Such an algorithm, solving a similar problem, is demonstrated and explained in this answer https://math.stackexchange.com/a/260896.
Below is benchmarking code that compares the original solution with two alternatives suggested in chat1 https://chat.stackoverflow.com/transcript/message/27425670#27425670, 2 https://chat.stackoverflow.com/transcript/message/27426276#27426276 :
function regressionBenchmark(numEl)
clc
if nargin<1, numEl=10; end
%// Define some nominal values:
R = 5;
IT = 600:600:3000;
P = 0.97;
%// Impose some believable spatial variations:
pMAT = 0.01*randn(numEl)+P;
rMAT = 0.1*randn(numEl)+R;
%// Generate "fake" measurement data using the relation "DL = R*IT.^P"
dlMAT = bsxfun(@times,rMAT,bsxfun(@power,permute(IT,[3,1,2]),pMAT));
%% // Method1: loops + polyval
disp('-------------------------------Method 1: loops + polyval')
tic; [fR,fP] = method1(IT,dlMAT); toc;
fprintf(1,'Regression performance:\nR: %d\nP: %d\n',norm(fR-rMAT,1),norm(fP-pMAT,1));
%% // Method2: loops + Vandermonde
disp('-------------------------------Method 2: loops + Vandermonde')
tic; [fR,fP] = method2(IT,dlMAT); toc;
fprintf(1,'Regression performance:\nR: %d\nP: %d\n',norm(fR-rMAT,1),norm(fP-pMAT,1));
%% // Method3: vectorized Vandermonde
disp('-------------------------------Method 3: vectorized Vandermonde')
tic; [fR,fP] = method3(IT,dlMAT); toc;
fprintf(1,'Regression performance:\nR: %d\nP: %d\n',norm(fR-rMAT,1),norm(fP-pMAT,1));
function [fittedR,fittedP] = method1(IT,dlMAT)
sol = cell(size(dlMAT,1),size(dlMAT,2));
for ind1 = 1:size(dlMAT,1)
for ind2 = 1:size(dlMAT,2)
sol{ind1,ind2} = polyfit(log(IT(:)),log(squeeze(dlMAT(ind1,ind2,:))),1);
end
end
fittedR = cellfun(@(x)exp(x(2)),sol);
fittedP = cellfun(@(x)x(1),sol);
function [fittedR,fittedP] = method2(IT,dlMAT)
sol = cell(size(dlMAT,1),size(dlMAT,2));
for ind1 = 1:size(dlMAT,1)
for ind2 = 1:size(dlMAT,2)
sol{ind1,ind2} = flipud([ones(numel(IT),1) log(IT(:))]\log(squeeze(dlMAT(ind1,ind2,:)))).'; %'
end
end
fittedR = cellfun(@(x)exp(x(2)),sol);
fittedP = cellfun(@(x)x(1),sol);
function [fittedR,fittedP] = method3(IT,dlMAT)
N = 1; %// Degree of polynomial
VM = bsxfun(@power, log(IT(:)), 0:N); %// Vandermonde matrix
result = fliplr((VM\log(reshape(dlMAT,[],size(dlMAT,3)).')).');
%// Compressed version:
%// result = fliplr(([ones(numel(IT),1) log(IT(:))]\log(reshape(dlMAT,[],size(dlMAT,3)).')).');
fittedR = exp(real(reshape(result(:,2),size(dlMAT,1),size(dlMAT,2))));
fittedP = real(reshape(result(:,1),size(dlMAT,1),size(dlMAT,2)));
方法2之所以能够向量化为方法3,本质上是因为矩阵乘法可以通过第二个矩阵的列来分隔。如果A*B
产生矩阵X
,那么根据定义A*B(:,n)
gives X(:,n)
对于任何n
。移动A
到右侧mldivide
,这意味着划分A\X(:,n)
可以一劳永逸n
with A\X
。这同样适用于超定系统 https://en.wikipedia.org/wiki/Overdetermined_system(线性回归问题),其中一般没有精确解,并且mldivide
找到矩阵最小化均方误差 https://en.wikipedia.org/wiki/Overdetermined_system#Approximate_solutions。在这种情况下,操作也A\X(:,n)
(方法2)可以一次性全部完成n
with A\X
(方法3)。
增加大小时改进算法的影响dlMAT
如下所示:
对于以下情况500*500
(or 2.5E5
) 元素,加速比Method 1
to Method 3
是关于x3500!
观察也很有趣output http://www.mathworks.com/help/matlab/matlab_prog/profiling-for-improving-performance.html#f9-17206 of profile http://www.mathworks.com/help/matlab/ref/profile.html(这里以500*500的情况为例):
Method 1
Method 2
Method 3
从上面可以看出,通过重新排列元素squeeze http://www.mathworks.com/help/matlab/ref/squeeze.html and flipud http://www.mathworks.com/help/matlab/ref/flipud.html占用大约一半(!)的运行时间Method 2
。还可以看出,溶液从细胞到基质的转换会损失一些时间。
由于第三个解决方案避免了所有这些陷阱,以及完全的循环(这主要意味着在每次迭代时重新评估脚本) - 毫不奇怪,它会带来相当大的加速。
Notes:
- “压缩”版本和“显式”版本之间几乎没有什么区别
Method 3
赞成“明确”的版本。因此,它没有被纳入比较。
- 尝试了一种解决方案,其中输入
Method 3
were gpuArray
-ed。这并没有提供改进的性能(甚至在某种程度上降低了性能),可能是由于错误的实现,或者与在 RAM 和 VRAM 之间来回复制矩阵相关的开销。