%% Mann-Kendall Tau (aka Tau-b) with Sen's Method (enhanced)
% A non-parametric monotonic trend test computing Mann-Kendall Tau, Tau-b,
% and Sen? Slope written in Mathworks-MATLAB implemented using matrix
% rotations.
%
% Suggested citation:
%
% Burkey, Jeff. May 2006. A non-parametric monotonic trend test computing
% Mann-Kendall Tau, Tau-b, and Sen? Slope written in Mathworks-MATLAB
% implemented using matrix rotations. King County, Department of Natural
% Resources and Parks, Science and Technical Services section.
% Seattle, Washington. USA.
% http://www.mathworks.com/matlabcentral/fileexchange/authors/23983
%
%!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
%
% Important Note:
% I have also posted a Seasonal Kendell function at Mathworks
% sktt.m
%
%http://www.mathworks.com/matlabcentral/fileexchange/22389-seasonal-kendall
%-test-with-slope-for-serial-dependent-data
%!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
%
% revised 12/1/2008- Computed variance now takes into account ties in the
% time index with multiple observations per index.
% Added in confidence intervals for Sens Slope
%
% Syntax:
% [taub tau h sig Z S sigma sen n senplot CIlower CIupper D Dall C3]
% = ktaub(datain, alpha, wantplot)
%
% where:
% datain = (N x 2) double
% alpha = (scalar) double
% wantplot is a flag
% ~= 0 means create plot, otherwise do not plot
%
% taub = Mann-Kendall coefficient adjusted for ties
% tau = Mann-Kendall coefficient not adjusted for ties
% n(n-1)/2
% h = hypothesis test (h=1 : is significant)
% sig = p value (two tailed)
% Z = Z score
% sigma = standard deviation
% sen = sen's slope
% plotofslope = data used to plot data and sen's slope
% cilower = lower confidence interval for sen's slope
% ciupper = upper confidence interval for sen's slope
%
% These next two variables are output because they are needed in the
% Seasonal Kendall function: sktt.m
% D = denominator used for calculating Tau-b
% Dall = denominator used for calculating Tau
% C3 = individual seasonal slopes aggregated for Sens Seasonal Slope
% nsigma = an assumed variance with all ties reconsidered but set to
% equal number of positive and negative differences.
%
%
% Modifications:
% 12/1/2008 - Taub = denominator is adjusted
% Tau = denominator is NOT adjusted
% (matches USGS Kendall.exe output)
%
% 1/17/2009 - checks for anomalies and provides solutions and/or
% notifications. In support of this for the Seasonal Kendall
% (which was also updated 1/17/2009), another term was added to
% the output of this function-- nsigma.
%
% 6/15/2011 - updated plotting confidence intervals on Sen's slope. I
% don't recall my source on developing it, so use at your own
% peril. That said using the 95/5 percentiles of the comptued
% individual slopes seems to be reasonable-- I just don't have
% a paper to back it up.
%
% 7/23/2011 - added a test for matlab version.
%
% 8/21/2013 - added citation supporting method of computing confidence
% intervals for Sen's slope. Thanks goes to Dr. Jeff Thompson
% for providing the source that backs up my ginned up method
% (Line 320).
%
% When calculating trends, there are a few situations that create anomalies
% in the estimates. Foremost, when S = 0, significance will always be
% 100-percent, which is not possible. When this occurs the p-value is
% adjusted by using the computed variance, but assuming S = 1. However,
% the output from the function will still show S=0 when the case arises.
%
% Secondly, a statistically significant slope = 0 can occur when there are
% a large number of ties in the data. In this case, a second test is done
% assuming the number of ties is equal an even number positive and negative
% differences. Significance is tested again, but the output is only sent
% to the screen. The p-value returned in the function is still the
% original p-value (and the adjusted p-value for serial correlation).
%
% Kendall Tau-b is useful when there are a significant number of
% artificially induced tied values. For example, water quality data that
% has infilled non-detects with half MDL's. This would create a false
% number of ties simply because of the common technique of infilling an
% unknown quantity with a known estimate of a quantity.
%
% These test for anomalies and solutions are based on an unpublished paper:
%
% Anomalies and remedies in non-parametric seasonal trend tests and
% estimates by Graham McBride, National Institute of Water & Atmospheric
% Research, Hamilton New Zealand. March 2000.
%
%
% Note: if data are not temporally evenly spaced, Sen's slope becomes
% inaccurate (a future TODO).
%
% Requirements: Statistics Toolbox
% or comment out ztest and manually determine significance
%
% This function is coded without any loops. While this is extremely fast,
% it does have some limitations to computer memory. For example, if a data
% set is around 10,000 in length, 1.0 GB RAM may or may not work.
%
% However, I would imagine if memory is a problem for someone with large
% datasets, a person could convert the necessary variables and statements
% to work with sparse matrices. Which may slow down this function on
% smaller datasets, but the memory issue would greatly diminish.
%
% Sen's methodology was added in by Curtis DeGasperi- King County, DNRP.
% reference: Statistical Methods for Environmental Pollution Monitoring,
% Richard O. Gilbert 1987, ISBN: 0-442-23050-8
%
% A couple of resources for Mann-Kendall statistics.
%
% Statistical Methods in Water Resources
% By D.R. Helsel and R.M. Hirsch
% http://water.usgs.gov/pubs/twri/twri4a3/
%
% Computer Program for the Kendall Family of Trend Tests
% by Dennis R. Helsel, David K. Mueller, and James R. Slack
% Scientific Investigations Report 2005-5275
% http://pubs.usgs.gov/sir/2005/5275/downloads/
%
%
% Written by Jeff Burkey
% King County, Department of Natural Resources and Parks
% 4/18/2006
% 12/4/2008 (significantly revised)
% email: jeff.burkey@kingcounty.gov
%
% [taub tau h sig Z S sigma sen n senplot CIlower CIupper D Dall C3] = ktaub(datain, alpha, wantplot)
function [taub,tau,h,sig,sen] = ktaub(datain, alpha, wantplot)
try
%% Check MATLAB version for compatibility
% Added this version trap since the most common comment I get is
% they syntax is in error. So far when people get this error it's
% because they are using an old version of matlab that does not
% accept some of the variable names in this function.
% 7/23/2011 - JJB
vmat = regexp(version,'\d+','match');
vr = [str2double(vmat{1}) str2double(vmat{2})];
if vr(1) == 7
if vr(2) >=9 || vr(1) > 7
% Then matlab version should work
else
txt = 'Your version of matlab is %s. \nYou need at least version 7.9 to run.\n Some of the syntax will not parse correctly.';
warning(txt,version)
end
end
catch msg
error('Matlab version is way too old to run this function.');
end
% wantplot is a flag to create a figure or not default set to no
if exist('wantplot','var') == 0
% user didn't provide assume zero (i.e. no plot)
wantplot = 0;
end
% Data are assumed to be in long columns, hence 'sortrows'
sorted = sortrows(datain,1);
% remove any NaNs, if data are missing they should not be included as
% NaNs.
sorted(any(isnan(sorted),2),:) = [];
% return n after removing any NaNs
n = size(sorted,1);
% set to NaN if trend is not significant
senplot = NaN;
% extract out the data
row1 = sorted(:,1)';
row2 = sorted(:,2)';
clear sorted;
L1 = length(row1);
L2 = L1 - 1;
% find ties
ro1 = sort(row1)';
ro2 = sort(row2)';
[~,b] = unique(ro1);
[~,e] = unique(ro2);
clear a c d f ro1 ro2;
% correcting loss of first value using diff on with unique
if b(1,1) > 1
ta = b(1,1);
else
ta = 1;
end
if e(1,1) > 1
tb = e(1,1);
else
tb = 1;
end
bdiff = [ta; diff(b)];
ediff = [tb; diff(e)];
clear ta tb b e;
%% Determine ties used for computing adjusted variance
tp = sum(bdiff .* (bdiff - 1) .* (2 .* bdiff + 5));
uq = sum(ediff .* (ediff - 1) .* (2 .* ediff + 5));
% modified 12/1/2008
% adjustments when time index has multiple observations
d1 = 9 * L1 * (L1-1) * (L1-2);
tu1 = sum(bdiff .* (bdiff - 1) .* (bdiff - 2)) * sum(ediff .* (ediff - 1) .* (ediff - 2)) / d1;
d2 = 2 * L1 * (L1-1);
tu2 = sum(bdiff .* (bdiff - 1)) * sum(ediff .* (ediff -1)) / d2;
% ties used for adjusting denominator in Tau
t1a = (sum(bdiff .* (bdiff - 1))) / 2;
t2a = (sum(ediff .* (ediff - 1))) / 2;
% create matricies to be used for substituting values as indicies
m1 = repmat((1:L2)',[1 L2]);
m2 = repmat((2:L1)',[1 L2])';
% populate matrixes for analysis
A1 = triu(row1(m1));
A2 = triu(row1(m2));
B1 = triu(row2(m1));
B2 = triu(row2(m2));
clear m1 m2 row1 row2;
% Perform pair comparison and convert to sign
A = sign(A1 - A2);
B = sign(B1 - B2);
%% Perform operations to calculate Sen's Slope
% Median of rate of change among all data points- CLD added 5/3/2006
A3 = reshape((A1 - A2),L2*L2,1);
B3 = reshape((B1 - B2),L2*L2,1);
a = find(A3~=0);
C3 = sort(B3(a)./A3(a));
sen = median(C3);
clear A1 A2 B1 B2 A3 B3 a;
%% Evaluate concordant and discordant
% +1 = concordant
% -1 = discordant
% 0 = tie
C = A.*B;
% Compute S
S = sum(sum(C,2));
clear A B C;
% Calculate denominator with ties removed Tau-b
D = sqrt(((.5*L1*(L1-1))-t1a)*((.5*L1*(L1-1))-t2a));
% Calcuation denominator no ties removed Tau
Dall = L1 * (L1 - 1) / 2;
% (modified 12/1/2008: added tau)
tau = S / Dall;
taub = S / D;
% adjust for normal approximations and continuity
if S > 0
s = S -1;
elseif S < 0
s = S + 1;
elseif S == 0
s = 0;
elseif isnan(S)
error('ErrorTrend:ktaub', 'This function cannot process NaNs. \nPlease remove data records with NaNs.\n');
end
%% Test for abnormalities in data
% Certain conditions can occur that potentially invalidate this
% statistic. Conditional statements conduct some tests and provide the
% user some feedback.
if S==1
% Notify user continuity correction is setting S = 0
fprintf('\nTaub Message: When absolute value S=1,');
fprintf('\n Continuity correction is setting S = 0.');
fprintf('\n This will affect calculated significance.\n');
end
% compute square-root of variance with all ties accounted for in time
% index and in observation values. - JJB 12/1/2008
sigma = sqrt(((L1*(L1-1)*(2*L1 + 5) - tp - uq) / 18) + tu1 + tu2);
% nsigma is used if slope is zero and determined significant. It is
% hypothesized that all ties can be represented as an equal number of
% postive and negative slopes.
nsigma = sqrt(L1*(L1-1)*(2*L1+5));
Z = s / sigma;
% Estimate confidence intervals of Sen's slope
% The next line requires STATISTICS Toolbox (norminv)
% Zup is a 2-tail Z (i.e. alpha/2)
%
% Hollander, M. and Wolfe, D. 1973, Nonparametric statistical methods,
% Wiley, New York. Chapter 9 (Regression problems involving slope),
% Section 3 (A distribution-free confidence interval based on the
% Theil test; p. 207 - 208)
Zup = norminv(1-alpha/2,0,1);
Calpha = Zup * sigma;
Nprime = length(C3);
M1 = (Nprime - Calpha)/2;
M2 = (Nprime + Calpha)/2 + 1;
% 2-tail limits
CIlower = interp1q((1:Nprime),C3,M1);
CIupper = interp1q((1:Nprime),C3,M2);
% clear M1 M2 NPrime Zup Calpha
% h = 1 : means significance
% h = 0 : means not significant (i.e. sig < z(sig))
if s==0
% Not possible to be 100% certain, force S = 1 and compute p-value
% using sigma.
[h, sig] = ztest(1,0,sigma,alpha);
fprintf('\nTaub Message: S = 0. P-value cannot = 100-percent. ');
fprintf('\n P-value is adjusted using S = 1 and should be reported as p > %1.5f.\n',sig);
if sen~=0
fprintf('\nTaub Message: A non-zero Sens slope occurred when S =0.');
fprintf('\n This is not an error, more a notification.');
fprintf('\n This anomaly may occur because the median may be computed');
fprintf('\n on one value equal to zero and one non-zero, etc.\n');
end
else
[h, sig] = ztest(s,0,sigma,alpha);
end
% Notify for Sens slope = 0 but is determined significant
if h==1 && sen==0
[hh, nsig] = ztest(s,0,nsigma,alpha);
fprintf('\nTaub Message: There was a significant trend = 0 found.\n');
fprintf(' Retested with ties set to equal number of positve and negative values.\n');
fprintf(' New p-value = %1.5f',nsig);
if hh==1
fprintf('. However trend still found to be significant.\n');
else
fprintf(', but trend is not found to be significant.\n');
end
end
%% Plotting routine
% Below is a very simplistic plotting routine to plot the Sen slope if
% the significance is less than 0.05. Uncomment or delete at your
% leisure.
if sig<=alpha && wantplot ~= 0 %A plotting example CLD added 5/3/2006
% Revised plotting 6/14/2011 - JJB
% Plots the slopes using the median value as the focus point for
% all three slopes (Sen's and confidence slopes). Is this the
% correct method? Don't know but seems reasonable. - JJB 6/15/2011
hold on
%generate points to represent median slope
%zero time for the calculation is the first time point
vv = median(datain(:,2));
middata = datain(round(length(datain)/2),1);
slope = vv + sen*(datain(:,1)-middata);
senplot = [datain(:,1) slope];
plot(datain(:,1),datain(:,2),'o')
plot(datain(:,1),slope,'-')
% add confidence intervals
slope = vv + CIlower*(datain(:,1)-middata);
plot(datain(:,1),slope,'--');
slope = vv + CIupper*(datain(:,1)-middata);
plot(datain(:,1),slope,'--');
box on
grid on
hold off
% pause
end
end