Citat:
Ursprungligen postat av
Xenonen
Kulturkrocken när professionella statistiker tittar in i klimatologernas förunderliga värld tar sig smått komiska uttryck i McShane och Wyners svar på Manns virriga kritik.
Tillgång till data och kod:
Citat:
Before embarking on our discussion of their [SMR - Schmidt, Mann & Rutherford] work, we must mention that, of the five discussants who performed analyses (DL, Kaplan, SMR, Smerdon, and Tingley), SMR was the only one who provided an incomplete and generally unusable repository of data and code.
Körsbärsplockning:
Citat:
SMR allege that we have applied the various methods in Sections 4 and 5 of our paper to an inappropriately large group of 95 proxies which date back to 1000 AD (93 when the Tiljander lightsum and thicknessmm series are removed due to high correlation as in our paper; see footnote 11). In contrast, the reconstruction of Mann et al. (2008) is applied to a smaller set of 59 proxies (57 if the two Tiljander series mentioned previously are removed; 55 if all four Tiljander series are excluded because they are “potentially contaminated”).
The process by which the complete set of 95/93 proxies is reduced to 59/57/55 is only suggestively described in an online supplement to Mann et al. (2008). As statisticians we can only be skeptical of such improvisation, especially since the instrumental calibration period contains very few independent degrees of freedom. Consequently, the application of ad hoc methods to screen and exclude data increases model uncertainty in ways that are unmeasurable and uncorrectable.
Principalkomponentanalys:
Citat:
SMR Figure 1c replots our Bayes model (Figure 16 of the paper) with two differences: it uses the reduced dataset of 55 proxies and only four principal components. There are no statistically significant differences between the resulting model and our original one (see SI), yet SMR allege that “K = 10 principal components is almost certainly too large, and the resulting reconstruction likely suffers from statistical over-fitting. Objective selection criteria applied to the Mann et al. (2008) AD 1000 proxy network, as well as independent “pseudoproxy” analyses discussed below, favor retaining only K = 4.”
SMR are wrong on two counts. First, the two “objective” criteria they suggest select differing numbers of principal components. Second, each criterion has multiple implementations each producing different results. As is well known to statisticians, there is no single objective way to resolve these discrepancies. Furthermore, the PC selection procedures that SMR prefer select “significant” PCs based entirely on the matrix of predictors without considering the response variable. To protect against overfitting, the selection process should in some way take into account the relationship between the predictor and the response [see also Izenman (2008), Hastie, Tibshirani and Friedman (2009)]. Compounding matters, SMR implement their allegedly objective criteria in nonstandard and arbitrary ways and several times in error. When correctly implemented, the number of principal components retained varies across each “objective” criterion from two to fifty-seven. Using ten principal components, therefore, can hardly be said to induce the “statistical over-fitting” claimed by SMR.
Linjär anpassning:
Citat:
Fortunately, we are able to use the data and code provided to us to rebut SMR’s findings. Before proceeding, however, we must note a troubling problem with SMR Figure 2. Visual inspection of the plots reveals an errant feature: OLS methods appear to have nonzero average residual in-sample! Upon examining the code SMR did provide, we confirmed that this is indeed the case. The culprit is an unreported and improper centering of the data subsequent to the model fits, resulting in biased estimates and uncalibrated confidence intervals.
Och det där är bara några exempel. Att klimatologer inte förstår sig på avancerade statistiska metoder är väl en sak, men att summan av residualerna är noll vid en vanlig linjär minsta kvadrat-anpassning torde alla som läst en introduktionskurs i linjär algebra veta.