```{r, echo = FALSE, message = FALSE}
library(treeclim)
set.seed(42)
```
# Introduction
## Basic Usage
**treeclim** is an R package for the numerical calibration of
proxy-climate relationships, developed with a focus on tree-ring data.
The workhorse function of **treeclim** is `dcc`, shorthand for "dendroclimatic calibration". In
its most simplest case, a growth-climate analysis can be performed
like this:
```{r, warning=FALSE, message=FALSE}
dc1 <- dcc(muc_spruce, muc_clim)
```
For this first example, we use data that comes with **treeclim**: a
chronology of _Picea abies_ growing near Munich (Bavaria, Germany)
together with corresponding monthly climate data (temperature and
precipitation). The default configuration of `dcc`is to perform a
bootstrapped response function analysis (Guiot 1991), using monthly climate
variables from previous June to current September. The main difference
here to other software (e.g., Dendroclim2002, Biondi and Waikul 2004) is that **treeclim**
uses stationary bootstrapping (Politis and Romano 1994) by default, to account for potentially
autocorrelated tree-ring data.
The resulting coefficients can be viewed by calling:
```{r}
dc1
```
We see a table listing the monthly climatic predictors that have been
generated by **treeclim** with the value of their corresponding
coefficients, significance flags and confidence intervals. In this
example, none of the climatic predictors turned out to be significant
at a level of p < 0.05. (One important notice: the object returned by
`dcc` is actually more complex than the table we get when print it. To
access the coefficients, e.g. for writing your own plotting functions,
you could use `coef(dc1)`.)
A basic plot is obtained by issueing:
```{r, fig.height=5, fig.width=14}
plot(dc1)
```
We will see later how this plot can be styled to meet your personal
preferences.
This was a very quick and basic introduction to the main functionality
of **treeclim**. Everything else beyond that is refining the
specification of your analysis. The next section details on the
conceptual framework of **treeclim** and how it reflects in its design
and usage.
Caution! Please make sure to have a recent version of treeclim (version >= 1.0.15) installed when you follow along the examples in this document.
## Citation
I enjoy writing **treeclim** and helping users, and Franco Biondi (of
Dendroclim2002 fame) enjoys mentoring my work and collaborating closely
to set the direction for this software package. Additional to
improving citation metrics, getting proper credits for our work
through correct citation also helps us to get an idea about who uses
**treeclim** and for what. So, please make sure to cite **treeclim**
correctly when you use it in your analyses:
```{r}
print(citation("treeclim"), style = "text")
```
The same applies to *R* in general and *dplR*, when you use it
together with **treeclim** (which you definetively should, of course).
# The Five Elements of Growth-Climate-Relationships
Everyone looking at tree-growth as a function of climate will have to
consider at least five elements of the analysis to be carried out:
- _Data_: which tree-ring and climate data shall be used
- _Design_: which climatic variables are considered interesting _a
priori_, i.e. based on your assumptions about what is limiting tree
growth for the site you are studying
- _Scope_: which subset of tree-ring and climate data shall be
considered for the analysis
- _Method_: which method to use, i.e. response functions, correlation
functions, or seasonal correlations
- _Dynamics_: shall the analysis be carried out in a static fashion,
yielding one set of coefficients, or shall it be done in a dynamic
fashion, allowing to look at time-varying response of tree-growth to
climate.
The design of **treeclim** allows to build an analytic strategy
considering these five elements. In the following, we will look at
each element in some detail.
## Data
**treeclim** expects a tree-ring chronology as input, ideally as
produced by the `chron` function in *dplR*. For climate data, two
general formats are accepted:
1. _Year-Month_ format: here, the climate data is formatted as a
`data.frame` with columns `year`, `month`, and one column for each
climatic variable like so:
```{r, echo = FALSE}
head(muc_clim)
```
It is important to note that **treeclim** can only work with data for
which all years are complete, which means you might have to truncate
your climate data if this is not the case.
2. _13-Column_ format: here, for each climate variable a `data.frame`
or numerical matrix is supplied with year as first column and
monthly observations in columns 2 to 13 like so:
```{r, echo=FALSE}
head(rt_prec)
```
To use more than one variable with this data format, you can supply
them as a list to dcc like so: `dcc(chrono = rt_spruce, climate =
list(rt_temp, rt_prec))`
For both climate data formats, there is no limit on the number of
variables that can be used for the analyses. Please note that no missing values are allowed for both formats, so if your climate data contains missing values, consider interpolation or other imputation methods.
For the _13-Column_ format, there are no obvious variable names for
the climatic variables. If you want to have names, you can supply them
using the `var_names` argument like so: `dcc(chrono = rt_spruce, climate =
list(rt_temp, rt_prec), var_names = c("tmp", "pre"))`. This also works
for renaming variables coming from the _Year-Month_ format.
## Design
Being able to design a customized matrix of predictors is probably the
most interesting part about `dcc`. The default design matrix will
select all months from previous June to current September from all
variables supplied to `dcc`. Since this is most of the times a good
starting point, but not necessarily the ideal design matrix, you can
easily specify a custom design matrix using a set of simple commands.
### Modifiers
In **treeclim**, there are three modifier functions for building the
design matrix.
- `.range`: for selecting all months individually
- `.mean`: for building the mean over the specified months
- `.sum`: for summing up over the specified months
All modifiers have the same structure of `MODIFIER(.months,
.variables)`, i.e. they take as arguments a number of months, and the
variables for which the modifier shall be applied. Both arguments can
be omitted; when `.months` is omitted, the default set of months
(previous June to current September) is used, when `.variables` is
omitted, the modifier will be applied to all available climate
variables.
Here are two examples:
```{r, message=FALSE, warning=FALSE}
dc2 <- dcc(muc_spruce, muc_clim, .range(-6:9, c("temp", "prec")))
rownames(dc2$coef)
```
This is identical with the default configuration and can be written more
concise as `dcc(muc_spruce, muc_clim, -6:9)`. Another example:
```{r, message=FALSE, warning=FALSE}
dc3 <- dcc(muc_spruce, muc_clim, .sum(5:9, c("temp", "prec")))
dc3
```
In this example, we did use all individual months, but as a sum of
temperature and precipitation for the given period of time, resulting
in much smaller design matrix with only two predictor variables. Since
we only have the variables `temp` and `prec` in `muc_clim`, we could
have written this more concise as `dcc(muc_spruce, muc_clim,
.sum(5:9))`.
### Exclusion
It can be desireable to exclude some time spans from analysis, for
example dormant seaons. For this task, **treeclim** has the function
`exclude_from` (or `exfr` for short) to exclude specific months from
being integrated into the predictors build using the modifier
functions.
```{r, message=FALSE, warning=FALSE}
dc4 <- dcc(muc_spruce, muc_clim, .range(exfr(-6:9, -10:2)))
rownames(dc4$coef)
```
In this example, we excluded autumn and winter months October (previous year) to February (current year) from the analysis. The remaining months from `.range(-6:9)` were selected as individual months for building the design matrix, because we used the `.range` modifier. Often, as here, it is not necessary to explicitly use `.range`, and `dcc(..., exfr(-6:9, -10:2))` would have done the same thing.
### Modifier Chaining
The last component for building design matrix is introduced by the
ability to chain modifiers. This is done using the `+` operator:
```{r, message=FALSE, warning=FALSE}
dc5 <- dcc(muc_spruce, muc_clim, .sum(-4:-9, "prec") +
.sum(4:9, "prec") + .range(exfr(-6:9, -10:3), "temp"))
dc5
```
Here, we are using two precipitation sums (corresponding to the
previous and current growing seasons), and a range of month-wise
temperatures excluding a dormant season. We can chain as many
modifiers as we want. Using modifiers, exclusion and modifier
chaining, we can create as complex (or simple) design matrices as
needed.
### Redundancy Check
While **treeclim** will check carefully that no two predictors are
exactly the same, it can by no means check if the specification of the
design matrix makes any sense at all.
## Scope
By default, **treeclim** will use the maximum timespan available for both tree-ring and climate data. This can be changed using the `timespan` argument: `dcc(..., timespan = c(1965, 2000))` will run the analysis using the timespan from 1965 to 2000. When the number of years for the analysis becomes smaller than the number of predictors in the design matrix, **treeclim** will raise an error.
## Method
With `dcc`, one may choose between response function analysis (the default) and correlation function analysis using the `method` argument: `dcc(..., method = "correlation")` will switch to the correlation method. A third method is available: seasonal correlations. The rationale behind this approach developed by Dave Meko (Meko et al. 2011) is somewhat different, which is why it deserves its own function [`seascorr`](#seasonal-correlations).
## Dynamics
The temporal stability of climate-growth-relations can be analysed using moving response and correlation functions. By default, `dcc` computes static analyses, but by setting the `moving`argument to `TRUE`, the analyses will be repeated in shifting time windows.
```{r, warning=FALSE, cache=TRUE, fig.height=5, fig.width=10}
dc6 <- dcc(chrono = muc_spruce, climate = muc_clim,
selection = 4:10, moving = TRUE, sb = FALSE)
dc6
plot(dc6)
```
The stars in the plot indicate windows with significant correlations for the given variable. Like in the non-dynamic case, you are free to create a customized design matrix using the **treeclim** [modifiers](#modifiers). Further options comprise the window size, the window offset (the gap between consecutive window starts), and whether to start with windowing from the most recent end (default), or from the least recent end.
If `method` is set to `correlation` for computing a moving correlation function, we can use `g_test` to test the temporal fluctuations of the correlation coefficients for significance by comparing them to the magnitude of fluctuations that would occur by chance alone (using a multivariate extension of the approach by Gershunov et al. 2001):
```{r, warning=FALSE, message=FALSE, cache=TRUE, fig.height=5, fig.width=12}
dc7 <- dcc(chrono = muc_spruce, climate = muc_clim,
selection = .mean(3:5) + .mean(6:8) + .mean(9:10), method = "cor",
moving = TRUE, win_size = 30, sb = FALSE)
plot(dc7)
g_test(dc7, sb = FALSE)
```
In this example, only the fluctuations in the correlation coefficient for mean temperature from current June to current August are significant at a level of p < 0.05. This means, that the probability of an even larger fluctuation by chance is less than 0.05. For more details, look into `?g_test`.
# Seasonal Correlations
**treeclim** provides similar functionality to Dave Meko's *seascorr* MATLAB program: partial correlations for a primary and a secondary climatic variable for seasons of different lengths. The tree-ring and climate input data is required in the [same format](#data) as for `dcc`.
```{r, message=FALSE, warning=FALSE}
sc1 <- seascorr(muc_spruce, muc_clim)
sc1
```
This is the default configuration for `seascorr`: considering three different season lengths (1, 3, and 6 months), and current September as month where tree-ring growth is assumed to have finished. There is a plot method available as well:
```{r}
plot(sc1)
```
By default, the primary variable is the first one found in the climate data supplied to `seascorr`. So with `r names(muc_clim)` in `muc_clim`, precipitation is chosen as primary variable. You can change this by either supplying names or positions, so to swap primary and secondary variables from the last example we can issue:
```{r, message=FALSE}
sc2 <- seascorr(muc_spruce, muc_clim, primary = "prec", secondary = "temp")
plot(sc2)
```
# Reconstruction Skills
**treeclim** offers some basic functionality for evaluating the reconstruction skills of tree-ring chronologies via the `skills` function. `skills` works on existing output from either `dcc` or `seascorr` and tests the prediction skills of the tree-ring data for a given climate target. For this example, we use a different built-in data set, a chronology of pines from Visdalen in Norway (consult `?norw015` for details) and associated climate data:
```{r, message=FALSE}
dc8 <- dcc(norw015, list(norway_prec, norway_temp), 3:9, var_names = c("prec", "temp"))
plot(dc8)
```
Let us assume we are interested in reconstructing summer temperatures, and want to make use of the significant correlation with July temperature. We call `skills` on the `dc8` object, and specify `.range(7, "temp")` as target to get July temperature as response variable (note that for `skills`, like in many dendroclimatic applications, climate is modelled subject to tree-ring data):
```{r}
sk8 <- skills(dc8, .range(7, "temp"))
sk8
```
By default, `skills` will take the maximum available timespan and divide it into equally sized calibration and verification periods. Reported are the parameter estimates and goodness of fit for the calibration and full models, plus some established verification statistics. Here, both RE and CE are positive, so the reconstruction has some skills at least. We can also look at the reconstruction:
```{r}
plot(sk8)
```
The specification of the calibration period in `skills` is quite flexible, you might want to look in to `?skills` for more details. Furthermore, you can chose between an OLS regression model (default) or an RMA regression.
# Plotting
**treeclim** uses **ggplot2** (Wickham 2009) to build its plots. **ggplot2**-plots are themeable and easily be customized. This means, it only takes a few lines of code to adapt your **treeclim** output to your aesthetical preferences.
```{r, fig.width=15, fig.height=8}
plot(dc1)
```
This is the default. If we feel the default style does not blend in nicely with our gothic inspired interior design, we can always change some things (we need to include **ggplot2** for this):
```{r, fig.width=15, fig.height=8}
library(ggplot2)
plot(dc1) + theme_dark() + scale_color_manual(values = c("#FFFFFF", "#F62525")) +
ylab("Response Coefficients") + ggtitle("Happy Worlddendro 2016!")
```
You may want to consult the **ggplot2** documentation for further options (there are many).
# Conclusion & Acknowledgements
**treeclim** tries to provide a flexible framework for the exploration of tree-climate-relationships. **treeclim** does not impose complexity on the user, and keeps simple things simple. When needed, however, functionality is available to adapt to various research questions. Most aspects of what **treeclim** can do has been touched in this vignette. There are few more options that are potentially interesting, such as the ability to use different bootstrapping schemes with `dcc`.
Areas where **treeclim** is currently lacking include better support of southern hemisphere data by providing access to climate data from the year before the previous one. Also, testing reconstructing skills is only basic.
Some of the features currently implemented derive from requests by the user community (`seascorr` being the most prominent example), and I am grateful for these and further suggestions.
# Citations
Biondi, F., and K. Waikul. 2004. DENDROCLIM2002: A C++ program for statistical calibration of climate signals in tree-ring chronologies. Computers in Geosciences 30:303–311.
Gershunov, A., N. Schneider, and T. Barnett. 2001. Low-frequency modulation of the ENSO-Indian Monsoon rainfall relationship: Signal or noise? Journal of Climate 14:2486–2492.
Guiot, J. 1991. The bootstrapped response function. Tree Ring Bulletin 51:39–41.
Meko, D. M., R. Touchan, and K. J. Anchukaitis. 2011. Seascorr: A MATLAB program for identifying the seasonal climate signal in an annual tree-ring time series. Computers & Geosciences 37:1234–1241.
Politis, D. N., and J. P. Romano. 1994. The stationary bootstrap. Journal of the American Statistical Association 89:1303–1313.
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.