Home | Vita | Course | Toolbox | Contact |
Laboratory of Tree-Ring Research, , Room 417, Bryant Bannister Tree-Ring Building (Bldg #45B)
Email: dmeko@arizona.edu
Office hours Wednesday, 1:00-3:00 PM (please email to schedule zoom meeting)
The course is 3 credits for University of Arizona students, and 1-3 credits for others.
Any time series with a constant time increment (e.g., year, month, day) is a candidate for use in the course. Examples are annual precipitation, monthly mean temperature, and daily cases of COVID-19.
Matlab version. I update scripts and functions now and then using the current site-license release of Matlab. For 2021, I am still using MATLAB Version 9.5.0.944444 (R2018b). Beware that cripts and functions used in the course may not run on earlier versions of Matlab.
Install the whole Matlab package (includes all toolboxes) when installing from the U of A site license. Not all of the toolboxes are needed, but this is the easiest installation. If you are not using the site license, keep in mind that my scripts and functions make extensive use of four toolboxes: Statistics, Signal Processing, System Identification, and Curve Fitting.
A small number of students not at the University of Arizona can also be accommodated through the "iCourse" path descrbed above. Back to Top of Page
The schedule typically allows about two weeks for gathering data and becoming familiar with Matlab. After that, one week (two class periods) is devoted to each of the 12 lessons or topics. Class meets on Tuesday and Thursday. A new topic is introduced on Tuesday, and is continued on the following Thursday. Thursday's class ends with an assignment and a demonstration of running the assignment Matlab script on my sample data. The assignment is due (must be uploaded by you to D2L) before class the following Tuesday.
Any online students not at the University of Arizona are expected to follow the same schedule of submitting assignments as regular students. In the "live online" mode, students have access to recorded zoom lectures. All students have access to D2L for submitting assignments.
Once we are into into the assignments on data analysis (after first couple of weeks), the class routine is as follows:
Tuesday
In the lightning talk, the student puts one or more figures from the submitted assignment up on the screen, and describes the time series analyzed and at least one finding from the analysis. Goals of this activity, new in spring 2019, are to 1) expose students to a variety of time series, 2) provide experience in communication of time series methods, and 3) give practical experience in the lightning-talk of briefly and concisely describing research.
Thursday
The breakout-room discussion is new for Spring Semester 2021. Students are assigned to breakout rooms and to discuss a specific time series question related to the current topic. After 10 minutes, students return, and one student representing a breakout group reports on their discussion.
You submit assignments by uploading them to D2L before the Tuesday class when the next topic is introduced. Students self-grade their assignments at the beginning of class on Tuesday. I browse the self-graded assignments the next day, assess the writing in the assignment, and may or may not change the student's self-assessed grade. To find out how to access assignments, click assignment files.
The instructor looks over the self-graded assignments the next day, and may subtract up to an additional point for shortcomings in the writing quality (e.g., too long, incomprehensible, many spelling or grammatical errors). Assignments, given in class on Thursday, are due (must be uploaded to D2L by you) before the start of class the following Tuesday. The first half hour of Tuesday's meeting period will be dedicated to presentation of a grading rubric, self-assessment of completed assignments, and uploading of self-graded assignments to D2L. This schedule gives you 4 days to complete and upload the assignment to D2L before 9:00 am Tuesday. D2L keeps track of the time the assignment was uploaded, and no penalty is assessed as long as it is uploaded before 9:00 AM on Tuesday of the due date.
A late penalty of 3 points is assessed if the assignment is not submitted to D2L by 9 AM Tuesday. A late penalty of 1 point is assessed if the graded assignment is not uploaded to D2L by 5 AM Wednesday, which is when I begin looking over your self-graded assignments. If you have some scheduled need to be away from class (e.g., attendance at a conference), you are responsible for uploading your assignment before 9:00 AM the Tuesday it is due, and for uploading the self-graded version by 10:15 AM the same day. In other words, the schedule is the same as for the students who are in class. If an emergency comes up (e.g., you catch COVID) and cannot do the assignment or assessment on schedule, please send me an email and we will reach some accommodation. Otherwise, the late penalties described above will apply.
A time series is broadly defined as any series of measurements taken at different times. Some basic descriptive categories of time series are 1) long vs short, 2) even time-step vs uneven time-step, 3) discrete vs continuous, 4) periodic vs aperiodic, 5) stationary vs nonstationary, and 6) univariate vs multivariate. These properties as well as the temporal overlap of multiple series, must be considered in selecting a dataset for analysis in this course. You will analyze your own time series in the course. The first steps are to select those series and to store them in structures in a mat file. Uniformity in storage at the outset is convenient for this class so that attention can then be focused on understanding time series methods rather debugging computer code to ready the data for analysis. A structure is a Matlab variable similar to a database in that the contents are accessed by textual field designators. A structure can store data of different forms. For example, one field might be a numeric time series matrix, another might be text describing the source of data, etc. In the first assignment you will run a Matlab script that reads your time series and metadata from ascii text files you prepare beforehand and stores the data in Matlab structures in a single mat file. In subsequent assignments you will apply time series methods to the data by running Matlab scripts and functions that load the mat file and operate on those structures.
Assignments
Select sample data to be use for assignments during the course
Read: (1) Notes_1.pdf, (2) "Getting Started", accessible from the MATLAB help menu
Answer: Run script geosa1.m and answer questions listed in the file in a1.pdf
What to Know
The probability distribution of a time series describes the probability that an observation falls into a specified range of values. An empirical probability distribution for a time series can be arrived at by sorting and ranking the values of the series. Quantiles and percentiles are useful statistics that can be taken directly from the empirical probability distribution. Many parametric statistical tests assume the time series is a sample from a population with a particular population probability distribution. Often the population is assumed to be normal. This chapter presents some basic definitions, statistics and plots related to the probability distribution. In addition, a test (Lilliefors test) is introduced for testing whether a sample comes from a normal distribution with unspecified mean and variance.
Assignments
Read: Notes_2.pdf
Answer: Run script geosa2.m and answer questions listed in the file in a2.pdf
What to Know
Autocorrelation refers to the correlation of a time series with its own past and future values. Autocorrelation is also sometimes called lagged correlation or serial correlation, which refers to the correlation between members of a series of numbers arranged in time. Positive autocorrelation might be considered a specific form of persistence, a tendency for a system to remain in the same state from one observation to the next. The likelihood of tomorrow being rainy is greater if today is rainy than if today is dry. Geophysical time series are frequently autocorrelated because of inertia or carryover in the physical system. The slowly evolving low pressure systems in the atmosphere might impart persistence to daily rainfall. The slow drainage of groundwater reserves might impart correlation to successive annual flows of a river. Stored photosynthates might impart correlation to successive annual values of tree-ring indices. Autocorrelation complicates the application of statistical tests by reducing the number of independent observations. Autocorrelation can also complicate the identification of significant covariance or correlation between time series (e.g., precipitation with a tree-ring series). Autocorrelation can be exploited for predictions: an autocorrelated time series is predictable, probabilistically, because future values depend on current and past values. Three tools for assessing the autocorrelation of a time series are (1) the time series plot, (2) the lagged scatterplot, and (3) the autocorrelation function.
Assignments
Read: Notes_3.pdf
Answer: Run script geosa3.m and answer questions listed in the file in a3.pdf
What to Know
The spectrum of a time series summarizes the partitioning of variance of the series to rapid and gradual fluctuations. Rapid fluctuations are those with short wavelength, or high frequency. Gradual fluctuations are those with long-wavelength, or low-frequency The spectrum by definition describes the variance of the series as a function of frequency or wavelength. The object of spectral analysis is to estimate and study the spectrum. The spectrum contains no new information beyond that in the autocovariance function (acvf), and in fact the spectrum can be computed mathematically by transformation of the acvf. But the spectrum and acvf present the information on the variance of the time series from complementary viewpoints. The acf summarizes information in the time domain and the spectrum in the frequency domain.
Assignments
Read: Notes_4.pdf
Answer: Run script geosa4.m and answer questions listed in the file in a4.pdf
What to Know
Autoregressive-moving-average (ARMA) models are mathematical models of the persistence, or autocorrelation, in a time series. ARMA models are widely used in hydrology, dendrochronology, econometrics, and other fields. There are several possible reasons for fitting ARMA models to data. Modeling can contribute to understanding the physical system by revealing something about the physical process that builds persistence into the series. For example, a simple physical water-balance model consisting of terms for precipitation input, evaporation, infiltration, and groundwater storage can be shown to yield a streamflow series that follows a particular form of ARMA model. ARMA models can also be used to predict behavior of a time series from past values alone. Such a prediction can be used as a baseline to evaluate possible importance of other variables to the system. ARMA models are widely used for prediction of economic and industrial time series. ARMA models can also be used to remove persistence. In dendrochronology, for example, ARMA modeling is applied routinely to generate residual chronologies – time series of ring-width index with no dependence on past values. This operation, called prewhitening, is meant to remove biologically-related persistence from the series so that the residual may be more suitable for studying the influence of climate and other outside environmental factors on tree growth.
Assignments
Read: Notes_5.pdf
Answer: Run script geosa5.m and answer questions listed in the file in a5.pdf
What to Know
Trend in a time series is a slow, gradual change in some property of the series over the whole interval under investigation. Trend is sometimes loosely defined as a long term change in the mean, but can also refer to change in other statistical properties. For example, tree-ring series of measured ring width frequently have a trend in variance as well as mean. Years ago a time series was typically decomposed into trend, seasonal or periodic components, and irregular fluctuations, and the various parts were studied separately. Modern analysis techniques frequently treat the series without such routine decomposition, but separate consideration of trend is still often required. One of the most frequent question asked about a time series is whether there is significant trend in mean.
Assignments
Read: Notes_6.pdf
Answer: Run script geosa6.m and answer questions listed in the file in a6.pdf
What to Know
Detrending is the statistical or mathematical operation of removing trend from the series. Detrending is often applied to remove a feature thought to distort or obscure the relationships of interest. In climatology, for example, a temperature trend due to urban warming might obscure a relationship between cloudiness and air temperature. Detrending is also sometimes used as a preprocessing step to prepare time series for analysis by methods that assume stationarity. Many alternative methods are available for detrending. Simple linear trend in mean can be removed by subtracting a least-squares-fit straight line. More complicated trends might require different procedures. For example, the cubic smoothing spline is commonly used in dendrochronology to fit and remove ring-width trend that might not be linear, or not even monotonically increasing or decreasing over time. In studying and removing trend, it is important to understand the effect of detrending on the spectral properties of the time series. This effect can be summarized by the frequency response of the detrending function.
Assignments
Read: Notes_7.pdf
Answer: Run script geosa7.m and answer questions listed in the file in a7.pdf
What to Know
The estimated spectrum of a time series gives the distribution of variance as a function of frequency. Depending on the purpose of analysis, some frequencies may be of greater interest than others, and it may be helpful to reduce the amplitude of variations at other frequencies by statistically filtering them out before viewing and analyzing the series. For example, the high-frequency (year-to-year) variations in a gauged discharge record of a watershed may be relatively unimportant to water supply in a basin with large reservoirs that can store several years of mean annual runoff. Where low-frequency variations are of main interest, it is desirable to smooth the discharge record to eliminate or reduce the short-period fluctuations before using the discharge record to study the importance of climatic variations to water supply. Smoothing is a form of filtering which produces a time series in which the importance of the spectral components at high frequencies is reduced. Electrical engineers call this type of filter a low-pass filter, because the low-frequency variations are allowed to pass through the filter. In a low-pass filter, the low frequency (long-period) waves are barely affected by the smoothing. It is also possible to filter a series such that the low-frequency variations are reduced and the high-frequency variations unaffected. This type of filter is called a high-pass filter. Detrending is a form of high-pass filtering: the fitted trend line tracks the lowest frequencies, and the residuals from the trend line have had those low frequencies removed. A third type of filtering, called band-pass filtering, reduces or filters out both high and low frequencies, and leaves some intermediate frequency band relatively unaffected. In this lesson, we cover several methods of smoothing, or low-pass filtering. We have already discussed how the cubic smoothing spline might be useful for this purpose. Four other types of filters are discussed here: 1) simple moving average, 2) binomial, 3) Gaussian, and 4) windowing (Hamming method). Considerations in choosing a type of low-pass filter are the desired frequency response and the span, or width, of the filter.
Assignments
Read: Notes_8.pdf
Answer: Run script geosa8.m and answer questions listed in the file in a8.pdf
What to Know
The Pearson product-moment correlation coefficient is probably the single most widely used statistic for summarizing the relationship between two variables. Statistical significance and caveats of interpretation of the correlation coefficient as applied to time series are topics of this lesson. Under certain assumptions, the statistical significance of a correlation coefficient depends on just the sample size, defined as the number of independent observations. If time series are autocorrelated, an effective sample size, lower than the actual sample size, should be used when evaluating significance. Transient or spurious relationships can yield significant correlation for some periods and not for others. The time variation of strength of linear correlation can be examined with plots of correlation computed for a sliding window. But if many correlation coefficients are evaluated simultaneously, confidence intervals should be adjusted (Bonferroni adjustment) to compensate for the increased likelihood of observing some high correlations where no relationship exists. Interpretation of sliding correlations can be also be complicated by time variations of mean and variance of the series, as the sliding correlation reflects covariation in terms of standardized departures from means in the time window of interest, which may differ from the long-term means. Finally, it should be emphasized that the Pearson correlation coefficient measures strength of linear relationship. Scatterplots are useful for checking whether the relationship is linear.
Assignments
Read: Notes_9.pdf
Answer: Run script geosa9.m and answer questions listed in the file in a9.pdf
What to Know
Lagged relationships are characteristic of many natural physical systems. Lagged correlation refers to the correlation between two time series shifted in time relative to one another. Lagged correlation is important in studying the relationship between time series for two reasons. First, one series may have a delayed response to the other series, or perhaps a delayed response to a common stimulus that affects both series. Second, the response of one series to the other series or an outside stimulus may be smeared in time, such that a stimulus restricted to one observation elicits a response at multiple observations. For example, because of storage in reservoirs, glaciers, etc., the volume discharge of a river in one year may depend on precipitation in the several preceding years. Or because of changes in crown density and photosynthate storage, the width of a tree-ring in one year may depend on climate of several preceding years. The simple correlation coefficient between the two series properly aligned in time is inadequate to characterize the relationship in such situations. Useful functions we will examine as alternative to the simple correlation coefficient are the cross-correlation function and the impulse response function. The cross-correlation function is the correlation between the series shifted against one another as a function of number of observations of the offset. If the individual series are autocorrelated, the estimated cross-correlation function may be distorted and misleading as a measure of the lagged relationship. We will look at two approaches to clarifying the pattern of cross-correlations. One is to individually remove the persistence from, or prewhiten, the series before cross-correlation estimation. In this approach, the two series are essentially regarded on equal footing. An alternative is the systems approach: view the series as a dynamic linear system -- one series the input and the other the output -- and estimate the impulse response function. The impulse response function is the response of the output at current and future times to a hypothetical pulse of input restricted to the current time.
Assignments
Read: Notes_10.pdf
Answer: Run script geosa10.m and answer questions listed in the file in a10.pdf
What to Know
Multiple linear regression (MLR) is a method used to model the linear relationship between a dependent variable and one or more independent variables. The dependent variable is sometimes also called the predictand, and the independent variables the predictors. MLR is based on least squares: the model is fit such that the sum-of-squares of differences of observed and predicted values is minimized. MLR is probably the most widely used method in dendroclimatology for developing models to reconstruct climate variables from tree-ring series. Typically, a climatic variable is defined as the predictand and tree-ring variables from one or more sites are defined as predictors. The model is fit to a period -- the calibration period -- for which climatic and tree-ring data overlap. In the process of fitting, or estimating, the model, statistics are computed that summarize the accuracy of the regression model for the calibration period. The performance of the model on data not used to fit the model is usually checked in some way by a process called validation. Finally, tree-ring data from before the calibration period are substituted into the prediction equation to get a reconstruction of the predictand. The reconstruction is a "prediction" in the sense that the regression model is applied to generate estimates of the predictand variable outside the period used to fit the data. The uncertainty in the reconstruction is summarized by confidence intervals, which can be computed by various alternative ways.
Assignments
Read: Notes_11.pdf
Answer: Run script geosa11.m (Part 1) and answer questions listed in the file in a11.pdf
What to Know
Regression R-squared, even if adjusted for loss of degrees of freedom due to the number of predictors in the model, can give a misleading, overly optimistic view of accuracy of prediction when the model is applied outside the calibration period. Application outside the calibration period is the rule rather than the exception in dendroclimatology. The calibration-period statistics are typically biased because the model is "tuned" for maximum agreement in the calibration period. Sometimes too large a pool of potential predictors is used in automated procedures to select final predictors. Another possible problem is that the calibration period itself may be anomalous in terms of the relationships between the variables: modeled relationships may hold up for some periods of time but not for others. It is advisable therefore to "validate" the regression model by testing the model on data not used to fit the model. Several approaches to validation are available. Among these are cross-validation and split-sample validation. In cross-validation, a series of regression models is fit, each time deleting a different observation from the calibration set and using the model to predict the predictand for the deleted observation. The merged series of predictions for deleted observations is then checked for accuracy against the observed data. In split-sample calibration, the model is fit to some portion of the data (say, the second half), and accuracy is measured on the predictions for the other half of the data. The calibration and validation periods are then exchanged and the process repeated. In any regression problem it is also important to keep in mind that modeled relationships may not be valid for periods when the predictors are outside their ranges for the calibration period: the multivariate distribution of the predictors for some observations outside the calibration period may have no analog in the calibration period. The distinction of predictions as extrapolations versus interpolations is useful in flagging such occurrences.
Assignments
Read: Notes_12.pdf
Answer: Run script geosa11.m (Part 2) and answer questions listed in the file in a12.pdf
What to Know
Powerpoint lecture outlines & miscellaneous files. Downloadable file other_Stale.zip has miscellaneous files used in lectures from the previous offering of the course. Included are Matlab demo scripts, sample data files, user-written functions used by demo scripts, and powerpoint presentations, as pdfs (lect1a.pdf, lect1b.pdf, etc.) used in on-campus lectures. Students taking the course this semester should not use other_Stale.zip, but instead get the file "other.zip" from D2L contents. I update other.zip over the semester, and add the presentation for the current lecture within a couple of days after that lecture is given. File other.zip for this semester does not exist till after the first lecture, and then is augmented after each lecture. At the end of the semester I revise the online-available other_Stale.zip.
Home | Vita | Course | Toolbox | Contact |
Home | Vita | Course | Toolbox | Contact |