| Multivariate anomaly detection for Earth observations: a comparison of algorithms and feature extraction techniques Milan Flach and Fabian Gans and Alexander Brenning and Joachim Denzler and Markus Reichstein and Erik Rodner and Sebastian Bathiany and Paul Bodesheim and Yanira Guanche and Sebasitan Sippel and Miguel D. Mahecha. Earth System Dynamics.8 (3):pages 677-696.2017. [bibtex] [pdf] [web] [doi:10.5194/esd-8-677-2017] [abstract] Abstract: Today, many processes at the Earth's surface are constantly monitored by multiple data streams. These observations have become central to advancing our understanding of vegetation dynamics in response to climate or land use change. Another set of important applications is monitoring effects of extreme climatic events, other disturbances such as fires, or abrupt land transitions. One important methodological question is how to reliably detect anomalies in an automated and generic way within multivariate data streams, which typically vary seasonally and are interconnected across variables. Although many algorithms have been proposed for detecting anomalies in multivariate data, only a few have been investigated in the context of Earth system science applications. In this study, we systematically combine and compare feature extraction and anomaly detection algorithms for detecting anomalous events. Our aim is to identify suitable workflows for automatically detecting anomalous patterns in multivariate Earth system data streams. We rely on artificial data that mimic typical properties and anomalies in multivariate spatiotemporal Earth observations like sudden changes in basic characteristics of time series such as the sample mean, the variance, changes in the cycle amplitude, and trends. This artificial experiment is needed as there is no "gold standard" for the identification of anomalies in real Earth observations. Our results show that a well-chosen feature extraction step (e.g., subtracting seasonal cycles, or dimensionality reduction) is more important than the choice of a particular anomaly detection algorithm. Nevertheless, we identify three detection algorithms (k-nearest neighbors mean distance, kernel density estimation, a recurrence approach) and their combinations (ensembles) that outperform other multivariate approaches as well as univariate extreme-event detection methods. Our results therefore provide an effective workflow to automatically detect anomalies in Earth system science data. |
| Hot spots of multivariate extreme anomalies in Earth observations Milan Flach and Sebastian Sippel and Paul Bodesheim and Alexander Brenning and Joachim Denzler and Fabian Gans and Yanira Guanche and Markus Reichstein and Erik Rodner and Miguel D. Mahecha. American Geophysical Union Fall Meeting (AGU): Abstract + Oral Presentation.2016. [bibtex] [web] [abstract] Abstract: Anomalies in Earth observations might indicate data quality issues, extremes or the change of underlying processes within a highly multivariate system. Thus, considering the multivariate constellation of variables for extreme detection yields crucial additional information over conventional univariate approaches. We highlight areas in which multivariate extreme anomalies are more likely to occur, i.e. hot spots of extremes in global atmospheric Earth observations that impact the Biosphere. In addition, we present the year of the most unusual multivariate extreme between 2001 and 2013 and show that these coincide with well known high impact extremes. Technically speaking, we account for multivariate extremes by using three sophisticated algorithms adapted from computer science applications. Namely an ensemble of the k-nearest neighbours mean distance, a kernel density estimation and an approach based on recurrences is used. However, the impact of atmosphere extremes on the Biosphere might largely depend on what is considered to be normal, i.e. the shape of the mean seasonal cycle and its inter-annual variability. We identify regions with similar mean seasonality by means of dimensionality reduction in order to estimate in each region both the `normal' variance and robust thresholds for detecting the extremes. In addition, we account for challenges like heteroscedasticity in Northern latitudes. Apart from hot spot areas, those anomalies in the atmosphere time series are of particular interest, which can only be detected by a multivariate approach but not by a simple univariate approach. Such an anomalous constellation of atmosphere variables is of interest if it impacts the Biosphere. The multivariate constellation of such an anomalous part of a time series is shown in one case study indicating that multivariate anomaly detection can provide novel insights into Earth observations. |
| Using Statistical Process Control for detecting anomalies in multivariate spatiotemporal Earth Observations Milan Flach and Miguel Mahecha and Fabian Gans and Erik Rodner and Paul Bodesheim and Yanira Guanche-Garcia and Alexander Brenning and Joachim Denzler and Markus Reichstein. European Geosciences Union General Assembly (EGU): Abstract + Oral Presentation.2016. [bibtex] [pdf] [web] [abstract] Abstract: The number of available Earth observations (EOs) is currently substantially increasing. Detecting anomalous pat-terns in these multivariate time series is an important step in identifying changes in the underlying dynamicalsystem. Likewise, data quality issues might result in anomalous multivariate data constellations and have to beidentified before corrupting subsequent analyses. In industrial application a common strategy is to monitor pro-duction chains with several sensors coupled to some statistical process control (SPC) algorithm. The basic ideais to raise an alarm when these sensor data depict some anomalous pattern according to the SPC, i.e. the produc-tion chain is considered ’out of control’. In fact, the industrial applications are conceptually similar to the on-linemonitoring of EOs. However, algorithms used in the context of SPC or process monitoring are rarely consideredfor supervising multivariate spatio-temporal Earth observations. The objective of this study is to exploit the poten-tial and transferability of SPC concepts to Earth system applications. We compare a range of different algorithmstypically applied by SPC systems and evaluate their capability to detect e.g. known extreme events in land sur-face processes. Specifically two main issues are addressed: (1) identifying the most suitable combination of datapre-processing and detection algorithm for a specific type of event and (2) analyzing the limits of the individual ap-proaches with respect to the magnitude, spatio-temporal size of the event as well as the data’s signal to noise ratio.Extensive artificial data sets that represent the typical properties of Earth observations are used in this study. Ourresults show that the majority of the algorithms used can be considered for the detection of multivariate spatiotem-poral events and directly transferred to real Earth observation data as currently assembled in different projectsat the European scale, e.g. http://baci-h2020.eu/index.php/ and http://earthsystemdatacube.net/. Known anomaliessuch as the Russian heatwave are detected as well as anomalies which are not detectable with univariate methods. |
| Detecting Multivariate Biosphere Extremes Yanira Guanche Garcia and Erik Rodner and Milan Flach and Sebastian Sippel and Miguel Mahecha and Joachim Denzler. International Workshop on Climate Informatics (CI).Pages 9-12.2016. [bibtex] [web] [doi:10.5065/D6K072N6] [abstract] Abstract: The detection of anomalies in multivariate time series is crucial to identify changes in the ecosystems. We propose an intuitive methodology to assess the occurrence of tail events of multiple biosphere variables. |
| Maximally Divergent Intervals for Anomaly Detection Erik Rodner and Björn Barz and Yanira Guanche and Milan Flach and Miguel Mahecha and Paul Bodesheim and Markus Reichstein and Joachim Denzler. Workshop on Anomaly Detection (ICML-WS).2016. Best Paper Award [bibtex] [pdf] [web] [code] [abstract] Abstract: We present new methods for batch anomaly detection in multivariate time series. Our methods are based on maximizing the Kullback-Leibler divergence between the data distribution within and outside an interval of the time series. An empirical analysis shows the benefits of our algorithms compared to methods that treat each time step independently from each other without optimizing with respect to all possible intervals. |