Inferring causality from observational data is probably the most persistent problem in data analysis. Frankly speaking, our old ways of turning data analysis into causal explanations was a bit naive. We know that correlation is not causation, but we effectively ignored that distinction. At best, we admitted that we were uncertain and compared our results to the intuition to build confidence in our conclusions. In this project proposal we focus on causal relationships among time series. The Armed Conflict Location & Event Data Project (ACLED) [1] provides information about different conflict types around the world. Protests, social riots, and further conflict (sub-)types are collected and processed in a timely manner. Individual event data include a variety of attributes, about the event timing, about actors involved and also about spatial, i.e. geographical, specifications. These observational data, in the form of time series, allow for instance to scrutinize causal relationships between different types of conflicts. For example: Are social riots causally linked to preceding protests? The spreading patterns of conflicts are also interesting and offer questions like: Can causal relationships be found in protest events of different nations? Commonly, the Granger causality approach is used to infer “causality” between time series. However, this method is based on correlations at its core. The approach in this project is based on a sophisticated method called Convergent Cross Mapping (CCM) which proves true causality [2]. The theoretical concepts of CCM are based on the theory of dynamical systems [3], and the understanding of the theoretical foundations is somewhat demanding with respect to the required mathematical skills. However, the published CCM approaches, applied on a plenty of ecological and climate data (e.g. [4], [5], [6]) promise exciting results also for the geopolitical time series from ACLED. It should also be noted that causal explanations have a predictive characteristic since the causes always precede the effects.

Goal

The goal of this project is to build an application framework that enables the causal inference between two or more time series. The framework should allow the import of arbitrary time series data, pre-processing facilities and a clear visualisation of the causal relationships resulting from the Convergent Cross Mapping (CCM) approach.

Requirements

  • Interest in causality models
  • Mathematical skills
  • Interest in time series analysis
  • Programming skills

Work Packages

We propose three work packages for this project:

  1. At the beginning, a general familiarization with topics of time series analysis and subsequently with the specific methods of CCM is expected. Literature review forms the main activity here.
  2. An introduction to the ACLED data set can be provided in a straightforward manner. However, the use of ACLED data is not mandatory. In principle, any time series can be used that offer interesting questions about their causal relationships. The explorative data approach is appropriate here, to find interesting use cases and define hypothesis about causal relationships.
  3. According to the goal of this project, the development of an application framework is required. The framework should enable appropriate preprocessing of the data and implement the CCM method. For R a simple framework based on available R-libraries already exists. However, other technologies and programming languages (e.g. Python, Julia, …) are welcome.

Interesting causal hypotheses and use cases should be considered for a publication.

References

[1] https://acleddata.com/

[2] Paluš M., A. Krakovská, J. Jakubík and M. Chvosteková (2018) Causality, dynamical systems and the arrow of time. Chaos 28, 075307, doi: 10.1063/1.5019944.

[3] Takens, F. (1981) Detecting strange attractors in turbulence. Dynamical Systems and Turbulence, Lecture Notes in Mathematics 898:366–381.

[4] Sugihara, G., R. May, H. Ye, C.-H. Hsieh, E. Deyle, M. Fogarty, and S. Munch (2012) Detecting causality in complex ecosystems. Science 338:496–500.

[5] Ye, H., and G. Sugihara (2016) Information leverage in interconnected ecosystems: Overcoming the curse of dimensionality. Science 353:922–925.

[6] Runge, J. et al. (2019) Inferring causation from time series in Earth system sciences. Nature Communications 10:2553, doi:10.1038/s41467-019-10105-3.