The task of geo- and chronolocation is to determine where and when a particular image or video was taken. The proposed project is part of a larger program, named Battle Damage Assessment (BDA), in which damages, caused by different war events, will be assessed using photographic material from a wide variety of sources, including satellite imagery. In this project, we will focus on image material from human operated cameras, typically images that are posted on open source social media or news platforms. If the photographic material is supplied with metadata about space and time, the task of geo- and chronolocation may be limited to the verification of the metadata labels. However, usually online media, delete the spatial and temporal metadata of pictures before publishing them. Deeper investigations are then necessary, based on visible cues on the picture itself. The aim of this project is to elaborate a generic workflow and a framework for solving geo- and chronolocation problems. The problems may become tricky and develop into a real detective work. Smallest clues on a picture may reveal a lot, where and when the picture was taken. However, it is precisely this meticulous search for clues that makes geo- and chronolocation so exciting. The flower of a plant may already strongly narrow down the region and the season of the recording, provided that you are able to identify the plant and know its life history. Cloud formations may reveal the time of day if you know the location. A poster with typeface may reveal the language region or even the place. A remarkable building or a characteristic silhouette of a mountain range may serve as template for image searches. Be it geography, architecture, botany, meteorology or linguistic knowledge, the list of clues and their corresponding subject areas is immense. Interdisciplinary interest and knowledge are essential in geo- and chronolocation tasks.

Goal

The long-term goal of this project is to develop an application framework for geo- and chronolocation that offers a user interface for handling and processing all types of images and clues in an ergonomic way. Existing tools for geo- and chronolocation should be integrated into the framework. Linking appropriate knowledge bases and data sources to act on the clues is also a central part of the development. The initially manual processing steps should successively be enriched with useful references and links and automated using Machine Learning (ML) approaches.

Work packages

The following list of work packages is extensive. Work packages 1 to 3 provide a basis for geo- and chronolocation tasks. In that sense they are mandatory. Work packages 4 and 5 build on this basis and show what the project aims to achieve. The idea is that different parts or aspects from work packages 4 and 5 can be selected, suitable for a defined processing time.

  1. First, review the existing approaches to geo- and chronolocation. What is state of the art? In addition to the literature research, a plenty of online information is available, since geo- and chronolocation is now offered in the form of contests and riddles (e.g. [1], [2]). Collect all methods and tools you encounter and reference them. Good starting points for the literature review are for example [3], [4], [5] and [6].
  2. Build a comprehensive list of visual clues that hold (implicitly) information about spatial and/or temporal dimensions. Each clue may be linked to data sources which provides reference data or to knowledge bases that allow building temporal and/or spatial associations.
  3. Design a generic workflow to manually process an arbitrary number of clues in the geo- and chronolocation process. Reference [5] offers a nice entry point for the chronolocation workflow.
  4. Try to build an application framework that implements the generic workflow built in work package 3. The question of automating the initially manual workflow steps will arise. Here, Machine Learning (ML) approaches can be very helpful. For instance, we may ask: are there clues on a image that can be detected and identified automatically using ML approaches?
  5. A special interest is given to the new CLIP-approach [7]. In essence, CLIP aims to minimize the difference between the encodings of an image and a corresponding text description, using zero- or a few-shot learning. In other words, the model should learn to make the encodings of the images and the encodings of its corresponding text as similar as possible (see [8]).

Requirements

  • Open and interdisciplinary mindset
  • Pleasure in solving riddles
  • Image processing skills
  • Software development skills
  • Programming skills
  • Machine Learning skills

References

[1] https://twitter.com/quiztime

[2] https://www.geoguessr.com

[3] https://faculty.cc.gatech.edu/~hays/papers/im2gps_chapter.pdf

[4] https://arxiv.org/pdf/2307.05845.pdf

[5] https://sector035.nl/articles/chronolocation-of-media

[6] https://www.bellingcat.com/tag/chronolocation/

[7] https://huggingface.co/geolocal/StreetCLIP

[8] https://arxiv.org/pdf/2302.00275.pdf