Three Spatial Data Fusion Vignettes

Alan Gelfand (Department of Statistical Science, Duke University)

16-Mar-2023, 12:15-13:00 (14 months ago)

Abstract: With increased collection of spatial (and spatio-temporal) datasets, we often find multiple sources that are capable of informing about features of a process of interest. Through suitable fusion of the data sources, we can learn at least as much about the process features of interest than from any individual source. For three different illustrative ecological/environmental applications, this talk will propose suitable coherent stochastic modeling to implement a fusion of these sources. We focus exclusively on approaches that arise through generative hierarchical modeling; the specification could produce the data sources that have been observed. Such modeling enables full inference both with regard to estimation and prediction, with implicit incorporation of uncertainty.

We consider the general setting of points and marks, modeled as $[points][marks|points]$, points in $\mathcal{D}$, marks in $\mathcal{Y}$. The process can model the points themselves, the marks themselves (ignoring any randomness in the points), or the points and marks jointly. This results in four data types: (i) a point pattern, $\mathcal{S}= (\textbf{s}_{1}, \textbf{s}_{2},\ldots,\textbf{s}_{n})$, (ii) a vector of counts for sets, $\{N(B_{k}), k=1,2,\ldots,K\}$, (iii) a vector of observations at points, $\{Y(\textbf{s}_{i}),i=1,2,\ldots,n\}$, (iv) a vector of averages for sets, $\{Y(B_{1}), Y(B_{2}),\ldots,Y(B_{k})\}$. We illustrate with two data sources; each can be any one of the four data types. Regardless of how the data are observed, we imagine the process operates at point level. Further, we imagine a stochastic process over $\mathcal{D}$ which links the two data sources.

The first vignette considers presence/absence data over $\mathcal{D}$ with one dataset being presence/absence of a species collected at a set of chosen locations. The other data source is in the form of museum/citizen science data, recording random locations where the species was observed. The goal is to better understand the probability of presence surface over $\mathcal{D}$. The second vignette considers zooplankton abundance data gathered through two different $\it{towing}$ mechanisms. One mechanism is calibrated while the other is not. The goal is to better understand zooplankton abundance over $\mathcal{D}$. The third, and most challenging vignette seeks to learn about whale abundance. Here, the two sources are aerial distance sampling data for whale sightings and passive acoustic monitoring data (using monitors on the ocean floor) for whale calls.

This is joint work with Shin Shirota, Jorge Castillo-Mateo, Erin Schliep, and Rob Schick.

probabilitystatistics theory

Audience: researchers in the discipline


Gothenburg statistics seminar

Series comments: Gothenburg statistics seminar is open to the interested public, everybody is welcome. It usually takes place in MVL14 (http://maps.chalmers.se/#05137ad7-4d34-45e2-9d14-7f970517e2b60, see specific talk).

Organizers: Moritz Schauer*, Ottmar Cronie*
*contact for this listing

Export talk to