Meta SAM 3 Meets Earth Observation: A look into using Segment Anything for object detection and tracking from animated Satellite time series

The world of geospatial AI just got a powerful new tool. Meta’s latest Segment Anything Model 3 (SAM 3), a foundation model for detection, segmentation, and tracking in images and video, is opening up exciting possibilities for analyzing satellite imagery. See the model page and GitHub.

The core connection with Earth Observation is simple: a sequence of satellite images taken over time is a video. Of course, spending a few hours of Thanksgiving holiday to explore the limits of this idea was irresistible.

A very interesting start is indeed Geostationary Operational Environmental Satellites (GOES) by NOAA and NASA, which capture the same view every few minutes, or rapidly collected high-resolution imagery in a video-like consistency. Since SAM 3 is designed to track objects across video frames, it is inherently suited to workflows like:

  • Wildfire Smoke Tracking: The result of my experiment with wildfire smoke detection is simply impressive. In this experiment, I animated time-series of the 2020 California, Creek wildfire using GEOS images available in Earth Engine (See this brilliant blog post by Justin Baraaten for more about using the data). Next, I used it as an input to SAM3 and experimented different frame rate and size.

    The results demonstrate the power of such use case using SAM 3 models for near-real-time segmentation and tracking of dynamic features like wildfire smoke, particularly with GEOS satellites.
SAM3 segmentation (prompt: ‘smoke’) using GOES-17 false-color animation of 2020’s California Creek Fire on September 5th. Animation generated in Google Earth Engine using Script from Justin Baraaten’s blog post here
  • Other applications: Similar to wildfire smoke detection, other potential applications that are worth exploring include volcanic ash plumes for disaster response, Methane leaks, and oil spills.

    Finer scale applications could involve tracking ship movement, or flooding in urban and coastal areas. Although not tested yet, these seem to me like possible applications if high resolution satellite imagery are available and consistent to construct animation of the movement (my initial assessment that these will probably require submeters resolution rather than Sentinel-2 like imagery).

The Challenges

While SAM 3’s tracking capability is simply impressive, its direct application to certain Earth Observation (EO) time-series tasks seem to face a crucial challenge: the nature of change is fundamentally different from the videos it was trained on (i.e., people, cars, and objects moving quickly).

Consider agriculture. The “change” a satellite sees is the slow, biological process of a crop growing, not a car driving across a street. A crop field goes from bare soil to green over weeks or months. The concept of the crop as its visual signature (color, texture, size) changes dramatically in the same location. The change is an inherently different, and more of phenological transformation than a movement to track, making it very challenging for a model optimized for rapid, mechanical movement to apply its temporal understanding effectively.

The Opportunities

Despite this challenge for changing features, SAM 3’s power as a zero-shot, open-vocabulary segmentation model is undeniable in geospatial analysis. It can serve as the powerful, adaptable pre-processing engine, freeing up domain experts to focus on the unique temporal and spectral complexities of our planet. This value is already realized and harvestable thanks to early adapter and pioneers researchers like Aliaksandr Hancharenka who developed segment-anything-eo and Qiusheng Wu who developed SAMGeo with a new interactive version for SAM3, these model are easily accessible and adapted to earth observation.

Foundation models like SAM 3 has an incredible value for many geospatial workflow. The segmentation power of such model can be utilized by researchers for rapidly labeling data: Use its text or visual prompting to quickly generate high-quality segmentation masks for virtually any object (like a specific building type or a logging area), drastically speeding up the creation of custom datasets. Possibly, it could also act as a teacher: Use the masks generated by SAM 3 to train smaller, specialized models that are fine-tuned for the unique characteristics and slow-change dynamics of EO data.

Looking ahead, while SAM 3 has demonstrated impressive, albeit limited, use cases with satellite time-series animations, the challenges of transformation (biological change) remain. This is precisely where the next generation of geospatial foundation models, which are actively being developed and “cooked” now will step in to address these temporal complexities directly. Possibly, unlocking new potential for reliable downstream applications.

#GeoAI, #FoundationModels, #Geospatial, #AI, #EO, #SAM, #Object-Tracking, #Segmentation

Esmaeel Adrah