Using Machine Learning to Map Sediment Grain Size and Fill Major Field Sampling Gaps

Spatial mapping completes missing information from sampling studies and enables a better understanding of hydrological dynamics in a complex river corridor.

The Science

The hyporheic zone beneath and alongside a riverbed, where groundwater and surface water mix, is a key area of river corridors. This mixing provides and controls transport of dissolved nutrients, contaminants, and microbes. Modeling flow and transport in this region is challenging due to the natural variation and dynamic influences from water, sediment, and microbial processes. Researchers at the U.S. Department of Energy’s Pacific Northwest National Laboratory used a type of machine learning to predict sediment grain size along the Hanford Reach of the Columbia River. Their resulting substrate-size maps filled in gaps not provided through field data and improved spatial coverage and resolution by learning existing but unevenly sampled measurements.

The Impact

In hydrogeology, substrate size is a proxy for permeability. Spatial mapping of grain-size distribution within the hyporheic zone and along a given length of a river enables a better understanding of the hydrological variation and complexity in the system. It also provides information useful for modeling transport properties, such as hydrologic exchange flows and residence times.


Measurements of sediment grain size cover about 70% of the entire Hanford Reach of the Columbia River, but the spatial resolution is too coarse to infer continuity or variation. Bathymetry measurements and hydrodynamic simulations of this area provide higher spatial resolution data. In this study, the research team used machine learning to link this higher- resolution information to predicted substrate size so they could develop a spatial map that captured the natural sediment variation along the 70-km reach.

The researchers trained the machine-learning model with data from more than 13,000 samples of dominant substrate sizes previously collected along the Reach. They also included measurements of bathymetry, slope, and aspect, as well as simulated hydrodynamic properties such as water depth, velocity, and river bottom shear stress. The researchers used a bagging-based machine-learning technique, called Random Forest, to develop unbiased predictions with minimal over- fitting. The resulting model accurately predicted the measured substrate size and could be applied to a grid of 5- to 10-m resolution across the Hanford Reach. These algorithms enable gap filling and refining for mapping spatial grain-size distribution by learning predictive relationships between substrate size and multiple types of complementary information.

Principal Investigator(s)

Huiying Ren
Pacific Northwest National Laboratory


This research was supported as part of Subsurface Biogeochemical Research Program (SBR) of the Office of Biological and Environmental Research (BER), within the U.S. Department of Energy Office of Science. This contribution originates from the SBR Scientific Focus Area at Pacific Northwest National Laboratory.


Ren, H. et al. “Spatial mapping of riverbed grain-size distribution using machine learning.” Frontiers in Water 2, 551627 (2020)[DOI:10.3389/frwa.2020.551627]