Machine Learning for Geothermal Resource Exploration in the Tularosa Basin, New Mexico (2023) — Maruti K. Mudunuru et al. — EnergiesAbstract
Geothermal energy is considered an essential renewable resource to generate flexible electricity. Geothermal resource assessments conducted by the U.S. Geological Survey showed that the southwestern basins in the U.S. have a significant geothermal potential for meeting domestic electricity demand. Within these southwestern basins, play fairway analysis (PFA), funded by the U.S. Department of Energy’s (DOE) Geothermal Technologies Office, identified that the Tularosa Basin in New Mexico has significant geothermal potential. This short communication paper presents a machine learning (ML) methodology for curating and analyzing the PFA data from the DOE’s geothermal data repository. The proposed approach to identify potential geothermal sites in the Tularosa Basin is based on an unsupervised ML method called non-negative matrix factorization with custom k-means clustering. This methodology is available in our open-source ML framework, GeoThermalCloud (GTC). Using this GTC framework, we discover prospective geothermal locations and find key parameters defining these prospects. Our ML analysis found that these prospects are consistent with the existing Tularosa Basin’s PFA studies. This instills confidence in our GTC framework to accelerate geothermal exploration and resource development, which is generally time-consuming.
The drivers and predictability of wildfire re-burns in the western United States (US) (2023) — K C Solander, C J Talsma, V V Vesselinov — Environmental Research: ClimateAbstract
Abstract Evidence is mounting that the effectiveness of using prescribed burns as a management tactic may be diminishing due to the higher incidence of wildfire re-burns. The development of predictive models of re-burns is thus essential to better understand their primary drivers so that forest management practices can be updated to account for these events. First, we assess the potential for human activity as a driver of re-burns by evaluating re-burn trends both within and outside of the wildland–urban interface (WUI) of the western US. Next, we investigate the predictability of re-burns through the application of both random forest and the explanatory machine learning non-negative matrix factorization using k -means clustering (NMFk) algorithms to predict re-burn occurrence over California based on a number of climate factors. Our findings indicate that while most states showed increasing trends within the WUI when trends were conducted over longer moving windows (e.g. 20 years), California was the only state where the rate of increase was consistently higher in the WUI, indicating a stronger potential for human activity as a driver in that location. Furthermore, we find model performance was found to be robust over most of California (Testing F1 scores = 0.688), although results were highly variable based on EPA level III Ecoregion (F1 scores = 0.0–0.778). Insights provided from this study will lead to a better understanding of climate and human activity drivers of re-burns and how these vary at broad spatial scales so that improvements in forest management practices can be tuned according to the level of change that is expected for a given region.
Characterizing Drought Behavior in the Colorado River Basin Using Unsupervised Machine Learning (2022) — Carl J. Talsma, Katrina E. Bennett, Velimir V. Vesselinov — Earth and Space ScienceAbstract
Abstract Drought is a pressing issue for the Colorado River Basin (CRB) due to the social and economic value of water resources in the region and the significant uncertainty of future drought under climate change. Here, we use climate simulations from various Earth System Models (ESMs) to force the Variable Infiltration Capacity hydrologic model and project multiple drought indicators for the sub‐watersheds within the CRB. We apply an unsupervised machine learning (ML) based on Non‐Negative Matrix Factorization using K‐means clustering (NMFk) to synthesize the simulated historical, future, and change in drought indicators. The unsupervised ML approach can identify sub‐watersheds where key changes to drought indicator behavior occur, including shifts in snowpack, snowmelt timing, precipitation, and evapotranspiration. While changes in future precipitation vary across ESMs, the results indicate that the Upper CRB will experience increasing evaporative demand and surface‐water scarcity, with some locations experiencing a shift from a radiation‐limited to a water‐limited evaporation regime in the summer. Large shifts in peak runoff are observed in snowmelt‐dominant sub‐watersheds, with complete disappearance of the snowmelt signal for some sub‐watersheds. The work demonstrates the utility of the NMFk algorithm to efficiently identify behavioral changes of drought indicators across space and time and to quickly analyze and interpret hydro climate model results.
SmartTensors: Unsupervised and physics-informed machine learning framework for the geoscience applications (2022) — Bulbul Ahmmed, Velimir V. Vesselinov, Maruti K. Mudunuru — Second International Meeting for Applied Geoscience & EnergyAbstract
SmartTensors (https://github.com/SmartTensors) is a novel framework for unsupervised and physics-informed machine learning for geoscience applications. The methods in SmartTensors AI platform are developed using advanced matrix/tensor factorization constrained by penalties enforcing robustness and interpretability (e.g., nonnegativity, sparsity, physics, and mathematical constraints;etc.). This framework has been applied to analyze diverse datasets related to a wide range of problems: from COVID-19 to wildfires and climate. Here, we will focus on the analysis of geothermal prospectivity of the Great Basin, U.S. The basin covers a vast area that is yet to be thoroughly explored to discover new geothermal resources. The available regional geochemical data are expected to provide critical information about the geothermal reservoir properties in the basin, including temperature, fluid/heat flow, boundary conditions, and spatial extent. The geochemical data may also include hidden (latent) information that is a proxy for geothermal prospectivity. We processed the sparse geochemical dataset of 18 geochemical attributes observed at 14,341 locations. The data are analyzed using our GeoThermalCloud toolbox for geothermal exploration (https://github.com/SmartTensors/GeoThermalCloud.jl) whichis also a part of the SmartTensors framework. An unsupervised machine learning using non-negative matrix factorization with customized k-means clustering (NMFk) as implemented in SmartTensors identified three hidden geothermal signatures representing low-, medium-, and high-temperature reservoirs, respectively (Fig). NMFk also evaluated the probability of occurrence of these types of resources through the studied region. NMFk also reconstructed attributes from sparse into continuous over the study domain. Future work will add in the ML analyses other regional- and site-scale datasets including geological, geophysical, and geothermal attributes. © 2022 Society of Exploration Geophysicists and the American Association of Petroleum Geologists.
Machine learning to identify geologic factors associated with production in geothermal fields: a case-study using 3D geologic data, Brady geothermal field, Nevada (2021) — Drew L. Siler et al. — Geothermal EnergyAbstract
Abstract In this paper, we present an analysis using unsupervised machine learning (ML) to identify the key geologic factors that contribute to the geothermal production in Brady geothermal field. Brady is a hydrothermal system in northwestern Nevada that supports both electricity production and direct use of hydrothermal fluids. Transmissive fluid-flow pathways are relatively rare in the subsurface, but are critical components of hydrothermal systems like Brady and many other types of fluid-flow systems in fractured rock. Here, we analyze geologic data with ML methods to unravel the local geologic controls on these pathways. The ML method, non-negative matrix factorization with k -means clustering (NMF k ), is applied to a library of 14 3D geologic characteristics hypothesized to control hydrothermal circulation in the Brady geothermal field. Our results indicate that macro-scale faults and a local step-over in the fault system preferentially occur along production wells when compared to injection wells and non-productive wells. We infer that these are the key geologic characteristics that control the through-going hydrothermal transmission pathways at Brady. Our results demonstrate: (1) the specific geologic controls on the Brady hydrothermal system and (2) the efficacy of pairing ML techniques with 3D geologic characterization to enhance the understanding of subsurface processes.
Nonnegative tensor decomposition with custom clustering for microphase separation of block copolymers (2019) — Boian S. Alexandrov et al. — Statistical Analysis and Data Mining: The ASA Data Science JournalAbstract
High‐dimensional datasets are becoming ubiquitous in many applications and therefore unsupervised tensor methods to interrogate them are needed. Here, we report a new unsupervised machine learning (ML) approach (NTFk) based on nonnegative tensor factorization integrated with a custom k‐means clustering. We demonstrate the ability of NTFk to extracting temporal and spatial features of phase separation of copolymers as they are modeled by self‐consistent field theory. Microphase separation of block copolymers has been extensively studied both experimentally and theoretically. However, the interpretation of computer simulations and/or experimental data, representing temporal and spatial changes of molecular species concentration is still a challenging task. Thus, extracting the phase diagram from simulations or experimental data as well as the interpretation of data requires discernment of the model/experimental parameters (such as, temperature, concentrations, the number of molecular species and the interaction between species) impact on the microphase separation process. An attractive and unique aspect of the introduced ML method is that it ensures the nonnegativity of the extracted latent features. Nonnegativity is an essential constraint needed to obtain interpretable and sparse latent features that are parts‐based representation of the data. The custom clustering in NTFk serves to estimate the number of latent features in the data.
Unsupervised Machine Learning for Analysis of Coexisting Lipid Phases and Domain Growth in Biological Membranes (2019) — Cesar A. López et al. — bioRxiv, 527630, 2019Abstract
ABSTRACT Phase separation in mixed lipid systems has been extensively studied both experimentally and theoretically because of its biological importance. A detailed description of such complex systems undoubtedly requires novel mathematical frameworks that are capable to decompose and categorize the evolution of thousands if not millions of lipids involved in the phenomenon. The interpretation and analysis of Molecular Dynamics (MD) simulations representing temporal and spatial changes in such systems is still a challenging task. Here, we present a new unsupervised machine learning approach based on Nonnegative Matrix Factorization, called NMFk, that successfully extracts physically meaningful features from neighborhood profiles derived from coarse-grained MD simulations of ternary lipid mixture. Our results demonstrate that leveraging NMFk can (a) determine the role of different lipid molecules in phase separation, (b) characterize the formation of nano-domains of lipids, (c) determine the timescales of interest and (d) extract physically meaningful features that uniquely describe the phase separation with broad implications.
Unsupervised Machine Learning for Analysis of Phase Separation in Ternary Lipid Mixture (2019) — Cesar A. Löpez et al. — Journal of Chemical Theory and ComputationAbstract
Phase separation in mixed lipid systems has been extensively studied both experimentally and theoretically because of its biological importance. A detailed description of such complex systems undoubtedly requires novel mathematical frameworks that are capable of decomposing and categorizing the evolution of thousands if not millions of lipids involved in the phenomenon. The interpretation and analysis of Molecular Dynamics (MD) simulations representing temporal and spatial changes in such systems is still a challenging task. Here, we present an unsupervised machine learning approach based on Nonnegative Matrix Factorization, called NMFk, that successfully extracts latent (i.e., not directly observable) features from the second layer neighborhood profiles derived from coarse-grained MD simulations of ternary lipid mixture. Our results demonstrate that NMFk extracts physically meaningful features that uniquely describe the phase separation such as locations and roles of different lipid types, formation of nano-domains, and timescales of lipid segregation.
Nonnegative Matrix Factorization for identification of unknown number of sources emitting delayed signals (2018) — Filip L. Iliev et al. — PLOS ONEAbstract
Factor analysis is broadly used as a powerful unsupervised machine learning tool for reconstruction of hidden features in recorded mixtures of signals. In the case of a linear approximation, the mixtures can be decomposed by a variety of model-free Blind Source Separation (BSS) algorithms. Most of the available BSS algorithms consider an instantaneous mixing of signals, while the case when the mixtures are linear combinations of signals with delays is less explored. Especially difficult is the case when the number of sources of the signals with delays is unknown and has to be determined from the data as well. To address this problem, in this paper, we present a new method based on Nonnegative Matrix Factorization (NMF) that is capable of identifying: (a) the unknown number of the sources, (b) the delays and speed of propagation of the signals, and (c) the locations of the sources. Our method can be used to decompose records of mixtures of signals with delays emitted by an unknown number of sources in a nondispersive medium, based only on recorded data. This is the case, for example, when electromagnetic signals from multiple antennas are received asynchronously; or mixtures of acoustic or seismic signals recorded by sensors located at different positions; or when a shift in frequency is induced by the Doppler effect. By applying our method to synthetic datasets, we demonstrate its ability to identify the unknown number of sources as well as the waveforms, the delays, and the strengths of the signals. Using Bayesian analysis, we also evaluate estimation uncertainties and identify the region of likelihood where the positions of the sources can be found.
Nonnegative/Binary matrix factorization with a D-Wave quantum annealer (2018) — Daniel O’Malley et al. — PLOS ONEAbstract
D-Wave quantum annealers represent a novel computational architecture and have attracted significant interest. Much of this interest has focused on the quantum behavior of D-Wave machines, and there have been few practical algorithms that use the D-Wave. Machine learning has been identified as an area where quantum annealing may be useful. Here, we show that the D-Wave 2X can be effectively used as part of an unsupervised machine learning method. This method takes a matrix as input and produces two low-rank matrices as output—one containing latent features in the data and another matrix describing how the features can be combined to approximately reproduce the input matrix. Despite the limited number of bits in the D-Wave hardware, this method is capable of handling a large input matrix. The D-Wave only limits the rank of the two output matrices. We apply this method to learn the features from a set of facial images and compare the performance of the D-Wave to two classical tools. This method is able to learn facial features and accurately reproduce the set of facial images. The performance of the D-Wave shows some promise, but has some limitations. It outperforms the two classical codes in a benchmark when only a short amount of computational time is allowed (200-20,000 microseconds), but these results suggest heuristics that would likely outperform the D-Wave in this benchmark.
Unsupervised phase mapping of X-ray diffraction data by nonnegative matrix factorization integrated with custom clustering (2018) — Valentin Stanev et al. — npj Computational MaterialsAbstract
Abstract Analyzing large X-ray diffraction (XRD) datasets is a key step in high-throughput mapping of the compositional phase diagrams of combinatorial materials libraries. Optimizing and automating this task can help accelerate the process of discovery of materials with novel and desirable properties. Here, we report a new method for pattern analysis and phase extraction of XRD datasets. The method expands the Nonnegative Matrix Factorization method, which has been used previously to analyze such datasets, by combining it with custom clustering and cross-correlation algorithms. This new method is capable of robust determination of the number of basis patterns present in the data which, in turn, enables straightforward identification of any possible peak-shifted patterns. Peak-shifting arises due to continuous change in the lattice constants as a function of composition and is ubiquitous in XRD datasets from composition spread libraries. Successful identification of the peak-shifted patterns allows proper quantification and classification of the basis XRD patterns, which is necessary in order to decipher the contribution of each unique single-phase structure to the multi-phase regions. The process can be utilized to determine accurately the compositional phase diagram of a system under study. The presented method is applied to one synthetic and one experimental dataset and demonstrates robust accuracy and identification abilities.
Blind source separation for groundwater pressure analysis based on nonnegative matrix factorization (2014) — Boian S. Alexandrov, Velimir V. Vesselinov — Water Resources ResearchAbstract
Abstract The identification of the physical sources causing spatial and temporal fluctuations of aquifer water levels is a challenging, yet a very important hydrogeological task. The fluctuations can be caused by variations in natural and anthropogenic sources such as pumping, recharge, barometric pressures, etc. The source identification can be crucial for conceptualization of the hydrogeological conditions and characterization of aquifer properties. We propose a new computational framework for model‐free inverse analysis of pressure transients based on Nonnegative Matrix Factorization (NMF) method for Blind Source Separation (BSS) coupled with k ‐means clustering algorithm, which we call NMF k . NMF k is capable of identifying a set of unique sources from a set of experimentally measured mixed signals, without any information about the sources, their transients, and the physical mechanisms and properties controlling the signal propagation through the subsurface flow medium. Our analysis only requires information about pressure transients at a number of observation points, m , where , and r is the number of unknown unique sources causing the observed fluctuations. We apply this new analysis on a data set from the Los Alamos National Laboratory site. We demonstrate that the sources identified by NMF k have real physical origins: barometric pressure and water‐supply pumping effects. We also estimate the barometric pressure efficiency of the monitoring wells. The possible applications of the NMF k algorithm are not limited to hydrogeology problems; NMF k can be applied to any problem where temporal system behavior is observed at multiple locations and an unknown number of physical sources are causing these fluctuations.