A chemical reaction model, consisting of two gas-phase and a surface reaction, for the deposition of copper from copper amidinate is investigated, by comparing results of an efficient, reduced order CFD model with experiments. The film deposition rate over a wide range of temperatures, 473K-623K, is accurately captured, focusing specifically on the reported drop of the deposition rate at higher temperatures, i.e above 553K that has not been widely explored in the literature. This investigation is facilitated by an efficient computational tool that merges equation-based analysis with data-driven reduced order modeling and artificial neural networks. The hybrid computer-aided approach is necessary in order to address, in a reasonable time-frame, the complex chemical and physical phenomena developed in a three-dimensional geometry that corresponds to the experimental set-up. It is through this comparison between the experiments and the derived simulation results, enabled by machine-learning algorithms that the prevalent theoretical hypothesis is tested and validated, illuminating the possible underlying dominant phenomena.

Engineering, Materials and Chemical Sciences

High Performance Data Analysis

Paper

https://www.sciencedirect.com/science/article/pii/S0098135421000673?via%3Dihub

The adoption of detailed mechanisms for chemical kinetics often poses two types of severe challenges: First, the number of degrees of freedom is large; and second, the dynamics is characterized by widely disparate time scales. As a result, reactive flow solvers with detailed chemistry often become intractable even for large clusters of CPUs, especially when dealing with direct numerical simulation (DNS) of turbulent combustion problems. This has motivated the development of several techniques for reducing the complexity of such kinetics models, where, eventually, only a few variables are considered in the development of the simplified model. Unfortunately, no generally applicable a priori recipe for selecting suitable parameterizations of the reduced model is available, and the choice of slow variables often relies upon intuition and experience. We present an automated approach to this task, consisting of three main steps. First, the low dimensional manifold of slow motions is (approximately) sampled by brief simulations of the detailed model, starting from a rich enough ensemble of admissible initial conditions. Second, a global parametrization of the manifold is obtained through the Diffusion Map (DMAP) approach, which has recently emerged as a powerful tool in data analysis/machine learning. Finally, a simplified model is constructed and solved on the fly in terms of the above reduced (slow) variables. Clearly, closing this latter model requires nontrivial interpolation calculations, enabling restriction (mapping from the full ambient space to the reduced one) and lifting (mapping from the reduced space to the ambient one). This is a key step in our approach, and a variety of interpolation schemes are reported and compared. The scope of the proposed procedure is presented and discussed by means of an illustrative combustion example.

Engineering, Materials and Chemical Sciences

High Performance Data Analysis

Paper

https://www.mdpi.com/2227-9717/2/1/112/htm

We employ the diffusion map approach as a nonlinear dimensionality reduction technique to extract a dynamically relevant, low-dimensional description of n-alkane chains in the ideal-gas phase and in aqueous solution. In the case of C8 we find the dynamics to be governed by torsional motions. For C16 and C24 we extract three global order parameters with which we characterize the fundamental dynamics, and determine that the low free-energy pathway of globular collapse proceeds by a “kink and slide” mechanism, whereby a bend near the end of the linear chain migrates toward the middle to form a hairpin and, ultimately, a coiled helix. The low-dimensional representation is subtly perturbed in the solvated phase relative to the ideal gas, and its geometric structure is conserved between C16 and C24. The methodology is directly extensible to biomolecular self-assembly processes, such as protein folding.

Engineering, Materials and Chemical Sciences

Machine Learning / AI

Paper

https://www.pnas.org/content/107/31/13597

Concise, accurate descriptions of physical systems through their conserved quantities abound in the natural sciences. In data science, however, current research often focuses on regression problems, without routinely incorporating additional assumptions about the system that generated the data. Here, we propose to explore a particular type of underlying structure in the data: Hamiltonian systems, where an “energy” is conserved. Given a collection of observations of such a Hamiltonian system over time, we extract phase space coordinates and a Hamiltonian function of them that acts as the generator of the system dynamics. The approach employs an autoencoder neural network component to estimate the transformation from observations to the phase space of a Hamiltonian system. An additional neural network component is used to approximate the Hamiltonian function on this constructed space, and the two components are trained jointly. As an alternative approach, we also demonstrate the use of Gaussian processes for the estimation of such a Hamiltonian. After two illustrative examples, we extract an underlying phase space as well as the generating Hamiltonian from a collection of movies of a pendulum. The approach is fully data-driven and does not assume a particular form of the Hamiltonian function.

Engineering, Materials and Chemical Sciences

Machine Learning / AI

Paper

https://aip.scitation.org/doi/10.1063/1.5128231

Molecular simulation is an important and ubiquitous tool in the study of microscopic phenomena in fields as diverse as materials science, protein folding and drug design. While the atomic-level resolution provides unparalleled detail, it can be non-trivial to extract the important motions underlying simulations of complex systems containing many degrees of freedom. The diffusion map is a nonlinear dimensionality reduction technique with the capacity to systematically extract the essential dynamical modes of high-dimensional simulation trajectories, furnishing a kinetically meaningful low-dimensional framework with which to develop insight and understanding of the underlying dynamics and thermodynamics. We survey the potential of this approach in the field of molecular simulation, consider its challenges, and discuss its underlying concepts and means of application. We provide examples drawn from our own work on the hydrophobic collapse mechanism of n-alkane chains, folding pathways of an antimicrobial peptide, and the dynamics of a driven interface.

Engineering, Materials and Chemical Sciences

Machine Learning / AI

Paper

https://www.sciencedirect.com/science/article/pii/S0009261411004957

A central problem in data analysis is the low dimensional representation of high dimensional data and the concise description of its underlying geometry and density. In the analysis of large scale simulations of complex dynamical systems, where the notion of time evolution comes into play, important problems are the identification of slow variables and dynamically meaningful reaction coordinates that capture the long time evolution of the system. In this paper we provide a unifying view of these apparently different tasks, by considering a family of diffusion maps, defined as the embedding of complex (high dimensional) data onto a low dimensional Euclidean space, via the eigenvectors of suitably defined random walks defined on the given datasets.

Engineering, Materials and Chemical Sciences

Machine Learning / AI

Paper

https://www.sciencedirect.com/science/article/pii/S1063520306000534

The occurrence of instabilities in chemically reacting systems, resulting in unsteady and spatially inhomogeneous reaction rates, is a widespread phenomenon. In this article, we use nonlinear signal processing techniques to extract a simple, but accurate, dynamic model from experimental data of a system with spatiotemporal variations. The approach consists of a combination of two steps. The proper orthogonal decomposition [POD or Karhunen-Loève (KL) expansion] allows us to determine active degrees of freedom (important spatial structures) of the system. Projection onto these “modes” reduces the data to a small number of time series. Processing these time series through an artificial neural network (ANN) results in a low-dimensional, nonlinear dynamic model with almost quantitative predictive capabilities.

Machine Learning / AI

Paper

https://aiche.onlinelibrary.wiley.com/doi/abs/10.1002/aic.690390110

Artificial neural networks (ANNs) are often used for short term discrete time series predictions. Continuous-time models are, however, required for qualitatively correct approximations to long-term dynamics (attractors) of nonlinear dynamical systems and their transitions (bifurcations) as system parameters are varied. In previous work the authors developed a black-box methodology for the characterization of experimental time series as continuous-time models (sets of ordinary differential equations) based on a neural network platform. This methodology naturally lends itself to the identification of partially known first principles dynamic models, and here the authors present its extension to "gray-box" identification

Machine Learning / AI

Paper

https://ieeexplore.ieee.org/document/366006

Multi-walled carbon nanotubes (MWCNTs) are made of multiple single-walled carbon nanotubes (SWCNTs) which are nested inside one another forming concentric cylinders. These nanomaterials are widely used in industrial and biomedical applications, due to their unique physicochemical characteristics. However, previous studies have shown that exposure to MWCNTs may lead to toxicity and some of the physicochemical properties of MWCNTs can influence their toxicological profiles. In silico modelling can be applied as a faster and less costly alternative to experimental (in vivo and in vitro) testing for the hazard characterization of MWCNTs. This study aims at developing a fully validated predictive nanoinformatics model based on statistical and machine learning approaches for the accurate prediction of genotoxicity of different types of MWCNTs. Towards this goal, a number of different computational workflows were designed, combining unsupervised (Principal Component Analysis, PCA) and supervised classification techniques (Support Vectors Machine, “SVM”, Random Forest, “RF”, Logistic Regression, “LR” and Naïve Bayes, “NB”) and Bayesian optimization. The Recursive Feature Elimination (RFE) method was applied for selecting the most important variables. An RF model using only three features was selected as the most efficient for predicting the genotoxicity of MWCNTs, exhibiting 80% accuracy on external validation and high classification probabilities. The most informative features selected by the model were “Length”, “Zeta average” and “Purity”.

Machine Learning / AI

Paper

https://pubs.rsc.org/en/content/articlelanding/2021/na/d0na00600a

Despite great progress in simulating multiphysics problems using the numerical discretization of partial differential equations (PDEs), one still cannot seamlessly incorporate noisy data into existing algorithms, mesh generation remains complex, and high-dimensional problems governed by parameterized PDEs cannot be tackled. Moreover, solving inverse problems with hidden physics is often prohibitively expensive and requires different formulations and elaborate computer codes. Machine learning has emerged as a promising alternative, but training deep neural networks requires big data, not always available for scientific problems. Instead, such networks can be trained from additional information obtained by enforcing the physical laws (for example, at random points in the continuous space-time domain). Such physics-informed learning integrates (noisy) data and mathematical models, and implements them through neural networks or other kernel-based regression networks. Moreover, it may be possible to design specialized network architectures that automatically satisfy some of the physical invariants for better accuracy, faster training and improved generalization. Here, we review some of the prevailing trends in embedding physics into machine learning, present some of the current capabilities and limitations and discuss diverse applications of physics-informed learning both for forward and inverse problems, including discovering hidden physics and tackling high-dimensional problems.

Machine Learning / AI

Paper

https://www.nature.com/articles/s42254-021-00314-5