A multiscale computational framework for coupling the multiple length scales in chemical vapor deposition (CVD) processes is implemented for studying the effect of the prevailing conditions inside a CVD reactor (macro-scale) on the film growth on a wafer with predefined topography (micro-scale). A multi-parallel method is proposed for accelerating the computations. It combines domain decomposition methods for the macro-scale (reactor scale) model, which is based on partial differential equations (PDEs), and a synchronous master-worker scheme for the parallel computation of the boundary conditions (BCs) for the PDEs; BCs are coming from the micro-scale model describing film growth on the predefined topography.

High Performance Computing

Paper

https://www.sciencedirect.com/science/article/pii/S1877750315300132

We present a numerical study of the structure and stability of laminar isothermal flows formed by two counterflowing jets of an incompressible Newtonian fluid. We demonstrate that symmetric counterflowing jets with identical mass flow rates exhibit multiple steady states and, in certain cases, time-dependent (periodic) steady states. Two geometric configurations were studied based on the inlet jet shapes: planar and axisymmetric. Stagnation flows formed by planar counterflowing jets exhibit both steady-state multiplicity and time-dependent behaviour, while axisymmetric jets exhibit only a steady-state multiplicity. A linearized bifurcation and stability analysis based on the continuity and Navier–Stokes equations revealed transitions between a single (symmetric) steady state and multiple steady states or periodic steady states. The dimensionless quantities forming the parameter space of this system are the inlet Reynolds number (R$e$) and a geometric aspect ratio ($\alpha$), equal to the jet inlet characteristic length (used for calculating R$e$) divided by the jet separation. The boundaries separating different flow regimes have been identified in the (R$e$, $\alpha$) parameter space. The resulting flow maps are useful for the design and operation of counterflow jet reactors.

High Performance Computing

Paper

https://www.cambridge.org/core/journals/journal-of-fluid-mechanics/article/bifurcation-and-stability-analysis-of-laminar-isothermal-counterflowing-jets/C5FF3D83DFF1C28FBF984EBD28D35C13

This paper presents an approach for determining the linear stability of steady states of partial differential equations (PDEs) on massively parallel computers. Linearizing the transient behavior around a steady state solution leads to an eigenvalue problem. The eigenvalues with the largest real part are calculated using Arnoldi's iteration driven by a novel implementation of the Cayley transformation. The Cayley transformation requires the solution of a linear system at each Arnoldi iteration. This is done iteratively so that the algorithm scales with problem size. A representative model problem of three-dimensional incompressible flow and heat transfer in a rotating disk reactor is used to analyze the effect of algorithmic parameters on the performance of the eigenvalue algorithm. Successful calculations of leading eigenvalues for matrix systems of order up to 4 million were performed, identifying the critical Grashof number for a Hopf bifurcation. Copyright © 2001 John Wiley & Sons, Ltd.

High Performance Computing

Paper

https://onlinelibrary.wiley.com/doi/10.1002/fld.135

The big data revolution has ushered an era with ever increasing volumes and complexity of data requiring ever faster computational analysis. During this very same era, CPU performance growth has been stagnating, pushing the industry to either scale their computation horizontally using multiple nodes in datacenters, or to scale vertically using heterogeneous components to reduce compute time. However, networking and storage continue to provide both higher throughput and lower latency, which allows for leveraging heterogeneous components, deployed in data centers around the world. Still, the integration of big data analytics frameworks with heterogeneous hardware components such as GPGPUs and FPGAs is challenging, because there is an increasing gap in the level of abstraction between analytics solutions developed with big data analytics frameworks, and accelerated kernels developed with heterogeneous components. In this article, we focus on FPGA accelerators that have seen wide-scale deployment in large cloud infrastructures. FPGAs allow the implementation of highly optimized hardware architectures, tailored exactly to an application, and unburdened by the overhead associated with traditional general-purpose computer architectures. FPGAs implementing dataflow-oriented architectures with high levels of (pipeline) parallelism can provide high application throughput, often providing high energy efficiency. Latency-sensitive applications can leverage FPGA accelerators by directly connecting to the physical layer of a network, and perform data transformations without going through the software stacks of the host system. While these advantages of FPGA accelerators hold promise, difficulties associated with programming and integration limit their use. This article explores the existing practices in big data analytics frameworks, discusses the aforementioned gap in development abstractions, and provides some perspectives on how to address these challenges in the future.

Engineering, Other

High Performance Data Analysis

Paper

https://ieeexplore.ieee.org/document/9439431

This paper presents an overview and performance analysis of a software-programmable domain-customizable System-on-Chip (SoC) overlay for low-latency inferencing of variable and low-precision Machine Learning (ML) networks targeting Internet-of-Things (IoT) edge devices. The SoC includes a 2-D processor array that can be customized at design time for FPGA logic families. The overlay resolves historic issues of poor designer productivity associated with traditional Field Programmable Gate Array (FPGA) design flows without the performance losses normally incurred by overlays. A standard Instruction Set Architecture (ISA) allows different ML networks to be quickly compiled and run on the overlay without the need to resynthesize. Performance results are presented that show the overlay achieves 1.3 × − 8.0 × speedup over custom designs while still allowing rapid changes to ML algorithms on the FPGA through standard compilation.

Engineering, Other

Machine Learning / AI

Paper

https://www.computer.org/csdl/proceedings-article/fpl/2021/375900a024/1xDQ3iK1FRK

Spiking Neural Networks (SNNs) are the next generation of Artificial Neural Networks (ANNs) that utilize an event-based representation to perform more efficient computation. Most SNN implementations have a systolic array-based architecture and, by assuming high sparsity in spikes, significantly reduce computing in their designs. This work shows this assumption does not hold for applications with signals of large temporal dimension. We develop a streaming SNN (S2N2) architecture that can support fixed-per-layer axonal and synaptic delays for its network. Our architecture is built upon FINN and thus efficiently utilizes FPGA resources. We show how radio frequency processing matches our S2N2 computational model. By not performing tick-batching, a stream of RF samples can efficiently be processed by S2N2, improving the memory utilization by more than three orders of magnitude.

Engineering, Other

Machine Learning / AI

Paper

https://dl.acm.org/doi/10.1145/3431920.3439283

Binary neural networks (BNNs) have 1-bit weights and activations. Such networks are well suited for FPGAs, as their dominant computations are bitwise arithmetic and the memory requirement is also significantly reduced. However, compared to start-of-the-art compact convolutional neural network (CNN) models, BNNs tend to produce a much lower accuracy on realistic datasets such as ImageNet. In addition, the input layer of BNNs has gradually become a major compute bottleneck, because it is conventionally excluded from binarization to avoid a large accuracy loss. This work proposes FracBNN, which exploits fractional activations to substantially improve the accuracy of BNNs. Specifically, our approach employs a dual-precision activation scheme to compute features with up to two bits, using an additional sparse binary convolution. We further binarize the input layer using a novel thermometer encoding. Overall, FracBNN preserves the key benefits of conventional BNNs, where all convolutional layers are computed in pure binary MAC operations (BMACs). We design an efficient FPGA-based accelerator for our novel BNN model that supports the fractional activations. To evaluate the performance of FracBNN under a resource-constrained scenario, we implement the entire optimized network architecture on an embedded FPGA (Xilinx Ultra96 v2). Our experiments on ImageNet show that FracBNN achieves an accuracy comparable to MobileNetV2, surpassing the best-known BNN design on FPGAs with an increase of 28.9% in top-1 accuracy and a 2.5x reduction in model size. FracBNN also outperforms a recently introduced BNN model with an increase of 2.4% in top-1 accuracy while using the same model size. On the embedded FPGA device, FracBNN demonstrates the ability of real-time image classification.

Engineering, Other

Machine Learning / AI

Paper

https://dl.acm.org/doi/10.1145/3431920.3439296

The remarkable success of machine learning (ML) in a variety of research domains has inspired academic and industrial communities to explore its potential to address hardware Trojan (HT) attacks. While numerous works have been published over the past decade, few survey papers, to the best of our knowledge, have systematically reviewed the achievements and analyzed the remaining challenges in this area. To fill this gap, this article surveys ML-based approaches against HT attacks available in the literature. In particular, we first provide a classification of all possible HT attacks and then review recent developments from four perspectives, i.e., HT detection, design-for-security (DFS), bus security, and secure architecture. Based on the review, we further discuss the lessons learned in and challenges arising from previous studies. Despite current work focusing more on chip-layer HT problems, it is notable that novel HT threats are constantly emerging and have evolved beyond chips and to the component, device, and even behavior layers, therein compromising the security and trustworthiness of the overall hardware ecosystem. Therefore, we divide the HT threats into four layers and propose a hardware Trojan defense (HTD) reference model from the perspective of the overall hardware ecosystem, therein categorizing the security threats and requirements ineach layer to provide a guideline for future research in this direction.

Engineering, Other

Machine Learning / AI

Paper

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8952724

FPGA Architecture: Survey and Challenges

University of Toronto, Canada

Field-Programmable Gate Arrays (FPGAs) have become one of the key digital circuit implementation media over the last decade. A crucial part of their creation lies in their architecture, which governs the nature of their programmable logic functionality and their programmable interconnect. FPGA architecture has a dramatic effect on the quality of the final device's speed performance, area efficiency and power consumption.

Engineering, Other

High Performance Computing

Paper

https://ieeexplore.ieee.org/document/8187326

Survey on FPGA Architecture and Recent Applications

Vellore Institiute of Technology, Vellore, India

Field Programmable Gate Array or FPGA is introduced in the year 1985 and it is getting popular day by day due to its properties like design to reuse and flexibility. As compared to microprocessor, FPGA have high performance and configurability. When compared with application specific integrated circuit (ASIC), FPGA reduces development time, non-recurrent engineering (NRE) costs. The unique property which differ it from ASIC is its reconfiguration. The recent trends in FPGA architecture are in the direction which reduces the gap between the ASIC and FPGA. This paper will discuss on the classification of FPGA based on their routing architecture and the recent trends in the field of Physics, computation, defense, space research, etc., which are focusing on betterment of the existing technology.

Engineering, Other

High Performance Computing

Paper

https://ieeexplore.ieee.org/document/8899550