Options for Implementing AI/ML on FPGAs | Bench Talk
 
Croatia - Flag Croatia

Incoterms:DDP
All prices include duty and customs fees on select shipping methods.

Please confirm your currency selection:

Euros
Free shipping on most orders over 50 € (EUR)
All payment options available

US Dollars
Free shipping on most orders over $60 (USD)
All payment options available

Bench Talk for Design Engineers

Bench Talk

rss

Bench Talk for Design Engineers | The Official Blog of Mouser Electronics


Options for Implementing AI/ML on FPGAs Adam Taylor

(Source: putilov_denis - stock.adobe.com)

Field Programmable Gate Arrays (FPGAs) are well known for accelerating artificial intelligence / machine learning applications, but how is this implemented in the FPGA and what are the different approaches? Let’s explore the engineers’ design space.

Artificial intelligence (AI) is a hot topic in both cloud and edge applications. In many cases, AI enables safer, more efficient, and secure systems. Artificial intelligence has been around a long time it was first used in 1956 by John McCarthy when the first conference on artificial intelligence was held. While significant research has been performed across the decades, it is only in the last 5 – 10 years that AI systems have been moving from the lab and research and into product road maps and products.

Within the cloud and edge environments, one of the most deployed forms of AI is Machine Learning (ML). Machine learning is the study of computer algorithms that allow computer programs to automatically improve through experience. An example of this is providing a ML network with a dataset of images with labels. The machine learning algorithm identifies features and elements of the image so that when a new unlabeled, previously unseen image is input, the ML algorithm determine how likely the image is to contain any of the learned features and elements. Such ML algorithms can be trained to detect objects in images, process keywords in speech, and analyze sensor data for anomalies. Typical applications include vision-guided robotics, autonomous operation of vehicles, and prognostics for industrial and safety critical systems.

ML learning algorithms are therefore split into two elements, the first being the training of the network against the training dataset. The second being the deployment in the field of the trained network, these elements are called training and inference respectively. Training accurate models requires a large, labelled dataset and is often performed on cloud-based GPUs to accelerate the training process. Design engineers can deploy the trained network across a range of technologies from MCU to GPU and FPGA.

Embedding AI in FPGAs

Several very popular frameworks— Caffe, TensorFlow. and Pytorch— aid training and deployment of the AI/ML systems. These frameworks are used for both network definition, training, and inference.

One of the key elements of many edge-based AI systems is the ability to perform inference within a determined timeframe. For example, autonomous vehicles must detect vehicles, obstacles, and pedestrians quickly to prevent collision. This requires a solution that is both responsive and deterministic, responsive because the sensor data must be processed quickly with minimum delay, deterministic as the response time for each input must be the same and not reliant upon system operating conditions or resource usage e.g., use of shared DDR memory slows down the response time.

Due to the requirements of responsivity and determinism, developers of edge-based solutions often target FPGA or heterogeneous SoC based solutions. These provide the developer with a programmable logic, ideal for implementing machine learning networks as its parallel nature enabled both a responsive application and a very deterministic solution.

When it comes to implementing ML inference in programmable logic, two approaches can be undertaken. Regardless of which approach is taken while neural networks are developed and trained using floating-point mathematics, implementations in FPGA or heterogeneous SoC typically use fixed-point implementations. The process of conversion from floating to fixed point is called quantization and can come with a small reduction in inference accuracy; however, for most applications, additional training can be performed using the quantized weights and activations to recover the accuracy.

The first approach implements the neural network directly within the programable logic. The trained weights for the inference are loaded into the network. This can be achieved either at run time or during compilation/synthesis of the design.

An example of these neural networks is the AMD-Xilinx FINN network, which can be used to implement quantized neural networks in FPGAs. These quantized neural networks are implemented as a quantized neural network with binary weights and two-bit activations.

With a quantized neural network, a neural network can be implemented in a FPGA with much less resources since no external DDR or SoC support is needed. Making the approach ideal for constrained developments where space, components and cost are at a premium. Though it requires a little more specialist knowledge to be able to integrate within the overall solution, it can be very effective. Typical examples which might use an approach such as this might be prognostics of industrial machinery e.g. Bearing wear or vibration etc.

The alternative approach to using a direct implementation of the neural network within the FPGA logic is the use of a highly specialized neural network accelerator. The neural network accelerator is implemented in the programmable logic and is closely coupled to the DDR memory with high bandwidth links, along with the dedicated processors within the heterogeneous SoC.

In applications that use a neural network accelerator, they are provided the network and weights / activations and biases by the software application. As such, this makes the ML inference easier to integrate within the overall application. One example of a neural network accelerator is the AMD-Xilinx Deep Learning Unit, which can work with networks defined in Pytorch, Caffe, and TensorFlow and perform all the quantization, retraining, and program generation for the application. This provides for easier integration into the application under development. Typical applications of such an approach are high performance vision-based applications such as vision guided robotics, smart city solutions and of course increase automotive SAE autonomy levels.

The highest accuracy and performance come with the use of a specialized neural network accelerator, and ease of integration often provides for a better solution overall. Hence this approach is taken by several vendors in their AI solutions. This approach integrates much easier with higher level software frameworks and abstraction stacks which is key to leverage the overall performance as AI is often only a small (but important) part of the overall solution.

Final Thoughts

Many times, the choice of which solution depends on the end application, even though AI may be a dominant marketing element. In the real-world AI is often only a tiny part of the overall solution as sensor interfacing, pre-processing, actuator drive and other elements that make up the solution will also come with their own constraints and requirements.

Programmable logic enables developers to build AI/ML solutions that are both responsive and deterministic. By combining these solutions with industry-standard frameworks, developers can make cloud and edge AI/ML applications safer, more efficient, and more secure.



« Back


​Adam TaylorAdam Taylor is a professor of embedded systems, engineering leader, and world-recognized expert in FPGA/System on Chip and Electronic Design.


All Authors

Show More Show More
View Blogs by Date