Optimizing fpga-based accelerator design for deep convolutional neural networks pdf

Minimizing computation in convolutional neural networks. Fpgabased accelerators of deep learning networks for. In section 3, we analyze the design target of nn accelerators and corresponding methods. A high performance fpgabased accelerator for largescale convolutional neural networks abstract. In recent years, convolutional neural networks cnns based machine learning algorithms have been widely applied in computer vision applications. This project is a fpga based implementation of first convolutional layer of alexnet. Convolutional networks convnets are feedforward architectures composed of multiple layers of convolutional. A highefficiency fpgabased accelerator for binarized neural network. Various network models, like convolutional neural network cnn, recurrent. Cpu has insufficient resources to satisfy the efficient computation of the convolution neural network cnn, especially for embedded applications. A gpuoutperforming fpga accelerator architecture for. Accelerating deep convolutional neural networks using. Section 6 compares existing designs and evaluate the techniques. An fpgabased framework for training convolutional neural networks wenlai zhao yz, haohuan fu, wayne luk x, teng yu, shaojun wang, bo feng, yuchun ma and guangwen yangyz, department of computer science and technology, tsinghua university, china.

Design of convolutional neural network based on fpga. In this paper, we focus on accelerating the forward propagation of deep convolutional neural networks cnns using an fpga. Refer to using the user application pdf for instructions on data transfer. Optimizing fpgabased accelerator design for deep convolutional neural networks chen zhang 1 email protected peng li 2 email protected guangyu sun 1,3 email protected yijin guan 1 email protected bingjun xiao 2 email protected jason cong 2,3,1, email protected 1 center for energyefficient computing and applications, peking university, china 2 computer science department. Design of fpgabased accelerator for convolutional neural. Cong, optimizing fpgabased accelerator design for deep convolutional neural networks, in proceedings of the 2015 acmsigda international symposium on fieldprogrammable gate arrays, acm, 2015, fpga 15, p. Latencydriven design for fpgabased convolutional neural. Deep convolutional neural networks cnns 1, 2, 3, 4 are extensively. A gpuoutperforming fpga accelerator architecture for binary convolutional neural networks project overview convolutional neural network cnn has become a popular machine learning engine for many imagerelated data analytics 1516 20 27, such as image classification, face detection, object tracking, etc. Request pdf optimizing fpgabased accelerator design for deep convolutional neural networks convolutional neural network cnn has been widely employed for image recognition because it can. Optimizing of convolutional neural network accelerator. Congoptimizing fpgabased accelerator design for deep convolutional neural networks presented at the proceedings of the 2015 acmsigda international symposium on fieldprogrammable gate arrays, monterey, california, usa 2015, 10. Optimizing fpgabased accelerator design for deep convolutional neural networks.

As cnn involves an enormous number of computations, it is necessary to accelerate the cnn computation by a hardware accelerator, such as fpga, gpu and asic designs. Dl a survey of fpgabased neural network inference accelerator 11. Various fpgabased accelerator designs have been proposed with so ware and hardware. Index terms adaptable architectures, convolutional neural networks cnns, deep learning. Specialized accelerator in the form of filed programmable gate array fpga offers a promising path towards major leaps in performance and energy efficiency. Therefore, heterogeneous computing platforms are widely used to accelerate cnn tasks, such as gpu, fpga, and asic. Binary convolutional neural network acceleration framework. Zhang and others published optimizing fpgabased accelerator design for deep convolutional neural networks find, read and cite all the research you need on. An fpgabased hardware accelerator for cnns using onchip. A highefficiency fpgabased accelerator for convolutional. Stateoftheart deep convolutional neural networks are typically organized into alternating convolutional and maxpooling neural network layers followed by a number of dense, fullyconnected layersas illustrated in the wellknown topology by krizhevsky et al. Opencl fpga has recently gained great popularity with emerging needs for workload acceleration such as convolutional neural network cnn, which is the most popular deep learning architecture in the domain of computer vision. If you have question or need further information, please contact him.

Since the impressive results of alexnet deep convolutional neural network dcnn in the imagenet largescale vision recognition challenge ilsvrc in 2012, dcnn research activity has seen exponential growth with the trend being deeper architectures accompanied by higher accuracies 2, 3. Optimizing fpgabased accelerator design for deep convolutional neural networks c zhang, p li, g sun, y guan, b xiao, j cong proceedings of the 2015 acmsigda international symposium on field, 2015. More broadly, this is part of a trendusing deep neural networks. During the last years, convolutional neural networks have been used for different applications, thanks to their potentiality to carry out tasks by using a reduced number of parameters when compared with other deep learning approaches. Autoencoderbased lowrank filtersharing for efficient convolutional neural networks. Fusedlayer cnn accelerators stony brook university.

Acceleration of deep learning for cloud and edge computing. Convolutional neural network cnn has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Efficient fpga acceleration of convolutional neural networks using logical3d compute array. A survey of fpgabased accelerators for convolutional neural networks sparsh mittal abstract deep convolutional neural networks cnns have recently shown very high accuracy in a wide range of cognitive tasks and due to this, they have received signi. Optimizing fpgabased accelerator design for deep convolutional. Among these, fpga can accelerate the computation by mapping the algorithm to the parallel hardware instead of cpu, which cannot fully.

To evaluate our design, we implement the accelerator on the zynq zc702 board, and the experiments on the svhn and cifar10 datasets show the stateoftheart performance efficiency and resource efficiency. Recently, rapid growth of modern applications based on deep learning algorithms has further improved research and implementations. Convolutional neural network cnn is a deep learning architecture extended from artificial. Optimizing fpgabased accelerator design for deep convolutional neural networks chen zhang1 chen. Optimizing opencl implementation of deep convolutional. Design space exploration of fpgabased deep convolutional neural networks. Design space exploration of fpgabased deep convolutional. A massively parallel coprocessor for convolutional neural networks. While opencl enhances the code portability and programmability of fpga, it comes at the expense of performance.

An energyefficient reconfigurable accelerator for deep convolutional neural networks. A high performance fpgabased accelerator for largescale. Request pdf optimizing fpgabased accelerator design for deep convolutional neural networks convolutional neural network cnn has been widely. Using data compression for optimizing fpgabased convolutional neural network accelerators. A highefficiency fpgabased accelerator for binarized. Congoptimizing fpgabased accelerator design for deep convolutional neural networks proceedings of the 2015 acmsigda international symposium on fieldprogrammable gate arrays, acm 2015, pp.

Improving the performance of openclbased fpga accelerator. Zhang c, li p, sun g, guan y, xiao b and cong j 2015 proc. Reasonable price, low power consumption and recon g. In recent years, convolution neural network cnn had been widely used in many imagerelated machine learning algorithms since its high accuracy for image recognition. Acm optimizing fpgabased accelerator design for deep convolutional neural networks 161170. Fpga based accelerators that target a specific cnn. While deeper structured networks bring about significant precision gains in many applications, they also pose an urgent demand for higher computation capacity at the expense of power consumption. Optimizing fpgabased accelerator design for piyawath. Introduction deep convolutional neural networks cnns have revolutionized the accuracy of recognition in computer vision. Design of fpgabased accelerator for convolutional neural network under heterogeneous computing framework with opencl li luo, yakun wu, fei qiao, yi yang, qi wei, xiaobo zhou, yongkai fan, shuzheng xu, xinjun liu, huazhong yang algorithms, cnn, computer science, deep learning, fpga, heterogeneous systems, neural networks, opencl. A survey of fpgabased neural network inference accelerator arxiv. Algorithmhardware codesign of adaptive floatingpoint encodings for resilient deep learning inference. At the forefront of deep learning lies the model of convolutional neural networks convnets with stateoftheart accuracy in a broad range of applications spanning. The zynqnet fpga accelerator, a specialized fpga architecture for the efficient acceleration of zynqnet cnn and similar convolutional neural networks.

Following this trend, research in dcnn fpga accelerators provides solutions that use highend. In applicationspecific systems, architectures and processors, 2009. Chen zhang, peng li, guangyu sun, yijin guan, bingjun xiao, jason cong. Algorithmhardware codesign for inmemory neural network computing with minimal peripheral circuit overhead. In proceedings of the 2015 acmsigda international symposium on fieldprogrammable gate arrays, pages 161170. Recently, rapid growth of modern applications based on deep learning algorithms has. Zynqnet cnn is trained offline on gpus using the caffe framework, while the zynqnet fpga accelerator employs the cnn for image classification, or inference, on a xilinx zynq xc 7z045 systemon. Cong, optimizing fpgabased accelerator design for deep convolutional neural networks, in proceedings of international symposium on fieldprogrammable gate arrays, pages 161170, 2015. To this end, various fpga based deep neural network accelerators are proposed for higher performance and lower energy consumption. Introduction deep convolutional neural networks dcnn have recently led to impressive progress in many challenging machine learning problems, such as machine vision, natural language processing and speech recognition. Latencydriven design for fpgabased convolutional neural networks stylianos i. Designing efficient accelerator of depthwise separable.

However, power consumption and memory footprint constraints, typical of on the edge and portable applications, usually collide with accuracy and latency. However, for largescale cnns, the computationintensive, memoryintensive and resourceconsuming. Optimizing fpgabased convolutional neural networks accelerator. An automated endtoend optimizing compiler for deep learning. The key challenge is to optimize the opencl kernels to. Design space exploration of fpgabased deep convolutional neural networks mohammad motamedi, philipp gysel, venkatesh akella and soheil ghiasi. Very deep convolutional networks for largescale image recognition.