It is known that external memory access consumes 2-3 orders of magnitude more energy compared to arithmetic operations with the same data. Therefore, reducing the amount of data movement by maximally re-using the fetched data is of important concern. Also, on-chip routing congestion for communicating among processing elements and input/output/kernel buffers has to be taken into account when designing an inference accelerator for the area- and energy-efficiency.
The high computational workloads of convolutional lax-yers in a CNN require massive data movement between the external memory and processing cores. We develop energy-efficient DNN inference accelerators that maximize the energy-efficiency by co-optimizing the overall data-flow and the convolution processing scheme.