Running artificial intelligence (AI) applications in huge, cloud-based data centers is so last year! Well, actually it is so this year too — the latest and greatest algorithms simply require too many resources to run on anything less. But that is not the long-term goal. When we send data to the cloud for processing, significant latency is introduced. This is a big problem for applications with real-time processing requirements. Furthermore, numerous privacy-related issues also arise when sending sensitive data over public networks for processing in a data center owned by a third party.
The solution, of course, is to run the algorithms much closer to where the data is captured using tinyML techniques. But as successful as these scaled-down algorithms have been, there is no magic involved. Corners have to be cut and optimizations have to be applied before tinyML algorithms can run on resource-constrained platforms like microcontrollers.
The architecture of a MAX78000 AI accelerator (📷: T. Gong et al.)
Tiny AI accelerators, such as the Analog Devices MAX78000 and Google Coral Micro, address this issue by significantly speeding up inference times through hardware optimizations like multiple convolutional processors and dedicated per-processor memory. Despite these advancements, challenges remain. Consider computer vision tasks, for example, where the limited memory per processor restricts input image size, requiring that they be downsampled. This, in turn, reduces accuracy, and moreover, the per-processor memory architecture causes underutilization of processors for low-channel input layers.
To overcome these issues, researchers at Nokia Bell Labs have introduced what they call Data Channel EXtension (DEX). It is a novel approach that improves tinyML model accuracy by extending the input data across unused channels, fully utilizing the available processors and memory to preserve more image information without increasing inference latency.
An overview of the DEX algorithm (📷: T. Gong et al.)
DEX operates in two main steps: patch-wise even sampling and channel-wise stacking. In patch-wise even sampling, the input image is divided into patches corresponding to the resolution of the output image. From each patch, evenly spaced samples are selected to ensure spatial relationships among pixels are preserved while distributing the sampling uniformly across the image. This prevents information loss caused by traditional downsampling.
Next, in channel-wise stacking, the sampled pixels are arranged across extended channels in an organized manner. The samples from each patch are sequentially stacked into different channels, maintaining spatial consistency and ensuring the additional channels store meaningful and distributed data. This process allows DEX to utilize all available processors and memory instances, unlike traditional methods that leave many processors idle.
Splitting data across channels makes better use of hardware resources (📷: T. Gong et al.)
By reshaping input data into a higher channel dimension (e.g., from 3 channels to 64 channels), DEX effectively preserves more pixel information and spatial relationships without requiring additional latency (due to the parallelism afforded by the accelerator). As a result, tinyML algorithms benefit from richer image representations, leading to improved accuracy and efficient utilization of hardware resources on tiny AI accelerators.
DEX was evaluated using the MAX78000 and MAX78002 tiny AI accelerators with four vision datasets (ImageNette, Caltech101, Caltech256, and Food101) and four neural network models (SimpleNet, WideNet, EfficientNetV2, and MobileNetV2). Compared to baseline methods like downsampling and CoordConv, DEX improved accuracy by 3.5 percent and 3.6 percent, respectively, while preserving inference latency. DEX’s ability to utilize 21.3 times more image information contributed to the accuracy boost, with only a minimal 3.2 percent increase in model size. These tests demonstrated the potential of DEX to maximize image information and resource utilization without performance trade-offs.