Smartphones: The New Way to Train Neural Networks?

March 1, 2021 - 8 minutes read

We know artificial intelligence (AI) to be a workhorse technology that can take nearly any information and make sense of it. But AI is still running into growing pains as hardware hasn’t caught up to the rapid and accelerated growth that AI software and research are experiencing. The tools behind many AI applications, often neural networks, are powerful and smart, but they require a lot of server bandwidth and energy to train.

Although training can take longer than testing in AI applications, the hardware cost is currently too high to fully realize AI’s potential. But new research from IBM has flipped how we allocate memory to AI training, and it doesn’t take away from accuracy in testing. The new approach reduces the number of bits we use in training calculations to reduce the amount of energy and server space a neural network needs for training purposes.

Memory Allocation in AI

Bits are the foundation of information storage in a computer, and they can exist either as a 0 or a 1. This is generally referred to as binary due to the two-factor nature of bits. In machine learning applications, like neural networks, computers have to deal with much larger and smaller numbers than 0 and 1. They use multiple bits to encode these numbers efficiently. A lot of software is offered as 32-bit or 64-bit.

The 32 and 64 parts of the software distribution refers to how many bits it uses to encode the data it will handle. 64-bit entails more precision because it allows for more information about a piece of data to be stored. For example, if you want to store an irrational number (one that has no real ending), 64-bit software can store many more decimals than the 32-bit version.

In AI applications, accuracy and precision are key, so the more bits, the better. Even in consumer-level computers, having the extra bits helps for server- and memory-intensive work like gaming or video editing. While deep learning algorithms use 32-bit software as the standard data size, for many neural networks, extra precision isn’t necessary.

As a result, it’s becoming more common for neural networks to run in 16-bit. And research shows that the loss in precision doesn’t equal too big of a loss in accuracy if the algorithm has been designed carefully beforehand. This is what rides the middle between too much resource usage and training accuracy, and it has sped up calculations, reduced energy in calculations, and used less memory and bandwidth. For a field like AI, where research and algorithms build on top of each other, this finding could be a major catalyst for innovation and ideas.

The Research Challenge

16-bit is already a huge step down from 32-bit, but the IBM researchers took it two steps further. The research team presented their findings as a paper during NeurIPS (Conference on Neural Information Processing Systems), a decades-old prestigious AI conference. They showed that they could train an AI algorithm on a variety of language and image tasks with only four bits, and the result experiences only a limited loss in accuracy while it gets sped up by more than seven times.

The main research challenge faced by the team was how to deal with numbers throughout the neural network. Neural networks generally have weights attached to each neuron, and it’s usually a decimal between -1 and 1, but in other places in the network, the numbers can be as low as 0.001 or as high as 1,000. When bits store numbers in a range as broad as 0.001 to 1,000, there is going to be precision loss in the range of -1 to 1. The problem is that the researchers needed to represent these massively differing ranges of numbers using many fewer degrees of freedom than a 32-bit or even a 16-bit memory.

The researchers decided to use a logarithmic scale to better reduce the magnitude of these ranges. What the conversion did was change numbers from regular increments (like 1, 2, 3) to stepped increments (like 1, 10, 100). Logarithmic scales can take any number of factors to increment numbers, and the researchers chose a factor of four for their scale.

They also used a technique that adaptively scales the numbers throughout the network as its training, enabling them to refit the logarithmic scale to whichever parts of the network, even if the disparity was in ranges like -1 to 1 and 0.001 to 1,000. These two ideas caused the neural network to take up much less energy, time, and computing power during training. This revelation could level the playing field for resource-strapped researchers to build much bigger models and more complex algorithms like the ones that Google, Facebook AI, and OpenMind release.

What’s more, this innovation could allow us to train neural networks on edge devices like smartphones and smaller devices. This further levels the playing field for third-world nations who have neither the network speeds nor the computing power of more developed nations.

Unfortunately, this research isn’t fully fleshed out yet: it was studied in a simulation, and not on real hardware because 4-bit AI chips don’t exist yet. But the lead researcher, Kailash Gopalakrishnan, told Boston-based MIT Tech Review that 4-bit chips are coming and that IBM will have 4-bit hardware ready to go in three or four years.

AI Innovations

IBM isn’t the only company working hard on this problem. Many companies that utilize AI algorithms are building dedicated hardware for AI training, and many are working with their current chips to boost low-precision capabilities without losing accuracy or clarity.

We still don’t know what new (and old) hardware has in store for AI. It’s an extremely exciting time for a field that’s recently been rapidly innovating and evolving. Would you be interested in training neural networks on your smartphone? As always, let us know your thoughts on this article in the comments below!