Tesla released a Dojo white paper this week, and Elon Musk pointed out alongside the release that this is much more important than it may seem at first. Meaning, this is highly important and worth taking a deep dive into.
In the 9-page paper, Tesla explained its arithmetic formats and methods for the new binary floating-point arithmetic in computer programming environments for deep learning neural network training. Translation: Tesla shared a detailed guide as to how it developed a way to solve a problem Tesla was facing. And, as you will learn from reading this article (and the document), part of this included upgrading an older standard that is widely adopted/used by computer systems today.
This is more important than it may seem at first
— Elon Musk (@elonmusk) October 26, 2021
First, let’s define what binary floating-point arithmetic is. This refers to a number’s decimal point, or for computers, a binary point and how it can float or be placed anywhere that is relative to the significant digits of the number. There are two types that Tesla mentioned in the paper — 8-bit and 16-bit binary floating-point arithmetic.
In the paper, Tesla explains how it it created new formats and methods as well as new 8-bit and 16-bit binary floating-point formats in order to create the right environment for deep learning neural network training. Then it implemented these systems to work in both software and hardware, or a mix of the two.
Tesla’s Paper by Sections
Tesla shared the why in the Motivation section, and the following sections are examples of what and how it created the standards that are the Dojo technology.
- Motivation.
- Tesla CFloat8 Formats.
- Arithmetic Operations.
- Tesla CFloat16 Formats.
- Exception Status Flags.
Motivation
Motivation is something that moves us. It inspires us. What motivates you to do what you do? That drive. In AI and robotics, it’s kind of the same as with psychology. Intrinsic motivation enables artificial agents, such as a robot, to pretty much do what humans do emotionally. Meaning that they show rewarding behaviors such as curiosity. These behaviors are grouped under the same term in psychology. It’s the drive that makes us do what we do for inherent satisfaction. It’s what makes us unique individuals.
Keep this in mind when reading the Motivation section of Tesla’s paper. In this section, Tesla brought up a formula standard by the Institute of Electrical and Electronics Engineers (IEEE) that was published in 1985 with the goal of providing a method for computation with floating-point numbers that yield the same result no matter how the processing is done. It can be done in hardware, software, or a mix of the two. This standard, which is seen in the graph below, has been widely adopted in computer systems.
It was later revised in 2008 and included a half-precision (16-bit) as a storage format without specifying the arithmetic operations. Tesla noted that both Microsoft and Nvidia defined this data type in the Cg language in 2002 and implemented it in silicon in the GeForceFX which was also released in 2002. Tesla added that the IEEE half-precision format has been used for many purposes including performing arithmetic operations in various computer systems. More specifically, Tesla mentioned graphics and machine learning applications.
“The advantage over single-precision binary format is that it requires half the storage and bandwidth (at the expense of precision and range). Subsequently, the IEEE half-precision format has been adopted in machine learning systems in the Nvidia AI processors, especially for training, due to the significantly increased memory storage and bandwidth requirements in such applications. “
Meaning, this format is great for increasing memory storage and bandwidth requirements for whatever it is used in. Tesla gave an example of Tesla using Google Brain. Google Brain is an AI research group at Google that developed the Brain Floating Point (BFloat16; 16 meaning 16-bit) format. It’s now used in Intel AI processors such as NervanaNNP-L1000, as well as many other processers. Tesla listed those as well but more importantly, noted the difference between the BFloat16 format and the IEEE Float16 format. This difference lies in the number of bits provisioned for mantissa and exponent bits. Why is this important? Tesla explained,
“As deep learning neural networks grow, the memory storage and bandwidth pressure continue to present challenges and create bottlenecks in many systems, even with the Float16 and BFloat16 storage in memory.”
Tesla CFloat8 Formats, Arithmetic Operations, & Tesla CFloat16 Formats
In a nutshell, Tesla is showing how it upgraded the IEEE754R standard by introducing its Configurable Float8 (CFloat8) Configurable Float16 (CFloat16), and Arithmetic operations. For the terms CFloat8 and CFloat16, the numbers refer to the bits (8-bit; 16-bit).
Tesla described a problem that the IEEE Float16 and Bfloat16 formats had and how it solved those problems by enabling configurability. Tesla shared two CFloat8 formats that support convert operations to and from the BFloat16 and IEEE Float32 formats. These are CFloat8_1_4_3 and CFloat8_1_5_2 and in the Arithmetic Operations section, Tesla described how it is able to do this.
“The arithmetic operations that the CFloat8 formats should provide are implementation-dependent. Typically, the CFloat8 formats are used in mixed precision arithmetic, where the operands stored in CFloat8 format in memory may be operated on and expanded to wider data types, such as Float32.”
The next format is a CFloat16 format and Tesla specified two of them for 16-bit floating-point numbers, Signed Half-Precision (SHP) and Unsigned Half Precision (UHP). Tesla explained that these are used to store parameters such as gradients in cases where the precision of CFloat8 formats are too small to guarantee the convergence of training networks.
Some Thoughts
This guide that Telsa published is a look into how it upgraded, or technically, recreated, an older standard to meet the needs of developing the Dojo computer. I would consider it a teaching guide for anyone who is working in the field of artificial intelligence and is interested in learning about Tesla’s development of AI. On the first page, Tesla gives you a list of keywords that it used frequently as well as a nutshell version of what the guide does before going into the details. That keyword section is why I think this is more of a teaching guide for engineers who are interested in Tesla, Dojo, and AI.
I think Elon is right. This is more important than it may seem. The average person will look at this document and get confused by all of the mathematical jargon. However, an engineer, or at least someone who is learning about Tesla and its role in AI, will understand that Tesla had just upgraded an older format to create a new technology.