Skip to content

RTL design of Systolic Arrays for matrix multiplication.

License

Notifications You must be signed in to change notification settings

mohasnik/Systolic-Array

Repository files navigation

RT-Level Modeling of Google TPU Systolic Arrays using SystemC

In today's world, with the significant advancements in AI and Machine Learning applications, there is an increasing demand for faster and more complex computations. This demand has spurred the development of new hardware accelerators to cater to the needs of ML scientists.

Matrix multiplication is undeniably one of the most frequently employed calculations in Machine Learning, particularly in Neural Networks. Systolic Arrays offer a solution for accelerating matrix-matrix multiplication, and they are utilized in Google TPU Accelerators. In the following section, the RTL implementation of a 3*3 systolic array is described.

Systolic Array nodes : Processing Elements (PE)

The Datapath of PE contains three registers, and a hardware capable of computing MAC operation. The constant matrix is received from Win bus. The result of the above calculation is received from Sin bus. WReg is responsible for storing weights, while Dreg is used to get the matrix values and perform the MAC operation. By employing these simple PEs, it becomes feasible to compute matrix-matrix multiplications rapidly.


PE datapath
Figure 1.1 - The datapath Structure of Processing Element.



For this PE to perform the proper operation, a control unit is needed to manage calculations and weight loading:

PE control unit
Figure 1.2 - The Controller Diagram of Processing Element.

Systolic Array Structure

By cascading multiple PEs, Systolic Arrays can be created. Note to the connections, as values in D move horizentally, but the values of S (result of MAC operation) move vertically in each 2 cycles.

Systolic array datapath
Figure 2.1 - The Datapath structure of Systolic Array.




The Systolic Array also requires a control unit to execute the proper actions, given that the Processing Elements (PEs) are multicycle units, and the entire process necessitates multiple nodes to carry out the multiplication. The controller is also responsible for weight loading:

Systolic array controller
Figure 2.2 - The Controller structure of Systolic Array.

Example of Multiplication

Refrences

About

RTL design of Systolic Arrays for matrix multiplication.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published