RT-Level Modeling of Google TPU Systolic Arrays using SystemC

In today's world, with the significant advancements in AI and Machine Learning applications, there is an increasing demand for faster and more complex computations. This demand has spurred the development of new hardware accelerators to cater to the needs of ML scientists.

Matrix multiplication is undeniably one of the most frequently employed calculations in Machine Learning, particularly in Neural Networks. Systolic Arrays offer a solution for accelerating matrix-matrix multiplication, and they are utilized in Google TPU Accelerators. In the following section, the RTL implementation of a 3*3 systolic array is described.

Systolic Array nodes : Processing Elements (PE)

The Datapath of PE contains three registers, and a hardware capable of computing MAC operation. The constant matrix is received from Win bus. The result of the above calculation is received from Sin bus. WReg is responsible for storing weights, while Dreg is used to get the matrix values and perform the MAC operation. By employing these simple PEs, it becomes feasible to compute matrix-matrix multiplications rapidly.

Figure 1.1 - The datapath Structure of Processing Element.

For this PE to perform the proper operation, a control unit is needed to manage calculations and weight loading:

Figure 1.2 - The Controller Diagram of Processing Element.

Systolic Array Structure

By cascading multiple PEs, Systolic Arrays can be created. Note to the connections, as values in D move horizentally, but the values of S (result of MAC operation) move vertically in each 2 cycles.

Figure 2.1 - The Datapath structure of Systolic Array.

The Systolic Array also requires a control unit to execute the proper actions, given that the Processing Elements (PEs) are multicycle units, and the entire process necessitates multiple nodes to carry out the multiplication. The controller is also responsible for weight loading:

Figure 2.2 - The Controller structure of Systolic Array.

Example of Multiplication

Refrences

Understanding Matrix Multiplication on a Weight-Stationary Systolic Architecture : https://www.telesens.co/2018/07/30/systolic-architectures/

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
ProcessingElement		ProcessingElement
init		init
matrixShifter		matrixShifter
output		output
primitives		primitives
systolicArray		systolicArray
testBenches		testBenches
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.cpp		main.cpp
shifters_TB.h		shifters_TB.h
systolicArr_BFM.h		systolicArr_BFM.h
systolicArr_TB.h		systolicArr_TB.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RT-Level Modeling of Google TPU Systolic Arrays using SystemC

Systolic Array nodes : Processing Elements (PE)

Systolic Array Structure

Example of Multiplication

Refrences

About

Releases

Packages

Languages

License

mohasnik/Systolic-Array

Folders and files

Latest commit

History

Repository files navigation

RT-Level Modeling of Google TPU Systolic Arrays using SystemC

Systolic Array nodes : Processing Elements (PE)

Systolic Array Structure

Example of Multiplication

Refrences

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages