TensorLib is a tensor library for inference of large language models. It supports cpu
and cuda
backend.
mkdir build
cd build
cmake ..
make
Make sure you have installed cuda
, openmp
and other dependencies.
Uncomment set(CMAKE_BUILD_TYPE Debug)
in CmakeLists.txt
.
C++ or python frameworks is used to test the code.
I have use pybind11 to wrap the c++ code to python. So you can use python to test the code. Pytorch and numpy is used to test the correctness of the Tensor functions.
Llama2-7b is supported now. see llama2.c about how to acquire the model weghts.
The script which can export fp16 weights is in script folder. It is modified from above project.
This project inspired by following projects:
This project is in very early stage. Limmited by my knowledge and time, I have not considered many details. So some inplementation details are not well thought out and not elegant. But I will try to improve it in the future.