Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved landing README.md. #29

Merged
merged 1 commit into from
Mar 13, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 82 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,99 @@
# Tensor Tapestry Compiler Suite
# Tapestry Tensor Expression Compiler Suite

<center><b>"It's Just One Ocean"<br/>-Crutcher</b></center>

## Overview

<img style="float: right; width: 20%; margin: 10px" alt="linear.relu.4x" src="docs/media/linear.relu.4x.ortho.jpg"/>

**Tapestry** is an experimental tensor expression compiler framework.

Modern GPU-filled datacenters contain thousands of nodes with 8+ GPUs each, and are capable of
performing trillions of floating point operations per second. The goal of **Tapestry** is to unlock
the full potential of modern GPU-filled datacenters, by providing a foundational programming
environment for scalable, optimized, massively multi-GPU tensor programs.

Tensor programs underlie all deep-network based AI, and all finite element numerical simulations.
These include numerical weather and fluid simulations, protein folding and drug discovery
simulations, quantum chemistry simulations, financial simulations, and material design and
manufacturing simulations. Modern tensor programming environments are designed to maximize
productivity of developers working on single-GPU workstations, and struggle to express programs
which can be scheduled across even a few GPUs. Some of these frameworks do have solutions to scaling
up limited workloads, but no general-purpose solutions exist for scaling up arbitrary tensor
programs.

Multiple existing companies operate with >$1B/year annual hardware budgets for these simulations,
somewhere in the dozens of $B/year are being spent worldwide on these calculations today.

Though it is difficult to predict in advance the speedups of a new optimizing compiler, it is the
case that, due to the semantics of their programming models, the vast majority of existing tensor
applications are run with no meaningful structural optimizations; the programs are run directly as
human engineers have written them, with no further optimizations. This is akin to directly executing
a SQL query without any query planner or optimizer. The potential wins in efficiency for existing
applications from an optimizing compiler are therefore large; conservatively in the 30% range; but
for some applications, the potential is dramatically larger.

Irrespective of efficiency wins, the potential for new applications is tremendous; existing
applications are limited by the interconnect scheduling and manual design of the programs, and
removing these limitations will enable new applications which are not possible today.

At the current time, **Tapestry** is sitting upon years of development towards a solid theoretical
foundation, of shardable, composable, and re-writable polyhedral model tensor block algebra
operation expressions on an extensible compiler framework. The work is focused on exploiting this
mathematical foundation towards a practical compiler suite. Expectations are that the project needs
1-3 aggregate engineer-years of work to reach a state where it can be used to compile real-world
applications.

This is a big-pull project; the payoffs are huge, but the work required to climb from theory back to
practical parity with existing frameworks is substantial. There are many opportunities for
development applications along the way, empowered by that solid theoretical foundation. We are
seeking contributors, reviewers, and enthusiasts to help bring this project to life sooner. Funding
support, or safe-harbor in a larger organization, would also be very helpful.

See the full [Tapestry Documentation](docs/README.md) for detailed information.

Join the Discord Server:
## Getting Started

[![Banner](https://invidget.switchblade.xyz/PNpSrFMeUb?theme=light)](https://discord.gg/PNpSrFMeUb)
### Read the Documentation

**Tapestry** is an experimental tensor expression optimizing compiler suite.
The full [Tapestry Documentation](docs/README.md) provides much more detailed information about the
project's motivation, goals, design, and implementation.

It exists to make it easy to optimize applications (such as AI) to maximally exploit both
datacenters full of GPUs, and integrated FPGA stacks.
### Join the Discord

The goal of **Tapestry** is to provide an ecosystem for a high-performance stochastic pareto-front
optimizer for distributed tensor expressions, targeting optimizations which are permitted to search
for extended time on a large number of machines.
[![Banner](https://invidget.switchblade.xyz/PNpSrFMeUb?theme=light)](https://discord.gg/PNpSrFMeUb)

Here are examples showing a **Linear**/**ReLU** pipeline, with and without sub-block sharding;
demonstrating the potential for sub-shard operation fusion:
If you have any interest in the project, please join the Discord server. We are actively looking for
reviewers, contributors, fans, theorists, and developers and would love to have you involved.

<table cellborder="0">
<tr>
<td>
<div style="width: 100%; margin: auto">
<img alt="linear.relu" src="docs/media/linear.relu.ortho.jpg"/>
</div>
</td>
<td>
<div style="width: 100%; margin: auto">
<img alt="linear.relu.4x" src="docs/media/linear.relu.4x.ortho.jpg"/>
</div>
</td>
</tr>
</table>
A huge portion of bringing this project to life is building a community of enthusiasts and experts
who can help guide the project, not only through theory and code; but also through iterative
development of the documentation, making the project accessible to wider audiences.

## Contributing
We are particularly interested in:

I'm actively looking for contributors to help with building or reviewing the project.
- document reviewers
- project managers
- programmers
- compiler theorists

If you'd like to get involved, please post any questions in the project
[Discussions](https://github.com/crutcher/loom/discussions) board, or open an issue.
### File an Issue / Bug

We could create a Discord server; if we got enough traction.
We are actively looking for feedback on the project. If you have any issues, please file a bug on
the [Issues](https://github.com/crutcher/tapestry/issues) page.

I'm particularly interested in contributors with experience in the following areas:
### Join the Discussions

- maven lifecycle / package publishing
- technical documentation / editing
- compiler design
- tensor algebra
- optimization theory
- graph transformations
- graph representation
- distributed computing
- graph visualization
If you have longer-form concerns to discuss, please post them in the project
[Discussions](https://github.com/crutcher/loom/discussions) board.

## Getting Started
## Setup / Contributing Code

In the current stage of development, **loom** produces no tool targets; and exists solely as a
collection of libraries and tests.
If you are interested in running the existing test suites, or contributing code, you'll need to
clone the repository and setup the development environment.

It **should** setup cleanly in any modern development environment; but full external dependencies
are not yet documented.
The project is a JDK 21 multi-module Maven/Java project, and should be setup in any modern
development IDE (JetBrains, VSC, etc).

Documenting missing dependencies is a high priority and setup instructions is another high priority
which contributors could help with.
That said, the project has been developed by one person thus far, and may have some missing
dependencies or undocumented requirements. If you run into any issues, please join the Discord or
file a bug (or both!) with as much information as possible, and I'll prioritize fixing the cause or
documenting the missing dependency.
Loading