Add code for unrolling affine.for under threshold #441

snarang181 · 2025-03-08T03:55:54Z

Add corresponding test

src/enzyme_ad/jax/Passes/AffineToStableHLORaising.cpp

snarang181 · 2025-03-08T04:01:05Z

When I run gdb --args bazel-bin/enzymexlamlir-opt --raise-affine-to-stablehlo Enzyme-JAX/test/lit_tests/raising/raiseaffinefor_unroll.mlir, I see

/home/snarang181/Enzyme-JAX/test/lit_tests/raising/raiseaffinefor_unroll.mlir:10:13: error: 'stablehlo.dynamic_update_slice' op operation destroyed but still has uses
            affine.store %4, %arg0[0, %arg2 + 136, %arg1 + 7] : memref<1x187x194xf64, 1>
            ^
/home/snarang181/Enzyme-JAX/test/lit_tests/raising/raiseaffinefor_unroll.mlir:10:13: note: see current operation: %0 = "stablehlo.dynamic_update_slice"(<<NULL VALUE>>, <<NULL VALUE>>, <<NULL VALUE>>, <<NULL VALUE>>, <<NULL VALUE>>) : (<<NULL TYPE>>, <<NULL TYPE>>, <<NULL TYPE>>, <<NULL TYPE>>, <<NULL TYPE>>) -> tensor<1x187x194xf64>
/home/snarang181/Enzyme-JAX/test/lit_tests/raising/raiseaffinefor_unroll.mlir:19:9: note: - use: %39 = "stablehlo.dynamic_update_slice"(<<UNKNOWN SSA VALUE>>, %38, %34, %35, %36) : (tensor<1x187x194xf64>, tensor<1x1x180xf64>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<1x187x194xf64>

        affine.store %3, %arg0[0, 135, %arg1 + 7] : memref<1x187x194xf64, 1>
        ^
/home/snarang181/Enzyme-JAX/test/lit_tests/raising/raiseaffinefor_unroll.mlir:13:14: note: - use: %17 = "stablehlo.slice"(<<UNKNOWN SSA VALUE>>) <{limit_indices = array<i64: 1, 136, 187>, start_indices = array<i64: 0, 135, 7>, strides = array<i64: 1, 1, 1>}> : (tensor<1x187x194xf64>) -> tensor<1x1x180xf64>

        %2 = affine.load %arg0[0, 135, %arg1 + 7] : memref<1x187x194xf64, 1>
             ^
/home/snarang181/Enzyme-JAX/test/lit_tests/raising/raiseaffinefor_unroll.mlir:12:14: note: - use: %14 = "stablehlo.slice"(<<UNKNOWN SSA VALUE>>) <{limit_indices = array<i64: 1, 136, 187>, start_indices = array<i64: 0, 135, 7>, strides = array<i64: 1, 1, 1>}> : (tensor<1x187x194xf64>) -> tensor<1x1x180xf64>

        %1 = affine.load %arg0[0, 135, -%arg1 + 186] : memref<1x187x194xf64, 1>
             ^
LLVM ERROR: operation destroyed but still has uses

This is presumably something to do with the use of innerOp in the for-loop. Do I need to use something like replaceAllUsesWith?

wsmoses · 2025-03-08T04:04:17Z

When I run gdb --args bazel-bin/enzymexlamlir-opt --raise-affine-to-stablehlo Enzyme-JAX/test/lit_tests/raising/raiseaffinefor_unroll.mlir, I see

/home/snarang181/Enzyme-JAX/test/lit_tests/raising/raiseaffinefor_unroll.mlir:10:13: error: 'stablehlo.dynamic_update_slice' op operation destroyed but still has uses
            affine.store %4, %arg0[0, %arg2 + 136, %arg1 + 7] : memref<1x187x194xf64, 1>
            ^
/home/snarang181/Enzyme-JAX/test/lit_tests/raising/raiseaffinefor_unroll.mlir:10:13: note: see current operation: %0 = "stablehlo.dynamic_update_slice"(<<NULL VALUE>>, <<NULL VALUE>>, <<NULL VALUE>>, <<NULL VALUE>>, <<NULL VALUE>>) : (<<NULL TYPE>>, <<NULL TYPE>>, <<NULL TYPE>>, <<NULL TYPE>>, <<NULL TYPE>>) -> tensor<1x187x194xf64>
/home/snarang181/Enzyme-JAX/test/lit_tests/raising/raiseaffinefor_unroll.mlir:19:9: note: - use: %39 = "stablehlo.dynamic_update_slice"(<<UNKNOWN SSA VALUE>>, %38, %34, %35, %36) : (tensor<1x187x194xf64>, tensor<1x1x180xf64>, tensor<i64>, tensor<i64>, tensor<i64>) -> tensor<1x187x194xf64>

        affine.store %3, %arg0[0, 135, %arg1 + 7] : memref<1x187x194xf64, 1>
        ^
/home/snarang181/Enzyme-JAX/test/lit_tests/raising/raiseaffinefor_unroll.mlir:13:14: note: - use: %17 = "stablehlo.slice"(<<UNKNOWN SSA VALUE>>) <{limit_indices = array<i64: 1, 136, 187>, start_indices = array<i64: 0, 135, 7>, strides = array<i64: 1, 1, 1>}> : (tensor<1x187x194xf64>) -> tensor<1x1x180xf64>

        %2 = affine.load %arg0[0, 135, %arg1 + 7] : memref<1x187x194xf64, 1>
             ^
/home/snarang181/Enzyme-JAX/test/lit_tests/raising/raiseaffinefor_unroll.mlir:12:14: note: - use: %14 = "stablehlo.slice"(<<UNKNOWN SSA VALUE>>) <{limit_indices = array<i64: 1, 136, 187>, start_indices = array<i64: 0, 135, 7>, strides = array<i64: 1, 1, 1>}> : (tensor<1x187x194xf64>) -> tensor<1x1x180xf64>

        %1 = affine.load %arg0[0, 135, -%arg1 + 186] : memref<1x187x194xf64, 1>
             ^
LLVM ERROR: operation destroyed but still has uses

This is presumably something to do with the use of innerOp in the for-loop. Do I need to use something like replaceAllUsesWith?

@Pangoraw since you said you had something like this earlier, can you take a quick look?

also cc @chelini who might have cycles to help co debug

affine-to-stable-hlo-raising pass

Pangoraw · 2025-03-08T06:32:05Z

I ran the upstream affine LICM and affine unrolling passes before invoking raise-affine-to-stablehlo

ivanradanov · 2025-03-08T07:25:41Z

@snarang181
I think the problem was that you were generating the stablehlo ops at the for op (which is in the parallel which we delete later), see the last commit. Does that look right?

snarang181 · 2025-03-08T14:12:39Z

@snarang181 I think the problem was that you were generating the stablehlo ops at the for op (which is in the parallel which we delete later), see the last commit. Does that look right?

Thanks for looking into this @ivanradanov; and yes, that makes sense.

snarang181 · 2025-03-08T17:14:47Z

This seems to work fine on the test case but is giving a core dumped error on the full unoptimized mlir (running fullpipe.sh). Here is the error snippet

enzymexlamlir-opt: src/enzyme_ad/jax/Passes/AffineToStableHLORaising.cpp:282: mlir::affine::AffineValueMap alignMemoryAccess(mlir::Value&, mlir::affine::AffineValueMap, mlir::Value*, llvm::ArrayRef<mlir::affine::AffineValueMap>, mlir::OpBuilder&): Assertion `shapeA.size() == cast<RankedTensorType>(a.getType()).getShape().size()' failed

I am happy to debug further, some co-debugging help to get me on the right track would be appreciated though.

wsmoses · 2025-03-08T18:07:11Z

Probably I’d start by adding a print before each function being raised to make a minimal test case

snarang181 · 2025-03-08T18:08:44Z

Probably I’d start by adding a print before each function being raised to make a minimal test case

Do I need to build with some args to get the LLVM_DEBUG to show on stdout?

wsmoses · 2025-03-08T18:24:45Z

You would add -debug for that.

however here I’m suggesting adding llvm::errs() << fn << “\n”; at an appropriate point inside the pass, recompiling, and running (which requires no extra flag)

ivanradanov · 2025-03-09T15:35:39Z

I get this:

enzymexlamlir-opt: src/enzyme_ad/jax/Passes/AffineToStableHLORaising.cpp:295: mlir::affine::AffineValueMap alignMemoryAccess(mlir::Value&, mlir::affine::AffineValueMap, mlir::Value*, llvm::ArrayRef<mlir::affine::AffineValueMap>, mlir::OpBuilder&): Assertion `shapeBs[i].size() == cast<RankedTensorType>(bs[i].getType()).getShape().size()' failed.

(not 100% sure) but it seems to be because here we emit a null AffineMap for a constant. I am not entirely sure what the correct one would be though

frame #7: 0x00000000073b4e46 enzymexlamlir-opt`tryRaisingOpToStableHLO(op=0x00000000123d31b0, mapping=0x00007fffffffb210, builder=0x00007fffffffb1d0, maps=0x00007fffffffb1b0) at AffineToStableHLORaising.cpp:784:65                                                             
   781            b = mapping.lookup(op->getOperand(1));                                                                                                                                                                                                                          
   782                                                                                                                                                                                                                                                                            
   783      auto mapA = maps[a], mapB = maps[b];                                                                                                                                                                                                                                  
-> 784      auto outputMap = alignMemoryAccess(a, mapA, b, mapB, builder);                                                                                                                                                                                                        
   785      assert(a.getType() == b.getType());                                                                                                                                                                                                                                   
   786                                                                                                                                                                                                                                                                            
   787      auto IT = a.getType().cast<RankedTensorType>();                                                                                                                                                                                                                       
(lldb) p a.dump()                                                                                                                                                                                                                                                                 
%770 = "stablehlo.constant"() <{value = dense<0> : tensor<1xi64>}> : () -> tensor<1xi64>

I am off for today but will continue investigating further tomorrow.

snarang181 · 2025-03-09T15:37:52Z

I get this:


enzymexlamlir-opt: src/enzyme_ad/jax/Passes/AffineToStableHLORaising.cpp:295: mlir::affine::AffineValueMap alignMemoryAccess(mlir::Value&, mlir::affine::AffineValueMap, mlir::Value*, llvm::ArrayRef<mlir::affine::AffineValueMap>, mlir::OpBuilder&): Assertion `shapeBs[i].size() == cast<RankedTensorType>(bs[i].getType()).getShape().size()' failed.

(not 100% sure) but it seems to be because here we emit a null AffineMap for a constant. I am not entirely sure what the correct one would be though


frame #7: 0x00000000073b4e46 enzymexlamlir-opt`tryRaisingOpToStableHLO(op=0x00000000123d31b0, mapping=0x00007fffffffb210, builder=0x00007fffffffb1d0, maps=0x00007fffffffb1b0) at AffineToStableHLORaising.cpp:784:65                                                             

   781            b = mapping.lookup(op->getOperand(1));                                                                                                                                                                                                                          

   782                                                                                                                                                                                                                                                                            

   783      auto mapA = maps[a], mapB = maps[b];                                                                                                                                                                                                                                  

-> 784      auto outputMap = alignMemoryAccess(a, mapA, b, mapB, builder);                                                                                                                                                                                                        

   785      assert(a.getType() == b.getType());                                                                                                                                                                                                                                   

   786                                                                                                                                                                                                                                                                            

   787      auto IT = a.getType().cast<RankedTensorType>();                                                                                                                                                                                                                       

(lldb) p a.dump()                                                                                                                                                                                                                                                                 

%770 = "stablehlo.constant"() <{value = dense<0> : tensor<1xi64>}> : () -> tensor<1xi64>

I am off for today but will continue investigating further tomorrow.

@ivanradanov, that's right. Feel free to ping me on slack when you get online your tomorrow and I'd like to co-debug if possible to get a feel for things. Thanks again for looking into it.

wsmoses · 2025-03-09T17:38:31Z

cc @Pangoraw

wsmoses · 2025-03-09T17:55:50Z

can you add the failing test here as a test file for ease?

Pangoraw · 2025-03-09T18:19:40Z

The current approach is missing handling the converted IV as a constant when raising the memref load/store. In the current situation I guess the load/store would result in slices over all possible values.

Solving this problem would probably be a first step into making the loops stablehlo.while since the iv would no longer be a constant but a block argument.

In the meantime, we can run the upstream affine unrolling pass?

Pangoraw · 2025-03-09T21:14:11Z

(not 100% sure) but it seems to be because here we emit a null AffineMap for a constant. I am not entirely sure what the correct one would be though

So these maps are used to identify for each dim in the raised tensors by which induction variable it is dependent. A scalar constant is raised to a 0-dim tensor and therefore as a map with zero dimensions.

For example the following value:

%0 = affine.load %a[%i, %j]

would have a map like (%i, %j) -> (%i, %j). The maps are then used to align the different axes via broadcast/transpose when two values need to interact together to produce a resulting value.

Add code for unrolling affine.for under threshold

8123730

Add corresponding test

wsmoses reviewed Mar 8, 2025

View reviewed changes

src/enzyme_ad/jax/Passes/AffineToStableHLORaising.cpp Outdated Show resolved Hide resolved

snarang181 mentioned this pull request Mar 8, 2025

Add support to raise an affine.for to stablehlo.while #442

Open

Remove threshold for unrolling for-loops in

2aaac14

affine-to-stable-hlo-raising pass

We need to codegen at top level

31eb786

snarang181 marked this pull request as ready for review March 8, 2025 14:13

snarang181 requested review from wsmoses, ivanradanov and Pangoraw March 8, 2025 14:13

snarang181 self-assigned this Mar 8, 2025

snarang181 marked this pull request as draft March 8, 2025 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add code for unrolling affine.for under threshold #441

Add code for unrolling affine.for under threshold #441

snarang181 commented Mar 8, 2025

snarang181 commented Mar 8, 2025

wsmoses commented Mar 8, 2025

Pangoraw commented Mar 8, 2025 •

edited

Loading

ivanradanov commented Mar 8, 2025

snarang181 commented Mar 8, 2025

snarang181 commented Mar 8, 2025

wsmoses commented Mar 8, 2025

snarang181 commented Mar 8, 2025

wsmoses commented Mar 8, 2025

ivanradanov commented Mar 9, 2025 •

edited

Loading

snarang181 commented Mar 9, 2025

wsmoses commented Mar 9, 2025

wsmoses commented Mar 9, 2025

Pangoraw commented Mar 9, 2025

Pangoraw commented Mar 9, 2025

Add code for unrolling affine.for under threshold #441

Are you sure you want to change the base?

Add code for unrolling affine.for under threshold #441

Conversation

snarang181 commented Mar 8, 2025

snarang181 commented Mar 8, 2025

wsmoses commented Mar 8, 2025

Pangoraw commented Mar 8, 2025 • edited Loading

ivanradanov commented Mar 8, 2025

snarang181 commented Mar 8, 2025

snarang181 commented Mar 8, 2025

wsmoses commented Mar 8, 2025

snarang181 commented Mar 8, 2025

wsmoses commented Mar 8, 2025

ivanradanov commented Mar 9, 2025 • edited Loading

snarang181 commented Mar 9, 2025

wsmoses commented Mar 9, 2025

wsmoses commented Mar 9, 2025

Pangoraw commented Mar 9, 2025

Pangoraw commented Mar 9, 2025

Pangoraw commented Mar 8, 2025 •

edited

Loading

ivanradanov commented Mar 9, 2025 •

edited

Loading