Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing Assertion in ObjectFifoStatefulTransformPass::unrollForLoops #1128

Open
andrej opened this issue Mar 13, 2024 · 3 comments
Open

Failing Assertion in ObjectFifoStatefulTransformPass::unrollForLoops #1128

andrej opened this issue Mar 13, 2024 · 3 comments
Assignees

Comments

@andrej
Copy link
Collaborator

andrej commented Mar 13, 2024

This one should probably be assigned to Andra. It seems some recent changes to the ObjectFifo are causing an issue for me. The following compiled fine for me a couple weeks ago.

Try to build reference_designs/ipu-xrt/matrix_multiplication_array with the following command:

M=256 K=256 N=768 make

The compiler then crashes during this step (when trying to make):

cd build && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=final.xclbin \
                        --aie-generate-ipu --ipu-insts-name=insts.txt ../build/aie.mlir

With the following failed assertion:

/usr/include/c++/11/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = int; _Alloc = std::allocator<int>; std::vector<_Tp, _Alloc>::reference = int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__n < this->size()' failed.

Here is a partial stack trace identifying some object fifo code as the culprit:

__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352503744) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) backtrace
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352503744) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352503744) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352503744, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff1f65048 in std::__replacement_assert(char const*, int, char const*, char const*) ()
   from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#6  0x00007ffff20891ff in AIEObjectFifoStatefulTransformPass::duplicateBlock(mlir::OpBuilder&, int, std::vector<mlir::Operation*, std::allocator<mlir::Operation*> >&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&, mlir::Value, long, bool) [clone .isra.0] ()
   from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#7  0x00007ffff2096021 in AIEObjectFifoStatefulTransformPass::unrollForLoops(xilinx::AIE::DeviceOp&, mlir::OpBuilder&, std::set<xilinx::AIE::TileOp, std::less<xilinx::AIE::TileOp>, std::allocator<xilinx::AIE::TileOp> >)::{lambda(mlir::scf::ForOp)#1}::operator()(mlir::scf::ForOp) const ()
   from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#8  0x00007ffff2096308 in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) [clone .constprop.5] ()
--Type <RET> for more, q to quit, c to continue without paging--c
  /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#9  0x00007ffff209671b in AIEObjectFifoStatefulTransformPass::unrollForLoops(xilinx::AIE::DeviceOp&, mlir::OpBuilder&, std::set<xilinx::AIE::TileOp, std::less<xilinx::AIE::TileOp>, std::allocator<xilinx::AIE::TileOp> >) () from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#10 0x00007ffff209c402 in AIEObjectFifoStatefulTransformPass::runOnOperation() () from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#11 0x00007fffefa8ba9e in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) () from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#12 0x00007fffefa8bf58 in mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) () from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#13 0x00007fffefa8c5f3 in mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::{lambda(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo&)#1}::operator()(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo&) const () from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#14 0x00007fffefa8afd5 in mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool) () from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#15 0x00007fffefa8b8cf in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) () from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#16 0x00007fffefa8bf58 in mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) () from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#17 0x00007fffefa8ceb5 in mlir::PassManager::run(mlir::Operation*) () from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so
#18 0x00007fffef9f8d79 in mlirPassManagerRunOnOp () from /home/andre/mlir-aie/my_install/mlir_aie/python/aie/_mlir_libs/libAIEAggregateCAPI.so

Thanks in advance for looking into this!

@AndraBisca AndraBisca self-assigned this Mar 13, 2024
@andrej
Copy link
Collaborator Author

andrej commented Mar 14, 2024

Just updated my comment above with the command to compile the breaking example after changes in #1056.

@andrej
Copy link
Collaborator Author

andrej commented Jun 11, 2024

I ran in to this again and dug a little deeper to get a minimal working example. See below. This should make it easier to debug instead of using the whole matrix multiplication design.

Summary

  • Three nested loops
  • Middle loop only has a single iteration
  • ObjectFifo accesses in both the two inner loops
  • The above conditions result in an assertion error, causing the compiler to crash

Error

/usr/include/c++/11/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = xilinx::AIE::BufferOp*; _Alloc = std::allocator<xilinx::AIE::BufferOp*>; std::vector<_Tp, _Alloc>::reference = xilinx::AIE::BufferOp*&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: 
Assertion '__n < this->size()' failed.
Aborted (core dumped)

Compilation Command

aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=bug.xclbin \
                         --aie-generate-npu --npu-insts-name=bug.txt bug.mlir

Code

The "unique" thing about this code is that we have a loop with only a single iteration. If we make it multiple iterations, the error does not happen. The error also does not happen when we only have two, not three, nested loops.

module {
  aie.device(npu1_4col) {

    %tile_0_1 = aie.tile(0, 1)
    %tile_0_2 = aie.tile(0, 2)

    aie.objectfifo @fifoA(%tile_0_2, {%tile_0_1}, 2 : i32) : !aie.objectfifo<memref<64x64xbf16>>
    aie.objectfifo @fifoB(%tile_0_1, {%tile_0_2}, 2 : i32) : !aie.objectfifo<memref<64x64xbf16>>

    %core_0_2 = aie.core(%tile_0_2) {

      %c0 = arith.constant 0 : index
      %c1 = arith.constant 1 : index
      %c4 = arith.constant 4 : index
      %c4294967295 = arith.constant 4294967295 : index

      scf.for %arg0 = %c0 to %c4294967295 step %c1 {
        scf.for %arg1 = %c0 to %c1 step %c1 {
          %0 = aie.objectfifo.acquire @fifoA(Produce, 1) : !aie.objectfifosubview<memref<64x64xbf16>>
          %1 = aie.objectfifo.subview.access %0[0] : !aie.objectfifosubview<memref<64x64xbf16>> -> memref<64x64xbf16>
          scf.for %arg2 = %c0 to %c4 step %c1 {
            %2 = aie.objectfifo.acquire @fifoB(Consume, 1) : !aie.objectfifosubview<memref<64x64xbf16>>
            %3 = aie.objectfifo.subview.access %2[0] : !aie.objectfifosubview<memref<64x64xbf16>> -> memref<64x64xbf16>
            aie.objectfifo.release @fifoB(Consume, 1)
          }
          aie.objectfifo.release @fifoA(Produce, 1)
        }
      }
      
      aie.end

    }
  }
}

Alternative error

If we remove the two aie.objectfifo.subview.access statements, the error instead becomes:

/home/github/actions-runner/_work/mlir-aie/mlir-aie/mlir/src/python/MLIRPythonExtension.Core/IRModule.h:433:
mlir::python::PyMlirContext::ErrorCapture::~ErrorCapture(): Assertion `errors.empty() && "unhandled captured errors"' failed.
Aborted (core dumped)

Workaround

In the Python code that generates the MLIR, check if loops have a single iteration. If so, do not emit the loop.

cc @AndraBisca

@andrej
Copy link
Collaborator Author

andrej commented Jun 12, 2024

After some more testing, this appears to affect not just loops with one iteration. For example, giving the middle loop nine iterations and the inner one four gives the same error, as follows:

module {
  aie.device(npu1_4col) {

    %tile_0_1 = aie.tile(0, 1)
    %tile_0_2 = aie.tile(0, 2)

    aie.objectfifo @fifoA(%tile_0_2, {%tile_0_1}, 2 : i32) : !aie.objectfifo<memref<64x64xbf16>>
    aie.objectfifo @fifoB(%tile_0_1, {%tile_0_2}, 2 : i32) : !aie.objectfifo<memref<64x64xbf16>>

    %core_0_2 = aie.core(%tile_0_2) {

      %c0 = arith.constant 0 : index
      %c1 = arith.constant 1 : index
      %c9 = arith.constant 9 : index
      %c4 = arith.constant 4 : index
      %c4294967295 = arith.constant 4294967295 : index

      scf.for %arg0 = %c0 to %c4294967295 step %c1 {
        scf.for %arg1 = %c0 to %c9 step %c1 {     //  <- 

          %0 = aie.objectfifo.acquire @fifoA(Produce, 1) : !aie.objectfifosubview<memref<64x64xbf16>>
          %1 = aie.objectfifo.subview.access %0[0] : !aie.objectfifosubview<memref<64x64xbf16>> -> memref<64x64xbf16>

          scf.for %arg2 = %c0 to %c4 step %c1 {

            %2 = aie.objectfifo.acquire @fifoB(Consume, 1) : !aie.objectfifosubview<memref<64x64xbf16>>
            %3 = aie.objectfifo.subview.access %2[0] : !aie.objectfifosubview<memref<64x64xbf16>> -> memref<64x64xbf16>
            aie.objectfifo.release @fifoB(Consume, 1)

          }
          
          aie.objectfifo.release @fifoA(Produce, 1)

        }
      }
      aie.end
    }
  }
}

I also noticed the ObjectFIFO depth has to be > 1 for the error to trigger. (I think for depth=1, the loops are not unrolled.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants