[SimpleLoopUnswitch][NFC] move quadratic asserts under EXPENSIVE_CHECKS #144887

jeanPerier · 2025-06-19T12:21:47Z

Three asserts/checks in SimpleLoopUnswitchPass are quite expensive on IR containing many deeply nested loop nests as in repro_60_loops_depth_10.ll.txt. Their cost is not linear with the number of loop nest and these asserts quickly become the most significant cost of the whole compilation.

This patch moves these asserts under EXPENSIVE_CHECKS.

This problem is easily exposed with flang because of Fortran multidimensional arrays and array expressions:

For instance, the code below generates 5 loop nests of depth 10.

module m
real, allocatable, dimension(:,:,:,:,:,:,:,:,:,:) :: a1, a2, a3, a4, a5
contains
subroutine foo()
  a1(:,:,:,:,:,:,:,:,:,:) = 1.
  a2(:,:,:,:,:,:,:,:,:,:) = 1.
  a3(:,:,:,:,:,:,:,:,:,:) = 1.
  a4(:,:,:,:,:,:,:,:,:,:) = 1.
  a5(:,:,:,:,:,:,:,:,:,:) = 1.
end subroutine
end module

Here is the time in second and the percentage of the whole LLVM optimization time spent in SimpleLoopUnswitchPass before and after this patch with flang -O3, with N the number of variable/loop nests created:

Before:

N = 1: 0.0347( 27.4%)
N = 2: 0.0757( 30.9%)                                                  
N = 3: 0.1218( 33.8%)
N = 4: 0.1763( 36.3%)
N = 5: 0.2461( 38.8%)         
N = 10: 0.7457( 48.1%)        
N = 15: 1.4956( 54.1%)        
N = 20: 2.5363( 58.2%)        
N = 30: 5.5817( 63.6%)        
N = 40: 9.9295( 67.1%)
N = 60:  23.7977 ( 71.8%)
N = 100: 81.3359( 77.7%)

After:

N = 1: 0.0384( 16.4%)
N = 2: 0.0471( 21.1%)                                                  
N = 3: 0.0648( 20.7%)
N = 4: 0.0832( 20.6%)
N = 5: 0.1036( 20.4%)         
N = 10: 0.2099( 20.2%)        
N = 15: 0.3256( 19.9%)        
N = 20: 0.4644( 19.8%)        
N = 30: 0.8262( 20.1%)        
N = 40: 1.2675( 20.2%)
N = 60: 2.3915 ( 20.2%)  
N = 100: 5.8809( 20.3%)

As you can see the assert cost is not linear with the number of loops and 75s out of 81s are spent in these asserts for N=100.

repro_60_loops_depth_10.ll.txt is the .ll generated by flang for the N=60 case to illustrate and repoduce without flang.

…VE_CHECKS

llvmbot · 2025-06-19T12:22:15Z

@llvm/pr-subscribers-llvm-transforms

Author: None (jeanPerier)

Changes

Three asserts/checks in SimpleLoopUnswitchPass are quite expensive on IR containing many deeply nested loop nests as in repro_60_loops_depth_10.ll.txt. Their cost is not linear with the number of loop nest and these asserts quickly become the most significant cost of the whole compilation.

This patch moves these asserts under EXPENSIVE_CHECKS.

This problem is easily exposed with flang because of Fortran multidimensional arrays and array expressions:

For instance, the code below generates 5 loop nests of depth 10.

module m
real, allocatable, dimension(:,:,:,:,:,:,:,:,:,:) :: a1, a2, a3, a4, a5
contains
subroutine foo()
  a1(:,:,:,:,:,:,:,:,:,:) = 1.
  a2(:,:,:,:,:,:,:,:,:,:) = 1.
  a3(:,:,:,:,:,:,:,:,:,:) = 1.
  a4(:,:,:,:,:,:,:,:,:,:) = 1.
  a5(:,:,:,:,:,:,:,:,:,:) = 1.
end subroutine
end module

Here is the time in second and the percentage of the whole LLVM optimization time spent in SimpleLoopUnswitchPass before and after this patch with flang -O3, with N the number of variable/loop nests created:

Before:

N = 1: 0.0347( 27.4%)
N = 2: 0.0757( 30.9%)                                                  
N = 3: 0.1218( 33.8%)
N = 4: 0.1763( 36.3%)
N = 5: 0.2461( 38.8%)         
N = 10: 0.7457( 48.1%)        
N = 15: 1.4956( 54.1%)        
N = 20: 2.5363( 58.2%)        
N = 30: 5.5817( 63.6%)        
N = 40: 9.9295( 67.1%)
N = 60:  23.7977 ( 71.8%)
N = 100: 81.3359( 77.7%)

After:

N = 1: 0.0384( 16.4%)
N = 2: 0.0471( 21.1%)                                                  
N = 3: 0.0648( 20.7%)
N = 4: 0.0832( 20.6%)
N = 5: 0.1036( 20.4%)         
N = 10: 0.2099( 20.2%)        
N = 15: 0.3256( 19.9%)        
N = 20: 0.4644( 19.8%)        
N = 30: 0.8262( 20.1%)        
N = 40: 1.2675( 20.2%)
N = 60: 2.3915 ( 20.2%)  
N = 100: 5.8809( 20.3%)

As you can see the assert cost is not linear with the number of loops and 75s out of 81s are spent in these asserts for N=100.

repro_60_loops_depth_10.ll.txt is the .ll generated by flang for the N=60 case to illustrate and repoduce without flang.

Full diff: https://github.com/llvm/llvm-project/pull/144887.diff

1 Files Affected:

(modified) llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp (+5-1)

diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
index 9b40fc03da6bb..675f6e1975535 100644
--- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
+++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
@@ -2522,12 +2522,14 @@ static void unswitchNontrivialInvariants(
   if (MSSAU && VerifyMemorySSA)
     MSSAU->getMemorySSA()->verifyMemorySSA();
 
+#ifdef EXPENSIVE_CHECKS
   // This transformation has a high risk of corrupting the dominator tree, and
   // the below steps to rebuild loop structures will result in hard to debug
   // errors in that case so verify that the dominator tree is sane first.
   // FIXME: Remove this when the bugs stop showing up and rely on existing
   // verification steps.
   assert(DT.verify(DominatorTree::VerificationLevel::Fast));
+#endif
 
   if (BI && !PartiallyInvariant) {
     // If we unswitched a branch which collapses the condition to a known
@@ -2633,7 +2635,7 @@ static void unswitchNontrivialInvariants(
          OuterL = OuterL->getParentLoop())
       UpdateLoop(*OuterL);
 
-#ifndef NDEBUG
+#ifdef EXPENSIVE_CHECKS
   // Verify the entire loop structure to catch any incorrect updates before we
   // progress in the pass pipeline.
   LI.verify(DT);
@@ -3709,9 +3711,11 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,
   if (AR.MSSA && VerifyMemorySSA)
     AR.MSSA->verifyMemorySSA();
 
+#ifdef EXPENSIVE_CHECKS
   // Historically this pass has had issues with the dominator tree so verify it
   // in asserts builds.
   assert(AR.DT.verify(DominatorTree::VerificationLevel::Fast));
+#endif
 
   auto PA = getLoopPassPreservedAnalyses();
   if (AR.MSSA)

antoniofrighetto · 2025-06-19T13:10:01Z

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp

@@ -2522,12 +2522,14 @@ static void unswitchNontrivialInvariants(
  if (MSSAU && VerifyMemorySSA)
    MSSAU->getMemorySSA()->verifyMemorySSA();

+#ifdef EXPENSIVE_CHECKS
  // This transformation has a high risk of corrupting the dominator tree, and
  // the below steps to rebuild loop structures will result in hard to debug
  // errors in that case so verify that the dominator tree is sane first.
  // FIXME: Remove this when the bugs stop showing up and rely on existing
  // verification steps.
  assert(DT.verify(DominatorTree::VerificationLevel::Fast));


Not sure about ifdef'ing this one, generally considering some of the past bugs (and the comment above). Maybe the final one instead could switch to VerificationLevel::Full, w/ expensive checks enabled.

Just to clarify, are you suggesting completely deleting this one because it has not fired recently and to turn the one on line 3714/3717 to VerificationLevel::Full under expensive checks, or do you want to keep this one outside of EXPENSIVE_CHECKS and use VerificationLevel::Full for the last one under EXPENSIVE_CHECKS?

The comment "Remove this when the bugs stop showing up and rely on existing verification steps" from 7 years ago seems to imply the former assuming the assert has not fired recently (not familiar with the code here, do you know? Searching GitHub issues for this asserts, I did not find an issues for this pass).

I was suggesting leaving this check outside of EXPENSIVE_CHECKS. While no related issues seem to have come up recently (some of the recent bugs were unrelated to breaking the dominator tree), the pass still does rewrite loops, and in general the CFG, quite aggressively, so I'd rather keep the assert than dropping it. That said, maybe enabling it only under EXPENSIVE_CHECKS is okay...

and to turn the one on line 3714/3717 to VerificationLevel::Full under expensive checks

Yeah I think now switching this one to VerificationLevel::Full under expensive checks could be reasonable.

vzakhari · 2025-06-19T17:47:07Z

Looks good to me, though, I am not familiar with the code and cannot comment on @antoniofrighetto's concern.

[SimpleLoopUnswitch][NFC] move pseudo-quadratic asserts under EXPENSI…

a045af1

…VE_CHECKS

jeanPerier requested review from nikic, mcinally and vzakhari June 19, 2025 12:21

llvmbot added the llvm:transforms label Jun 19, 2025

antoniofrighetto reviewed Jun 19, 2025

View reviewed changes

vzakhari approved these changes Jun 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SimpleLoopUnswitch][NFC] move quadratic asserts under EXPENSIVE_CHECKS #144887

[SimpleLoopUnswitch][NFC] move quadratic asserts under EXPENSIVE_CHECKS #144887

Uh oh!

jeanPerier commented Jun 19, 2025

Uh oh!

llvmbot commented Jun 19, 2025

Uh oh!

antoniofrighetto Jun 19, 2025

Uh oh!

jeanPerier Jun 19, 2025

Uh oh!

antoniofrighetto Jun 20, 2025

Uh oh!

vzakhari commented Jun 19, 2025

Uh oh!

Uh oh!

[SimpleLoopUnswitch][NFC] move quadratic asserts under EXPENSIVE_CHECKS #144887

Are you sure you want to change the base?

[SimpleLoopUnswitch][NFC] move quadratic asserts under EXPENSIVE_CHECKS #144887

Uh oh!

Conversation

jeanPerier commented Jun 19, 2025

Uh oh!

llvmbot commented Jun 19, 2025

Uh oh!

antoniofrighetto Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

jeanPerier Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

antoniofrighetto Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

vzakhari commented Jun 19, 2025

Uh oh!

Uh oh!