Skip to content

[SimpleLoopUnswitch][NFC] move quadratic asserts under EXPENSIVE_CHECKS #144887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jeanPerier
Copy link
Contributor

Three asserts/checks in SimpleLoopUnswitchPass are quite expensive on IR containing many deeply nested loop nests as in repro_60_loops_depth_10.ll.txt. Their cost is not linear with the number of loop nest and these asserts quickly become the most significant cost of the whole compilation.

This patch moves these asserts under EXPENSIVE_CHECKS.

This problem is easily exposed with flang because of Fortran multidimensional arrays and array expressions:

For instance, the code below generates 5 loop nests of depth 10.

module m
real, allocatable, dimension(:,:,:,:,:,:,:,:,:,:) :: a1, a2, a3, a4, a5
contains
subroutine foo()
  a1(:,:,:,:,:,:,:,:,:,:) = 1.
  a2(:,:,:,:,:,:,:,:,:,:) = 1.
  a3(:,:,:,:,:,:,:,:,:,:) = 1.
  a4(:,:,:,:,:,:,:,:,:,:) = 1.
  a5(:,:,:,:,:,:,:,:,:,:) = 1.
end subroutine
end module

Here is the time in second and the percentage of the whole LLVM optimization time spent in SimpleLoopUnswitchPass before and after this patch with flang -O3, with N the number of variable/loop nests created:

Before:

N = 1: 0.0347( 27.4%)
N = 2: 0.0757( 30.9%)                                                  
N = 3: 0.1218( 33.8%)
N = 4: 0.1763( 36.3%)
N = 5: 0.2461( 38.8%)         
N = 10: 0.7457( 48.1%)        
N = 15: 1.4956( 54.1%)        
N = 20: 2.5363( 58.2%)        
N = 30: 5.5817( 63.6%)        
N = 40: 9.9295( 67.1%)
N = 60:  23.7977 ( 71.8%)
N = 100: 81.3359( 77.7%)  

After:

N = 1: 0.0384( 16.4%)
N = 2: 0.0471( 21.1%)                                                  
N = 3: 0.0648( 20.7%)
N = 4: 0.0832( 20.6%)
N = 5: 0.1036( 20.4%)         
N = 10: 0.2099( 20.2%)        
N = 15: 0.3256( 19.9%)        
N = 20: 0.4644( 19.8%)        
N = 30: 0.8262( 20.1%)        
N = 40: 1.2675( 20.2%)
N = 60: 2.3915 ( 20.2%)  
N = 100: 5.8809( 20.3%)  

As you can see the assert cost is not linear with the number of loops and 75s out of 81s are spent in these asserts for N=100.

repro_60_loops_depth_10.ll.txt is the .ll generated by flang for the N=60 case to illustrate and repoduce without flang.

@llvmbot
Copy link
Member

llvmbot commented Jun 19, 2025

@llvm/pr-subscribers-llvm-transforms

Author: None (jeanPerier)

Changes

Three asserts/checks in SimpleLoopUnswitchPass are quite expensive on IR containing many deeply nested loop nests as in repro_60_loops_depth_10.ll.txt. Their cost is not linear with the number of loop nest and these asserts quickly become the most significant cost of the whole compilation.

This patch moves these asserts under EXPENSIVE_CHECKS.

This problem is easily exposed with flang because of Fortran multidimensional arrays and array expressions:

For instance, the code below generates 5 loop nests of depth 10.

module m
real, allocatable, dimension(:,:,:,:,:,:,:,:,:,:) :: a1, a2, a3, a4, a5
contains
subroutine foo()
  a1(:,:,:,:,:,:,:,:,:,:) = 1.
  a2(:,:,:,:,:,:,:,:,:,:) = 1.
  a3(:,:,:,:,:,:,:,:,:,:) = 1.
  a4(:,:,:,:,:,:,:,:,:,:) = 1.
  a5(:,:,:,:,:,:,:,:,:,:) = 1.
end subroutine
end module

Here is the time in second and the percentage of the whole LLVM optimization time spent in SimpleLoopUnswitchPass before and after this patch with flang -O3, with N the number of variable/loop nests created:

Before:

N = 1: 0.0347( 27.4%)
N = 2: 0.0757( 30.9%)                                                  
N = 3: 0.1218( 33.8%)
N = 4: 0.1763( 36.3%)
N = 5: 0.2461( 38.8%)         
N = 10: 0.7457( 48.1%)        
N = 15: 1.4956( 54.1%)        
N = 20: 2.5363( 58.2%)        
N = 30: 5.5817( 63.6%)        
N = 40: 9.9295( 67.1%)
N = 60:  23.7977 ( 71.8%)
N = 100: 81.3359( 77.7%)  

After:

N = 1: 0.0384( 16.4%)
N = 2: 0.0471( 21.1%)                                                  
N = 3: 0.0648( 20.7%)
N = 4: 0.0832( 20.6%)
N = 5: 0.1036( 20.4%)         
N = 10: 0.2099( 20.2%)        
N = 15: 0.3256( 19.9%)        
N = 20: 0.4644( 19.8%)        
N = 30: 0.8262( 20.1%)        
N = 40: 1.2675( 20.2%)
N = 60: 2.3915 ( 20.2%)  
N = 100: 5.8809( 20.3%)  

As you can see the assert cost is not linear with the number of loops and 75s out of 81s are spent in these asserts for N=100.

repro_60_loops_depth_10.ll.txt is the .ll generated by flang for the N=60 case to illustrate and repoduce without flang.


Full diff: https://github.com/llvm/llvm-project/pull/144887.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp (+5-1)
diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
index 9b40fc03da6bb..675f6e1975535 100644
--- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
+++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
@@ -2522,12 +2522,14 @@ static void unswitchNontrivialInvariants(
   if (MSSAU && VerifyMemorySSA)
     MSSAU->getMemorySSA()->verifyMemorySSA();
 
+#ifdef EXPENSIVE_CHECKS
   // This transformation has a high risk of corrupting the dominator tree, and
   // the below steps to rebuild loop structures will result in hard to debug
   // errors in that case so verify that the dominator tree is sane first.
   // FIXME: Remove this when the bugs stop showing up and rely on existing
   // verification steps.
   assert(DT.verify(DominatorTree::VerificationLevel::Fast));
+#endif
 
   if (BI && !PartiallyInvariant) {
     // If we unswitched a branch which collapses the condition to a known
@@ -2633,7 +2635,7 @@ static void unswitchNontrivialInvariants(
          OuterL = OuterL->getParentLoop())
       UpdateLoop(*OuterL);
 
-#ifndef NDEBUG
+#ifdef EXPENSIVE_CHECKS
   // Verify the entire loop structure to catch any incorrect updates before we
   // progress in the pass pipeline.
   LI.verify(DT);
@@ -3709,9 +3711,11 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,
   if (AR.MSSA && VerifyMemorySSA)
     AR.MSSA->verifyMemorySSA();
 
+#ifdef EXPENSIVE_CHECKS
   // Historically this pass has had issues with the dominator tree so verify it
   // in asserts builds.
   assert(AR.DT.verify(DominatorTree::VerificationLevel::Fast));
+#endif
 
   auto PA = getLoopPassPreservedAnalyses();
   if (AR.MSSA)

@@ -2522,12 +2522,14 @@ static void unswitchNontrivialInvariants(
if (MSSAU && VerifyMemorySSA)
MSSAU->getMemorySSA()->verifyMemorySSA();

#ifdef EXPENSIVE_CHECKS
// This transformation has a high risk of corrupting the dominator tree, and
// the below steps to rebuild loop structures will result in hard to debug
// errors in that case so verify that the dominator tree is sane first.
// FIXME: Remove this when the bugs stop showing up and rely on existing
// verification steps.
assert(DT.verify(DominatorTree::VerificationLevel::Fast));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about ifdef'ing this one, generally considering some of the past bugs (and the comment above). Maybe the final one instead could switch to VerificationLevel::Full, w/ expensive checks enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, are you suggesting completely deleting this one because it has not fired recently and to turn the one on line 3714/3717 to VerificationLevel::Full under expensive checks, or do you want to keep this one outside of EXPENSIVE_CHECKS and use VerificationLevel::Full for the last one under EXPENSIVE_CHECKS?

The comment "Remove this when the bugs stop showing up and rely on existing verification steps" from 7 years ago seems to imply the former assuming the assert has not fired recently (not familiar with the code here, do you know? Searching GitHub issues for this asserts, I did not find an issues for this pass).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting leaving this check outside of EXPENSIVE_CHECKS. While no related issues seem to have come up recently (some of the recent bugs were unrelated to breaking the dominator tree), the pass still does rewrite loops, and in general the CFG, quite aggressively, so I'd rather keep the assert than dropping it. That said, maybe enabling it only under EXPENSIVE_CHECKS is okay...

and to turn the one on line 3714/3717 to VerificationLevel::Full under expensive checks

Yeah I think now switching this one to VerificationLevel::Full under expensive checks could be reasonable.

@vzakhari
Copy link
Contributor

Looks good to me, though, I am not familiar with the code and cannot comment on @antoniofrighetto's concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants