Skip to content

[flang][cuda] Materialize box when src or dst are rebox #116494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 18, 2024

Conversation

clementval
Copy link
Contributor

Same as for results coming from embox, materialize the box when they are coming from a rebox operation.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Nov 16, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 16, 2024

@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

Same as for results coming from embox, materialize the box when they are coming from a rebox operation.


Full diff: https://github.com/llvm/llvm-project/pull/116494.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/Transforms/CUFOpConversion.cpp (+1-1)
  • (modified) flang/test/Fir/CUDA/cuda-data-transfer.fir (+47)
diff --git a/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp b/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
index 9de20f0f0d45e1..17699dadc7511f 100644
--- a/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
+++ b/flang/lib/Optimizer/Transforms/CUFOpConversion.cpp
@@ -654,7 +654,7 @@ struct CUFDataTransferOpConversion
               loc, builder);
       }
       auto materializeBoxIfNeeded = [&](mlir::Value val) -> mlir::Value {
-        if (mlir::isa<fir::EmboxOp>(val.getDefiningOp())) {
+        if (mlir::isa<fir::EmboxOp, fir::ReboxOp>(val.getDefiningOp())) {
           // Materialize the box to memory to be able to call the runtime.
           mlir::Value box = builder.createTemporary(loc, val.getType());
           builder.create<fir::StoreOp>(loc, val, box);
diff --git a/flang/test/Fir/CUDA/cuda-data-transfer.fir b/flang/test/Fir/CUDA/cuda-data-transfer.fir
index 1ee44f3c6d97c9..5f10dc0562d179 100644
--- a/flang/test/Fir/CUDA/cuda-data-transfer.fir
+++ b/flang/test/Fir/CUDA/cuda-data-transfer.fir
@@ -466,4 +466,51 @@ func.func @_QPlogical_cst() {
 // CHECK: %[[BOX_NONE:.*]] = fir.convert %[[DESC]] : (!fir.ref<!fir.box<!fir.logical<4>>>) -> !fir.ref<!fir.box<none>>
 // CHECK: fir.call @_FortranACUFDataTransferCstDesc(%{{.*}}, %[[BOX_NONE]], %{{.*}}, %{{.*}}, %{{.*}}) : (!fir.ref<!fir.box<none>>, !fir.ref<!fir.box<none>>, i32, !fir.ref<i8>, i32) -> none
 
+func.func @_QPcallkernel(%arg0: !fir.box<!fir.array<?x?xcomplex<f32>>> {fir.bindc_name = "a"}, %arg1: !fir.ref<f32> {fir.bindc_name = "b"}, %arg2: !fir.ref<f32> {fir.bindc_name = "c"}) {
+  %c0_i64 = arith.constant 0 : i64
+  %c1_i32 = arith.constant 1 : i32
+  %c0_i32 = arith.constant 0 : i32
+  %c1 = arith.constant 1 : index
+  %c0 = arith.constant 0 : index
+  %0 = fir.dummy_scope : !fir.dscope
+  %1 = fir.declare %arg0 dummy_scope %0 {uniq_name = "_QFcallkernelEa"} : (!fir.box<!fir.array<?x?xcomplex<f32>>>, !fir.dscope) -> !fir.box<!fir.array<?x?xcomplex<f32>>>
+  %2 = fir.rebox %1 : (!fir.box<!fir.array<?x?xcomplex<f32>>>) -> !fir.box<!fir.array<?x?xcomplex<f32>>>
+  %3 = cuf.alloc !fir.box<!fir.heap<!fir.array<?x?xcomplex<f32>>>> {bindc_name = "adev", data_attr = #cuf.cuda<device>, uniq_name = "_QFcallkernelEadev"} -> !fir.ref<!fir.box<!fir.heap<!fir.array<?x?xcomplex<f32>>>>>
+  %7 = fir.declare %3 {data_attr = #cuf.cuda<device>, fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QFcallkernelEadev"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x?xcomplex<f32>>>>>) -> !fir.ref<!fir.box<!fir.heap<!fir.array<?x?xcomplex<f32>>>>>
+  %8 = fir.declare %arg1 dummy_scope %0 {uniq_name = "_QFcallkernelEb"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32>
+  %9 = fir.declare %arg2 dummy_scope %0 {uniq_name = "_QFcallkernelEc"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32>
+  %10 = fir.alloca i32 {bindc_name = "m", uniq_name = "_QFcallkernelEm"}
+  %11 = fir.declare %10 {uniq_name = "_QFcallkernelEm"} : (!fir.ref<i32>) -> !fir.ref<i32>
+  %12 = fir.alloca i32 {bindc_name = "n", uniq_name = "_QFcallkernelEn"}
+  %13 = fir.declare %12 {uniq_name = "_QFcallkernelEn"} : (!fir.ref<i32>) -> !fir.ref<i32>
+  %14:3 = fir.box_dims %2, %c0 : (!fir.box<!fir.array<?x?xcomplex<f32>>>, index) -> (index, index, index)
+  %15 = fir.convert %14#1 : (index) -> i32
+  fir.store %15 to %13 : !fir.ref<i32>
+  %16:3 = fir.box_dims %2, %c1 : (!fir.box<!fir.array<?x?xcomplex<f32>>>, index) -> (index, index, index)
+  %27 = fir.load %13 : !fir.ref<i32>
+  %28 = fir.convert %27 : (i32) -> index
+  %29 = arith.cmpi sgt, %28, %c0 : index
+  %30 = arith.select %29, %28, %c0 : index
+  %31 = fir.load %11 : !fir.ref<i32>
+  %32 = fir.convert %31 : (i32) -> index
+  %33 = arith.cmpi sgt, %32, %c0 : index
+  %34 = arith.select %33, %32, %c0 : index
+  %35 = fir.shape %30, %34 : (index, index) -> !fir.shape<2>
+  %36 = fir.undefined index
+  %37 = fir.slice %c1, %28, %c1, %c1, %32, %c1 : (index, index, index, index, index, index) -> !fir.slice<2>
+  %38 = fir.rebox %2 [%37] : (!fir.box<!fir.array<?x?xcomplex<f32>>>, !fir.slice<2>) -> !fir.box<!fir.array<?x?xcomplex<f32>>>
+  cuf.data_transfer %38 to %7 {transfer_kind = #cuf.cuda_transfer<host_device>} : !fir.box<!fir.array<?x?xcomplex<f32>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x?xcomplex<f32>>>>>
+  return
+}
+
+// CHECK-LABEL: func.func @_QPcallkernel(
+// CHECK-SAME: %[[ARG0:.*]]: !fir.box<!fir.array<?x?xcomplex<f32>>> {fir.bindc_name = "a"}
+// CHECK: %[[ALLOCA:.*]] = fir.alloca !fir.box<!fir.array<?x?xcomplex<f32>>>
+// CHECK: %[[DECL_ARG0:.*]] = fir.declare %[[ARG0]] dummy_scope %{{.*}} {uniq_name = "_QFcallkernelEa"} : (!fir.box<!fir.array<?x?xcomplex<f32>>>, !fir.dscope) -> !fir.box<!fir.array<?x?xcomplex<f32>>>
+// CHECK: %[[REBOX0:.*]] = fir.rebox %[[DECL_ARG0]] : (!fir.box<!fir.array<?x?xcomplex<f32>>>) -> !fir.box<!fir.array<?x?xcomplex<f32>>>
+// CHECK: %[[REBOX1:.*]] = fir.rebox %[[REBOX0]] [%{{.*}}] : (!fir.box<!fir.array<?x?xcomplex<f32>>>, !fir.slice<2>) -> !fir.box<!fir.array<?x?xcomplex<f32>>>
+// CHECK: fir.store %[[REBOX1]] to %[[ALLOCA]] : !fir.ref<!fir.box<!fir.array<?x?xcomplex<f32>>>>
+// CHECK: %[[BOX_NONE:.*]] = fir.convert %[[ALLOCA]] : (!fir.ref<!fir.box<!fir.array<?x?xcomplex<f32>>>>) -> !fir.ref<!fir.box<none>>
+// CHECK: fir.call @_FortranACUFDataTransferDescDesc(%{{.*}}, %[[BOX_NONE]], %{{.*}}, %{{.*}}, %{{.*}}) : (!fir.ref<!fir.box<none>>, !fir.ref<!fir.box<none>>, i32, !fir.ref<i8>, i32) -> none
+
 } // end of module

@clementval clementval merged commit de2e270 into llvm:main Nov 18, 2024
11 checks passed
@clementval clementval deleted the cuf_rebox_materialize branch November 18, 2024 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants