Skip to content

[SYCL] Add -fsycl-fp32-prec-sqrt flag #5309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 4, 2022
Merged

[SYCL] Add -fsycl-fp32-prec-sqrt flag #5309

merged 10 commits into from
Feb 4, 2022

Conversation

npmiller
Copy link
Contributor

This flag enables correctly rounded sycl::sqrt (the default precision
requirement is 3 ULP).

And enables the flag for CUDA and HIP targets.

This is a follow up from #5141, to have a proper fix for #4041.

This flag enables correctly rounded `sycl::sqrt` (the default precision
requirement is 3 ULP).

And enables the flag for CUDA and HIP targets.
smanna12
smanna12 previously approved these changes Jan 14, 2022
Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes LGTM. Thanks!

smanna12
smanna12 previously approved these changes Jan 14, 2022
Copy link
Contributor

@smanna12 smanna12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes look good to me.

Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple minor nits.

Co-authored-by: premanandrao <premanand.m.rao@intel.com>
@npmiller npmiller dismissed stale reviews from smanna12 and elizabethandrews via 5cebae7 January 14, 2022 16:58
npmiller and others added 2 commits January 14, 2022 17:01
Co-authored-by: premanandrao <premanand.m.rao@intel.com>
Copy link
Contributor

@AGindinson AGindinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation/tests LGTM, thanks!
I would prefer to leave the actual @intel/dpcpp-clang-driver-reviewers approval to @mdtoguchi or @hchilama in view of the new option being added.

Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression/kernel_name_class.cpp from llvm-test-suite fails with:

Memory access fault by GPU node-1 (Agent handle: 0x828770) on address 0x7fe65b61e000. Reason: Page not present or supervisor privilege.

I don't think it's directly related to this patch, but it might be an issue in the runtime library or hip plug-in. Any ideas what's going on here?

@bader bader requested review from mdtoguchi and hchilama January 18, 2022 14:47
Copy link
Contributor

@smanna12 smanna12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes LGTM

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes LGTM!

@bader bader requested a review from a team January 18, 2022 15:00
@npmiller
Copy link
Contributor Author

npmiller commented Jan 18, 2022

Regression/kernel_name_class.cpp from llvm-test-suite fails with:

Memory access fault by GPU node-1 (Agent handle: 0x828770) on address 0x7fe65b61e000. Reason: Page not present or supervisor privilege.

I don't think it's directly related to this patch, but it might be an issue in the runtime library or hip plug-in. Any ideas what's going on here?

This is strange, I'm unable to reproduce these failures with this branch and the latest llvm-test-suite (on gfx908). So I'm not too sure what's going on, I'm tempted to just rebase on the latests sycl branch and then we can try re-running the CI

@bader
Copy link
Contributor

bader commented Jan 18, 2022

Based on the history of pre-commit checks in this pull request, the issue seems to be sporadic, but still the log suggests there is a bug somewhere as test program accesses wrong memory location.

@bader
Copy link
Contributor

bader commented Jan 18, 2022

@npmiller, I'll restart GitHub Actions jobs.

@pvchupin pvchupin requested a review from xtian-github January 18, 2022 19:35
Copy link

@xtian-github xtian-github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what fp64? No need for it?

@zjin-lcf
Copy link
Contributor

what fp64? No need for it?

Thanks for the suggestion. I will update my example to include double-precision data type.

@xtian-github
Copy link

what fp64? No need for it?

Thanks for the suggestion. I will update my example to include double-precision data type.

Thanks for adding fp64! If we do support fp64, do we need to rename the option a bit, e.g. remove fp32 given the CUDA compiler option does not have "type" in the name.

@npmiller
Copy link
Contributor Author

npmiller commented Jan 19, 2022

what fp64? No need for it?

The SYCL spec already requires double precision sqrt to be correctly rounded so I don't believe this flag would make sense for fp64.

what fp64? No need for it?

The SYCL spec already requires double precision sqrt to be correctly rounded so I don't believe this flag would make sense for fp64.

With the verification result of the updated example, I found that nvcc calls the fast sqrt for double precision. The optimization option is just "-O3". However, the sycl compiler calls the correctly rounded sqrt. Thanks.

So, for SYCL, under -O3, do we have (or need to have) same behavior as NVCC for SYCL compiler?

@zjin-lcf
Copy link
Contributor

what fp64? No need for it?

The SYCL spec already requires double precision sqrt to be correctly rounded so I don't believe this flag would make sense for fp64.

With the verification result of the updated example, I found that nvcc calls the fast sqrt for double precision. The optimization option is just "-O3". However, the sycl compiler calls the correctly rounded sqrt. Thanks.

@npmiller npmiller requested a review from xtian-github January 31, 2022 10:45
@npmiller
Copy link
Contributor Author

@bader looks like all of the testing passed on this after re-running the CI

@bader
Copy link
Contributor

bader commented Jan 31, 2022

Great. Let's check that @xtian-github's concerns are resolved.

@bader
Copy link
Contributor

bader commented Feb 4, 2022

@xtian-github, ping.

Copy link

@xtian-github xtian-github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bader bader merged commit 5c8b7e7 into intel:sycl Feb 4, 2022
alexbatashev added a commit to alexbatashev/llvm that referenced this pull request Feb 5, 2022
* upstream/sycl: (3571 commits)
  [ESIMD] Doxygen update part III - core APIs. (intel#5472)
  [SYCL][DOC] Move proposed FPGA extensions (intel#5453)
  [SYCL] Add -fsycl-fp32-prec-sqrt flag (intel#5309)
  [SYCL] Emit program build logs for warning levels >= 2 (intel#5319)
  [SYCL] Add clang support for code_location in KernelInfo (intel#5335)
  [SYCL][Doc] Move FPGA extensions (intel#5470)
  [ESIMD] Fix public simd and simd_view APIs. (intel#5465)
  [SYCL] Deprecate sycl::atomics in SYCL 2020 mode (intel#5440)
  [SYCL] Add unit test for PR 5414 (intel#5450)
  [XPTI] Allow arbitrary data types in metadata (intel#4998)
  [SYCL][DOC] Move discard queue events to supported (intel#5452)
  [Driver][SYCL] Initial support for allowing fat static -lname processing (intel#5413)
  [SYCL] Fix dead pointer usage if leaf buffer overflows (intel#5417)
  [SYCL][L0] Fix memory leak in USM prefetch (intel#5461)
  [SYCL][Doc] Add new free function queries proposal (intel#5106)
  [SYCL][ESIMD] Update vc-intrinsics deps to the top of the trunk (intel#5460)
  [SYCL][DOC] Move old spec constant extension spec (intel#5456)
  [SYCL][DOC] Move deprecated extensions (intel#5458)
  [SYCL][DOC] Fix links to old SubGroupMask doc (intel#5459)
  [ESIMD] Doxygen update part II - memory APIs. (intel#5443)
  ...
MrSidims added a commit to MrSidims/llvm that referenced this pull request Feb 18, 2025
It follows the approach from intel#5141
and intel#5309 adding intermediate
fcuda-prec-div flag.

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants