-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Add -fsycl-fp32-prec-sqrt flag #5309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This flag enables correctly rounded `sycl::sqrt` (the default precision requirement is 3 ULP). And enables the flag for CUDA and HIP targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FE changes LGTM. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FE changes look good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple minor nits.
Co-authored-by: premanandrao <premanand.m.rao@intel.com>
5cebae7
Co-authored-by: premanandrao <premanand.m.rao@intel.com>
Co-authored-by: Artem Gindinson <artem.gindinson@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation/tests LGTM, thanks!
I would prefer to leave the actual @intel/dpcpp-clang-driver-reviewers approval to @mdtoguchi or @hchilama in view of the new option being added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regression/kernel_name_class.cpp from llvm-test-suite fails with:
Memory access fault by GPU node-1 (Agent handle: 0x828770) on address 0x7fe65b61e000. Reason: Page not present or supervisor privilege.
I don't think it's directly related to this patch, but it might be an issue in the runtime library or hip plug-in. Any ideas what's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FE changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FE changes LGTM!
This is strange, I'm unable to reproduce these failures with this branch and the latest |
Based on the history of pre-commit checks in this pull request, the issue seems to be sporadic, but still the log suggests there is a bug somewhere as test program accesses wrong memory location. |
@npmiller, I'll restart GitHub Actions jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what fp64? No need for it?
Thanks for the suggestion. I will update my example to include double-precision data type. |
Thanks for adding fp64! If we do support fp64, do we need to rename the option a bit, e.g. remove fp32 given the CUDA compiler option does not have "type" in the name. |
The SYCL spec already requires double precision
So, for SYCL, under -O3, do we have (or need to have) same behavior as NVCC for SYCL compiler? |
With the verification result of the updated example, I found that nvcc calls the fast sqrt for double precision. The optimization option is just "-O3". However, the sycl compiler calls the correctly rounded sqrt. Thanks. |
@bader looks like all of the testing passed on this after re-running the CI |
Great. Let's check that @xtian-github's concerns are resolved. |
@xtian-github, ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* upstream/sycl: (3571 commits) [ESIMD] Doxygen update part III - core APIs. (intel#5472) [SYCL][DOC] Move proposed FPGA extensions (intel#5453) [SYCL] Add -fsycl-fp32-prec-sqrt flag (intel#5309) [SYCL] Emit program build logs for warning levels >= 2 (intel#5319) [SYCL] Add clang support for code_location in KernelInfo (intel#5335) [SYCL][Doc] Move FPGA extensions (intel#5470) [ESIMD] Fix public simd and simd_view APIs. (intel#5465) [SYCL] Deprecate sycl::atomics in SYCL 2020 mode (intel#5440) [SYCL] Add unit test for PR 5414 (intel#5450) [XPTI] Allow arbitrary data types in metadata (intel#4998) [SYCL][DOC] Move discard queue events to supported (intel#5452) [Driver][SYCL] Initial support for allowing fat static -lname processing (intel#5413) [SYCL] Fix dead pointer usage if leaf buffer overflows (intel#5417) [SYCL][L0] Fix memory leak in USM prefetch (intel#5461) [SYCL][Doc] Add new free function queries proposal (intel#5106) [SYCL][ESIMD] Update vc-intrinsics deps to the top of the trunk (intel#5460) [SYCL][DOC] Move old spec constant extension spec (intel#5456) [SYCL][DOC] Move deprecated extensions (intel#5458) [SYCL][DOC] Fix links to old SubGroupMask doc (intel#5459) [ESIMD] Doxygen update part II - memory APIs. (intel#5443) ...
It follows the approach from intel#5141 and intel#5309 adding intermediate fcuda-prec-div flag. Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
This flag enables correctly rounded
sycl::sqrt
(the default precisionrequirement is 3 ULP).
And enables the flag for CUDA and HIP targets.
This is a follow up from #5141, to have a proper fix for #4041.