public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' @ 2011-05-31 19:38 desrt at desrt dot ca 2011-06-01 6:55 ` [Bug other/49244] " jakub at gcc dot gnu.org ` (6 more replies) 0 siblings, 7 replies; 8+ messages in thread From: desrt at desrt dot ca @ 2011-05-31 19:38 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 Summary: no intrinsics to emit 'lock bts' and 'lock btc' Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: other AssignedTo: unassigned@gcc.gnu.org ReportedBy: desrt@desrt.ca I want to be able to code this function: bool set_and_test (int *a, int bit) { uint mask = (1u << bit); return (__sync_fetch_and_or (a, mask) & mask) != 0; } and have GCC not emit a loop on amd64 and x86. GCC presently emits a loop for __sync_fetch_and_or() in this case. That's because asm "lock or" discards the previous value, so it can only be used in cases that the result is ignored. Since we do a comparison with the value, GCC has to do the loop. This special case (set and test a single bit) corresponds quite directly to the 'lock bts' assembly instruction, though. GCC could emit that instead. It would be nice if GCC could detect (by magic?) that I am only interested in this single bit or (probably much easier) expose an intrinsic that lets me access this functionality on platforms that it exists and falls back to using __sync_fetch_and_or() otherwise. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug other/49244] no intrinsics to emit 'lock bts' and 'lock btc' 2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca @ 2011-06-01 6:55 ` jakub at gcc dot gnu.org 2011-06-02 23:07 ` desrt at desrt dot ca ` (5 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: jakub at gcc dot gnu.org @ 2011-06-01 6:55 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |aldyh at gcc dot gnu.org, | |jakub at gcc dot gnu.org, | |rth at gcc dot gnu.org --- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-06-01 06:54:05 UTC --- I'm afraid this can't be done in the combiner nor in peephole2 and generally would be hard to do after expansion, but perhaps it could be done during expansion or earlier, by folding the (complex) sequence into a new builtin, preferrably with space in the name to make it inaccessible to users and then expanding that builtin. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug other/49244] no intrinsics to emit 'lock bts' and 'lock btc' 2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca 2011-06-01 6:55 ` [Bug other/49244] " jakub at gcc dot gnu.org @ 2011-06-02 23:07 ` desrt at desrt dot ca 2012-08-23 9:47 ` [Bug target/49244] " mgorny at gentoo dot org ` (4 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: desrt at desrt dot ca @ 2011-06-02 23:07 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 --- Comment #2 from Ryan Lortie <desrt at desrt dot ca> 2011-06-02 23:05:58 UTC --- I'd be happy enough if GCC published a "recommended sequence" or something so that it would be easier to check for this exact sequence... A new user-accessible builtin would also make me happy. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/49244] no intrinsics to emit 'lock bts' and 'lock btc' 2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca 2011-06-01 6:55 ` [Bug other/49244] " jakub at gcc dot gnu.org 2011-06-02 23:07 ` desrt at desrt dot ca @ 2012-08-23 9:47 ` mgorny at gentoo dot org 2021-10-04 14:52 ` [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc' jakub at gcc dot gnu.org ` (3 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: mgorny at gentoo dot org @ 2012-08-23 9:47 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 Michał Górny <mgorny at gentoo dot org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mgorny at gentoo dot org --- Comment #3 from Michał Górny <mgorny at gentoo dot org> 2012-08-23 09:46:56 UTC --- I believe that this became more important with C++11 <atomic>, and a separate intrinsics will be not be enough anymore. I believe that GCC should be able to optimize out such a simple cases. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc' 2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca ` (2 preceding siblings ...) 2012-08-23 9:47 ` [Bug target/49244] " mgorny at gentoo dot org @ 2021-10-04 14:52 ` jakub at gcc dot gnu.org 2021-10-06 2:16 ` hjl.tools at gmail dot com ` (2 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: jakub at gcc dot gnu.org @ 2021-10-04 14:52 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hjl.tools at gmail dot com --- Comment #24 from Jakub Jelinek <jakub at gcc dot gnu.org> --- I wanted to look at #c20, but at least my i9-7960X for e.g. lock; btsl $65, var acts the same as lock; btsl $1, var rather than lock; btsl $1, var+8, so maybe #c20 is not possible. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc' 2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca ` (3 preceding siblings ...) 2021-10-04 14:52 ` [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc' jakub at gcc dot gnu.org @ 2021-10-06 2:16 ` hjl.tools at gmail dot com 2021-10-06 8:00 ` jakub at gcc dot gnu.org 2021-11-10 9:17 ` cvs-commit at gcc dot gnu.org 6 siblings, 0 replies; 8+ messages in thread From: hjl.tools at gmail dot com @ 2021-10-06 2:16 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 --- Comment #25 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Jakub Jelinek from comment #24) > I wanted to look at #c20, but at least my i9-7960X for e.g. lock; btsl $65, > var > acts the same as lock; btsl $1, var rather than lock; btsl $1, var+8, > so maybe #c20 is not possible. The maximum bit position is 63 for the immediate operand. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc' 2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca ` (4 preceding siblings ...) 2021-10-06 2:16 ` hjl.tools at gmail dot com @ 2021-10-06 8:00 ` jakub at gcc dot gnu.org 2021-11-10 9:17 ` cvs-commit at gcc dot gnu.org 6 siblings, 0 replies; 8+ messages in thread From: jakub at gcc dot gnu.org @ 2021-10-06 8:00 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #26 from Jakub Jelinek <jakub at gcc dot gnu.org> --- So bug is fixed then, #c20 is impossible. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc' 2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca ` (5 preceding siblings ...) 2021-10-06 8:00 ` jakub at gcc dot gnu.org @ 2021-11-10 9:17 ` cvs-commit at gcc dot gnu.org 6 siblings, 0 replies; 8+ messages in thread From: cvs-commit at gcc dot gnu.org @ 2021-11-10 9:17 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 --- Comment #27 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>: https://gcc.gnu.org/g:fb161782545224f55ba26ba663889c5e6e9a04d1 commit r12-5102-gfb161782545224f55ba26ba663889c5e6e9a04d1 Author: liuhongt <hongtao.liu@intel.com> Date: Mon Oct 25 13:59:51 2021 +0800 Improve integer bit test on __atomic_fetch_[or|and]_* returns commit adedd5c173388ae505470df152b9cb3947339566 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue May 3 13:37:25 2016 +0200 re PR target/49244 (__sync or __atomic builtins will not emit 'lock bts/btr/btc') optimized bit test on __atomic_fetch_or_* and __atomic_fetch_and_* returns with lock bts/btr/btc by turning mask_2 = 1 << cnt_1; _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3); _5 = _4 & mask_2; into _4 = ATOMIC_BIT_TEST_AND_SET (ptr_6, cnt_1, 0, _3); _5 = _4; and mask_6 = 1 << bit_5(D); _1 = ~mask_6; _2 = __atomic_fetch_and_4 (v_8(D), _1, 0); _3 = _2 & mask_6; _4 = _3 != 0; into mask_6 = 1 << bit_5(D); _1 = ~mask_6; _11 = .ATOMIC_BIT_TEST_AND_RESET (v_8(D), bit_5(D), 1, 0); _4 = _11 != 0; But it failed to optimize many equivalent, but slighly different cases: 1. _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); _4 = (_Bool) _1; 2. _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); _4 = (_Bool) _1; 3. _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); _7 = ~_1; _5 = (_Bool) _7; 4. _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); _7 = ~_1; _5 = (_Bool) _7; 5. _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); _2 = (int) _1; _7 = ~_2; _5 = (_Bool) _7; 6. _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); _2 = (int) _1; _7 = ~_2; _5 = (_Bool) _7; 7. _1 = __atomic_fetch_or_4 (ptr_6, 0x80000000, _3); _5 = (signed int) _1; _4 = _5 < 0; 8. _1 = __atomic_fetch_and_4 (ptr_6, 0x7fffffff, _3); _5 = (signed int) _1; _4 = _5 < 0; 9. _1 = 1 << bit_4(D); mask_5 = (unsigned int) _1; _2 = __atomic_fetch_or_4 (v_7(D), mask_5, 0); _3 = _2 & mask_5; 10. mask_7 = 1 << bit_6(D); _1 = ~mask_7; _2 = (unsigned int) _1; _3 = __atomic_fetch_and_4 (v_9(D), _2, 0); _4 = (int) _3; _5 = _4 & mask_7; We make mask_2 = 1 << cnt_1; _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3); _5 = _4 & mask_2; and mask_6 = 1 << bit_5(D); _1 = ~mask_6; _2 = __atomic_fetch_and_4 (v_8(D), _1, 0); _3 = _2 & mask_6; _4 = _3 != 0; the canonical forms for this optimization and transform cases 1-9 to the equivalent canonical form. For cases 10 and 11, we simply remove the cast before __atomic_fetch_or_4/__atomic_fetch_and_4 with _1 = 1 << bit_4(D); _2 = __atomic_fetch_or_4 (v_7(D), _1, 0); _3 = _2 & _1; and mask_7 = 1 << bit_6(D); _1 = ~mask_7; _3 = __atomic_fetch_and_4 (v_9(D), _1, 0); _6 = _3 & mask_7; _5 = (int) _6; 2021-11-04 H.J. Lu <hongjiu.lu@intel.com> Hongtao Liu <hongtao.liu@intel.com> gcc/ PR middle-end/102566 * match.pd (nop_atomic_bit_test_and_p): New match. * tree-ssa-ccp.c (convert_atomic_bit_not): New function. (gimple_nop_atomic_bit_test_and_p): New prototype. (optimize_atomic_bit_test_and): Transform equivalent, but slighly different cases to their canonical forms. gcc/testsuite/ PR middle-end/102566 * g++.target/i386/pr102566-1.C: New test. * g++.target/i386/pr102566-2.C: Likewise. * g++.target/i386/pr102566-3.C: Likewise. * g++.target/i386/pr102566-4.C: Likewise. * g++.target/i386/pr102566-5a.C: Likewise. * g++.target/i386/pr102566-5b.C: Likewise. * g++.target/i386/pr102566-6a.C: Likewise. * g++.target/i386/pr102566-6b.C: Likewise. * gcc.target/i386/pr102566-1a.c: Likewise. * gcc.target/i386/pr102566-1b.c: Likewise. * gcc.target/i386/pr102566-2.c: Likewise. * gcc.target/i386/pr102566-3a.c: Likewise. * gcc.target/i386/pr102566-3b.c: Likewise. * gcc.target/i386/pr102566-4.c: Likewise. * gcc.target/i386/pr102566-5.c: Likewise. * gcc.target/i386/pr102566-6.c: Likewise. * gcc.target/i386/pr102566-7.c: Likewise. * gcc.target/i386/pr102566-8a.c: Likewise. * gcc.target/i386/pr102566-8b.c: Likewise. * gcc.target/i386/pr102566-9a.c: Likewise. * gcc.target/i386/pr102566-9b.c: Likewise. * gcc.target/i386/pr102566-10a.c: Likewise. * gcc.target/i386/pr102566-10b.c: Likewise. * gcc.target/i386/pr102566-11.c: Likewise. * gcc.target/i386/pr102566-12.c: Likewise. * gcc.target/i386/pr102566-13.c: New test. * gcc.target/i386/pr102566-14.c: New test. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-11-10 9:17 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-05-31 19:38 [Bug other/49244] New: no intrinsics to emit 'lock bts' and 'lock btc' desrt at desrt dot ca 2011-06-01 6:55 ` [Bug other/49244] " jakub at gcc dot gnu.org 2011-06-02 23:07 ` desrt at desrt dot ca 2012-08-23 9:47 ` [Bug target/49244] " mgorny at gentoo dot org 2021-10-04 14:52 ` [Bug target/49244] __sync or __atomic builtins will not emit 'lock bts/btr/btc' jakub at gcc dot gnu.org 2021-10-06 2:16 ` hjl.tools at gmail dot com 2021-10-06 8:00 ` jakub at gcc dot gnu.org 2021-11-10 9:17 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).