public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/47769] New: [missed optimization] use of btr (bit test and reset)
@ 2011-02-16 18:52 kretz at kde dot org
  2011-02-16 20:13 ` [Bug target/47769] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: kretz at kde dot org @ 2011-02-16 18:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769

           Summary: [missed optimization] use of btr (bit test and reset)
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: kretz@kde.org


The code:

tmp &= ~(1 << bit);

gets translated to

actual shift, not, and and instructions. Instead GCC could emit one btr
instruction (which modifies the flags - unwanted but acceptable):

btr %[bit], %[tmp]

The btr instruction has a latency of 1 cycle and throughput of 0.5 cycles on
all recent Intel CPUs and thus outperforms the shift + not + and combination.

Rationale:
I make use of this pattern for iteration over a bitmask. I use bsf
(_bit_scan_forward(bitmask)) to find the lowest set bit. To find the next one I
have to mask off the last found bit and currently have to use inline assembly
to get a btr instruction there. Alternatively an intrinsic for btr and friends
might make sense.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/47769] [missed optimization] use of btr (bit test and reset)
  2011-02-16 18:52 [Bug target/47769] New: [missed optimization] use of btr (bit test and reset) kretz at kde dot org
@ 2011-02-16 20:13 ` rguenth at gcc dot gnu.org
  2011-02-17 10:01 ` kretz at kde dot org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-02-16 20:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|                            |x86_64-*-*, i?86-*-*
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2011.02.16 19:46:40
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-02-16 19:46:40 UTC ---
Can you provide a testcase that can be compiled please?

Cut&pasting from i386.md:

;; %%% bts, btr, btc, bt.
;; In general these instructions are *slow* when applied to memory,
;; since they enforce atomic operation.  When applied to registers,
;; it depends on the cpu implementation.  They're never faster than
;; the corresponding and/ior/xor operations, so with 32-bit there's
;; no point.  But in 64-bit, we can't hold the relevant immediates
;; within the instruction itself, so operating on bits in the high
;; 32-bits of a register becomes easier.
;;
;; These are slow on Nocona, but fast on Athlon64.  We do require the use
;; of btrq and btcq for corner cases of post-reload expansion of absdf and
;; negdf respectively, so they can never be disabled entirely.

....

(define_insn "*btrq"
  [(set (zero_extract:DI (match_operand:DI 0 "register_operand" "+r")
                         (const_int 1)
                         (match_operand:DI 1 "const_0_to_63_operand" ""))
        (const_int 0))
   (clobber (reg:CC FLAGS_REG))]
  "TARGET_64BIT && (TARGET_USE_BT || reload_completed)"
  "btr{q}\t{%1, %0|%0, %1}"
  [(set_attr "type" "alu1")
   (set_attr "prefix_0f" "1")
   (set_attr "mode" "DI")])

and

  /* X86_TUNE_USE_BT */
  m_AMD_MULTIPLE | m_ATOM | m_CORE2I7 | m_GENERIC,

so it appears it should already be used by default.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/47769] [missed optimization] use of btr (bit test and reset)
  2011-02-16 18:52 [Bug target/47769] New: [missed optimization] use of btr (bit test and reset) kretz at kde dot org
  2011-02-16 20:13 ` [Bug target/47769] " rguenth at gcc dot gnu.org
@ 2011-02-17 10:01 ` kretz at kde dot org
  2011-02-17 10:12 ` kretz at kde dot org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: kretz at kde dot org @ 2011-02-17 10:01 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769

--- Comment #2 from Matthias Kretz <kretz at kde dot org> 2011-02-17 10:00:55 UTC ---
Created attachment 23375
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23375
Test code to see whether btr gets used automatically and to compare speed

compile with
g++ -O3 -march=core2 -msse4 -Wall -funroll-loops

-funroll-loops is not required to see the speedup, but it shows that a higher
instruction level parallelism can be achieved with the use of btr.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/47769] [missed optimization] use of btr (bit test and reset)
  2011-02-16 18:52 [Bug target/47769] New: [missed optimization] use of btr (bit test and reset) kretz at kde dot org
  2011-02-16 20:13 ` [Bug target/47769] " rguenth at gcc dot gnu.org
  2011-02-17 10:01 ` kretz at kde dot org
@ 2011-02-17 10:12 ` kretz at kde dot org
  2012-03-29  9:56 ` kretz at kde dot org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: kretz at kde dot org @ 2011-02-17 10:12 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769

--- Comment #3 from Matthias Kretz <kretz at kde dot org> 2011-02-17 10:01:34 UTC ---
Created attachment 23376
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23376
TimeStampCounter class for benchmarking


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/47769] [missed optimization] use of btr (bit test and reset)
  2011-02-16 18:52 [Bug target/47769] New: [missed optimization] use of btr (bit test and reset) kretz at kde dot org
                   ` (2 preceding siblings ...)
  2011-02-17 10:12 ` kretz at kde dot org
@ 2012-03-29  9:56 ` kretz at kde dot org
  2013-05-03 11:45 ` kretz at kde dot org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: kretz at kde dot org @ 2012-03-29  9:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769

--- Comment #4 from Matthias Kretz <kretz at kde dot org> 2012-03-29 09:55:46 UTC ---
ping.
Are you still waiting for more input from me?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/47769] [missed optimization] use of btr (bit test and reset)
  2011-02-16 18:52 [Bug target/47769] New: [missed optimization] use of btr (bit test and reset) kretz at kde dot org
                   ` (3 preceding siblings ...)
  2012-03-29  9:56 ` kretz at kde dot org
@ 2013-05-03 11:45 ` kretz at kde dot org
  2013-05-03 12:14 ` paolo.carlini at oracle dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: kretz at kde dot org @ 2013-05-03 11:45 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769

--- Comment #5 from Matthias Kretz <kretz at kde dot org> 2013-05-03 11:45:49 UTC ---
Another ping.
The bug status is still WAITING...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/47769] [missed optimization] use of btr (bit test and reset)
  2011-02-16 18:52 [Bug target/47769] New: [missed optimization] use of btr (bit test and reset) kretz at kde dot org
                   ` (4 preceding siblings ...)
  2013-05-03 11:45 ` kretz at kde dot org
@ 2013-05-03 12:14 ` paolo.carlini at oracle dot com
  2021-11-28  6:55 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: paolo.carlini at oracle dot com @ 2013-05-03 12:14 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769

Paolo Carlini <paolo.carlini at oracle dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/47769] [missed optimization] use of btr (bit test and reset)
  2011-02-16 18:52 [Bug target/47769] New: [missed optimization] use of btr (bit test and reset) kretz at kde dot org
                   ` (5 preceding siblings ...)
  2013-05-03 12:14 ` paolo.carlini at oracle dot com
@ 2021-11-28  6:55 ` pinskia at gcc dot gnu.org
  2021-11-29  5:16 ` crazylht at gmail dot com
  2022-05-26  8:29 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-11-28  6:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|minor                       |enhancement
   Last reconfirmed|2011-02-16 19:46:40         |2021-11-27

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/47769] [missed optimization] use of btr (bit test and reset)
  2011-02-16 18:52 [Bug target/47769] New: [missed optimization] use of btr (bit test and reset) kretz at kde dot org
                   ` (6 preceding siblings ...)
  2021-11-28  6:55 ` pinskia at gcc dot gnu.org
@ 2021-11-29  5:16 ` crazylht at gmail dot com
  2022-05-26  8:29 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2021-11-29  5:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---

> 
> This is obviously horrible, but the right answer isn't btr in a loop, it's
> what clang does:
> 
>         movabsq $7905747460161236406, %rax # imm = 0x6DB6DB6DB6DB6DB6 every
> third bit unset
>         andq    %rdi, %rax
>         retq
> 

Open pr103462 for this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/47769] [missed optimization] use of btr (bit test and reset)
  2011-02-16 18:52 [Bug target/47769] New: [missed optimization] use of btr (bit test and reset) kretz at kde dot org
                   ` (7 preceding siblings ...)
  2021-11-29  5:16 ` crazylht at gmail dot com
@ 2022-05-26  8:29 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-05-26  8:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769
Bug 47769 depends on bug 105735, which changed state.

Bug 105735 Summary: GCC failed to reduce &= loop_inv in loop.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105735

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |DUPLICATE

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-26  8:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-16 18:52 [Bug target/47769] New: [missed optimization] use of btr (bit test and reset) kretz at kde dot org
2011-02-16 20:13 ` [Bug target/47769] " rguenth at gcc dot gnu.org
2011-02-17 10:01 ` kretz at kde dot org
2011-02-17 10:12 ` kretz at kde dot org
2012-03-29  9:56 ` kretz at kde dot org
2013-05-03 11:45 ` kretz at kde dot org
2013-05-03 12:14 ` paolo.carlini at oracle dot com
2021-11-28  6:55 ` pinskia at gcc dot gnu.org
2021-11-29  5:16 ` crazylht at gmail dot com
2022-05-26  8:29 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).