public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE
@ 2013-01-14  6:11 andi-gcc at firstfloor dot org
  2013-01-14 19:06 ` [Bug target/55966] " andi-gcc at firstfloor dot org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14  6:11 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

             Bug #: 55966
           Summary: __atomic_fetch_* generate wrong code for HLE
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: andi-gcc@firstfloor.org
            Target: x86_64-linux


__atomic_fetch_(and|xor|or|nand) sometimes generate a cmpxchg loop instead of
the direct instruction. nand always does that because there is no x86 nand 
The others can in principle generate direct instructions, and do, but not
always.

When specifying __ATOMIC_HLE_RELEASE or ACQUIRE the HLE prefix is not
generated.
Also when the CMPXCHG loop is generated it would be needed to put a PAUSE for
the unsuccessfull path, otherwise poor performance will happen.

Generating correct code for a CMPXCHG HLE loop is tricky and it may be better
to forbid the nand case. But for others which can be implemented as a single
atomic operations it would be better to ensure they always do that instead of
falling back to cmpxchg.

Testcase TBD.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
  2013-01-14  6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
@ 2013-01-14 19:06 ` andi-gcc at firstfloor dot org
  2013-01-14 19:52 ` andi-gcc at firstfloor dot org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14 19:06 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

--- Comment #1 from Andi Kleen <andi-gcc at firstfloor dot org> 2013-01-14 19:06:02 UTC ---
Here's a test case. This requires the libstdc++ HLE patch from
http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00673.html

g++ -std=gnu++0x

#include <atomic>

#define ACQ memory_order_acquire | __memory_order_hle_acquire
#define REL memory_order_release | __memory_order_hle_release

int main()
{
  using namespace std;
  atomic_ulong au = ATOMIC_VAR_INIT(0);

  if (!au.fetch_and(1, ACQ))
    au.fetch_and(-1, REL);

  unsigned lock = 0;
  __atomic_fetch_and(&lock, 1, __ATOMIC_HLE_ACQUIRE|__ATOMIC_ACQUIRE);

  return 0;
}

The first fetch_and generates incorrectly:

.L2:
        movq    %rax, %rcx
        movq    %rax, %rdx
        andl    $1, %ecx
        lock; cmpxchgq  %rcx, -24(%rsp)
        jne     .L2


The second generates correctly:

        lock; 
        .byte   0xf3
        andq    $-1, -24(%rsp)


The __atomic_fetch_and generates correctly:

     lock; 
        .byte   0xf2
        andl    $1, -28(%rsp)


The first fetch_and should generate the same code sequence.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
  2013-01-14  6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
  2013-01-14 19:06 ` [Bug target/55966] " andi-gcc at firstfloor dot org
@ 2013-01-14 19:52 ` andi-gcc at firstfloor dot org
  2013-01-14 20:15 ` andi-gcc at firstfloor dot org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14 19:52 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

--- Comment #2 from Andi Kleen <andi-gcc at firstfloor dot org> 2013-01-14 19:52:03 UTC ---
Created attachment 29163
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29163
preprocessed testcase


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
  2013-01-14  6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
  2013-01-14 19:06 ` [Bug target/55966] " andi-gcc at firstfloor dot org
  2013-01-14 19:52 ` andi-gcc at firstfloor dot org
@ 2013-01-14 20:15 ` andi-gcc at firstfloor dot org
  2013-01-14 21:22 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14 20:15 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

Andi Kleen <andi-gcc at firstfloor dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #29163|0                           |1
        is obsolete|                            |

--- Comment #3 from Andi Kleen <andi-gcc at firstfloor dot org> 2013-01-14 20:15:11 UTC ---
Created attachment 29164
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29164
preprocessed test case

Sorry earlier was the wrong file


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
  2013-01-14  6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
                   ` (2 preceding siblings ...)
  2013-01-14 20:15 ` andi-gcc at firstfloor dot org
@ 2013-01-14 21:22 ` ubizjak at gmail dot com
  2013-01-14 21:25 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2013-01-14 21:22 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

--- Comment #4 from Uros Bizjak <ubizjak at gmail dot com> 2013-01-14 21:22:35 UTC ---
The problem is, that in failed case maybe_emit_op() gets target register to
return the result to, so with after=false, it expands via
optab->mem_fetch_before.

Unfortunately, atomic_fetch_<logic> or atomic_<logic>_fetch do not exist for
x86 target, so generic expander falls back to exchange loop.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
  2013-01-14  6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
                   ` (3 preceding siblings ...)
  2013-01-14 21:22 ` ubizjak at gmail dot com
@ 2013-01-14 21:25 ` ubizjak at gmail dot com
  2013-01-14 22:05 ` andi-gcc at firstfloor dot org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2013-01-14 21:25 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

--- Comment #5 from Uros Bizjak <ubizjak at gmail dot com> 2013-01-14 21:25:13 UTC ---
Following testcase will expand to a cmpxchg loop:

int hle_and (int *p, int v)
{
  return __atomic_fetch_and_4 (p, v, __ATOMIC_ACQUIRE | __ATOMIC_HLE_ACQUIRE);
}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
  2013-01-14  6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
                   ` (4 preceding siblings ...)
  2013-01-14 21:25 ` ubizjak at gmail dot com
@ 2013-01-14 22:05 ` andi-gcc at firstfloor dot org
  2013-01-14 22:23 ` amacleod at redhat dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14 22:05 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

--- Comment #6 from Andi Kleen <andi-gcc at firstfloor dot org> 2013-01-14 22:05:38 UTC ---
Hmm that's true. x86 doesn't have xand, x_or, x_xor, only xadd 
Maybe cmpxchg is the only way?

For some special cases it can be done (like and single bit-> btr, or single bit
-> btr), but it's probably complicated to implement.

In this case I would prefer to forbid those for HLE.
I guess more arguments for the target hook.

Other ideas?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
  2013-01-14  6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
                   ` (5 preceding siblings ...)
  2013-01-14 22:05 ` andi-gcc at firstfloor dot org
@ 2013-01-14 22:23 ` amacleod at redhat dot com
  2013-01-14 22:32 ` andi-gcc at firstfloor dot org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: amacleod at redhat dot com @ 2013-01-14 22:23 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

Andrew Macleod <amacleod at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amacleod at redhat dot com

--- Comment #7 from Andrew Macleod <amacleod at redhat dot com> 2013-01-14 22:22:58 UTC ---
What do you mean 'forbid' it?

Isn't the HLE bit suppose to be a hint anyway?  Saying if the hardware supports
it, use it on this atomic instruction...  Its not incorrect to ignore it, and
some targets simply don't support it. so its ignored.  

I would expect that when we have to drop back to a compare-exchange loop we're
no longer going to really care?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
  2013-01-14  6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
                   ` (6 preceding siblings ...)
  2013-01-14 22:23 ` amacleod at redhat dot com
@ 2013-01-14 22:32 ` andi-gcc at firstfloor dot org
  2013-01-14 22:34 ` andi-gcc at firstfloor dot org
  2024-04-09  2:58 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14 22:32 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

--- Comment #8 from Andi Kleen <andi-gcc at firstfloor dot org> 2013-01-14 22:32:06 UTC ---
forbid = give warning and drop bit

It's a hint, but in a good implementation it should not be silently dropped or
code generated that has no chance to elide. It's a quality of implementation
issue.

IMHO there should be at least a warning for this case, like we do for other
cases where elision doesn't work.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
  2013-01-14  6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
                   ` (7 preceding siblings ...)
  2013-01-14 22:32 ` andi-gcc at firstfloor dot org
@ 2013-01-14 22:34 ` andi-gcc at firstfloor dot org
  2024-04-09  2:58 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: andi-gcc at firstfloor dot org @ 2013-01-14 22:34 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

--- Comment #9 from Andi Kleen <andi-gcc at firstfloor dot org> 2013-01-14 22:34:16 UTC ---
Also i need to look more closely, but most likely the C++ atomic code should be
changed to avoid this situation. This would give much better code on x86 in any
case even without elision.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/55966] __atomic_fetch_* generate wrong code for HLE
  2013-01-14  6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
                   ` (8 preceding siblings ...)
  2013-01-14 22:34 ` andi-gcc at firstfloor dot org
@ 2024-04-09  2:58 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-09  2:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |WONTFIX
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Dropping __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE is always ok thing to do, it
is only there for a hint anyways. So closing as won't fix as Intel also disable
HLE on almost all cores anyways:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2955f270a84762343000f103e0640d29c7a96f3

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-04-09  2:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-14  6:11 [Bug target/55966] New: __atomic_fetch_* generate wrong code for HLE andi-gcc at firstfloor dot org
2013-01-14 19:06 ` [Bug target/55966] " andi-gcc at firstfloor dot org
2013-01-14 19:52 ` andi-gcc at firstfloor dot org
2013-01-14 20:15 ` andi-gcc at firstfloor dot org
2013-01-14 21:22 ` ubizjak at gmail dot com
2013-01-14 21:25 ` ubizjak at gmail dot com
2013-01-14 22:05 ` andi-gcc at firstfloor dot org
2013-01-14 22:23 ` amacleod at redhat dot com
2013-01-14 22:32 ` andi-gcc at firstfloor dot org
2013-01-14 22:34 ` andi-gcc at firstfloor dot org
2024-04-09  2:58 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).