public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/103069] New: cmpxchg isn't optimized
@ 2021-11-03 19:08 hjl.tools at gmail dot com
  2021-11-03 20:53 ` [Bug target/103069] " thiago at kde dot org
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2021-11-03 19:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

            Bug ID: 103069
           Summary: cmpxchg isn't optimized
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hjl.tools at gmail dot com
                CC: crazylht at gmail dot com, wwwhhhyyy333 at gmail dot com
            Blocks: 103065
  Target Milestone: ---
            Target: i386,x86-64

>From the CPU's point of view, getting a cache line for writing is more
expensive than reading.  See Appendix A.2 Spinlock in:

https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf

The full compare and swap will grab the cache line exclusive and causes
excessive cache line bouncing.

[hjl@gnu-cfl-2 pr102566]$ cat e.c 
int
f3 (int *a)
{
  return __atomic_fetch_or (a, 0x40000000, __ATOMIC_RELAXED);
}
[hjl@gnu-cfl-2 pr102566]$ gcc -S -O2 x.c 
[hjl@gnu-cfl-2 pr102566]$ cat x.s 
        .file   "x.c"
        .text
        .p2align 4
        .globl  foo
        .type   foo, @function
foo:
.LFB0:
        .cfi_startproc
        movl    v(%rip), %eax
.L2:
        movl    %eax, %ecx
        movl    %eax, %edx
        orl     $1, %ecx
        lock cmpxchgl   %ecx, v(%rip)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
GCC should first emit a normal load, check and jump to .L2 if cmpxchgl
may fail.  Before jump to .L2, PAUSE should be inserted to to yield the
CPU to another hyperthread and to save power. It also serves to slightly
limit the rate of accesses on the processor interconnect.
        jne     .L2
        movl    %edx, %eax
        andl    $1, %eax
        ret
        .cfi_endproc
.LFE0:
        .size   foo, .-foo
        .ident  "GCC: (GNU) 11.2.1 20211019 (Red Hat 11.2.1-6)"
        .section        .note.GNU-stack,"",@progbits
[hjl@gnu-cfl-2 pr102566]$


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103065
[Bug 103065] [meta] atomic operations aren't optimized

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
@ 2021-11-03 20:53 ` thiago at kde dot org
  2021-11-04 21:25 ` thiago at kde dot org
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2021-11-03 20:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #1 from Thiago Macieira <thiago at kde dot org> ---
(the assembly doesn't match the source code, but we got your point)

Another possible improvement for the __atomic_fetch_{and,nand,or} functions is
that it can check whether the fetched value is already correct and branch out.
In your example, the __atomic_fetch_or with 0x40000000 can check if that bit is
already set and, if so, not execute the CMPXCHG at all.

This is a valid solution for x86 on memory orderings up to acq_rel. For other
architectures, they may still need barriers. For seq_cst, we either need a
barrier or we need to execute the CMPXCHG at least once. 

Therefore, the emitted code might want to optimistically execute the operation
once and, if it fails, enter the load loop. That's a slightly longer codegen.
Whether we want that under -Os or not, you'll have to be the judge.

Prior art: glibc/sysdeps/x86_64/nptl/pthread_spin_lock.S:
ENTRY(__pthread_spin_lock)
1:      LOCK
        decl    0(%rdi)
        jne     2f
        xor     %eax, %eax
        ret

        .align  16
2:      rep
        nop
        cmpl    $0, 0(%rdi)
        jg      1b
        jmp     2b
END(__pthread_spin_lock)

This does the atomic operation once, hoping it'll succeed. If it fails, it
enters the PAUSE+CMP+JG loop until the value is suitable.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
  2021-11-03 20:53 ` [Bug target/103069] " thiago at kde dot org
@ 2021-11-04 21:25 ` thiago at kde dot org
  2021-11-15 11:10 ` cvs-commit at gcc dot gnu.org
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2021-11-04 21:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #2 from Thiago Macieira <thiago at kde dot org> ---
See also bug 103090 for a few more (restricted) possibilities to replace a
cmpxchg loop with a LOCK RMW operation.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
  2021-11-03 20:53 ` [Bug target/103069] " thiago at kde dot org
  2021-11-04 21:25 ` thiago at kde dot org
@ 2021-11-15 11:10 ` cvs-commit at gcc dot gnu.org
  2021-11-15 14:26 ` hjl.tools at gmail dot com
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-15 11:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hongyu Wang <hongyuw@gcc.gnu.org>:

https://gcc.gnu.org/g:4d281ff7ddd8f6365943c0a622107f92315bb8a6

commit r12-5265-g4d281ff7ddd8f6365943c0a622107f92315bb8a6
Author: Hongyu Wang <hongyu.wang@intel.com>
Date:   Fri Nov 12 10:50:46 2021 +0800

    PR target/103069: Relax cmpxchg loop for x86 target

    From the CPU's point of view, getting a cache line for writing is more
    expensive than reading.  See Appendix A.2 Spinlock in:

    https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/
    xeon-lock-scaling-analysis-paper.pdf

    The full compare and swap will grab the cache line exclusive and causes
    excessive cache line bouncing.

    The atomic_fetch_{or,xor,and,nand} builtins generates cmpxchg loop under
    -march=x86-64 like:

            movl    v(%rip), %eax
    .L2:
            movl    %eax, %ecx
            movl    %eax, %edx
            orl     $1, %ecx
            lock cmpxchgl   %ecx, v(%rip)
            jne     .L2
            movl    %edx, %eax
            andl    $1, %eax
            ret

    To relax above loop, GCC should first emit a normal load, check and jump to
    .L2 if cmpxchgl may fail. Before jump to .L2, PAUSE should be inserted to
    yield the CPU to another hyperthread and to save power, so the code is
    like

    .L84:
            movl    (%rdi), %ecx
            movl    %eax, %edx
            orl     %esi, %edx
            cmpl    %eax, %ecx
            jne     .L82
            lock cmpxchgl   %edx, (%rdi)
            jne     .L84
    .L82:
            rep nop
            jmp     .L84

    This patch adds corresponding atomic_fetch_op expanders to insert load/
    compare and pause for all the atomic logic fetch builtins. Add flag
    -mrelax-cmpxchg-loop to control whether to generate relaxed loop.

    gcc/ChangeLog:

            PR target/103069
            * config/i386/i386-expand.c (ix86_expand_atomic_fetch_op_loop):
            New expand function.
            * config/i386/i386-options.c (ix86_target_string): Add
            -mrelax-cmpxchg-loop flag.
            (ix86_valid_target_attribute_inner_p): Likewise.
            * config/i386/i386-protos.h (ix86_expand_atomic_fetch_op_loop):
            New expand function prototype.
            * config/i386/i386.opt: Add -mrelax-cmpxchg-loop.
            * config/i386/sync.md (atomic_fetch_<logic><mode>): New expander
            for SI,HI,QI modes.
            (atomic_<logic>_fetch<mode>): Likewise.
            (atomic_fetch_nand<mode>): Likewise.
            (atomic_nand_fetch<mode>): Likewise.
            (atomic_fetch_<logic><mode>): New expander for DI,TI modes.
            (atomic_<logic>_fetch<mode>): Likewise.
            (atomic_fetch_nand<mode>): Likewise.
            (atomic_nand_fetch<mode>): Likewise.
            * doc/invoke.texi: Document -mrelax-cmpxchg-loop.

    gcc/testsuite/ChangeLog:

            PR target/103069
            * gcc.target/i386/pr103069-1.c: New test.
            * gcc.target/i386/pr103069-2.c: Ditto.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (2 preceding siblings ...)
  2021-11-15 11:10 ` cvs-commit at gcc dot gnu.org
@ 2021-11-15 14:26 ` hjl.tools at gmail dot com
  2021-11-18  8:31 ` cvs-commit at gcc dot gnu.org
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2021-11-15 14:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
   Target Milestone|---                         |12.0
         Resolution|---                         |FIXED

--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> ---
Fixed for GCC 12.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (3 preceding siblings ...)
  2021-11-15 14:26 ` hjl.tools at gmail dot com
@ 2021-11-18  8:31 ` cvs-commit at gcc dot gnu.org
  2022-01-24 23:49 ` hjl.tools at gmail dot com
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-18  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hongyu Wang <hongyuw@gcc.gnu.org>:

https://gcc.gnu.org/g:15f5e70cbb33b40c97325ef9d55557747a148d39

commit r12-5363-g15f5e70cbb33b40c97325ef9d55557747a148d39
Author: Hongyu Wang <hongyu.wang@intel.com>
Date:   Thu Nov 18 14:45:23 2021 +0800

    i386: Fix wrong codegen for -mrelax-cmpxchg-loop

    For -mrelax-cmpxchg-loop introduced by PR 103069/r12-5265, it would
    produce infinite loop. The correct code should be

    .L84:
            movl    (%rdi), %ecx
            movl    %eax, %edx
            orl     %esi, %edx
            cmpl    %eax, %ecx
            jne     .L82
            lock cmpxchgl   %edx, (%rdi)
            jne     .L84
            movl    %r8d, %eax  <<< retval is missing in previous impl
            ret
    .L82:
            rep nop
            jmp     .L84

    Adjust corresponding expander to fix such issue, and fix runtime test
    so the problem would be exposed.

    gcc/ChangeLog:

            * config/i386/i386-expand.c (ix86_expand_atomic_fetch_op_loop):
            Adjust generated cfg to avoid infinite loop.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr103069-2.c: Adjust.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (4 preceding siblings ...)
  2021-11-18  8:31 ` cvs-commit at gcc dot gnu.org
@ 2022-01-24 23:49 ` hjl.tools at gmail dot com
  2022-01-24 23:52 ` hjl.tools at gmail dot com
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2022-01-24 23:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
     Ever confirmed|0                           |1
         Resolution|FIXED                       |---
   Last reconfirmed|                            |2022-01-24

--- Comment #6 from H.J. Lu <hjl.tools at gmail dot com> ---
nptl/pthread_rwlock_common.c in glibc has:

  unsigned int r = atomic_load_relaxed (&rwlock->__data.__readers);
  /* Release MO so that subsequent readers or writers synchronize with us.  */
  while (!atomic_compare_exchange_weak_release
         (&rwlock->__data.__readers, &r,
          ((r ^ PTHREAD_RWLOCK_WRLOCKED)
           ^ ((r >> PTHREAD_RWLOCK_READER_SHIFT) == 0 ? 0
              : PTHREAD_RWLOCK_WRPHASE))))
    {
      /* TODO Back-off.  */
    }   

-mrelax-cmpxchg-loop -O2 generates:

         9240e: 89 c1                   mov    %eax,%ecx
         92410: 31 d2                   xor    %edx,%edx
         92412: c1 e9 03                shr    $0x3,%ecx
         92415: 0f 95 c2                setne  %dl
         92418: 31 c2                   xor    %eax,%edx
         9241a: 83 f2 02                xor    $0x2,%edx
         9241d: f0 41 0f b1 10          lock cmpxchg %edx,(%r8)       
         92422: 75 ea                   jne    9240e
<__pthread_rwlock_unlock@GL
IBC_2.2.5+0x10e>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (5 preceding siblings ...)
  2022-01-24 23:49 ` hjl.tools at gmail dot com
@ 2022-01-24 23:52 ` hjl.tools at gmail dot com
  2022-01-24 23:53 ` hjl.tools at gmail dot com
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2022-01-24 23:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #7 from H.J. Lu <hjl.tools at gmail dot com> ---
nptl/pthread_mutex_unlock.c in glibc has:

     do  
        { 
          newval = oldval & PTHREAD_MUTEX_PRIO_CEILING_MASK;
        }
      while (!atomic_compare_exchange_weak_release (&mutex->__data.__lock,
                                                    &oldval, newval));

GCC 12 generates:

         90a81: 41 89 c0                mov    %eax,%r8d
         90a84: 41 81 e0 00 00 f8 ff    and    $0xfff80000,%r8d
         90a8b: f0 44 0f b1 07          lock cmpxchg %r8d,(%rdi)       
         90a90: 75 ef                   jne    90a81
<__pthread_mutex_unlock_ful
l+0x81>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (6 preceding siblings ...)
  2022-01-24 23:52 ` hjl.tools at gmail dot com
@ 2022-01-24 23:53 ` hjl.tools at gmail dot com
  2022-01-24 23:55 ` hjl.tools at gmail dot com
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2022-01-24 23:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #8 from H.J. Lu <hjl.tools at gmail dot com> ---
nptl/pthread_create.c has

              do
                pd->nextevent = __nptl_last_event;
              while (atomic_compare_and_exchange_bool_acq (&__nptl_last_event,
                                                           pd, pd->nextevent));

we got

         8d1b7: 48 8b 05 da 55 16 00    mov    0x1655da(%rip),%rax        
         8d1be: 48 89 83 68 06 00 00    mov    %rax,0x668(%rbx)
         8d1c5: f0 4c 0f b1 35 ca 55    lock cmpxchg %r14,0x1655ca(%rip)        

         8d1ce: 75 e7                   jne    8d1b7
<pthread_create@@GLIBC_2.34
+0xb57>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (7 preceding siblings ...)
  2022-01-24 23:53 ` hjl.tools at gmail dot com
@ 2022-01-24 23:55 ` hjl.tools at gmail dot com
  2022-01-25  0:04 ` thiago at kde dot org
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2022-01-24 23:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |NEW

--- Comment #9 from H.J. Lu <hjl.tools at gmail dot com> ---
nptl/nptl_setxid.c in glibc has

  do  
    {   
      flags = THREAD_GETMEM (self, cancelhandling);
      newval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling,
                                          flags & ~SETXID_BITMASK, flags);
    }   
  while (flags != newval);

GCC 12 generates:

         899f0: 64 8b 14 25 08 03 00    mov    %fs:0x308,%edx
         899f8: 89 d6                   mov    %edx,%esi
         899fa: 89 d0                   mov    %edx,%eax
         899fc: 83 e6 bf                and    $0xffffffbf,%esi
         899ff: f0 0f b1 31             lock cmpxchg %esi,(%rcx)       
         89a03: 75 eb                   jne    899f0
<__GI___nptl_setxid_sighand
ler+0x90>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (8 preceding siblings ...)
  2022-01-24 23:55 ` hjl.tools at gmail dot com
@ 2022-01-25  0:04 ` thiago at kde dot org
  2022-02-15  8:59 ` wwwhhhyyy333 at gmail dot com
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2022-01-25  0:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #10 from Thiago Macieira <thiago at kde dot org> ---
(In reply to H.J. Lu from comment #9)
> nptl/nptl_setxid.c in glibc has
> 
>   do  
>     {   
>       flags = THREAD_GETMEM (self, cancelhandling);
>       newval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling,
>                                           flags & ~SETXID_BITMASK, flags);
>     }   
>   while (flags != newval);
> 
> GCC 12 generates:
> 
> 	 899f0:	64 8b 14 25 08 03 00 	mov    %fs:0x308,%edx
> 	 899f8:	89 d6                	mov    %edx,%esi
> 	 899fa:	89 d0                	mov    %edx,%eax
> 	 899fc:	83 e6 bf             	and    $0xffffffbf,%esi
> 	 899ff:	f0 0f b1 31          	lock cmpxchg %esi,(%rcx)       
> 	 89a03:	75 eb                	jne    899f0 <__GI___nptl_setxid_sighand
> ler+0x90>

This one is a single bit. This one should be replaced with a LOCK BTC and no
loop.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (9 preceding siblings ...)
  2022-01-25  0:04 ` thiago at kde dot org
@ 2022-02-15  8:59 ` wwwhhhyyy333 at gmail dot com
  2022-02-22  3:36 ` cvs-commit at gcc dot gnu.org
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2022-02-15  8:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #11 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---


For the case with atomic_compare_exchange_weak_release, it can be expanded as

loop: mov    %eax,%r8d
      and    $0xfff80000,%r8d
      mov    (%r8),%rsi <--- load lock first
      cmp    %rsi,%rax <--- compare with expected input
      jne    .L2 <--- lock ne expected
      lock cmpxchg %r8d,(%rdi)
      mov    %rsi,%rax <--- perform the behavior of failed cmpxchg
      jne    loop

But this is not suitable for atomic_compare_exchange_strong, as the document
said

Unlike atomic_compare_exchange_weak, this strong version is required to always
return true when expected indeed compares equal to the contained object, not
allowing spurious failures. If we expand cmpxchg as above, it would result in
spurious failure since the load is not atomic. 

So for

 do
   pd->nextevent = __nptl_last_event;
 while (atomic_compare_and_exchange_bool_acq (&__nptl_last_event,
                                              pd, pd->nextevent));

who invokes atomic_compare_exchange_strong we may not simply adjust the
expander. It is better to know the call is in loop condition and relax it
accordingly.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (10 preceding siblings ...)
  2022-02-15  8:59 ` wwwhhhyyy333 at gmail dot com
@ 2022-02-22  3:36 ` cvs-commit at gcc dot gnu.org
  2022-02-22  3:38 ` wwwhhhyyy333 at gmail dot com
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-22  3:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hongyu Wang <hongyuw@gcc.gnu.org>:

https://gcc.gnu.org/g:0435b978f95971e139882549f5a1765c50682216

commit r12-7316-g0435b978f95971e139882549f5a1765c50682216
Author: Hongyu Wang <hongyu.wang@intel.com>
Date:   Fri Feb 11 14:44:15 2022 +0800

    i386: Relax cmpxchg instruction under -mrelax-cmpxchg-loop [PR103069]

    For cmpxchg, it is commonly used in spin loop, and several user code
    such as pthread directly takes cmpxchg as loop condition, which cause
    huge cache bouncing.

    This patch extends previous implementation to relax all cmpxchg
    instruction under -mrelax-cmpxchg-loop with an extra atomic load,
    compare and emulate the failed cmpxchg behavior.

    For original spin loop which looks like

    loop: mov    %eax,%r8d
          or     $1,%r8d
          lock cmpxchg %r8d,(%rdi)
          jne    loop

    It will now truns to

    loop: mov    %eax,%r8d
          or     $1,%r8d
          mov    (%r8),%rsi <--- load lock first
          cmp    %rsi,%rax <--- compare with expected input
          jne    .L2 <--- lock ne expected
          lock cmpxchg %r8d,(%rdi)
          jne    loop
      L2: mov    %rsi,%rax <--- perform the behavior of failed cmpxchg
          jne    loop

    under -mrelax-cmpxchg-loop.

    gcc/ChangeLog:

            PR target/103069
            * config/i386/i386-expand.cc (ix86_expand_atomic_fetch_op_loop):
            Split atomic fetch and loop part.
            (ix86_expand_cmpxchg_loop): New expander for cmpxchg loop.
            * config/i386/i386-protos.h (ix86_expand_cmpxchg_loop): New
            prototype.
            * config/i386/sync.md (atomic_compare_and_swap<mode>): Call new
            expander under TARGET_RELAX_CMPXCHG_LOOP.
            (atomic_compare_and_swap<mode>): Likewise for doubleword modes.

    gcc/testsuite/ChangeLog:

            PR target/103069
            * gcc.target/i386/pr103069-2.c: Adjust result check.
            * gcc.target/i386/pr103069-3.c: New test.
            * gcc.target/i386/pr103069-4.c: Likewise.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (11 preceding siblings ...)
  2022-02-22  3:36 ` cvs-commit at gcc dot gnu.org
@ 2022-02-22  3:38 ` wwwhhhyyy333 at gmail dot com
  2022-02-22  4:16 ` thiago at kde dot org
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2022-02-22  3:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #13 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
All above glibc cases are now both relaxed by an load/cmp to skip cmpxchg under
-mrelax-cmpxchg-loop,

but for

>   do  
>     {   
>       flags = THREAD_GETMEM (self, cancelhandling);
>       newval = THREAD_ATOMIC_CMPXCHG_VAL (self, cancelhandling,
>                                           flags & ~SETXID_BITMASK, flags);
>     }   
>   while (flags != newval);

If we want to optimize it to lock btc, we need to know the cmpxchg lies in a
loop. So it may require an extra pass to do further analysis and optimize,
which is not a good idea to do in stage 4.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (12 preceding siblings ...)
  2022-02-22  3:38 ` wwwhhhyyy333 at gmail dot com
@ 2022-02-22  4:16 ` thiago at kde dot org
  2022-02-22  8:21 ` wwwhhhyyy333 at gmail dot com
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2022-02-22  4:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #14 from Thiago Macieira <thiago at kde dot org> ---
I'd restrict relaxations to loops emitted by the compiler. All other atomic
operations shouldn't be modified at all, unless the user asks for it. That
includes non-looping atomic operations (like LOCK BTC, LOCK XADD) as well as a
pure LOCK CMPXCHG that came from a single __atomic_compare_exchange by the
user.

I'd welcome the ability to relax the latter, especially if with one codebase I
could be efficient in CAS architectures as well as LL/SC ones.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (13 preceding siblings ...)
  2022-02-22  4:16 ` thiago at kde dot org
@ 2022-02-22  8:21 ` wwwhhhyyy333 at gmail dot com
  2022-02-22 18:05 ` thiago at kde dot org
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2022-02-22  8:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #15 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
(In reply to Thiago Macieira from comment #14)
> I'd restrict relaxations to loops emitted by the compiler. All other atomic
> operations shouldn't be modified at all, unless the user asks for it. That
> includes non-looping atomic operations (like LOCK BTC, LOCK XADD) as well as
> a pure LOCK CMPXCHG that came from a single __atomic_compare_exchange by the
> user.
> 
> I'd welcome the ability to relax the latter, especially if with one codebase
> I could be efficient in CAS architectures as well as LL/SC ones.

The latest patch relaxed the pure LOCK CMPXCHG with -mrelax-cmpxchg-loop as the
commit message shows. So if you want, I can split this part to another switch
like -mrelax-cmpxchg-insn.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (14 preceding siblings ...)
  2022-02-22  8:21 ` wwwhhhyyy333 at gmail dot com
@ 2022-02-22 18:05 ` thiago at kde dot org
  2022-02-22 18:41 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2022-02-22 18:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #16 from Thiago Macieira <thiago at kde dot org> ---
Can this option be enabled and disabled with a _Pragma?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (15 preceding siblings ...)
  2022-02-22 18:05 ` thiago at kde dot org
@ 2022-02-22 18:41 ` jakub at gcc dot gnu.org
  2022-02-22 20:25 ` thiago at kde dot org
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-22 18:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #17 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
_Pragma("GCC target \"relax-cmpxchg-loop\"")
should do that (ditto target("relax-cmpxchg-loop") attribute).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (16 preceding siblings ...)
  2022-02-22 18:41 ` jakub at gcc dot gnu.org
@ 2022-02-22 20:25 ` thiago at kde dot org
  2022-02-23  3:35 ` wwwhhhyyy333 at gmail dot com
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2022-02-22 20:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #18 from Thiago Macieira <thiago at kde dot org> ---
(In reply to Jakub Jelinek from comment #17)
> _Pragma("GCC target \"relax-cmpxchg-loop\"")
> should do that (ditto target("relax-cmpxchg-loop") attribute).

The attribute is applied to a function. I'm hoping to do it for s block of
code:

 _Pragma("GCC push_options")
 _Pragma("GCC target \"relax-cmpxchg-loop\"")
 __atomic_compare_exchange_weak(....);
 _Pragma("GCC pop_options")

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (17 preceding siblings ...)
  2022-02-22 20:25 ` thiago at kde dot org
@ 2022-02-23  3:35 ` wwwhhhyyy333 at gmail dot com
  2022-02-23  4:06 ` thiago at kde dot org
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2022-02-23  3:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #19 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
(In reply to Thiago Macieira from comment #18)
> (In reply to Jakub Jelinek from comment #17)
> > _Pragma("GCC target \"relax-cmpxchg-loop\"")
> > should do that (ditto target("relax-cmpxchg-loop") attribute).
> 
> The attribute is applied to a function. I'm hoping to do it for s block of
> code:
> 
>  _Pragma("GCC push_options")
>  _Pragma("GCC target \"relax-cmpxchg-loop\"")
>  __atomic_compare_exchange_weak(....);
>  _Pragma("GCC pop_options")

I'm not aware of any target __attribute__ or #Pragma can be used to code block,
at this level user can change their code directly, so I don't know why it is
needed..

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (18 preceding siblings ...)
  2022-02-23  3:35 ` wwwhhhyyy333 at gmail dot com
@ 2022-02-23  4:06 ` thiago at kde dot org
  2022-04-13  8:18 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: thiago at kde dot org @ 2022-02-23  4:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #20 from Thiago Macieira <thiago at kde dot org> ---
I think there will be cases where the relaxation makes sense and others where
it doesn't because the surrounding code already does it. So I'd like to control
per emission.

If I can't do it per code block, I suppose I could make a lambda block

  [&]() __attribute__((target("relax-cmpxchg-loop"))) { 
    return __atomic_compare_exchange_weak(....);
  }();

Of course, it might be easier to simply relax it myself at that point.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (19 preceding siblings ...)
  2022-02-23  4:06 ` thiago at kde dot org
@ 2022-04-13  8:18 ` cvs-commit at gcc dot gnu.org
  2022-05-06  8:31 ` jakub at gcc dot gnu.org
  2023-05-08 12:23 ` rguenth at gcc dot gnu.org
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-04-13  8:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

--- Comment #21 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hongyu Wang <hongyuw@gcc.gnu.org>:

https://gcc.gnu.org/g:522f25e90c781d284f8347a04940db8b41c42fd5

commit r12-8136-g522f25e90c781d284f8347a04940db8b41c42fd5
Author: Hongyu Wang <hongyu.wang@intel.com>
Date:   Wed Apr 13 14:51:36 2022 +0800

    i386: Fix infinite loop under -mrelax-cmpxchg-loop [PR 103069]

    For -mrelax-cmpxchg-loop which relaxes atomic_fetch_<logic> loops,
    there is a missing set to %eax when compare fails, which would result
    in infinite loop in some benchmark. Add set to %eax to avoid it.

    gcc/ChangeLog:

            PR target/103069
            * config/i386/i386-expand.cc (ix86_expand_cmpxchg_loop):
              Add missing set to target_val at pause label.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (20 preceding siblings ...)
  2022-04-13  8:18 ` cvs-commit at gcc dot gnu.org
@ 2022-05-06  8:31 ` jakub at gcc dot gnu.org
  2023-05-08 12:23 ` rguenth at gcc dot gnu.org
  22 siblings, 0 replies; 24+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-06  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.0                        |12.2

--- Comment #22 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 12.1 is being released, retargeting bugs to GCC 12.2.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/103069] cmpxchg isn't optimized
  2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
                   ` (21 preceding siblings ...)
  2022-05-06  8:31 ` jakub at gcc dot gnu.org
@ 2023-05-08 12:23 ` rguenth at gcc dot gnu.org
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-08 12:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103069

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.3                        |12.4

--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 12.3 is being released, retargeting bugs to GCC 12.4.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-05-08 12:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-03 19:08 [Bug target/103069] New: cmpxchg isn't optimized hjl.tools at gmail dot com
2021-11-03 20:53 ` [Bug target/103069] " thiago at kde dot org
2021-11-04 21:25 ` thiago at kde dot org
2021-11-15 11:10 ` cvs-commit at gcc dot gnu.org
2021-11-15 14:26 ` hjl.tools at gmail dot com
2021-11-18  8:31 ` cvs-commit at gcc dot gnu.org
2022-01-24 23:49 ` hjl.tools at gmail dot com
2022-01-24 23:52 ` hjl.tools at gmail dot com
2022-01-24 23:53 ` hjl.tools at gmail dot com
2022-01-24 23:55 ` hjl.tools at gmail dot com
2022-01-25  0:04 ` thiago at kde dot org
2022-02-15  8:59 ` wwwhhhyyy333 at gmail dot com
2022-02-22  3:36 ` cvs-commit at gcc dot gnu.org
2022-02-22  3:38 ` wwwhhhyyy333 at gmail dot com
2022-02-22  4:16 ` thiago at kde dot org
2022-02-22  8:21 ` wwwhhhyyy333 at gmail dot com
2022-02-22 18:05 ` thiago at kde dot org
2022-02-22 18:41 ` jakub at gcc dot gnu.org
2022-02-22 20:25 ` thiago at kde dot org
2022-02-23  3:35 ` wwwhhhyyy333 at gmail dot com
2022-02-23  4:06 ` thiago at kde dot org
2022-04-13  8:18 ` cvs-commit at gcc dot gnu.org
2022-05-06  8:31 ` jakub at gcc dot gnu.org
2023-05-08 12:23 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).