From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id D36AA3858C3A; Wed,  3 Nov 2021 20:53:13 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D36AA3858C3A
From: "thiago at kde dot org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/103069] cmpxchg isn't optimized
Date: Wed, 03 Nov 2021 20:53:13 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: thiago at kde dot org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-103069-4-afQFOapnFw@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-103069-4@http.gcc.gnu.org/bugzilla/>
References: <bug-103069-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Nov 2021 20:53:13 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103069
--- Comment #1 from Thiago Macieira <thiago at kde dot org> ---
(the assembly doesn't match the source code, but we got your point)

Another possible improvement for the __atomic_fetch_{and,nand,or} functions=
 is
that it can check whether the fetched value is already correct and branch o=
ut.
In your example, the __atomic_fetch_or with 0x40000000 can check if that bi=
t is
already set and, if so, not execute the CMPXCHG at all.

This is a valid solution for x86 on memory orderings up to acq_rel. For oth=
er
architectures, they may still need barriers. For seq_cst, we either need a
barrier or we need to execute the CMPXCHG at least once.=20

Therefore, the emitted code might want to optimistically execute the operat=
ion
once and, if it fails, enter the load loop. That's a slightly longer codege=
n.
Whether we want that under -Os or not, you'll have to be the judge.

Prior art: glibc/sysdeps/x86_64/nptl/pthread_spin_lock.S:
ENTRY(__pthread_spin_lock)
1:      LOCK
        decl    0(%rdi)
        jne     2f
        xor     %eax, %eax
        ret

        .align  16
2:      rep
        nop
        cmpl    $0, 0(%rdi)
        jg      1b
        jmp     2b
END(__pthread_spin_lock)

This does the atomic operation once, hoping it'll succeed. If it fails, it
enters the PAUSE+CMP+JG loop until the value is suitable.=