From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 2761A3858438; Thu,  8 Jun 2023 22:55:33 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2761A3858438
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1686264933;
	bh=LwPWJDxdba55SCusFGaSGETKsFLm/Ieqxj/so/IXcnM=;
	h=From:To:Subject:Date:From;
	b=y37WgVjp2ahVAKPxhG9ptsM3UM1FAYoBW9EUrISqkbBFpuiPezN6ttFCsV+yQTbnu
	 rscJso3H14jCv5a8dpUWGIa5TF+qswdythn289/fvsGsJWjNI+gdLmPW/ajgiB7xlx
	 uPQhckGdylUNYiDOIFXLFd+IDBOBzgKzw/9AQU4w=
From: "thiago at kde dot org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/110184] New: [i386] Missed optimisation: atomic
 operations should use PF, ZF and SF
Date: Thu, 08 Jun 2023 22:55:32 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 13.1.1
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: thiago at kde dot org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-110184-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110184

            Bug ID: 110184
           Summary: [i386] Missed optimisation: atomic operations should
                    use PF, ZF and SF
           Product: gcc
           Version: 13.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: thiago at kde dot org
  Target Milestone: ---

Follow up from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102566

The x86 locked ALU operations always set PF, ZF and SF, so the atomic built=
ins
could use those to emit more optimal code instead of a cmpxchg loop.

Given:
template <auto Op> int atomic_rmw_op(std::atomic_int &i)
{
    int old =3D Op(i);
    if (old =3D=3D 0)
        return 1;
    if (old < 0)
        return 2;
    return 0;
}

-------
Starting with the non-standard __atomic_OP_fetch, the current code for=20

inline int andn_fetch_1(std::atomic_int &i)
{
    return __atomic_and_fetch((int *)&i, ~1, 0);
}

is

L33:
        movl    %eax, %edx
        andl    $-2, %edx
        lock cmpxchgl   %edx, (%rdi)
        jne     .L33
        movl    %edx, %eax
        shrl    $31, %eax
        addl    %eax, %eax      // eax =3D 2 if edx < 0
        testl   %edx, %edx
        movl    $1, %edx
        cmove   %edx, %eax

But it could be more optimally written as:

        movl    %ecx, 1
        movl    %edx, 2
        xorl    %eax, %eax
        lock andl    $-2, (%rdi)
        cmove   %ecx, %eax
        cmovs   %edx, %eax

The other __atomic_OP_fetch operations are very similar. I note that GCC
already realises that if you perform __atomic_and_fetch(ptr, 1), the result
can't have the sign bit set.

-------
For the standard atomic_fetch_OP operations, there are a couple of caveats:

fetch_and: if the retrieved value is ANDed again with the same pattern; for
example:
    int pattern =3D 0x80000001;
    return i.fetch_and(pattern, std::memory_order_relaxed) & pattern;
This appears to be partially implemented, depending on what the pattern is.=
 For
example, it generates the optimal code for pattern =3D 3, 15, 0x7fffffff,
0x80000000. It appears to be related to testing for either SF or ZF, but not
both.

fetch_or: always for SF, for the useful case when the pattern being ORed
doesn't already contain the sign bit. If it does (a "non-useful case"), then
the comparison is a constant, and likewise for ZF because it's never set if=
 the
pattern isn't zero.

fetch_xor: always, because the original value is reconstructible. Avoid
generating unnecessary code in case the code already does the XOR itself, as
in:

    return i.fetch_xor(1, std::memory_order_relaxed) ^ 1;


See https://gcc.godbolt.org/z/n9bMnaE4e for full results.=