From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id D36AA3858C3A; Wed, 3 Nov 2021 20:53:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D36AA3858C3A From: "thiago at kde dot org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/103069] cmpxchg isn't optimized Date: Wed, 03 Nov 2021 20:53:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: thiago at kde dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Nov 2021 20:53:13 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103069 --- Comment #1 from Thiago Macieira --- (the assembly doesn't match the source code, but we got your point) Another possible improvement for the __atomic_fetch_{and,nand,or} functions= is that it can check whether the fetched value is already correct and branch o= ut. In your example, the __atomic_fetch_or with 0x40000000 can check if that bi= t is already set and, if so, not execute the CMPXCHG at all. This is a valid solution for x86 on memory orderings up to acq_rel. For oth= er architectures, they may still need barriers. For seq_cst, we either need a barrier or we need to execute the CMPXCHG at least once.=20 Therefore, the emitted code might want to optimistically execute the operat= ion once and, if it fails, enter the load loop. That's a slightly longer codege= n. Whether we want that under -Os or not, you'll have to be the judge. Prior art: glibc/sysdeps/x86_64/nptl/pthread_spin_lock.S: ENTRY(__pthread_spin_lock) 1: LOCK decl 0(%rdi) jne 2f xor %eax, %eax ret .align 16 2: rep nop cmpl $0, 0(%rdi) jg 1b jmp 2b END(__pthread_spin_lock) This does the atomic operation once, hoping it'll succeed. If it fails, it enters the PAUSE+CMP+JG loop until the value is suitable.=