From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 65715 invoked by alias); 15 Jul 2015 16:05:10 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 65688 invoked by uid 48); 15 Jul 2015 16:05:06 -0000 From: "tkoeppe at google dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/66881] New: Possibly inefficient std::atomic codegen on x86 for simple arithmetic Date: Wed, 15 Jul 2015 16:05:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 4.9.2 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: tkoeppe at google dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-07/txt/msg01248.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66881 Bug ID: 66881 Summary: Possibly inefficient std::atomic codegen on x86 for simple arithmetic Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: tkoeppe at google dot com Target Milestone: --- Consider these two simple versions of addition: #include std::atomic x; int y; void f(int a) { x.store(x.load(std::memory_order_relaxed) + a, std::memory_order_relaxed); } void g(int a) { y += a; } GCC generates the following assembly: f(int): mov eax, DWORD PTR x[rip] add edi, eax mov DWORD PTR x[rip], edi ret g(int): add DWORD PTR y[rip], edi ret Now, it is clear to me that the correct atomic codegen for store() and load() is "mov", as it appears here, but why aren't the two consecutive operations not folded into a single add? Aren't the semantics and the memory ordering the same? x86 says that (most) "reads" and "writes" are strongly ordered; doesn't that apply to the read and write produced by "add", too? (My original motivation came from a variant of this with floats, where the non-atomic code executed noticeably faster, even though I would have expected the two to produce the same machine code.)