From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 995DD385800F; Tue, 31 Aug 2021 13:27:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 995DD385800F From: "rearnsha at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/102135] (ARM Cortex-M3 and newer) changing operation order may reduce number of instructions needed Date: Tue, 31 Aug 2021 13:27:21 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 10.2.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rearnsha at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Aug 2021 13:27:21 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102135 --- Comment #1 from Richard Earnshaw --- A small change to the testcase shows that this is highly dependent on the constrained registers from the calling convention.=20=20 uint64_t foo64(int dummy, const uint8_t *rData1) { uint64_t buffer; buffer =3D (((uint64_t)rData1[7]) << 56)|((uint64_t)(rData1[6]) << 48)|((uint64_t)(rData1[5]) << 40)|(((uint64_t)rData1[4]) << 32)| (((uint64_t)rData1[3]) << 24)|(((uint64_t)rData1[2]) << 16)|((uint64_t)(rData1[1]) << 8)|rData1[0]; } Register allocation does not re-order code in order to reduce the conflicts= , so this is not easy to fix. This is also a problem that is more obvious in micro-testcases such as this example, in real code it is more common for the register allocator to have = more freedom and to be able to avoid issues like this. If your programming styl= e is to write functions like this you'd likely get better code overall by marking these very small functions as inline, so that they do not incur the call se= tup and call/return overhead, which can be significant when you take into accou= nt the number of registers that must be saved over a function call.=