From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 19D083858D35; Mon, 30 Aug 2021 11:40:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 19D083858D35 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/102125] (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations Date: Mon, 30 Aug 2021 11:40:45 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 10.2.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: component cf_reconfirmed_on cf_gcctarget keywords everconfirmed bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Aug 2021 11:40:46 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102125 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Component|c |target Last reconfirmed| |2021-08-30 Target| |arm Keywords| |missed-optimization Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #2 from Richard Biener --- One common source of missed optimizations is gimple_fold_builtin_memory_op which has /* If we can perform the copy efficiently with first doing all loads and then all stores inline it that way. Currently efficiently means that we can load all the memory into a single integer register which is what MOVE_MAX gives us. */ src_align =3D get_pointer_alignment (src); dest_align =3D get_pointer_alignment (dest); if (tree_fits_uhwi_p (len) && compare_tree_int (len, MOVE_MAX) <=3D 0 ... /* If the destination pointer is not aligned we must be a= ble to emit an unaligned store. */ && (dest_align >=3D GET_MODE_ALIGNMENT (mode) || !targetm.slow_unaligned_access (mode, dest_align) || (optab_handler (movmisalign_optab, mode) !=3D CODE_FOR_nothing))) where here likely the MOVE_MAX limit applies (it is 4). Since we actually do need to perform two loads the code seems to do what is intended (but that's of course "bad" for 64bit copies on 32bit archs and likewise for 128bit copies on 64bit archs). It's usually too late for RTL memcpy expansion to fully elide stack storage= .=