From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 652BB38449F7; Wed, 22 Nov 2023 19:07:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 652BB38449F7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1700680035; bh=Jbsm32QaoE2YxAQq4bWBp9RvEUKU/wbVWWtlqdxr7PA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=d4vwttSqWLgdiKmubRD8atvJwx/WSbhC5d3N1J3gtvaRJ3SZmFsqkLhETE9To/pjb uNNY2yTqEmHV6Za6nrpXQN/CxcAKCszWM1EN4k5FUxKkfu/QRJ1qrxR0jmtT73k0vk dOKEibWhzG95Q5OBb42CAwfhxtJtAYg2QYZ8e30E= From: "bugdal at aerifal dot cx" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/32667] block copy with exact overlap is expanded as memcpy Date: Wed, 22 Nov 2023 19:07:14 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 4.2.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: bugdal at aerifal dot cx X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D32667 --- Comment #28 from Rich Felker --- > No, that is not a reasonable fix, because it severely pessimizes common c= ode for a theoretical only problem. Far less than a call to memmove (which necessarily has something comparable= to that and other unnecessary branches) pessimizes it. I also disagree that it's severe. On basically any machine with branch prediction, the branch will be predicted correctly all the time and has basically zero cost. On the other hand, the branches in memmove could go different ways depending on the caller, so it's much more machine-capability-dependent whether they can be predicted. In some sense the optimal thing to do is "nothing", just assuming it would = be hard to write a memcpy that fails on src=3D=3Ddest. However, at the very le= ast this precludes hardened memcpy trapping on src=3D=3Ddest, which might be a useful hardening feature (or rather on a range test for overlapping, which would happen to also catch exact overlap). So it would be nice if it were fixed. FWIW, I don't think single branches are relevant to overall performance in cases where the compiler is doing something reasonable by emitting a call to memcpy to implement assignment. If the object is small enough that the bran= ch is relevant, the call overhead is even more of a big deal, and it should be inlining loads/stores to perform the assignment.=