From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 652BB38449F7; Wed, 22 Nov 2023 19:07:15 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 652BB38449F7
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1700680035;
	bh=Jbsm32QaoE2YxAQq4bWBp9RvEUKU/wbVWWtlqdxr7PA=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=d4vwttSqWLgdiKmubRD8atvJwx/WSbhC5d3N1J3gtvaRJ3SZmFsqkLhETE9To/pjb
	 uNNY2yTqEmHV6Za6nrpXQN/CxcAKCszWM1EN4k5FUxKkfu/QRJ1qrxR0jmtT73k0vk
	 dOKEibWhzG95Q5OBb42CAwfhxtJtAYg2QYZ8e30E=
From: "bugdal at aerifal dot cx" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/32667] block copy with exact overlap is expanded as
 memcpy
Date: Wed, 22 Nov 2023 19:07:14 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 4.2.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: bugdal at aerifal dot cx
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-32667-4-i4ZwKuV1HW@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-32667-4@http.gcc.gnu.org/bugzilla/>
References: <bug-32667-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D32667
--- Comment #28 from Rich Felker <bugdal at aerifal dot cx> ---
> No, that is not a reasonable fix, because it severely pessimizes common c=
ode for a theoretical only problem.

Far less than a call to memmove (which necessarily has something comparable=
 to
that and other unnecessary branches) pessimizes it.

I also disagree that it's severe. On basically any machine with branch
prediction, the branch will be predicted correctly all the time and has
basically zero cost. On the other hand, the branches in memmove could go
different ways depending on the caller, so it's much more
machine-capability-dependent whether they can be predicted.

In some sense the optimal thing to do is "nothing", just assuming it would =
be
hard to write a memcpy that fails on src=3D=3Ddest. However, at the very le=
ast this
precludes hardened memcpy trapping on src=3D=3Ddest, which might be a useful
hardening feature (or rather on a range test for overlapping, which would
happen to also catch exact overlap). So it would be nice if it were fixed.

FWIW, I don't think single branches are relevant to overall performance in
cases where the compiler is doing something reasonable by emitting a call to
memcpy to implement assignment. If the object is small enough that the bran=
ch
is relevant, the call overhead is even more of a big deal, and it should be
inlining loads/stores to perform the assignment.=