From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C022A382FAF6; Thu, 27 Jun 2024 21:06:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C022A382FAF6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1719522387; bh=Snp11SXCdO/x1QKJHAWYPn8bWscSezftcB/UqRHgs8c=; h=From:To:Subject:Date:From; b=spPI4ZkJAnp6U3ySquJHeIuWo+OhSx5rin0iUHMG0e9QCIEitOYn98BjR6yhpGiAT 61A0PE9iXnUaMmFzApeNiDFPTfl4hCAtOZ+wGuFz0n5ZkJi9OnPWqJyeaSrNSyPmvz JW1EBDoCUmTcChU72RZz8fcoHn1FGiLLGTBIz63U= From: "arcata at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/115690] New: Strange codegen for small fixed-size `memcpy` when targeting `-march=i486` Date: Thu, 27 Jun 2024 21:06:27 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 14.1.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: arcata at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115690 Bug ID: 115690 Summary: Strange codegen for small fixed-size `memcpy` when targeting `-march=3Di486` Product: gcc Version: 14.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: arcata at gmail dot com Target Milestone: --- Given the following C code: ``` void *memcpy(void *a, const void *b, unsigned long c); void foo(unsigned *x, unsigned *y) { memcpy(x, y, 16); } ``` Using gcc 14.1, `gcc -m32 -march=3Di486 -O2` produces the following assembl= y: ``` foo: push edi push esi mov ecx, DWORD PTR [esp+12] mov esi, DWORD PTR [esp+16] mov eax, DWORD PTR [esi] mov DWORD PTR [ecx], eax mov eax, DWORD PTR [esi+12] mov DWORD PTR [ecx+12], eax lea edi, [ecx+4] and edi, -4 sub ecx, edi sub esi, ecx add ecx, 16 shr ecx, 2 rep movsd pop esi pop edi ret ``` While not wrong, this seems suboptimal compared to either using `rep movsd`= to do the entire memcpy or breaking it down into four 32-bit loads and stores. `-march=3Di386` does the former: ``` foo: push edi push esi mov esi, DWORD PTR [esp+16] mov ecx, 4 mov edi, DWORD PTR [esp+12] rep movsd pop esi pop edi ret ``` and `-march=3Di586` does the latter: ``` foo: mov edx, DWORD PTR [esp+8] mov eax, DWORD PTR [esp+4] mov ecx, DWORD PTR [edx] mov DWORD PTR [eax], ecx mov ecx, DWORD PTR [edx+4] mov DWORD PTR [eax+4], ecx mov ecx, DWORD PTR [edx+8] mov DWORD PTR [eax+8], ecx mov edx, DWORD PTR [edx+12] mov DWORD PTR [eax+12], edx ret ``` either of which seems like it would better suit the i486 microarchitecture = than the hybrid approach it seems to be taking.=