From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id D2E0E3870914; Wed, 24 Jun 2020 21:14:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D2E0E3870914 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1593033260; bh=gLE8EnJp5FYULcuOt+kryWuKQYzuE5WzzBoSKUOgiss=; h=From:To:Subject:Date:From; b=Z50SVwI4afYTyBCEY/ocMiJ42xj/NGGWef/eU5Uv+YYnue1noq+Npr9qQCX6JakKo 6nj1pYg1Uo6j3ZmrT+CEq/5ZRq8YtjeLPMiB442mw+YuaRpoti+99S47E6NsJygAnV 3r0zwZmRlqlaoD7i8WoGVj5QUJ27iKeylWRXdlI8= From: "msebor at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/95886] New: suboptimal memcpy with embedded zero bytes Date: Wed, 24 Jun 2020 21:14:20 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 10.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: msebor at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Jun 2020 21:14:20 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95886 Bug ID: 95886 Summary: suboptimal memcpy with embedded zero bytes Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: msebor at gcc dot gnu.org Target Milestone: --- While testing the fix for pr95189 I noticed that the memcpy expansion into copy-by-pieces is less than optimal for sequences containing embedded null bytes. For example, in the test case below, the memcpy call in f() is expa= nded into what looks like a more efficient sequence than the equivalent memcpy c= all in g(). The only difference between the two is that the former copies a sequence of non-zero bytes while among the bytes copied by the latter is a = null byte. Clang emits the same code for g() as GCC does for f(). $ cat z.c && gcc -O2 -S -Wall -fdump-tree-optimized=3D/dev/stdout -o/dev/st= dout z.c const char a[10] =3D { 1, 2, 3, 4, 5, 6, 7, 8, 9 }; const char b[10] =3D { 0, 1, 2, 3, 4, 5, 6, 7, 8 }; void f (void *d) { __builtin_memcpy (d, a, 9); // optimal } void g (void *d) { __builtin_memcpy (d, b, 9); // suboptimal } .file "z.c" .text ;; Function f (f, funcdef_no=3D0, decl_uid=3D1932, cgraph_uid=3D1, symbol_o= rder=3D2) f (void * d) { [local count: 1073741824]: __builtin_memcpy (d_2(D), &a, 9); [tail call] return; } .p2align 4 .globl f .type f, @function f: .LFB0: .cfi_startproc movabsq $578437695752307201, %rax movb $9, 8(%rdi) movq %rax, (%rdi) ret .cfi_endproc .LFE0: .size f, .-f ;; Function g (g, funcdef_no=3D1, decl_uid=3D1935, cgraph_uid=3D2, symbol_o= rder=3D3) g (void * d) { [local count: 1073741824]: __builtin_memcpy (d_2(D), &b, 9); [tail call] return; } .p2align 4 .globl g .type g, @function g: .LFB1: .cfi_startproc movq b(%rip), %rax movq %rax, (%rdi) movzbl b+8(%rip), %eax movb %al, 8(%rdi) ret .cfi_endproc .LFE1: .size g, .-g .globl b .section .rodata .align 8 .type b, @object .size b, 10 b: .string "" .string "\001\002\003\004\005\006\007\b" .globl a .align 8 .type a, @object .size a, 10 a: .string "\001\002\003\004\005\006\007\b\t" .ident "GCC: (GNU) 10.1.1 20200527" .section .note.GNU-stack,"",@progbits=