From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5D9623858C5E; Mon, 23 Oct 2023 11:16:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5D9623858C5E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1698059773; bh=WxoGWIojgJurXvFErcFlfuAZTAsiIprxlzDc/NTHXHU=; h=From:To:Subject:Date:From; b=wILXQbed22kS4lz7xASJpdJvW/2cZoUZruztIv6hNlR/5Mot7HtlsLJSQeWwYYo8s 5HMCHZR726ybjIIJaGHVvPw/s9NARPoeCdTZzxwVw4N7EQN6xK+Owm7te0iir8GUY8 OjE494+gsJqyAycAkHMiGH4cFYcjZVJedi3PIXE4= From: "bettio.davide at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/111933] New: memcpy on Xtensa not optimized when n == sizeof(uint32_t) or sizeof(uint64_t) Date: Mon, 23 Oct 2023 11:16:12 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: 11.2.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: bettio.davide at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111933 Bug ID: 111933 Summary: memcpy on Xtensa not optimized when n =3D=3D sizeof(uint32_t) or sizeof(uint64_t) Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: bettio.davide at gmail dot com Target Milestone: --- This issue is about what I think being a missing optimization on ESP32 Xten= sa GCC compiler.=20 I tested the same issue on versions between gcc 8.4.0 and 11.2.0 with Xtensa ESP32/ESP32-S2/ESP32-S3 GCC.=20 I'm writing some functions for unaligned memory access and I've been checki= ng them with Compiler Explorer (https://godbolt.org/) and I'm getting some (I think) sub-optimal outputs. As far as I understood on ESP32 Xtensa a 32-bit unaligned memory access is faster than 4 8-bit accesses, however I'm getting the following results (us= ing -O2) and the following snippets of code: Function that calls the inline from_unaligned_u32: bool test2(uint32_t *in) { uint32_t got =3D from_unaligned_u32(in); if (got > 5) { return true; } return false; } A: uint32_t from_unaligned_u32(uint32_t *unaligned) { uint32_t tmp; tmp =3D *unaligned; return tmp; } generates: test2(unsigned int*): entry sp, 32 l32i.n a8, a2, 0 movi.n a2, 1 bgeui a8, 6, .L2 movi.n a2, 0 .L2: extui a2, a2, 0, 1 retw.n B: inline uint32_t from_unaligned_u32(uint32_t *unaligned) { uint32_t tmp; memcpy(&tmp, unaligned, sizeof(tmp)); return tmp; } generates: test2(unsigned int*): entry sp, 48 l8ui a8, a2, 2 l8ui a10, a2, 0 l8ui a9, a2, 1 l8ui a2, a2, 3 s8i a10, sp, 0 s8i a2, sp, 3 s8i a9, sp, 1 s8i a8, sp, 2 l32i.n a8, sp, 0 movi.n a2, 1 bgeui a8, 6, .L2 movi.n a2, 0 .L2: extui a2, a2, 0, 1 retw.n My assumption here is that unaligned access on Xtensa ESP32 is faster than calling memcpy or multiple 1-byte loads (please let me know if I am wrong),= so from my point of view is a missing optimization. I would expect both A and B generating the same assembly code like on other archs. Also interstingly the uint64_t "B" version (that is similar to the previous= ), generates a call to memcpy instead of some inline code.=