From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0A5EB385BF9B; Mon, 23 Aug 2021 09:54:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0A5EB385BF9B From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/86723] not optimizing with bswap, that casts to int aftwards Date: Mon, 23 Aug 2021 09:54:11 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 8.1.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Aug 2021 09:54:12 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D86723 --- Comment #7 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:b320edc0c29c838b0090c3c9be14187d132f73f2 commit r12-3072-gb320edc0c29c838b0090c3c9be14187d132f73f2 Author: Jakub Jelinek Date: Mon Aug 23 11:52:06 2021 +0200 bswap: Recognize (int) __builtin_bswap64 (arg) idioms or __builtin_bswa= p?? (arg) & mask [PR86723] The following patch recognizes in the bswap pass (only there for now, haven't done it for store merging pass yet) code sequences that can be handled by (int32) __builtin_bswap64 (arg), i.e. where we have 0x05060708 n->n with 64-bit non-memory argument (if it is memory, we can just load the 32-bit at 4 bytes into the address and n->n would be 0x01020304; and only 64 -> 32 bit, because 64 -> 16 bit or 32 -> 16 = bit would mean only two bytes in the result and probably not worth it), and furthermore the case where we have in the 0x0102030405060708 etc. numbers some bytes 0 (i.e. known to contain zeros rather than source bytes), as long as we have at least two original bytes in the right positions (and no unknown bytes). This can be handled by __builtin_bswap64 (arg) & 0xff0000ffffff00ffULL etc. The latter change is the reason why counting the bswap messages doesn't work too well in optimize-bswap* tests anymore, while the pass iterates from= end of basic block towards start, it will often match both the bswap at the= end and some of the earlier bswaps with some masks (not a problem generally, we'll just DCE it away whenever possible). The pass right now doesn't handle __builtin_bswap* calls in the pattern matching (which is the rea= son why it operates backwards), but it uses FOR_EACH_BB_FN (bb, fun) order of handling blocks and matched sequences can span multiple blocks, so I= was worried about cases like: void bar (unsigned long long); unsigned long long foo (unsigned long long value, int x) { unsigned long long tmp =3D (((value & 0x00000000000000ffull) << 56) | ((value & 0x000000000000ff00ull) << 40) | ((value & 0x00000000ff000000ull) << 8)); if (x) bar (tmp); return (tmp | ((value & 0x000000ff00000000ull) >> 8) | ((value & 0x0000ff0000000000ull) >> 24) | ((value & 0x0000000000ff0000ull) << 24) | ((value & 0x00ff000000000000ull) >> 40) | ((value & 0xff00000000000000ull) >> 56)); } but it seems we handle even that fine, while bb2 ending in GIMPLE_COND is processed first, we recognize there a __builtin_bswap64 (value) & ma= sk1, in the last bb we recognize tmp | (__builtin_bswap64 (value) & mask2) a= nd PRE optimizes that into t =3D __builtin_bswap64 (value); tmp =3D t & ma= sk1; in the first bb and return t; in the last one. 2021-08-23 Jakub Jelinek PR tree-optimization/86723 * gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Add cast64_to_32 argument, set *cast64_to_32 to false, unless n is non-memory permutation of 64-bit src which only has bytes of 0 or [5..8] and n->range is 4. (find_bswap_or_nop): Add cast64_to_32 and mask arguments, adjust find_bswap_or_nop_finalize caller, support bswap with some bytes zeroed, as long as at least two bytes are not zeroed. (bswap_replace): Add mask argument and handle masking of bswap result. (maybe_optimize_vector_constructor): Adjust find_bswap_or_nop caller, punt if cast64_to_32 or mask is not all ones. (pass_optimize_bswap::execute): Adjust find_bswap_or_nop_finali= ze caller, for now punt if cast64_to_32. * gcc.dg/pr86723.c: New test. * gcc.target/i386/pr86723.c: New test. * gcc.dg/optimize-bswapdi-1.c: Use -fdump-tree-optimized instea= d of -fdump-tree-bswap and scan for number of __builtin_bswap64 call= s. * gcc.dg/optimize-bswapdi-2.c: Likewise. * gcc.dg/optimize-bswapsi-1.c: Use -fdump-tree-optimized instea= d of -fdump-tree-bswap and scan for number of __builtin_bswap32 call= s. * gcc.dg/optimize-bswapsi-5.c: Likewise. * gcc.dg/optimize-bswapsi-3.c: Likewise. Expect one __builtin_bswap32 call instead of zero.=