From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 336573857722; Wed, 19 Apr 2023 09:15:17 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 336573857722 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1681895717; bh=Wl2A0fw9wKRn2z3Xyvvoj+CCag2njoMaIovzImwER74=; h=From:To:Subject:Date:In-Reply-To:References:From; b=nCSlTZ8TI5GIMMQ+gK0hDlOTa7EEm4Bs8sIjgbdiL7+PeipFKHvbWJnK/eC5YFvkT nO4Pfp4BSLoictgVtU657DG97WbuEVJovdu54pSi5qUX9HuVRrtiPfOvp+36Q6EHCV ipwgHL4Cj22bRhofix5gJp7tk3M405RS02WqKamA= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/109011] missed optimization in presence of __builtin_ctz Date: Wed, 19 Apr 2023 09:15:15 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.2.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109011 --- Comment #20 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:ade0a1ee5c6707b950ba284adcfed0514866c12d commit r14-65-gade0a1ee5c6707b950ba284adcfed0514866c12d Author: Jakub Jelinek Date: Wed Apr 19 11:14:23 2023 +0200 tree-vect-patterns: Improve __builtin_{clz,ctz,ffs}ll vectorization [PR109011] For __builtin_popcountll tree-vect-patterns.cc has vect_recog_popcount_pattern, which improves the vectorized code. Without that the vectorization is always multi-type vectorization in the loop (at least int and long long types) where we emit two .POPCOUNT calls with long long arguments and int return value and then widen to long long, so effectively after vectorization do the V?DImode -> V?DImode popcount twice, then pack the result into V?SImode and immediately unpack. The following patch extends that handling to __builtin_{clz,ctz,ffs}ll builtins as well (as long as there is an optab for them; more to come laster). x86 can do __builtin_popcountll with -mavx512vpopcntdq, __builtin_clzll with -mavx512cd, ppc can do __builtin_popcountll and __builtin_clzll with -mpower8-vector and __builtin_ctzll with -mpower9-vector, s390 can do __builtin_{popcount,clz,ctz}ll with -march=3Dz13 -mzarch (i.e. V= X). 2023-04-19 Jakub Jelinek PR tree-optimization/109011 * tree-vect-patterns.cc (vect_recog_popcount_pattern): Rename to ... (vect_recog_popcount_clz_ctz_ffs_pattern): ... this. Handle al= so CLZ, CTZ and FFS. Remove vargs variable, use gimple_build_call_internal rather than gimple_build_call_internal_vec. (vect_vect_recog_func_ptrs): Adjust popcount entry. * gcc.dg/vect/pr109011-1.c: New test.=