From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) by sourceware.org (Postfix) with ESMTPS id 374A1384B801 for ; Mon, 21 Jun 2021 10:05:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 374A1384B801 Received: by mail-ej1-x633.google.com with SMTP id gt18so27769676ejc.11 for ; Mon, 21 Jun 2021 03:05:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zT1oEhfynfTHv07PpzPXVgN28Oi/bcGVwbHSC6Axco4=; b=Sb7VLMzcZdcF9kaeKTu7YFD9PkQJAE2XJwnfJpLb3eNyepbmSE3nAl7Hv8YJojqZrZ A1tYOkXYTOjdGe0tdyyk9Ae7lqw4dd48sK8JhewBv3EqX5s/7dQ6dJGy+RjX2hdCccTr UkhfAyXmoLUj5lAYIHPbJKiePrBqjyVqAbFcQ2Mg5XAzsyKX8TbU+S2Kg2eSu93m9bS+ NSYRIGa1xOozDJZD54ZiHw8yhhNduIXCgtjEEg55Woh8pGBQGbnoxZY27dMWzqEALQuh V6tdnPti4pPGh3+j3VprDglUexHQdBExwTIXZLL3xTw8PHUML/gZy8u+7zG4+PTyMUz1 FD8w== X-Gm-Message-State: AOAM531v5di/yFbCMxS0mFfOfaNNxPVhVRtvF1y9HYJkGqltBfdABv9P VJWpqxZZ1XI07wRaWVlrm/eHw7/4aL8+EgCTz7E= X-Google-Smtp-Source: ABdhPJxyvTsAFiMnmIQ2WqyVlDxjGN3R5KboaEWsSpfEp0guxIkGazvAyK45x35Y8NBG1Bch8d9joWvMxKSyQCEtJxc= X-Received: by 2002:a17:906:a38d:: with SMTP id k13mr24471176ejz.250.1624269937114; Mon, 21 Jun 2021 03:05:37 -0700 (PDT) MIME-Version: 1.0 References: <20210617062912.89506-1-hongtao.liu@intel.com> In-Reply-To: <20210617062912.89506-1-hongtao.liu@intel.com> From: Richard Biener Date: Mon, 21 Jun 2021 12:05:26 +0200 Message-ID: Subject: Re: [PATCH] Add vect_recog_popcount_pattern to handle mismatch between the vectorized popcount IFN and scalar popcount builtin. To: liuhongt Cc: GCC Patches , Hongtao Liu , "H. J. Lu" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jun 2021 10:05:40 -0000 On Thu, Jun 17, 2021 at 8:29 AM liuhongt wrote: > > The patch remove those pro- and demotions when backend support direct > optab. > > For i386: it enables vectorization for vpopcntb/vpopcntw and optimized > for vpopcntq. > > gcc/ChangeLog: > > PR tree-optimization/97770 > * tree-vect-patterns.c (vect_recog_popcount_pattern): > New. > (vect_recog_func vect_vect_recog_func_ptrs): Add new pattern. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/97770 > * gcc.target/i386/avx512bitalg-pr97770-1.c: Remove xfail. > * gcc.target/i386/avx512vpopcntdq-pr97770-1.c: Remove xfail. > --- > .../gcc.target/i386/avx512bitalg-pr97770-1.c | 27 +++-- > .../i386/avx512vpopcntdq-pr97770-1.c | 9 +- > gcc/tree-vect-patterns.c | 110 ++++++++++++++++++ > 3 files changed, 127 insertions(+), 19 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c > index c83a477045c..d1beec4cdb4 100644 > --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c > +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c > @@ -1,19 +1,18 @@ > /* PR target/97770 */ > /* { dg-do compile } */ > -/* { dg-options "-O2 -mavx512bitalg -mavx512vl -mprefer-vector-width=512" } */ > -/* Add xfail since no IFN for QI/HImode popcount */ > -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*xmm" 1 {xfail *-*-*} } } */ > -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*xmm" 1 {xfail *-*-*} } } */ > -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*ymm" 1 {xfail *-*-*} } } */ > -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*ymm" 1 {xfail *-*-*} } } */ > -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*zmm" 1 {xfail *-*-*} } } */ > -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*zmm" 1 {xfail *-*-*} } } */ > +/* { dg-options "-O2 -march=icelake-server -mprefer-vector-width=512" } */ > +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*xmm" 1 } } */ > +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*xmm" 1 } } */ > +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*ymm" 1 } } */ > +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*ymm" 1 } } */ > +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*zmm" 1 } } */ > +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*zmm" 1 } } */ > > #include > > void > __attribute__ ((noipa, optimize("-O3"))) > -popcountb_128 (char * __restrict dest, char* src) > +popcountb_128 (unsigned char * __restrict dest, unsigned char* src) > { > for (int i = 0; i != 16; i++) > dest[i] = __builtin_popcount (src[i]); > @@ -21,7 +20,7 @@ popcountb_128 (char * __restrict dest, char* src) > > void > __attribute__ ((noipa, optimize("-O3"))) > -popcountw_128 (short* __restrict dest, short* src) > +popcountw_128 (unsigned short* __restrict dest, unsigned short* src) > { > for (int i = 0; i != 8; i++) > dest[i] = __builtin_popcount (src[i]); > @@ -29,7 +28,7 @@ popcountw_128 (short* __restrict dest, short* src) > > void > __attribute__ ((noipa, optimize("-O3"))) > -popcountb_256 (char * __restrict dest, char* src) > +popcountb_256 (unsigned char * __restrict dest, unsigned char* src) > { > for (int i = 0; i != 32; i++) > dest[i] = __builtin_popcount (src[i]); > @@ -37,7 +36,7 @@ popcountb_256 (char * __restrict dest, char* src) > > void > __attribute__ ((noipa, optimize("-O3"))) > -popcountw_256 (short* __restrict dest, short* src) > +popcountw_256 (unsigned short* __restrict dest, unsigned short* src) > { > for (int i = 0; i != 16; i++) > dest[i] = __builtin_popcount (src[i]); > @@ -45,7 +44,7 @@ popcountw_256 (short* __restrict dest, short* src) > > void > __attribute__ ((noipa, optimize("-O3"))) > -popcountb_512 (char * __restrict dest, char* src) > +popcountb_512 (unsigned char * __restrict dest, unsigned char* src) > { > for (int i = 0; i != 64; i++) > dest[i] = __builtin_popcount (src[i]); > @@ -53,7 +52,7 @@ popcountb_512 (char * __restrict dest, char* src) > > void > __attribute__ ((noipa, optimize("-O3"))) > -popcountw_512 (short* __restrict dest, short* src) > +popcountw_512 (unsigned short* __restrict dest, unsigned short* src) > { > for (int i = 0; i != 32; i++) > dest[i] = __builtin_popcount (src[i]); > diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c > index 63bb00d9b4a..dedd2e4c3d6 100644 > --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c > +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c > @@ -1,13 +1,12 @@ > /* PR target/97770 */ > /* { dg-do compile } */ > -/* { dg-options "-O2 -mavx512vpopcntdq -mavx512vl -mprefer-vector-width=512" } */ > +/* { dg-options "-O2 -march=icelake-server -mprefer-vector-width=512" } */ > /* { dg-final { scan-assembler-times "vpopcntd\[ \\t\]+\[^\\n\\r\]*xmm" 1 } } */ > /* { dg-final { scan-assembler-times "vpopcntd\[ \\t\]+\[^\\n\\r\]*ymm" 1 } } */ > /* { dg-final { scan-assembler-times "vpopcntd\[ \\t\]+\[^\\n\\r\]*zmm" 1 } } */ > -/* Add xfail since current vectorizor cannot generate expected code for DImode popcount */ > -/* { dg-final { scan-assembler-times "vpopcntq\[ \\t\]+\[^\\n\\r\]*xmm" 1 { xfail *-*-* } } } */ > -/* { dg-final { scan-assembler-times "vpopcntq\[ \\t\]+\[^\\n\\r\]*ymm" 1 { xfail *-*-* } } } */ > -/* { dg-final { scan-assembler-times "vpopcntq\[ \\t\]+\[^\\n\\r\]*zmm" 1 { xfail *-*-* } } } */ > +/* { dg-final { scan-assembler-times "vpopcntq\[ \\t\]+\[^\\n\\r\]*xmm" 1 } } */ > +/* { dg-final { scan-assembler-times "vpopcntq\[ \\t\]+\[^\\n\\r\]*ymm" 1 } } */ > +/* { dg-final { scan-assembler-times "vpopcntq\[ \\t\]+\[^\\n\\r\]*zmm" 1 } } */ > #ifndef AVX512VPOPCNTQ_H_INCLUDED > #define AVX512VPOPCNTQ_H_INCLUDED > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c > index 177d44ebb5e..5c80800efbb 100644 > --- a/gcc/tree-vect-patterns.c > +++ b/gcc/tree-vect-patterns.c > @@ -1292,6 +1292,115 @@ vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, > "vect_recog_widen_minus_pattern"); > } > > +/* Function vect_recog_popcount_pattern > + > + Try to find the following pattern: > + > + UTYPE1 A; > + TYPE1 B; > + UTYPE2 temp_in; > + TYPE3 temp_out; > + temp_in = (TYPE2)A; > + > + temp_out = __builtin_popcount{,l,ll} (temp_in); > + B = (TYPE1) temp_out; > + > + TYPE2 may or may not be equal to TYPE3. > + i.e. TYPE2 is equal to TYPE3 for __builtin_popcount > + i.e. TYPE2 is not equal to TYPE3 for __builtin_popcountll > + > + Input: > + > + * STMT_VINFO: The stmt from which the pattern search begins. > + here it starts with B = (TYPE1) temp_out; > + > + Output: > + > + * TYPE_OUT: The vector type of the output of this pattern. > + > + * Return value: A new stmt that will be used to replace the sequence of > + stmts that constitute the pattern. In this case it will be: > + B = .POPCOUNT (A); > +*/ > + > +static gimple * > +vect_recog_popcount_pattern (vec_info *vinfo, > + stmt_vec_info stmt_vinfo, tree *type_out) > +{ > + gassign *last_stmt = dyn_cast (stmt_vinfo->stmt); > + gimple *popcount_stmt, *pattern_stmt; > + tree rhs_oprnd, rhs_origin, lhs_oprnd, lhs_type, vec_type, new_var; > + auto_vec vargs; > + > + /* Find B = (TYPE1) temp_out. */ > + if (!last_stmt) > + return NULL; > + tree_code code = gimple_assign_rhs_code (last_stmt); > + if (!CONVERT_EXPR_CODE_P (code)) > + return NULL; > + > + lhs_oprnd = gimple_assign_lhs (last_stmt); > + lhs_type = TREE_TYPE (lhs_oprnd); > + if (TREE_CODE (lhs_type) != INTEGER_TYPE) > + return NULL; INTEGRAL_TYPE_P > + rhs_oprnd = gimple_assign_rhs1 (last_stmt); > + if (TREE_CODE (rhs_oprnd) != SSA_NAME > + || !has_single_use (rhs_oprnd)) > + return NULL; > + popcount_stmt = SSA_NAME_DEF_STMT (rhs_oprnd); > + > + /* Find temp_out = __builtin_popcount{,l,ll} (temp_in); */ > + if (!is_gimple_call (popcount_stmt) > + || !gimple_call_lhs (popcount_stmt)) Since you're arriving here via use-def chain the LHS will never be NULL. > + return NULL; > + switch (gimple_call_combined_fn (popcount_stmt)) > + { > + CASE_CFN_POPCOUNT: > + break; > + default: > + return NULL; > + } > + for safety: if (gimple_call_num_args (popcount_stmt) != 1) return NULL; > + rhs_oprnd = gimple_call_arg (popcount_stmt, 0); > + vect_unpromoted_value unprom_diff; > + rhs_origin = vect_look_through_possible_promotion (vinfo, rhs_oprnd, > + &unprom_diff); > + > + if (!rhs_origin) > + return NULL; > + > + /* Input and outout of .POPCOUNT should be same-precision integer. > + Also A should be unsigned or same presion as temp_in, > + otherwise there would be sign_extend from A to temp_in. */ > + if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (lhs_type) > + || !(TYPE_UNSIGNED (unprom_diff.type) > + || (TYPE_PRECISION (unprom_diff.type) > + == TYPE_PRECISION (TREE_TYPE (rhs_oprnd))))) Note I find a if (A || !(B || C)) hard to read, please write if (A || (!B && !C)) instead. OK otherwise. Thanks, Richard. > + return NULL; > + vargs.safe_push (unprom_diff.op); > + > + vect_pattern_detected ("vec_regcog_popcount_pattern", popcount_stmt); > + vec_type = get_vectype_for_scalar_type (vinfo, lhs_type); > + /* Do it only the backend existed popcount2. */ > + if (!direct_internal_fn_supported_p (IFN_POPCOUNT, > + vec_type, > + OPTIMIZE_FOR_SPEED)) > + return NULL; > + > + /* Create B = .POPCOUNT (A). */ > + new_var = vect_recog_temp_ssa_var (lhs_type, NULL); > + pattern_stmt = gimple_build_call_internal_vec (IFN_POPCOUNT, vargs); > + gimple_call_set_lhs (pattern_stmt, new_var); > + gimple_set_location (pattern_stmt, gimple_location (last_stmt)); > + *type_out = vec_type; > + > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "created pattern stmt: %G", pattern_stmt); > + return pattern_stmt; > +} > + > /* Function vect_recog_pow_pattern > > Try to find the following pattern: > @@ -5283,6 +5392,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { > { vect_recog_sad_pattern, "sad" }, > { vect_recog_widen_sum_pattern, "widen_sum" }, > { vect_recog_pow_pattern, "pow" }, > + { vect_recog_popcount_pattern, "popcount" }, > { vect_recog_widen_shift_pattern, "widen_shift" }, > { vect_recog_rotate_pattern, "rotate" }, > { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" }, > -- > 2.18.1 >