From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1136.google.com (mail-yw1-x1136.google.com [IPv6:2607:f8b0:4864:20::1136]) by sourceware.org (Postfix) with ESMTPS id 0BC303858D35 for ; Sun, 25 Jun 2023 05:07:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0BC303858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x1136.google.com with SMTP id 00721157ae682-5704fce0f23so21337007b3.3 for ; Sat, 24 Jun 2023 22:07:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687669619; x=1690261619; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mGYbJ4siHetBbG7UJ0aPJvMGgibehDzkcQJYK8hRa/Y=; b=qW63QQiTEFeRdBoYlEn6W2dfhF0vftAO9PF8Z7GLBqvReQOg4bcHCCXexuicpDRZKW dgfZpz95uuzYLe8wtkFpbAeS8qVna0MDcE3OozfB5WoHPcgI43VrCAi9ZJzn+/v5oTRf 4+LkINWUoBr8Wvw0ajQ5/LjDNnEy1eP/Xv/XqUoWiPZZYAtOWNn6BPrv/XoONQ03o9cj EemenAYl1efwqN7VqmUVwnenRuaV0jwzIbItBYhCrBX+Eo2n11g3I8ggi/mfZwF4X+Yb lUIhJjdIgGLMggXBHM36Zi/2W9vD5l3/C4t5/FlvJa+0pQPMVSB1Y93ziYoAmLcUk340 PL2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687669619; x=1690261619; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mGYbJ4siHetBbG7UJ0aPJvMGgibehDzkcQJYK8hRa/Y=; b=hpDjNJNWUQlmr3/S222bBcjxz38edKtGc3WM/eBn0aknzTH+M6pH8i6C8on9CqViaf EWnv4AlKrVr16+M8l0mYrS0irOmBlTkJG5EZD9FNVqfZJmwgi5G/rTrE1NTs6vzEtkxT C/6SlHDPze3hfamN41KmvmIr/1c/vroyToQDPljjFnhq0MCfKieeKa90twpc9WyszGHR Ee/sOOg1Pt3dY3M/V3zDMH4KPrAM3eYoJ8tj2aS8kKrK2wfa6BTXbntP9248B8ek/IsU jkAKk08I53DlANVmJEECfaiTM68Aj45hQ/Elu9xOvlxiCdWvqJVGqrcXC7IarU56/psS cvmA== X-Gm-Message-State: AC+VfDxxXRi3CH2YizWEfbAwwiXu6DfTtY7gkHdvBJEO7joVSyhX/meJ kK+eM5yAWnGiuf1TlU4vXa8HOBsvBckbRHD0WJQ= X-Google-Smtp-Source: ACHHUZ4/pUuRpkPwqGper4qEUWnunY5Rz4elDD/5m6sRn+gFE5+7f0TDXnDPJV8ot+cZbCz9yWqOJ8tvbY+Z4zvDnhw= X-Received: by 2002:a0d:c786:0:b0:570:8854:8e9e with SMTP id j128-20020a0dc786000000b0057088548e9emr21568005ywd.33.1687669619143; Sat, 24 Jun 2023 22:06:59 -0700 (PDT) MIME-Version: 1.0 References: <04f99abe-a563-d093-23b7-4abf0f91633d@suse.com> In-Reply-To: From: Hongtao Liu Date: Sun, 25 Jun 2023 13:06:47 +0800 Message-ID: Subject: Re: [PATCH 4/5] x86: further PR target/100711-like splitting To: Jan Beulich Cc: "gcc-patches@gcc.gnu.org" , Hongtao Liu , Kirill Yukhin Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jun 21, 2023 at 2:28=E2=80=AFPM Jan Beulich via Gcc-patches wrote: > > With respective two-operand bitwise operations now expressable by a > single VPTERNLOG, add splitters to also deal with ior and xor > counterparts of the original and-only case. Note that the splitters need > to be separate, as the placement of "not" differs in the final insns > (*iornot3, *xnor3) which are intended to pick up one half of > the result. > > gcc/ > > * config/i386/sse.md: New splitters to simplify > not;vec_duplicate;{ior,xor} as vec_duplicate;{iornot,xnor}. > > gcc/testsuite/ > > * gcc.target/i386/pr100711-4.c: New test. > * gcc.target/i386/pr100711-5.c: New test. > > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17366,6 +17366,36 @@ > (match_dup 2)))] > "operands[3] =3D gen_reg_rtx (mode);") > > +(define_split > + [(set (match_operand:VI 0 "register_operand") > + (ior:VI > + (vec_duplicate:VI > + (not: > + (match_operand: 1 "nonimmediate_operand"))) > + (match_operand:VI 2 "vector_operand")))] > + " =3D=3D 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)" > + [(set (match_dup 3) > + (vec_duplicate:VI (match_dup 1))) > + (set (match_dup 0) > + (ior:VI (not:VI (match_dup 3)) (match_dup 2)))] > + "operands[3] =3D gen_reg_rtx (mode);") > + > +(define_split > + [(set (match_operand:VI 0 "register_operand") > + (xor:VI > + (vec_duplicate:VI > + (not: > + (match_operand: 1 "nonimmediate_operand"))) > + (match_operand:VI 2 "vector_operand")))] > + " =3D=3D 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)" > + [(set (match_dup 3) > + (vec_duplicate:VI (match_dup 1))) > + (set (match_dup 0) > + (not:VI (xor:VI (match_dup 3) (match_dup 2))))] > + "operands[3] =3D gen_reg_rtx (mode);") > + Can we merge this splitter(xor:not) into ior:not one with a code iterator for xor,ior, They look the same except for the xor/ior. No need to merge it into and:not case which have different guard conditions= . Others LGTM. > (define_insn "*andnot3_mask" > [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=3Dv") > (vec_merge:VI48_AVX512VL > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr100711-4.c > @@ -0,0 +1,42 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512bw -mno-avx512vl -mprefer-vector-width=3D512 -O= 2" } */ > + > +typedef char v64qi __attribute__ ((vector_size (64))); > +typedef short v32hi __attribute__ ((vector_size (64))); > +typedef int v16si __attribute__ ((vector_size (64))); > +typedef long long v8di __attribute__((vector_size (64))); > + > +v64qi foo_v64qi (char a, v64qi b) > +{ > + return (__extension__ (v64qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b; > +} > + > +v32hi foo_v32hi (short a, v32hi b) > +{ > + return (__extension__ (v32hi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b; > +} > + > +v16si foo_v16si (int a, v16si b) > +{ > + return (__extension__ (v16si) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b; > +} > + > +v8di foo_v8di (long long a, v8di b) > +{ > + return (__extension__ (v8di) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b; > +} > + > +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0xbb" = 4 { target { ! ia32 } } } } */ > +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0xbb" = 2 { target { ia32 } } } } */ > +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0xdd" = 2 { target { ia32 } } } } */ > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr100711-5.c > @@ -0,0 +1,40 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512bw -mno-avx512vl -mprefer-vector-width=3D512 -O= 2" } */ > + > +typedef char v64qi __attribute__ ((vector_size (64))); > +typedef short v32hi __attribute__ ((vector_size (64))); > +typedef int v16si __attribute__ ((vector_size (64))); > +typedef long long v8di __attribute__((vector_size (64))); > + > +v64qi foo_v64qi (char a, v64qi b) > +{ > + return (__extension__ (v64qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) ^ b; > +} > + > +v32hi foo_v32hi (short a, v32hi b) > +{ > + return (__extension__ (v32hi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) ^ b; > +} > + > +v16si foo_v16si (int a, v16si b) > +{ > + return (__extension__ (v16si) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) ^ b; > +} > + > +v8di foo_v8di (long long a, v8di b) > +{ > + return (__extension__ (v8di) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) ^ b; > +} > + > +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0x99" = 4 } } */ > --=20 BR, Hongtao