From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb34.google.com (mail-yb1-xb34.google.com [IPv6:2607:f8b0:4864:20::b34]) by sourceware.org (Postfix) with ESMTPS id 2188F3861891 for ; Fri, 15 Dec 2023 02:26:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2188F3861891 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2188F3861891 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::b34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702607218; cv=none; b=UrWKM2tAHw0eoRXRJOiHidDjB5b2mj7ek39biEapfNHL/YdYhXrw7WHIvt3ht5jsHFB15SQs/T7FocmUoU7b9534e4LUHIlIEzuRnUS90yF296RwcFuRSJF2PiRNKNF60AqywVYUr9xcLxYhI0lodFX1EIwtQnyP/7WMPnFO9IQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702607218; c=relaxed/simple; bh=gB3hueReeAQT7lpRbaWbXol38nbES1phTp4qWxnXChk=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=DMJdU5fj9NzaGPtJp+e4VuXfLwTEdT1ue1YG+vdsIhG/esS6dcphJ7bqGMb66nOqC/LDcw08R3hCpMGIu044pZH6TInVFu1I976Ah2igpBpgMcALYBTDM2t+3+5pdF4bUknJttESesGE6Trf5L/F4TmrfvKc8AkpxsPaRdncVEY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yb1-xb34.google.com with SMTP id 3f1490d57ef6-dbcca9d5ae5so163599276.0 for ; Thu, 14 Dec 2023 18:26:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702607215; x=1703212015; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LXCEXsjFE9WmYTlWgRRvqmlXOGu4lw9+rdas7v2JL4A=; b=QU7dbXDkqOQSLDqouQ26Sa+ffGm9UiC4+KK8QrOY2s10NrKq0iOWC13+rxd+uYUgHO DOxGYls8jf0h9Eiu/tdLoAVrzwsappgmc48zLgRmyK460m3ky9mCC4MqDKmJAEf/4lp5 mrDjLCXEop7n36jbPtvxl/Uh5yj+6prNTmNHfKoSwYg8T69ds6nENnUhQ7FGwk9g6HJK ON/j9zq4iXtkvE0yhXy57L2zA5OjpTyY9elGVbgFVVGnn8fl1scrc+n/9dBlvTd9Tet7 c+IoFKYhmQ47vIEm4aYyf3nankZIp0SS6ZBvI6Kz3mLJAPzisyt5/Cswv+kinMoULORt HyYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702607215; x=1703212015; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LXCEXsjFE9WmYTlWgRRvqmlXOGu4lw9+rdas7v2JL4A=; b=AHW6ZRz1RfLaXtDRUeCZn/U4Y7jqRNUFjnmmIDfxWwW+oHOUSUr2RAmo57vJr/SzbZ GdUE8MqYKE/AaAop+xhCsc5bZA9dEsHe5lFDS58H+6A91UwdLfQ65ZFpSI8dd5UtOEQL lqaaqCJKGl+NxK1JAr5kw/g9zLRautMMIkGhraJOmCrjzAEZxEWpVJo1MpLglkytNUZh Ae5Owa48L7KK9KoT9Ld3NXgGwQnRm/qt4pkKdVfRsYwL0mycRm1de+vBNGhvW9sqt/jF Mhq+TH8vD09o58ztGsvW8RK9g7VMmwePM6UYFHVIjxPE9N/5Cm2LxxW89mDhA6BlHizx R1qg== X-Gm-Message-State: AOJu0YxXpKmOP2AJl2ze678pqH4zLshVq/4cTOZiev+v7M3Q/S8y9E/J kYg/nFiNhOyVVLr0d3YpXKJbbmjGWDi8ei45nHs= X-Google-Smtp-Source: AGHT+IHjuyc6PTqYaVDvLQqh6nQRBbjGcfIn1w2KTRZKtgnJmoZebCak+u9tNZNBCoJR83XyucYMQHUoEjeCHSw/0GE= X-Received: by 2002:a0d:d687:0:b0:5e2:fde8:9626 with SMTP id y129-20020a0dd687000000b005e2fde89626mr3371671ywd.41.1702607215200; Thu, 14 Dec 2023 18:26:55 -0800 (PST) MIME-Version: 1.0 References: <20231214075402.464671-1-hongyu.wang@intel.com> In-Reply-To: <20231214075402.464671-1-hongyu.wang@intel.com> From: Hongtao Liu Date: Fri, 15 Dec 2023 10:35:12 +0800 Message-ID: Subject: Re: [PATCH] i386: Sync move_max/store_max with prefer-vector-width [PR112824] To: Hongyu Wang Cc: gcc-patches@gcc.gnu.org, hjl.tools@gmail.com, hongtao.liu@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Dec 14, 2023 at 3:54=E2=80=AFPM Hongyu Wang = wrote: > > Hi, > > Currently move_max follows the tuning feature first, but ideally it > should sync with prefer-vector-width when it is explicitly set to keep > vector move and operation with same vector size. > > Bootstrapped/regtested on x86-64-pc-linux-gnu{-m32,} > > OK for trunk? > > gcc/ChangeLog: > > PR target/112824 > * config/i386/i386-options.cc (ix86_option_override_internal): > Sync ix86_move_max/ix86_store_max with prefer_vector_width when > it is explicitly set. > > gcc/testsuite/ChangeLog: > > PR target/112824 > * gcc.target/i386/pieces-memset-45.c: Remove > -mprefer-vector-width=3D256. > * g++.target/i386/pr112824-1.C: New test. > --- > gcc/config/i386/i386-options.cc | 8 +- > gcc/testsuite/g++.target/i386/pr112824-1.C | 113 ++++++++++++++++++ > .../gcc.target/i386/pieces-memset-45.c | 2 +- > 3 files changed, 120 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/g++.target/i386/pr112824-1.C > > diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-optio= ns.cc > index 588a0878c0d..440ef59ffff 100644 > --- a/gcc/config/i386/i386-options.cc > +++ b/gcc/config/i386/i386-options.cc > @@ -3012,7 +3012,9 @@ ix86_option_override_internal (bool main_args_p, > { > /* Set the maximum number of bits can be moved from memory to > memory efficiently. */ > - if (ix86_tune_features[X86_TUNE_AVX512_MOVE_BY_PIECES]) > + if (opts_set->x_prefer_vector_width_type !=3D PVW_NONE) > + opts->x_ix86_move_max =3D opts->x_prefer_vector_width_type; > + else if (ix86_tune_features[X86_TUNE_AVX512_MOVE_BY_PIECES]) > opts->x_ix86_move_max =3D PVW_AVX512; > else if (ix86_tune_features[X86_TUNE_AVX256_MOVE_BY_PIECES]) > opts->x_ix86_move_max =3D PVW_AVX256; > @@ -3034,7 +3036,9 @@ ix86_option_override_internal (bool main_args_p, > { > /* Set the maximum number of bits can be stored to memory > efficiently. */ > - if (ix86_tune_features[X86_TUNE_AVX512_STORE_BY_PIECES]) > + if (opts_set->x_prefer_vector_width_type !=3D PVW_NONE) > + opts->x_ix86_store_max =3D opts->x_prefer_vector_width_type; > + else if (ix86_tune_features[X86_TUNE_AVX512_STORE_BY_PIECES]) > opts->x_ix86_store_max =3D PVW_AVX512; > else if (ix86_tune_features[X86_TUNE_AVX256_STORE_BY_PIECES]) > opts->x_ix86_store_max =3D PVW_AVX256; > diff --git a/gcc/testsuite/g++.target/i386/pr112824-1.C b/gcc/testsuite/g= ++.target/i386/pr112824-1.C > new file mode 100644 > index 00000000000..fccaf23c530 > --- /dev/null > +++ b/gcc/testsuite/g++.target/i386/pr112824-1.C > @@ -0,0 +1,113 @@ > +/* PR target/112824 */ > +/* { dg-do compile } */ > +/* { dg-options "-std=3Dc++23 -O3 -march=3Dskylake-avx512 -mprefer-vecto= r-width=3D512" } */ > +/* { dg-final { scan-assembler-not "vmov(?:dqu|apd)\[ \\t\]+\[^\n\]*%ymm= " } } */ > + > + remove empty line. > +#include > +#include > +#include > +#include > + > +template > +using Vec [[gnu::vector_size(W * sizeof(T))]] =3D T; > + > +// Omitted: 16 without AVX, 32 without AVX512F, > +// or for forward compatibility some AVX10 may also mean 32-only > +static constexpr ptrdiff_t VectorBytes =3D 64; > +template > +static constexpr ptrdiff_t VecWidth =3D 64 <=3D sizeof(T) ? 1 : 64/sizeo= f(T); > + > +template struct Vector{ > + static constexpr ptrdiff_t L =3D N; > + T data[L]; > + static constexpr auto size()->ptrdiff_t{return N;} > +}; > +template struct Vector{ > + static constexpr ptrdiff_t W =3D N >=3D VecWidth ? VecWidth : = ptrdiff_t(std::bit_ceil(size_t(N))); > + static constexpr ptrdiff_t L =3D (N/W) + ((N%W)!=3D0); > + using V =3D Vec; > + V data[L]; > + static constexpr auto size()->ptrdiff_t{return N;} > +}; > +/// should be trivially copyable > +/// codegen is worse when passing by value, even though it seems like it= should make > +/// aliasing simpler to analyze? > +template > +[[gnu::always_inline]] constexpr auto operator+(Vector x, Vector y) -> Vector { > + Vector z; > + for (ptrdiff_t n =3D 0; n < Vector::L; ++n) z.data[n] =3D x.dat= a[n] + y.data[n]; > + return z; > +} > +template > +[[gnu::always_inline]] constexpr auto operator*(Vector x, Vector y) -> Vector { > + Vector z; > + for (ptrdiff_t n =3D 0; n < Vector::L; ++n) z.data[n] =3D x.dat= a[n] * y.data[n]; > + return z; > +} > +template > +[[gnu::always_inline]] constexpr auto operator+(T x, Vector y) -> V= ector { > + Vector z; > + for (ptrdiff_t n =3D 0; n < Vector::L; ++n) z.data[n] =3D x + y= .data[n]; > + return z; > +} > +template > +[[gnu::always_inline]] constexpr auto operator*(T x, Vector y) -> V= ector { > + Vector z; > + for (ptrdiff_t n =3D 0; n < Vector::L; ++n) z.data[n] =3D x * y= .data[n]; > + return z; > +} > + > + > + Ditto. > +template struct Dual { > + T value; > + Vector partials; > +}; > +// Here we have a specialization for non-power-of-2 `N` > +template > +requires(std::floating_point && (std::popcount(size_t(N))>1)) > +struct Dual { > + Vector data; > +}; > + > +template > +consteval auto firstoff(){ > + static_assert(std::same_as, "type not implemented"); > + if constexpr (W=3D=3D2) return Vec<2,int64_t>{0,1} !=3D 0; > + else if constexpr (W =3D=3D 4) return Vec<4,int64_t>{0,1,2,3} !=3D 0= ; > + else if constexpr (W =3D=3D 8) return Vec<8,int64_t>{0,1,2,3,4,5,6,7= } !=3D 0; > + else static_assert(false, "vector width not implemented"); > +} > + > +template > +[[gnu::always_inline]] constexpr auto operator+(Dual a, > + Dual b) > + -> Dual { > + if constexpr (std::floating_point && (std::popcount(size_t(N))>1)){ > + Dual c; > + for (ptrdiff_t l =3D 0; l < Vector::L; ++l) > + c.data.data[l] =3D a.data.data[l] + b.data.data[l]; > + return c; > + } else return {a.value + b.value, a.partials + b.partials}; > +} > + > +template > +[[gnu::always_inline]] constexpr auto operator*(Dual a, > + Dual b) > + -> Dual { > + if constexpr (std::floating_point && (std::popcount(size_t(N))>1)){ > + using V =3D typename Vector::V; > + V va =3D V{}+a.data.data[0][0], vb =3D V{}+b.data.data[0][0]; > + V x =3D va * b.data.data[0]; > + Dual c; > + c.data.data[0] =3D firstoff::W,T>() ? x + vb*a.data.data= [0] : x; > + for (ptrdiff_t l =3D 1; l < Vector::L; ++l) > + c.data.data[l] =3D va*b.data.data[l] + vb*a.data.data[l]; > + return c; > + } else return {a.value * b.value, a.value * b.partials + b.value * a.p= artials}; > +} > + > +void prod(Dual,2> &c, const Dual,2> &a, co= nst Dual,2>&b){ > + c =3D a*b; > +} > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-45.c b/gcc/tests= uite/gcc.target/i386/pieces-memset-45.c > index 70c80e5064b..e8ce7c23256 100644 > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-45.c > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-45.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -march=3Dx86-64 -mprefer-vector-width=3D256 -mavx51= 2f -mtune-ctrl=3Davx512_store_by_pieces" } */ > +/* { dg-options "-O2 -march=3Dx86-64 -mavx512f -mtune-ctrl=3Davx512_stor= e_by_pieces" } */ > > extern char *dst; > > -- > 2.31.1 > Others LGTM. --=20 BR, Hongtao