From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=BwkY=HB=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129])
	by sourceware.org (Postfix) with ESMTPS id CD282385AE47
	for <gcc-patches@gcc.gnu.org>; Mon, 20 Nov 2023 07:28:17 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CD282385AE47
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CD282385AE47
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::129
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700465299; cv=none;
	b=SgSZZK7mENPBbW2IY6ELqFf3soFlQKj+uHNgDnI2t3Or6pp3zv0gAy3DSMMmCBEufPdPq9sJxTX5awfCm68nLygZLneAHajwRWpiIsCiy5F+0GkKTMPHS43dGoB4EOshvMp4Ie+/NWQpACy5A3Gii2Ar/WIxou76pGcCDb+IDrg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1700465299; c=relaxed/simple;
	bh=UzN7d4RPydPYLeS3dCvsftv+q8dXFWEExjUQMTfVVi4=;
	h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=mbaasTHMA6+FHtFOwJM0NpVOu5Kia5iSd6aRTaX61tnr5XWosW1jMXDZYBFgn4SWRPmrAJUlMIe8f5zcQ7pI0REvIeta4+1bHd6708gG+MgDc+63g70QI/v/FyKs8O9WkMyP01OOO+KE5BlpQnUqalEPrYsUazN7ISqWoFYBj+Y=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-lf1-x129.google.com with SMTP id 2adb3069b0e04-507a62d4788so5696034e87.0
        for <gcc-patches@gcc.gnu.org>; Sun, 19 Nov 2023 23:28:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1700465296; x=1701070096; darn=gcc.gnu.org;
        h=content-transfer-encoding:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=wbAnZYezA8f7JqZHmt/DyYdrWR8DijUSpas0D+4UJ5I=;
        b=NiTKWCeEETG+d5aA9m9IeWsrfM7s3i5BDu137v8jdBEAljmj9M9ypWbccpi1AxcuTj
         rT9MX1cvmHdkHgNFjsxEanml7H6eFZFkOOdP71lhO3bVNIO8r8x34HyRFY1QMx3ntLZt
         e95tKMruqgDq9gxeIfVP4Mva2FBJhESmq2icKpW+4va9TRC/Vg21ByaDk4beOWb+AElG
         5qyS7hszNfls5KHyZdbpDBMvUgjGes0LEB5KpuJhCp7Tg44yHhKT+ecKOth5x5PjcZ1C
         DvS/U1YmJn6HIngIXSKqaJWNqdPMoHsysppXJj7UvbguudoPGOWPHT4hHR7J1sw8XxLb
         Oxfg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1700465296; x=1701070096;
        h=content-transfer-encoding:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=wbAnZYezA8f7JqZHmt/DyYdrWR8DijUSpas0D+4UJ5I=;
        b=GLgX4t495ORuI8QosxPKlE8l/mYf5O7nXXG/fpdaY5dF2Nk4FVUdkv8jdKVefq7kNP
         0Sw9XUbdJFMV/5/ZTz6By+wrFzkIIRUIsGk+Hsl0yvvfYkaTkHMeg/grIYkwUYdQbBV/
         2gckaFmNEhrlgkeyueNeTsdy23tCkBbFazA3sL2gGX2xjs89JHyuzw+nNYbWxtyst4KV
         VeNavu6ZgohH5BqplNnGAG75s8e/DAPMp54cUi2e2QHACglCJzBt1gk46KE6Yr0Qs25o
         xwNjwL9fe8NoRrbDpTebrsd5lrdk2b18TvP8eTtLOHUCPDF9mSqb+SCtqcNYMDIEqpzY
         wVEQ==
X-Gm-Message-State: AOJu0YzCHay6nrsABCsPRfNoMyLRZr+HpSJdTd34mzsCQaNojKNmaCoF
	S39Nsd25JH5GPKw3KJcfZHtZDhDF4xiz2UnJYgE=
X-Google-Smtp-Source: AGHT+IHjY4RIPejQMC3kPnbtIhT1I/Pnsv56HedyZtxNyFL/GLWDSAF1rYu0DgGyt3oYBU2hjqI9ZqGl7iuprBDkWrg=
X-Received: by 2002:ac2:5238:0:b0:509:8da4:93da with SMTP id
 i24-20020ac25238000000b005098da493damr4299665lfl.18.1700465295672; Sun, 19
 Nov 2023 23:28:15 -0800 (PST)
MIME-Version: 1.0
References: <ZVreIppK5dO9j3oU@cowardly-lion.the-meissners.org>
In-Reply-To: <ZVreIppK5dO9j3oU@cowardly-lion.the-meissners.org>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 20 Nov 2023 08:24:35 +0100
Message-ID: <CAFiYyc3BaGffObe2ieZXQZji_1qXTXzVOSR6dz6D87i+sZvK2w@mail.gmail.com>
Subject: Re: [PATCH 0/4] Add vector pair support to PowerPC attribute((vector_size(32)))
To: Michael Meissner <meissner@linux.ibm.com>, gcc-patches@gcc.gnu.org, 
	Segher Boessenkool <segher@kernel.crashing.org>, "Kewen.Lin" <linkw@linux.ibm.com>, 
	David Edelsohn <dje.gcc@gmail.com>, Peter Bergner <bergner@linux.ibm.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, Nov 20, 2023 at 5:19=E2=80=AFAM Michael Meissner <meissner@linux.ib=
m.com> wrote:
>
> This is simiilar to the patches on November 10th.
>
>     *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636077.ht=
ml
>     *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636078.ht=
ml
>     *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636083.ht=
ml
>     *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636080.ht=
ml
>     *   https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636081.ht=
ml
>
> to add a set of built-in functions that use the PowePC __vector_pair type=
 and
> that provide a set of functions to do basic operations on vector pair.
>
> After I posted these patches, it was decided that it would be better to h=
ave a
> new type that is used rather than a bunch of new built-in functions.  Wit=
hin
> the GCC context, the best way to add this support is to extend the vector=
 modes
> so that V4DFmode, V8SFmode, V4DImode, V8SImode, V16HImode, and V32QImode =
are
> used.
>
> These patches are to provide this new implementation.
>
> While in theory you could add a whole new type that isn't a larger size v=
ector,
> my experience with IEEE 128-bit floating point is that GCC really doesn't=
 like
> 2 modes that are the same size but have different implementations (such a=
s we
> see with IEEE 128-bit floating point and IBM double-double 128-bit floati=
ng
> point).  So I did not consider adding a new mode for using with vector pa=
irs.
>
> My original intention was to just implement V4DFmode and V8SFmode, since =
the
> primary users asking for vector pair support are people implementing the =
high
> end math libraries like Eigen and Blas.
>
> However in implementing this code, I discovered that we will need integer
> vector pair support as well as floating point vector pair.  The integer m=
odes
> and types are needed to properly implement byte shuffling and vector
> comparisons which need integer vector pairs.
>
> With the current patches, vector pair support is not enabled by default. =
 The
> main reason is I have not implemented the support for byte shuffling whic=
h
> various tests depend on.
>
> I would also like to implement overloads for the vector built-in function=
s like
> vec_add, vec_sum, etc. that if you give it a vector pair, it would handle=
 it
> just like if you give a vector type.
>
> In addition, once the various bugs are addressed, I would then implement =
the
> support so that automatic vectorization would consider using vector pairs
> instead of vectors.
>
> In terms of benchmarks, I wrote two benchmarks:
>
>    1)   One benchmark is a saxpy type loop: value[i] +=3D (a[i] * b[i]). =
 That is
>         a loop with 3 loads and a store per loop.
>
>    2)   Another benchmark produces a scalar sun of an entire vector.  Thi=
s is a
>         loop that just has a single load and no store.
>
> For the saxpy type loop, I get the following general numbers for both flo=
at and
> double:
>
>    1)   The benchmarks that use attribute((vector_size(32))) are roughly =
9-10%
>         faster than using normal vector processing (both auto vectorize a=
nd
>         using vector types).
>
>    2)   The benchmarks that use attribute((vector_size(32))) are roughly =
19-20%
>         faster than if I write the loop using the vector pair loads using=
 the
>         exist built-ins, and then manually split the values and do the
>         arithmetic and single vector stores,
>
> Unfortunately, for floating point, doing the sum of the whole vector is s=
lower
> using the new vector pair built-in functions using a simple loop (compare=
d to
> using the existing built-ins for disassembling vector pairs.  If I write =
more
> complex loops that manually unroll the loop, then the floating point vect=
or
> pair built-in functions become like the integer vector pair integer built=
-in
> functions.  So there is some amount of tuning that will need to be done.
>
> There are 4 patches in this set:
>
> The first patch adds support for the types, and does moves, and provides =
some
> optimizations for extracting an element and setting an element.
>
> The second patch implements the floating point arithmetic operations.
>
> The third patch implements the integer operations.
>
> The fourth patch provides new tests to test these features.

I wouldn't expose the "fake" larger modes to the vectorizer but rather
adjust m_suggested_unroll_factor (which you already do to some extent).

> --
> Michael Meissner, IBM
> PO Box 98, Ayer, Massachusetts, USA, 01432
> email: meissner@linux.ibm.com