From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from mx2.suse.de (mx2.suse.de [195.135.220.15])
 by sourceware.org (Postfix) with ESMTPS id 99EB8386EC57
 for <gcc@gcc.gnu.org>; Wed, 12 May 2021 09:27:57 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 99EB8386EC57
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=rguenther@suse.de
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.221.27])
 by mx2.suse.de (Postfix) with ESMTP id 5BC96AF75;
 Wed, 12 May 2021 09:27:56 +0000 (UTC)
Date: Wed, 12 May 2021 11:27:56 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: Richard Sandiford <richard.sandiford@arm.com>
cc: Tamar Christina <Tamar.Christina@arm.com>, 
 "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Subject: Re: [RFC] Implementing detection of saturation and rounding arithmetic
In-Reply-To: <mpt5yzo1gpi.fsf@arm.com>
Message-ID: <nycvar.YFH.7.76.2105121123420.9200@zhemvz.fhfr.qr>
References: <VI1PR08MB5325923B697DFCB8927347A6FF539@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <mpt5yzo1gpi.fsf@arm.com>
User-Agent: Alpine 2.21 (LSU 202 2017-01-01)
MIME-Version: 1.0
X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 12 May 2021 09:27:59 -0000

On Wed, 12 May 2021, Richard Sandiford wrote:

> Tamar Christina <Tamar.Christina@arm.com> writes:
> > Hi All,
> >
> > We are looking to implement saturation support in the compiler.  The aim is to
> > recognize both Scalar and Vector variant of typical saturating expressions.
> >
> > As an example:
> >
> > 1. Saturating addition:
> >    char sat (char a, char b)
> >    {
> >       int tmp = a + b;
> >       return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >    }
> >
> > 2. Saturating abs:
> >    char sat (char a)
> >    {
> >       int tmp = abs (a);
> >       return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
> >    }
> >
> > 3. Rounding shifts
> >    char rndshift (char dc)
> >    {
> >       int round_const = 1 << (shift - 1);
> >       return (dc + round_const) >> shift;
> >    }
> >
> > etc.
> >
> > Of course the first issue is that C does not really have a single idiom for
> > expressing this.
> >
> > At the RTL level we have ss_truncate and us_truncate and float_truncate for
> > truncation.
> >
> > At the Tree level we have nothing for truncation (I believe) for scalars. For
> > Vector code there already seems to be VEC_PACK_SAT_EXPR but it looks like
> > nothing actually generates this at the moment. it's just an unused tree code.
> >
> > For rounding there doesn't seem to be any existing infrastructure.
> >
> > The proposal to handle these are as follow, keep in mind that all of these also
> > exist in their scalar form, as such detecting them in the vectorizer would be
> > the wrong place.
> >
> > 1. Rounding:
> >    a) Use match.pd to rewrite various rounding idioms to shifts.
> >    b) Use backwards or forward prop to rewrite these to internal functions
> >       where even if the target does not support these rounding instructions they
> >       have a chance to provide a more efficient implementation than what would
> >       be generated normally.
> >
> > 2. Saturation:
> >    a) Use match.pd to rewrite the various saturation expressions into min/max
> >       operations which opens up the expressions to further optimizations.
> >    b) Use backwards or forward prop to convert to internal functions if the
> >       resulting min/max expression still meet the criteria for being a
> >       saturating expression.  This follows the algorithm as outlined in "The
> >       Software Vectorization handbook" by Aart J.C. Bik.
> >
> >       We could get the right instructions by using combine if we don't rewrite
> >       the instructions to an internal function, however then during Vectorization
> >       we would overestimate the cost of performing the saturation.  The constants
> >       will the also be loaded into registers and so becomes a lot more difficult
> >       to cleanup solely in the backend.
> >
> > The one thing I am wondering about is whether we would need an internal function
> > for all operations supported, or if it should be modelled as an internal FN which
> > just "marks" the operation as rounding/saturating. After all, the only difference
> > between a normal and saturating expression in RTL is the xx_truncate RTL surrounding
> > the expression.  Doing so would also mean that all targets whom have saturating
> > instructions would automatically benefit from this.
> 
> I might have misunderstood what you meant here, but the *_truncate
> RTL codes are true truncations: the operand has to be wider than the
> result.  Using this representation for general arithmetic is a problem
> if you're operating at the maximum size that the target supports natively.
> E.g. representing a 64-bit saturating addition as:
> 
>   - extend to 128 bits
>   - do a 128-bit addition
>   - truncate to 64 bits
> 
> is going to be hard to cost and code-generate on targets that don't support
> native 128-bit operations (or at least, don't support them cheaply).
> This might not be a problem when recognising C idioms, since the C source
> code has to be able do the wider operation before truncating the result,
> but it could be a problem if we provide built-in functions or if we want
> to introduce compiler-generated saturating operations.
> 
> RTL already has per-operation saturation such as ss_plus/us_plus,
> ss_minus/us_minus, ss_neg/us_neg, ss_mult/us_mult, ss_div,
> ss_ashift/us_ashift and ss_abs.  I think we should do the same
> in gimple, using internal functions like you say.

I think that for followup optimizations using regular arithmetic
ops and just new saturating truncations is better.  Maybe we can
also do both, with first only matching the actual saturation
with a new tree code and then later match the optabs the target
actually supports (in ISEL for example)?

Truly saturating ops might provide an interesting example how
to deal with -ftrapv - one might think we can now simply
use the trapping optabs as internal functions to reflect
-ftrapv onto the IL ...

Richard.

> Thanks,
> Richard
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)