Expansion of narrowing math built-ins into power instructions

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Expansion of narrowing math built-ins into power instructions
@ 2019-07-29 17:37 Martin Jambor
  2019-07-29 18:40 ` Segher Boessenkool
  2019-07-30  9:20 ` Florian Weimer
  0 siblings, 2 replies; 63+ messages in thread
From: Martin Jambor @ 2019-07-29 17:37 UTC (permalink / raw)
  To: segher; +Cc: Tejas Joshi, Jan Hubicka, Joseph Myers, GCC Mailing List

Hi Segher,

as you might know, Tejas is our Google Summer of Code student working on
adding built-in functions for some new math functions added in ISO/IEC
TS 18661.

His next step is to expand "functions rounding result to narrower type"
(so fadd, fsub and possibly fmul and fdiv described in
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2314.pdf) into ISA
instructions on targets that have such instructions.  And Joseph
suggested when he proposed this project that POWER8 (and I suppose also
9) is one of them.

Can you please confirm this and also perhaps point Tejas to the right
pieces of power machine description and target code to emulate to
implement expansion of these functions?  It would be very appreciated,
because even though me and Honza are official mentors of the project, we
are not very well versed in ppc target.

Thanks a lot,

Martin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-07-29 17:37 Expansion of narrowing math built-ins into power instructions Martin Jambor
@ 2019-07-29 18:40 ` Segher Boessenkool
  2019-07-30 19:47   ` Joseph Myers
  2019-07-30  9:20 ` Florian Weimer
  1 sibling, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-07-29 18:40 UTC (permalink / raw)
  To: Martin Jambor; +Cc: Tejas Joshi, Jan Hubicka, Joseph Myers, GCC Mailing List

Hi!

On Mon, Jul 29, 2019 at 07:37:53PM +0200, Martin Jambor wrote:
> as you might know, Tejas is our Google Summer of Code student working on
> adding built-in functions for some new math functions added in ISO/IEC
> TS 18661.
> 
> His next step is to expand "functions rounding result to narrower type"
> (so fadd, fsub and possibly fmul and fdiv described in
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2314.pdf) into ISA
> instructions on targets that have such instructions.  And Joseph
> suggested when he proposed this project that POWER8 (and I suppose also
> 9) is one of them.
> 
> Can you please confirm this and also perhaps point Tejas to the right
> pieces of power machine description and target code to emulate to
> implement expansion of these functions?  It would be very appreciated,
> because even though me and Honza are official mentors of the project, we
> are not very well versed in ppc target.

I think this is refering to the "fadds" and similar Power architecture
instructions, which take as inputs any single or double precision
numbers, and round the result to single precision?  These instructions
produce a correct result also for double-precision inputs, from ISA 2.07
(POWER8 and later) on.  (The result if OE=1 or UE=1 is undefined).  (See
4.3.5.1 in the ISA).

In GCC (in rs6000.md) we have the "*add<mode>3_fpr" and similar insns,
which could be extended to allow DF inputs with an SF output; it doesn't
yet allow it.

gcc112 is a Power8, and gcc135 is a Power9, and Tejas does have a
compile farm account already ;-)


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-07-29 17:37 Expansion of narrowing math built-ins into power instructions Martin Jambor
  2019-07-29 18:40 ` Segher Boessenkool
@ 2019-07-30  9:20 ` Florian Weimer
  2019-07-30 19:49   ` Joseph Myers
  1 sibling, 1 reply; 63+ messages in thread
From: Florian Weimer @ 2019-07-30  9:20 UTC (permalink / raw)
  To: Martin Jambor
  Cc: segher, Tejas Joshi, Jan Hubicka, Joseph Myers, GCC Mailing List

* Martin Jambor:

> as you might know, Tejas is our Google Summer of Code student working on
> adding built-in functions for some new math functions added in ISO/IEC
> TS 18661.
>
> His next step is to expand "functions rounding result to narrower type"
> (so fadd, fsub and possibly fmul and fdiv described in
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2314.pdf) into ISA
> instructions on targets that have such instructions.

Sorry, this might be a silly question, but: How do you plan to recognize
that the fadd/fsub being called is indeed the one from the TS?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-07-29 18:40 ` Segher Boessenkool
@ 2019-07-30 19:47   ` Joseph Myers
  0 siblings, 0 replies; 63+ messages in thread
From: Joseph Myers @ 2019-07-30 19:47 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Martin Jambor, Tejas Joshi, Jan Hubicka, GCC Mailing List

On Mon, 29 Jul 2019, Segher Boessenkool wrote:

> I think this is refering to the "fadds" and similar Power architecture
> instructions, which take as inputs any single or double precision
> numbers, and round the result to single precision?  These instructions

Yes.

On Power9, it is *also* possible to do such narrowing operations from IEEE 
binary128 to binary32 or binary64 format, by first doing the operation on 
binary128 using one of the "round to odd" instruction variants, then doing 
a conversion to the narrower format.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-07-30  9:20 ` Florian Weimer
@ 2019-07-30 19:49   ` Joseph Myers
  2019-07-31  6:47     ` Tejas Joshi
  0 siblings, 1 reply; 63+ messages in thread
From: Joseph Myers @ 2019-07-30 19:49 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Martin Jambor, segher, Tejas Joshi, Jan Hubicka, GCC Mailing List

On Tue, 30 Jul 2019, Florian Weimer wrote:

> * Martin Jambor:
> 
> > as you might know, Tejas is our Google Summer of Code student working on
> > adding built-in functions for some new math functions added in ISO/IEC
> > TS 18661.
> >
> > His next step is to expand "functions rounding result to narrower type"
> > (so fadd, fsub and possibly fmul and fdiv described in
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2314.pdf) into ISA
> > instructions on targets that have such instructions.
> 
> Sorry, this might be a silly question, but: How do you plan to recognize
> that the fadd/fsub being called is indeed the one from the TS?

I expect it's the same as any other built-in function: compatible 
prototype plus appropriate options (-std=gnu*, or -std=c2x in future once 
we teach GCC that these functions are in C2x) that enable the built-in 
functions.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-07-30 19:49   ` Joseph Myers
@ 2019-07-31  6:47     ` Tejas Joshi
  2019-07-31 14:47       ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-07-31  6:47 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, segher, joseph

Hi,

> In GCC (in rs6000.md) we have the "*add<mode>3_fpr" and similar insns,
> which could be extended to allow DF inputs with an SF output; it doesn't
> yet allow it.

Thanks for the inputs, I will try to address these points now. I have
built GCC on gcc112 and will apply patch and test testcases there.

Tejas


On Wed, 31 Jul 2019 at 01:18, Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Tue, 30 Jul 2019, Florian Weimer wrote:
>
> > * Martin Jambor:
> >
> > > as you might know, Tejas is our Google Summer of Code student working on
> > > adding built-in functions for some new math functions added in ISO/IEC
> > > TS 18661.
> > >
> > > His next step is to expand "functions rounding result to narrower type"
> > > (so fadd, fsub and possibly fmul and fdiv described in
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2314.pdf) into ISA
> > > instructions on targets that have such instructions.
> >
> > Sorry, this might be a silly question, but: How do you plan to recognize
> > that the fadd/fsub being called is indeed the one from the TS?
>
> I expect it's the same as any other built-in function: compatible
> prototype plus appropriate options (-std=gnu*, or -std=c2x in future once
> we teach GCC that these functions are in C2x) that enable the built-in
> functions.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-07-31  6:47     ` Tejas Joshi
@ 2019-07-31 14:47       ` Segher Boessenkool
  2019-08-08 18:39         ` Tejas Joshi
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-07-31 14:47 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, joseph

On Wed, Jul 31, 2019 at 12:23:18PM +0530, Tejas Joshi wrote:
> > In GCC (in rs6000.md) we have the "*add<mode>3_fpr" and similar insns,
> > which could be extended to allow DF inputs with an SF output; it doesn't
> > yet allow it.
> 
> Thanks for the inputs, I will try to address these points now. I have
> built GCC on gcc112 and will apply patch and test testcases there.

For the QP float (binary128, KFmode, take your pick) you need Power9 or
newer, so gcc135.


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-07-31 14:47       ` Segher Boessenkool
@ 2019-08-08 18:39         ` Tejas Joshi
  2019-08-08 20:05           ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-08-08 18:39 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, joseph, segher

Hi.
It took some time for me to finish with the folding part for fadd
variants and till it is reviewed, I want to move ahead with power8/9
expansions on top of the current fadd patch.

> In GCC (in rs6000.md) we have the "*add<mode>3_fpr" and similar insns,
> which could be extended to allow DF inputs with an SF output; it doesn't
> yet allow it.

This might be very lousy but I am confused with the optabs and insn
name rn, the comments in obtabs.def says that these patterns are
present in md as insn names. How can fadd function be mapped with the
"fadd<mode>3_fpr" pattern name?
Also, faddl and daddl functions take long double as argument, can they
also be expanded on DF to SF mode or only on QP float on power9?

I have built GCC and applied my current patches on gcc112 and yes, on
gcc135 too.

Thanks,
Tejas

On Wed, 31 Jul 2019 at 20:17, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Wed, Jul 31, 2019 at 12:23:18PM +0530, Tejas Joshi wrote:
> > > In GCC (in rs6000.md) we have the "*add<mode>3_fpr" and similar insns,
> > > which could be extended to allow DF inputs with an SF output; it doesn't
> > > yet allow it.
> >
> > Thanks for the inputs, I will try to address these points now. I have
> > built GCC on gcc112 and will apply patch and test testcases there.
>
> For the QP float (binary128, KFmode, take your pick) you need Power9 or
> newer, so gcc135.
>
>
> Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-08 18:39         ` Tejas Joshi
@ 2019-08-08 20:05           ` Segher Boessenkool
  2019-08-08 23:09             ` Joseph Myers
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-08 20:05 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

Hi!

On Fri, Aug 09, 2019 at 12:14:54AM +0530, Tejas Joshi wrote:
> > In GCC (in rs6000.md) we have the "*add<mode>3_fpr" and similar insns,
> > which could be extended to allow DF inputs with an SF output; it doesn't
> > yet allow it.
> 
> This might be very lousy but I am confused with the optabs and insn
> name rn, the comments in obtabs.def says that these patterns are
> present in md as insn names. How can fadd function be mapped with the
> "fadd<mode>3_fpr" pattern name?

The actual name starts with an asterisk, which means as it is it can
never be used by name.  But, right above this pattern, there is the
define_expand named add<mode>3 (for modes SFDF).

These current patterns all take the same mode for all inputs and outputs
(that's what <mode>3 indicates, say, fadddf3).  You will need to define
something that takes two SFs in and produces a DF.  That cannot really
be in this same pattern, it needs a float_extend added (you can do all
kinds of trickery, but just adding a few extra patterns is much easier
than define_subst and whatnot).

> Also, faddl and daddl functions take long double as argument, can they
> also be expanded on DF to SF mode or only on QP float on power9?

We can have three different long double modes on powerpc: DP float, QP
float, or "IBM long double", also known as "double double", which is
essentially the sum of two double precision numbers.

Types (a source level construct) are not the same as modes (an RTL
concept).

Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-08 20:05           ` Segher Boessenkool
@ 2019-08-08 23:09             ` Joseph Myers
  2019-08-10 10:24               ` Tejas Joshi
  0 siblings, 1 reply; 63+ messages in thread
From: Joseph Myers @ 2019-08-08 23:09 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka

On Thu, 8 Aug 2019, Segher Boessenkool wrote:

> These current patterns all take the same mode for all inputs and outputs
> (that's what <mode>3 indicates, say, fadddf3).  You will need to define
> something that takes two SFs in and produces a DF.  That cannot really

For example, md.texi describes standard patterns such as mulhisi3 that 
multiply two HImode values and produce an SImode result (widening integer 
multiply).

Using a similar naming pattern, you might have a pattern adddfsf3 that 
multiplies two DFmode values and produces an SFmode result (or you could 
call it something like add_truncdfsf3 if you wish to emphasise the 
truncation involved, for example).  Similarly addtfsf3 that multiplies 
TFmode and produces an SFmode result, and so on.  Of course these names 
need documenting (and you need corresponding RTL for them to generate that 
distinguishes the fused add+truncate from the different RTL for separate 
addition and truncation with double rounding).  In cases where long double 
and double have the same mode, the daddl function should use the existing 
adddf3 pattern.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-08 23:09             ` Joseph Myers
@ 2019-08-10 10:24               ` Tejas Joshi
  2019-08-10 16:46                 ` Segher Boessenkool
  2019-08-11 16:59                 ` Segher Boessenkool
  0 siblings, 2 replies; 63+ messages in thread
From: Tejas Joshi @ 2019-08-10 10:24 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph

[-- Attachment #1: Type: text/plain, Size: 2109 bytes --]

Hello.
I have been trying to write a basic pattern taking all the suggestions
you both have mentioned. The same patch is attached here, but I cannot
see call to :

float
foo (double x, double y)
{
    return __builtin_fadd (x, y);
}
being expanded to any instruction, at least a simple one, using
-fno-builtin-fadd (and also -mhard-float?). It always stays "bl fadd".
What am I missing here?

> (POWER8 and later) on.  (The result if OE=1 or UE=1 is undefined).  (See
> 4.3.5.1 in the ISA).

4.3.5.1 in the ISA says that single precision arithmetic instructions
perform operation in double format and coerces the result in single
format. Can fadd be considered as this type of instruction or do I
need to perform add in DFmode and then use "instruction provided to
explicitly convert double format operand in FPR to single format."?

Thanks,
Tejas


On Fri, 9 Aug 2019 at 04:39, Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Thu, 8 Aug 2019, Segher Boessenkool wrote:
>
> > These current patterns all take the same mode for all inputs and outputs
> > (that's what <mode>3 indicates, say, fadddf3).  You will need to define
> > something that takes two SFs in and produces a DF.  That cannot really
>
> For example, md.texi describes standard patterns such as mulhisi3 that
> multiply two HImode values and produce an SImode result (widening integer
> multiply).
>
> Using a similar naming pattern, you might have a pattern adddfsf3 that
> multiplies two DFmode values and produces an SFmode result (or you could
> call it something like add_truncdfsf3 if you wish to emphasise the
> truncation involved, for example).  Similarly addtfsf3 that multiplies
> TFmode and produces an SFmode result, and so on.  Of course these names
> need documenting (and you need corresponding RTL for them to generate that
> distinguishes the fused add+truncate from the different RTL for separate
> addition and truncation with double rounding).  In cases where long double
> and double have the same mode, the daddl function should use the existing
> adddf3 pattern.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com

[-- Attachment #2: fadd-md.diff --]
[-- Type: text/x-patch, Size: 1380 bytes --]

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 4ef1993..e4bfc4a 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4652,6 +4652,21 @@
   [(set_attr "type" "fp")
    (set_attr "isa" "*,<Fisa>")])
 
+(define_expand "add_truncdfsf3"
+  [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand"))
+	(plus:DF (match_operand:DF 1 "gpc_reg_operand")
+		 (match_operand:DF 2 "gpc_reg_operand")))]
+  "TARGET_HARD_FLOAT"
+  "")
+
+(define_insn "*add_truncdfsf3_fpr"
+  [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand" "=<Ff>"))
+	(plus:DF (match_operand:DF 1 "gpc_reg_operand" "%<Ff>")
+		 (match_operand:DF 2 "gpc_reg_operand" "<Ff>")))]
+  "TARGET_HARD_FLOAT"
+  "fadd %0,%1,%2"
+  [(set_attr "type" "fp")])
+
 (define_expand "sub<mode>3"
   [(set (match_operand:SFDF 0 "gpc_reg_operand")
 	(minus:SFDF (match_operand:SFDF 1 "gpc_reg_operand")
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 4ffd0f3..45be794 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -67,6 +67,7 @@ OPTAB_CD(sfixtrunc_optab, "fix_trunc$F$b$I$a2")
 OPTAB_CD(ufixtrunc_optab, "fixuns_trunc$F$b$I$a2")
 
 /* Misc optabs that use two modes; model them as "conversions".  */
+OPTAB_CD(fadd_optab, "add_trunc$b$a3")
 OPTAB_CD(smul_widen_optab, "mul$b$a3")
 OPTAB_CD(umul_widen_optab, "umul$b$a3")
 OPTAB_CD(usmul_widen_optab, "usmul$b$a3")

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-10 10:24               ` Tejas Joshi
@ 2019-08-10 16:46                 ` Segher Boessenkool
  2019-08-11  4:58                   ` Tejas Joshi
  2019-08-11 16:59                 ` Segher Boessenkool
  1 sibling, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-10 16:46 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

Hi!

On Sat, Aug 10, 2019 at 04:00:53PM +0530, Tejas Joshi wrote:
> I have been trying to write a basic pattern taking all the suggestions
> you both have mentioned. The same patch is attached here, but I cannot
> see call to :
> 
> float
> foo (double x, double y)
> {
>     return __builtin_fadd (x, y);
> }
> being expanded to any instruction, at least a simple one, using
> -fno-builtin-fadd (and also -mhard-float?). It always stays "bl fadd".
> What am I missing here?

As far as I understand that flag should set the behaviour of the fadd
function, not the __builtin_fadd one.  So I don't know.

> > (POWER8 and later) on.  (The result if OE=1 or UE=1 is undefined).  (See
> > 4.3.5.1 in the ISA).
> 
> 4.3.5.1 in the ISA says that single precision arithmetic instructions
> perform operation in double format and coerces the result in single
> format. Can fadd be considered as this type of instruction or do I
> need to perform add in DFmode and then use "instruction provided to
> explicitly convert double format operand in FPR to single format."?

A single precision add is "fadds".  It rounds its result to single
precision.

I'm lost what the exact semantic of the wanted fadd() function are.
I thought you wanted to add two single precision numbers, producing a
double precision one.  But instead you want to add two double precision
numbers, producing a single precision one?  The fadds instruction fits
well to that, but you'll have to check exactly how the fadd() function
should behave with respect to rounding and exceptions and the like.

Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-10 16:46                 ` Segher Boessenkool
@ 2019-08-11  4:58                   ` Tejas Joshi
  2019-08-11  7:20                     ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-08-11  4:58 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph

Hi!

> As far as I understand that flag should set the behaviour of the fadd
> function, not the __builtin_fadd one.  So I don't know.

According to ISO/IEC TS 18661, I am supposed to implement the fadd
variants for folding and expand them inline, that take double and long
double as arguments and return
addition in appropriate narrower type, float and double. As far as I
know, we use __builtin_ to call the internal functions? I do not know
which the only fadd function is.

> double precision one.  But instead you want to add two double precision
> numbers, producing a single precision one?  The fadds instruction fits

Yes.

> well to that, but you'll have to check exactly how the fadd() function
> should behave with respect to rounding and exceptions and the like.

In Joseph's initial mail that describes what should be carried out in
the course of project, about rounding and exceptions. I have strictly
followed this description for my folding patch :

* The narrowing functions, e.g. fadd, faddl, daddl, are a bit different
from most other built-in math.h functions because the return type is
different from the argument types.  You could start by adding them to
builtins.def similarly to roundeven (with new macros to handle adding such
functions for relevant pairs of _FloatN, _FloatNx types).  These functions
could be folded for constant arguments only if the result is exact, or if
-fno-rounding-math -fno-trapping-math (and -fno-math-errno if the result
involves overflow / underflow).

Thanks,
Tejas


On Sat, 10 Aug 2019 at 22:16, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Sat, Aug 10, 2019 at 04:00:53PM +0530, Tejas Joshi wrote:
> > I have been trying to write a basic pattern taking all the suggestions
> > you both have mentioned. The same patch is attached here, but I cannot
> > see call to :
> >
> > float
> > foo (double x, double y)
> > {
> >     return __builtin_fadd (x, y);
> > }
> > being expanded to any instruction, at least a simple one, using
> > -fno-builtin-fadd (and also -mhard-float?). It always stays "bl fadd".
> > What am I missing here?
>
> As far as I understand that flag should set the behaviour of the fadd
> function, not the __builtin_fadd one.  So I don't know.
>
> > > (POWER8 and later) on.  (The result if OE=1 or UE=1 is undefined).  (See
> > > 4.3.5.1 in the ISA).
> >
> > 4.3.5.1 in the ISA says that single precision arithmetic instructions
> > perform operation in double format and coerces the result in single
> > format. Can fadd be considered as this type of instruction or do I
> > need to perform add in DFmode and then use "instruction provided to
> > explicitly convert double format operand in FPR to single format."?
>
> A single precision add is "fadds".  It rounds its result to single
> precision.
>
> I'm lost what the exact semantic of the wanted fadd() function are.
> I thought you wanted to add two single precision numbers, producing a
> double precision one.  But instead you want to add two double precision
> numbers, producing a single precision one?  The fadds instruction fits
> well to that, but you'll have to check exactly how the fadd() function
> should behave with respect to rounding and exceptions and the like.
>
>
> Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-11  4:58                   ` Tejas Joshi
@ 2019-08-11  7:20                     ` Segher Boessenkool
  2019-08-11 12:46                       ` Tejas Joshi
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-11  7:20 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

Hi Tejas,

On Sun, Aug 11, 2019 at 10:34:26AM +0530, Tejas Joshi wrote:
> > As far as I understand that flag should set the behaviour of the fadd
> > function, not the __builtin_fadd one.  So I don't know.
> 
> According to ISO/IEC TS 18661, I am supposed to implement the fadd
> variants for folding and expand them inline, that take double and long
> double as arguments and return
> addition in appropriate narrower type, float and double. As far as I
> know, we use __builtin_ to call the internal functions? I do not know
> which the only fadd function is.

See the manual, section "Other Built-in Functions Provided by GCC":

  @opindex fno-builtin
  GCC includes built-in versions of many of the functions in the standard
  C library.  These functions come in two forms: one whose names start with
  the @code{__builtin_} prefix, and the other without.  Both forms have the
  same type (including prototype), the same address (when their address is
  taken), and the same meaning as the C library functions even if you specify
  the @option{-fno-builtin} option @pxref{C Dialect Options}).  Many of these
  functions are only optimized in certain cases; if they are not optimized in
  a particular case, a call to the library function is emitted.

> > double precision one.  But instead you want to add two double precision
> > numbers, producing a single precision one?  The fadds instruction fits
> 
> Yes.
> 
> > well to that, but you'll have to check exactly how the fadd() function
> > should behave with respect to rounding and exceptions and the like.

I read 18661-1 now...  and yup, "fadds" will work fine, and there are
no complications like this as far as I see.

For QP to either DP or SP, you can do round-to-odd followed by one of the
conversion instructions.  The ISA manual describes this; I can help you
with it, but first get DP->SP (fadd) working?

For the non-QP long doubles we have...  There is the option of using DP
for it, which isn't standard-compliant, many other archs have it too,
and it is simple anyway, because you have all code for operations
already.  You can mostly just ignore this option.

For double-double...  Well firstly, double-double is on the way out, so
adding new features to it is pretty useless?  Just ignore it unless you
have time left, I'd say.

> In Joseph's initial mail that describes what should be carried out in
> the course of project, about rounding and exceptions. I have strictly
> followed this description for my folding patch :
> 
> * The narrowing functions, e.g. fadd, faddl, daddl, are a bit different
> from most other built-in math.h functions because the return type is
> different from the argument types.  You could start by adding them to
> builtins.def similarly to roundeven (with new macros to handle adding such
> functions for relevant pairs of _FloatN, _FloatNx types).  These functions
> could be folded for constant arguments only if the result is exact, or if
> -fno-rounding-math -fno-trapping-math (and -fno-math-errno if the result
> involves overflow / underflow).

For Power, all five basic operations (add, sub, mul, div, fma) work fine
wrt rounding mode if using the fadds etc. insns, for DP->SP.  All
exceptions work as expected, except maybe underflow and overflow, but
18661 doesn't require much at all for those anyway :-)

Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-11  7:20                     ` Segher Boessenkool
@ 2019-08-11 12:46                       ` Tejas Joshi
  0 siblings, 0 replies; 63+ messages in thread
From: Tejas Joshi @ 2019-08-11 12:46 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph

Hello.

> with it, but first get DP->SP (fadd) working?

Can you please review what have I have been trying and facing the
issues on patch :
<https://gcc.gnu.org/ml/gcc/2019-08/msg00078.html>

Thanks,
Tejas


On Sun, 11 Aug 2019 at 12:50, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi Tejas,
>
> On Sun, Aug 11, 2019 at 10:34:26AM +0530, Tejas Joshi wrote:
> > > As far as I understand that flag should set the behaviour of the fadd
> > > function, not the __builtin_fadd one.  So I don't know.
> >
> > According to ISO/IEC TS 18661, I am supposed to implement the fadd
> > variants for folding and expand them inline, that take double and long
> > double as arguments and return
> > addition in appropriate narrower type, float and double. As far as I
> > know, we use __builtin_ to call the internal functions? I do not know
> > which the only fadd function is.
>
> See the manual, section "Other Built-in Functions Provided by GCC":
>
>   @opindex fno-builtin
>   GCC includes built-in versions of many of the functions in the standard
>   C library.  These functions come in two forms: one whose names start with
>   the @code{__builtin_} prefix, and the other without.  Both forms have the
>   same type (including prototype), the same address (when their address is
>   taken), and the same meaning as the C library functions even if you specify
>   the @option{-fno-builtin} option @pxref{C Dialect Options}).  Many of these
>   functions are only optimized in certain cases; if they are not optimized in
>   a particular case, a call to the library function is emitted.
>
> > > double precision one.  But instead you want to add two double precision
> > > numbers, producing a single precision one?  The fadds instruction fits
> >
> > Yes.
> >
> > > well to that, but you'll have to check exactly how the fadd() function
> > > should behave with respect to rounding and exceptions and the like.
>
> I read 18661-1 now...  and yup, "fadds" will work fine, and there are
> no complications like this as far as I see.
>
> For QP to either DP or SP, you can do round-to-odd followed by one of the
> conversion instructions.  The ISA manual describes this; I can help you
> with it, but first get DP->SP (fadd) working?
>
> For the non-QP long doubles we have...  There is the option of using DP
> for it, which isn't standard-compliant, many other archs have it too,
> and it is simple anyway, because you have all code for operations
> already.  You can mostly just ignore this option.
>
> For double-double...  Well firstly, double-double is on the way out, so
> adding new features to it is pretty useless?  Just ignore it unless you
> have time left, I'd say.
>
> > In Joseph's initial mail that describes what should be carried out in
> > the course of project, about rounding and exceptions. I have strictly
> > followed this description for my folding patch :
> >
> > * The narrowing functions, e.g. fadd, faddl, daddl, are a bit different
> > from most other built-in math.h functions because the return type is
> > different from the argument types.  You could start by adding them to
> > builtins.def similarly to roundeven (with new macros to handle adding such
> > functions for relevant pairs of _FloatN, _FloatNx types).  These functions
> > could be folded for constant arguments only if the result is exact, or if
> > -fno-rounding-math -fno-trapping-math (and -fno-math-errno if the result
> > involves overflow / underflow).
>
> For Power, all five basic operations (add, sub, mul, div, fma) work fine
> wrt rounding mode if using the fadds etc. insns, for DP->SP.  All
> exceptions work as expected, except maybe underflow and overflow, but
> 18661 doesn't require much at all for those anyway :-)
>
>
> Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-10 10:24               ` Tejas Joshi
  2019-08-10 16:46                 ` Segher Boessenkool
@ 2019-08-11 16:59                 ` Segher Boessenkool
  2019-08-12 17:25                   ` Tejas Joshi
  1 sibling, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-11 16:59 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

Hi Tejas,

On Sat, Aug 10, 2019 at 04:00:53PM +0530, Tejas Joshi wrote:
> +(define_expand "add_truncdfsf3"
> +  [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand"))
> +	(plus:DF (match_operand:DF 1 "gpc_reg_operand")
> +		 (match_operand:DF 2 "gpc_reg_operand")))]
> +  "TARGET_HARD_FLOAT"
> +  "")

float_extend on the LHS is never correct.  I think the following should
work, never mind that it looks like it does double rounding, because it
doesn't (famous last words ;-) ):

(define_expand "add_truncdfsf3"
  [(set (match_operand:SF 0 "gpc_reg_operand")
	(float_truncate:SF
	  (plus:DF (match_operand:DF 1 "gpc_reg_operand")
		   (match_operand:DF 2 "gpc_reg_operand"))))]
  "TARGET_HARD_FLOAT"
  "")

> +(define_insn "*add_truncdfsf3_fpr"
> +  [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand" "=<Ff>"))
> +	(plus:DF (match_operand:DF 1 "gpc_reg_operand" "%<Ff>")
> +		 (match_operand:DF 2 "gpc_reg_operand" "<Ff>")))]
> +  "TARGET_HARD_FLOAT"
> +  "fadd %0,%1,%2"
> +  [(set_attr "type" "fp")])

The constraints should be "f", "%d", "d", respectively.  <Ff> says to
display something for the mode in a mode iterator.  There is no mode
iterator here.  (In what you copied this from, there was SFDF).

You want to output "fadds", not "fadd".

Maybe it is easier to immediately write the VSX scalar version for this
as well?  That's xsaddsp.  Oh, and you need to restrict all of this to
more recent CPUs, we'll have to do some new TARGET_* flag for that I
think.

Finally: please send patches to gcc-patches@ (not gcc@).

Thanks,

Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-11 16:59                 ` Segher Boessenkool
@ 2019-08-12 17:25                   ` Tejas Joshi
  2019-08-12 17:55                     ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-08-12 17:25 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph

Hi,
I have the following code in my rs6000.md (I haven't used new TARGET_* yet) :

(define_expand "add_truncdfsf3"
  [(set (match_operand:SF 0 "gpc_reg_operand")
       (float_truncate:SF
       (plus:DF (match_operand:DF 1 "gpc_reg_operand")
                (match_operand:DF 2 "gpc_reg_operand"))))]
  "TARGET_HARD_FLOAT"
  "")

(define_insn "*add_truncdfsf3_fpr"
  [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
       (float_truncate:SF
       (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
                (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))]
  "TARGET_HARD_FLOAT"
  "@
   fadds %0,%1,%2
   xsaddsp %x0,%x1,%x2"
  [(set_attr "type" "fp")])

with following optab in optabs.def :

OPTAB_CD(fadd_optab, "add_trunc$b$a3")             (what is the
difference between $b$a and $a$b?)

I have also tried adding fadd, add_truncdfsf3 in rs6000-builtin.def,
examined rtl dumps multiple times but couldn't get fadd to be
exapanded. What am I missing here?

Thanks,
Tejas

On Sun, 11 Aug 2019 at 22:29, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi Tejas,
>
> On Sat, Aug 10, 2019 at 04:00:53PM +0530, Tejas Joshi wrote:
> > +(define_expand "add_truncdfsf3"
> > +  [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand"))
> > +     (plus:DF (match_operand:DF 1 "gpc_reg_operand")
> > +              (match_operand:DF 2 "gpc_reg_operand")))]
> > +  "TARGET_HARD_FLOAT"
> > +  "")
>
> float_extend on the LHS is never correct.  I think the following should
> work, never mind that it looks like it does double rounding, because it
> doesn't (famous last words ;-) ):
>
> (define_expand "add_truncdfsf3"
>   [(set (match_operand:SF 0 "gpc_reg_operand")
>         (float_truncate:SF
>           (plus:DF (match_operand:DF 1 "gpc_reg_operand")
>                    (match_operand:DF 2 "gpc_reg_operand"))))]
>   "TARGET_HARD_FLOAT"
>   "")
>
> > +(define_insn "*add_truncdfsf3_fpr"
> > +  [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand" "=<Ff>"))
> > +     (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%<Ff>")
> > +              (match_operand:DF 2 "gpc_reg_operand" "<Ff>")))]
> > +  "TARGET_HARD_FLOAT"
> > +  "fadd %0,%1,%2"
> > +  [(set_attr "type" "fp")])
>
> The constraints should be "f", "%d", "d", respectively.  <Ff> says to
> display something for the mode in a mode iterator.  There is no mode
> iterator here.  (In what you copied this from, there was SFDF).
>
> You want to output "fadds", not "fadd".
>
> Maybe it is easier to immediately write the VSX scalar version for this
> as well?  That's xsaddsp.  Oh, and you need to restrict all of this to
> more recent CPUs, we'll have to do some new TARGET_* flag for that I
> think.
>
> Finally: please send patches to gcc-patches@ (not gcc@).
>
> Thanks,
>
>
> Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-12 17:25                   ` Tejas Joshi
@ 2019-08-12 17:55                     ` Segher Boessenkool
  2019-08-12 21:20                       ` Joseph Myers
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-12 17:55 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

On Mon, Aug 12, 2019 at 11:01:11PM +0530, Tejas Joshi wrote:
> I have the following code in my rs6000.md (I haven't used new TARGET_* yet) :
> 
> (define_expand "add_truncdfsf3"
>   [(set (match_operand:SF 0 "gpc_reg_operand")
>        (float_truncate:SF
>        (plus:DF (match_operand:DF 1 "gpc_reg_operand")
>                 (match_operand:DF 2 "gpc_reg_operand"))))]
>   "TARGET_HARD_FLOAT"
>   "")
> 
> (define_insn "*add_truncdfsf3_fpr"
>   [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
>        (float_truncate:SF
>        (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
>                 (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))]
>   "TARGET_HARD_FLOAT"
>   "@
>    fadds %0,%1,%2
>    xsaddsp %x0,%x1,%x2"
>   [(set_attr "type" "fp")])

Those look fine.  You can also merge them into one:

(define_insn "add_truncdfsf3"
  [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
	(float_truncate:SF
	  (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
		   (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))]
  "TARGET_HARD_FLOAT"
  "@
   fadds %0,%1,%2
   xsaddsp %x0,%x1,%x2"
  [(set_attr "type" "fp")])

> with following optab in optabs.def :
> 
> OPTAB_CD(fadd_optab, "add_trunc$b$a3")             (what is the
> difference between $b$a and $a$b?)

Which of the two modes becomes $a and which becomes $b?  It depends on
the definition of fadd_optab what order is expected, I think.


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-12 17:55                     ` Segher Boessenkool
@ 2019-08-12 21:20                       ` Joseph Myers
  2019-08-12 21:52                         ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Joseph Myers @ 2019-08-12 21:20 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka

On Mon, 12 Aug 2019, Segher Boessenkool wrote:

> (define_insn "add_truncdfsf3"
>   [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
> 	(float_truncate:SF
> 	  (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
> 		   (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))]

That sort of pattern is incorrect for a fused operation such as fadd, 
because combine could match it for code that is supposed to do separate 
addition and narrowing conversion.  The RTL needs to be something that 
does *not* match the combination of separate operations (just as fma has 
its own RTL, and a separate pass is responsible for converting separate 
operations to fused ones in the -ffp-contract=fast case where it's 
permitted).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-12 21:20                       ` Joseph Myers
@ 2019-08-12 21:52                         ` Segher Boessenkool
  2019-08-14  6:15                           ` Tejas Joshi
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-12 21:52 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka

On Mon, Aug 12, 2019 at 09:20:18PM +0000, Joseph Myers wrote:
> On Mon, 12 Aug 2019, Segher Boessenkool wrote:
> 
> > (define_insn "add_truncdfsf3"
> >   [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
> > 	(float_truncate:SF
> > 	  (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
> > 		   (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))]
> 
> That sort of pattern is incorrect for a fused operation such as fadd, 
> because combine could match it for code that is supposed to do separate 
> addition and narrowing conversion.  The RTL needs to be something that 
> does *not* match the combination of separate operations (just as fma has 
> its own RTL, and a separate pass is responsible for converting separate 
> operations to fused ones in the -ffp-contract=fast case where it's 
> permitted).

Ugh, we allow disabling contraction, I forgot.  Rats.


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-12 21:52                         ` Segher Boessenkool
@ 2019-08-14  6:15                           ` Tejas Joshi
  2019-08-14  7:21                             ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-08-14  6:15 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph

> The RTL needs to be something that
> does *not* match the combination of separate operations (just as fma has
> its own RTL, and a separate pass is responsible for converting separate

So do I need to introduce fadd's own RTL just as fma which would emit
a fused instruction while -ffp-contract is default (fast) and would
emit separate instructions like add in DFmode and then truncate to SF?
while -ffp-contract=off ? (just as fma)


On Tue, 13 Aug 2019 at 03:22, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Mon, Aug 12, 2019 at 09:20:18PM +0000, Joseph Myers wrote:
> > On Mon, 12 Aug 2019, Segher Boessenkool wrote:
> >
> > > (define_insn "add_truncdfsf3"
> > >   [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
> > >     (float_truncate:SF
> > >       (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
> > >                (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))]
> >
> > That sort of pattern is incorrect for a fused operation such as fadd,
> > because combine could match it for code that is supposed to do separate
> > addition and narrowing conversion.  The RTL needs to be something that
> > does *not* match the combination of separate operations (just as fma has
> > its own RTL, and a separate pass is responsible for converting separate
> > operations to fused ones in the -ffp-contract=fast case where it's
> > permitted).
>
> Ugh, we allow disabling contraction, I forgot.  Rats.
>
>
> Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-14  6:15                           ` Tejas Joshi
@ 2019-08-14  7:21                             ` Segher Boessenkool
  2019-08-14 16:11                               ` Joseph Myers
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-14  7:21 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

On Wed, Aug 14, 2019 at 11:51:28AM +0530, Tejas Joshi wrote:
> > The RTL needs to be something that
> > does *not* match the combination of separate operations (just as fma has
> > its own RTL, and a separate pass is responsible for converting separate
> 
> So do I need to introduce fadd's own RTL just as fma which would emit
> a fused instruction while -ffp-contract is default (fast) and would
> emit separate instructions like add in DFmode and then truncate to SF?
> while -ffp-contract=off ? (just as fma)

I think you can do one RTL code that replaces float_truncate in

> > > > (define_insn "add_truncdfsf3"
> > > >   [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
> > > >     (float_truncate:SF
> > > >       (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
> > > >                (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))]

but that is only meant for such explicit contraction.  This can then
happily be used to implement all such patterns.  Is there some issue
with that I overlook?

A good name for this...  I would say "float_contract", because I like
horrible names.  It shouldn't be hard to think of something better :-)


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-14  7:21                             ` Segher Boessenkool
@ 2019-08-14 16:11                               ` Joseph Myers
  2019-08-14 20:21                                 ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Joseph Myers @ 2019-08-14 16:11 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka

On Wed, 14 Aug 2019, Segher Boessenkool wrote:

> I think you can do one RTL code that replaces float_truncate in
> 
> > > > > (define_insn "add_truncdfsf3"
> > > > >   [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
> > > > >     (float_truncate:SF
> > > > >       (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
> > > > >                (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))]
> 
> but that is only meant for such explicit contraction.  This can then
> happily be used to implement all such patterns.  Is there some issue
> with that I overlook?

Yes, I think such a separate RTL code would work (as would an 
architecture-specific UNSPEC) - it just needs to avoid the pattern 
matching RTL that can arise other than from the built-in functions.

(Everything to do with needing -fno-math-errno to expand into such 
instructions should be handled in the architecture-independent compiler.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-14 16:11                               ` Joseph Myers
@ 2019-08-14 20:21                                 ` Segher Boessenkool
  2019-08-14 20:23                                   ` Joseph Myers
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-14 20:21 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka

On Wed, Aug 14, 2019 at 04:10:56PM +0000, Joseph Myers wrote:
> On Wed, 14 Aug 2019, Segher Boessenkool wrote:
> 
> > I think you can do one RTL code that replaces float_truncate in
> > 
> > > > > > (define_insn "add_truncdfsf3"
> > > > > >   [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
> > > > > >     (float_truncate:SF
> > > > > >       (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
> > > > > >                (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))]
> > 
> > but that is only meant for such explicit contraction.  This can then
> > happily be used to implement all such patterns.  Is there some issue
> > with that I overlook?
> 
> Yes, I think such a separate RTL code would work (as would an 
> architecture-specific UNSPEC) - it just needs to avoid the pattern 
> matching RTL that can arise other than from the built-in functions.
> 
> (Everything to do with needing -fno-math-errno to expand into such 
> instructions should be handled in the architecture-independent compiler.)

Does something like
  float d; double a, b, x;
  ...
  d = fadd (a + x, b - x);
work as wanted, with such a representation?  It would simplify (does it?) to
  d = fadd (a, b);
but is that allowed?


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-14 20:21                                 ` Segher Boessenkool
@ 2019-08-14 20:23                                   ` Joseph Myers
  2019-08-14 21:00                                     ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Joseph Myers @ 2019-08-14 20:23 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka

On Wed, 14 Aug 2019, Segher Boessenkool wrote:

> Does something like
>   float d; double a, b, x;
>   ...
>   d = fadd (a + x, b - x);
> work as wanted, with such a representation?  It would simplify (does it?) to
>   d = fadd (a, b);
> but is that allowed?

It's not allowed, but neither is simplifying (a + x) + (b - x) into (a + 
b), when contraction isn't allowed.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-14 20:23                                   ` Joseph Myers
@ 2019-08-14 21:00                                     ` Segher Boessenkool
  2019-08-15  9:52                                       ` Tejas Joshi
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-14 21:00 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka

On Wed, Aug 14, 2019 at 08:23:27PM +0000, Joseph Myers wrote:
> On Wed, 14 Aug 2019, Segher Boessenkool wrote:
> 
> > Does something like
> >   float d; double a, b, x;
> >   ...
> >   d = fadd (a + x, b - x);
> > work as wanted, with such a representation?  It would simplify (does it?) to
> >   d = fadd (a, b);
> > but is that allowed?
> 
> It's not allowed, but neither is simplifying (a + x) + (b - x) into (a + 
> b), when contraction isn't allowed.

Ah of course.  And we already should not do such simplification on RTL,
when contraction is disallowed.

So yeah it should work fine I think.  A new RTL code would be best (it
would be silly to have to make an unspec for it in every port separately),
but an unspec is of course easiest for now.


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-14 21:00                                     ` Segher Boessenkool
@ 2019-08-15  9:52                                       ` Tejas Joshi
  2019-08-15 12:47                                         ` Richard Sandiford
  2019-08-15 18:54                                         ` Segher Boessenkool
  0 siblings, 2 replies; 63+ messages in thread
From: Tejas Joshi @ 2019-08-15  9:52 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph

Hello.
I just wanted to make sure that I am looking at the correct code here.
Except for rtl.def where I should be introducing something like
float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints
set on functions around expr.c, cfgexpand.c where I grep for
float_truncate/FLOAT_TRUNCATE did not hit.
Also, in what manner should float_contract/narrow be different from
float_truncate as both are trying to do similar things? (truncation
from DF to SF)

Thanks,
Tejas


On Thu, 15 Aug 2019 at 02:30, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Wed, Aug 14, 2019 at 08:23:27PM +0000, Joseph Myers wrote:
> > On Wed, 14 Aug 2019, Segher Boessenkool wrote:
> >
> > > Does something like
> > >   float d; double a, b, x;
> > >   ...
> > >   d = fadd (a + x, b - x);
> > > work as wanted, with such a representation?  It would simplify (does it?) to
> > >   d = fadd (a, b);
> > > but is that allowed?
> >
> > It's not allowed, but neither is simplifying (a + x) + (b - x) into (a +
> > b), when contraction isn't allowed.
>
> Ah of course.  And we already should not do such simplification on RTL,
> when contraction is disallowed.
>
> So yeah it should work fine I think.  A new RTL code would be best (it
> would be silly to have to make an unspec for it in every port separately),
> but an unspec is of course easiest for now.
>
>
> Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-15  9:52                                       ` Tejas Joshi
@ 2019-08-15 12:47                                         ` Richard Sandiford
  2019-08-15 13:55                                           ` Tejas Joshi
  2019-08-15 18:45                                           ` Segher Boessenkool
  2019-08-15 18:54                                         ` Segher Boessenkool
  1 sibling, 2 replies; 63+ messages in thread
From: Richard Sandiford @ 2019-08-15 12:47 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, segher, joseph

Tejas Joshi <tejasjoshi9673@gmail.com> writes:
> Hello.
> I just wanted to make sure that I am looking at the correct code here.
> Except for rtl.def where I should be introducing something like
> float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints
> set on functions around expr.c, cfgexpand.c where I grep for
> float_truncate/FLOAT_TRUNCATE did not hit.
> Also, in what manner should float_contract/narrow be different from
> float_truncate as both are trying to do similar things? (truncation
> from DF to SF)

I think the code should instead be a fused addition and truncation,
a bit like FMA is a fused addition and multiplication.  Describing it as
a DFmode addition followed by some conversion to SF would still involve
double rounding.

simplify-rtx.c is probably the most important place to handle it.
It would be easiest to test using the selftests at the end of the file.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-15 12:47                                         ` Richard Sandiford
@ 2019-08-15 13:55                                           ` Tejas Joshi
  2019-08-15 18:45                                           ` Segher Boessenkool
  1 sibling, 0 replies; 63+ messages in thread
From: Tejas Joshi @ 2019-08-15 13:55 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph, richard.sandiford

> I think the code should instead be a fused addition and truncation,
> a bit like FMA is a fused addition and multiplication.  Describing it as
> a DFmode addition followed by some conversion to SF would still involve
> double rounding.

In that case, something like FADD. But for functions like fsub, fmul
and fdiv that does similar computation, wouldn't we need more
operation codes for them?
Is it possible to have something generalized that does *arithmetic
computation (rather than just addition)* and then *conversion
(narrowing)*? just a thought.

Thanks,
Tejas


On Thu, 15 Aug 2019 at 18:17, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Tejas Joshi <tejasjoshi9673@gmail.com> writes:
> > Hello.
> > I just wanted to make sure that I am looking at the correct code here.
> > Except for rtl.def where I should be introducing something like
> > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints
> > set on functions around expr.c, cfgexpand.c where I grep for
> > float_truncate/FLOAT_TRUNCATE did not hit.
> > Also, in what manner should float_contract/narrow be different from
> > float_truncate as both are trying to do similar things? (truncation
> > from DF to SF)
>
> I think the code should instead be a fused addition and truncation,
> a bit like FMA is a fused addition and multiplication.  Describing it as
> a DFmode addition followed by some conversion to SF would still involve
> double rounding.
>
> simplify-rtx.c is probably the most important place to handle it.
> It would be easiest to test using the selftests at the end of the file.
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-15 12:47                                         ` Richard Sandiford
  2019-08-15 13:55                                           ` Tejas Joshi
@ 2019-08-15 18:45                                           ` Segher Boessenkool
  2019-08-16 10:23                                             ` Richard Sandiford
  1 sibling, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-15 18:45 UTC (permalink / raw)
  To: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph, richard.sandiford

On Thu, Aug 15, 2019 at 01:47:47PM +0100, Richard Sandiford wrote:
> Tejas Joshi <tejasjoshi9673@gmail.com> writes:
> > Hello.
> > I just wanted to make sure that I am looking at the correct code here.
> > Except for rtl.def where I should be introducing something like
> > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints

I like that "float_narrow" name :-)

> > set on functions around expr.c, cfgexpand.c where I grep for
> > float_truncate/FLOAT_TRUNCATE did not hit.
> > Also, in what manner should float_contract/narrow be different from
> > float_truncate as both are trying to do similar things? (truncation
> > from DF to SF)
> 
> I think the code should instead be a fused addition and truncation,
> a bit like FMA is a fused addition and multiplication.  Describing it as
> a DFmode addition followed by some conversion to SF would still involve
> double rounding.

How so?  It would *mean* there is only single rounding, even!  That's
the whole point of it.

> simplify-rtx.c is probably the most important place to handle it.
> It would be easiest to test using the selftests at the end of the file.


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-15  9:52                                       ` Tejas Joshi
  2019-08-15 12:47                                         ` Richard Sandiford
@ 2019-08-15 18:54                                         ` Segher Boessenkool
  1 sibling, 0 replies; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-15 18:54 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

On Thu, Aug 15, 2019 at 03:29:03PM +0530, Tejas Joshi wrote:
> Also, in what manner should float_contract/narrow be different from
> float_truncate as both are trying to do similar things? (truncation
> from DF to SF)

It's just a different name, nothing more, nothing less.  Because it is
a different name it can not be accidentally generated from actual
truncations.


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-15 18:45                                           ` Segher Boessenkool
@ 2019-08-16 10:23                                             ` Richard Sandiford
  2019-08-17  5:40                                               ` Tejas Joshi
  0 siblings, 1 reply; 63+ messages in thread
From: Richard Sandiford @ 2019-08-16 10:23 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph

Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Thu, Aug 15, 2019 at 01:47:47PM +0100, Richard Sandiford wrote:
>> Tejas Joshi <tejasjoshi9673@gmail.com> writes:
>> > Hello.
>> > I just wanted to make sure that I am looking at the correct code here.
>> > Except for rtl.def where I should be introducing something like
>> > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints
>
> I like that "float_narrow" name :-)
>
>> > set on functions around expr.c, cfgexpand.c where I grep for
>> > float_truncate/FLOAT_TRUNCATE did not hit.
>> > Also, in what manner should float_contract/narrow be different from
>> > float_truncate as both are trying to do similar things? (truncation
>> > from DF to SF)
>> 
>> I think the code should instead be a fused addition and truncation,
>> a bit like FMA is a fused addition and multiplication.  Describing it as
>> a DFmode addition followed by some conversion to SF would still involve
>> double rounding.
>
> How so?  It would *mean* there is only single rounding, even!  That's
> the whole point of it.

But a PLUS should behave as a PLUS in any context.  Making its
behaviour dependent on the containing rtxes (if any) would be a
can of worms.

Richard

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-16 10:23                                             ` Richard Sandiford
@ 2019-08-17  5:40                                               ` Tejas Joshi
  2019-08-17  8:21                                                 ` Richard Sandiford
  0 siblings, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-08-17  5:40 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph

Hi,

> It's just a different name, nothing more, nothing less.  Because it is
> a different name it can not be accidentally generated from actual
> truncations.

I have introduced float_narrow but I could not find appropriate places
to generate it for a call to fadd instead it to generate a CALL. I
used GDB to set breakpoints which hit fold_rtx and cse_insn but I got
confused with the rtx codes and passes which generate respective RTL.
It should not be similar to FLOAT_TRUNCATE if we want to avoid it
generating for actual truncations?

Thanks,
Tejas


On Fri, 16 Aug 2019 at 15:53, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Thu, Aug 15, 2019 at 01:47:47PM +0100, Richard Sandiford wrote:
> >> Tejas Joshi <tejasjoshi9673@gmail.com> writes:
> >> > Hello.
> >> > I just wanted to make sure that I am looking at the correct code here.
> >> > Except for rtl.def where I should be introducing something like
> >> > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints
> >
> > I like that "float_narrow" name :-)
> >
> >> > set on functions around expr.c, cfgexpand.c where I grep for
> >> > float_truncate/FLOAT_TRUNCATE did not hit.
> >> > Also, in what manner should float_contract/narrow be different from
> >> > float_truncate as both are trying to do similar things? (truncation
> >> > from DF to SF)
> >>
> >> I think the code should instead be a fused addition and truncation,
> >> a bit like FMA is a fused addition and multiplication.  Describing it as
> >> a DFmode addition followed by some conversion to SF would still involve
> >> double rounding.
> >
> > How so?  It would *mean* there is only single rounding, even!  That's
> > the whole point of it.
>
> But a PLUS should behave as a PLUS in any context.  Making its
> behaviour dependent on the containing rtxes (if any) would be a
> can of worms.
>
> Richard

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-17  5:40                                               ` Tejas Joshi
@ 2019-08-17  8:21                                                 ` Richard Sandiford
  2019-08-19 10:46                                                   ` Tejas Joshi
  2019-08-19 13:07                                                   ` Segher Boessenkool
  0 siblings, 2 replies; 63+ messages in thread
From: Richard Sandiford @ 2019-08-17  8:21 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, segher, joseph

Tejas Joshi <tejasjoshi9673@gmail.com> writes:
> Hi,
>
>> It's just a different name, nothing more, nothing less.  Because it is
>> a different name it can not be accidentally generated from actual
>> truncations.
>
> I have introduced float_narrow but I could not find appropriate places
> to generate it for a call to fadd instead it to generate a CALL. I
> used GDB to set breakpoints which hit fold_rtx and cse_insn but I got
> confused with the rtx codes and passes which generate respective RTL.
> It should not be similar to FLOAT_TRUNCATE if we want to avoid it
> generating for actual truncations?

Please don't do it this way.  The whole point of the work is that this
is a single operation that cannot be modelled as a post-processing of
a normal double addition result.  It's a single operation at the source
level, a single IFN, a single optab, and a single instruction.  Splitting
it apart into two operations for rtl only, and making it look in rtl terms
like a post-processing of a normal addition result, seems like it's going
to come back to bite us.

In lisp terms we're saying that the operand to the float_narrow is
implicitly quoted:

  (float_narrow:m '(plus:n a b))

so that when float_narrow is evaluated, the argument is the unevaluated
rtl expression "(plus a b)" rather than the evaluated result a + b.
float_narrow then does its own evaluation of a and b and performs a
fused addition and narrowing on the result.

No other rtx rvalue works like this.  rtx nappings like simplification
or evaluation are normally depth-first, so that the mapping is applied
to the operands first, and then the root is mapped/simplified/evaluated
with the results.  Adding implicit lisp quoting would require special
cases in these routines for float_narrow.

The only current analogue I can think of for this is the handling
of zero_extend on const_ints.  Because const_ints are modeless, we have
to avoid cases in which the recursion produces things like:

  (zero_extend:m (const_int -1))

because it's no longer clear what mode the zero_extend is extending from.
But I think that's seen as a wart of having modeless const_ints.  I don't
think it's something we should actively embrace by adding float_narrow.

Using float_narrow would also be inconsistent with the way we handle
saturating arithmetic.  There we use US_PLUS and SS_PLUS rtx codes for
unsigned and signed saturating plus respectively, rather than:

  (unsigned_sat '(plus a b))
  (signed_sat '(plus a b))

Using dedicated codes might seem clunky.  But it's simple, safe, and fits
the existing model without special cases. :-)

Thanks,
Richard

>
> Thanks,
> Tejas
>
>
> On Fri, 16 Aug 2019 at 15:53, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>> > On Thu, Aug 15, 2019 at 01:47:47PM +0100, Richard Sandiford wrote:
>> >> Tejas Joshi <tejasjoshi9673@gmail.com> writes:
>> >> > Hello.
>> >> > I just wanted to make sure that I am looking at the correct code here.
>> >> > Except for rtl.def where I should be introducing something like
>> >> > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints
>> >
>> > I like that "float_narrow" name :-)
>> >
>> >> > set on functions around expr.c, cfgexpand.c where I grep for
>> >> > float_truncate/FLOAT_TRUNCATE did not hit.
>> >> > Also, in what manner should float_contract/narrow be different from
>> >> > float_truncate as both are trying to do similar things? (truncation
>> >> > from DF to SF)
>> >>
>> >> I think the code should instead be a fused addition and truncation,
>> >> a bit like FMA is a fused addition and multiplication.  Describing it as
>> >> a DFmode addition followed by some conversion to SF would still involve
>> >> double rounding.
>> >
>> > How so?  It would *mean* there is only single rounding, even!  That's
>> > the whole point of it.
>>
>> But a PLUS should behave as a PLUS in any context.  Making its
>> behaviour dependent on the containing rtxes (if any) would be a
>> can of worms.
>>
>> Richard

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-17  8:21                                                 ` Richard Sandiford
@ 2019-08-19 10:46                                                   ` Tejas Joshi
  2019-08-19 13:07                                                   ` Segher Boessenkool
  1 sibling, 0 replies; 63+ messages in thread
From: Tejas Joshi @ 2019-08-19 10:46 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph, richard.sandiford

> but an unspec is of course easiest for now.

So, at this point, should I proceed with UNSPEC considering the
complications that might arise as Richard points out?


On Sat, 17 Aug 2019 at 13:51, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Tejas Joshi <tejasjoshi9673@gmail.com> writes:
> > Hi,
> >
> >> It's just a different name, nothing more, nothing less.  Because it is
> >> a different name it can not be accidentally generated from actual
> >> truncations.
> >
> > I have introduced float_narrow but I could not find appropriate places
> > to generate it for a call to fadd instead it to generate a CALL. I
> > used GDB to set breakpoints which hit fold_rtx and cse_insn but I got
> > confused with the rtx codes and passes which generate respective RTL.
> > It should not be similar to FLOAT_TRUNCATE if we want to avoid it
> > generating for actual truncations?
>
> Please don't do it this way.  The whole point of the work is that this
> is a single operation that cannot be modelled as a post-processing of
> a normal double addition result.  It's a single operation at the source
> level, a single IFN, a single optab, and a single instruction.  Splitting
> it apart into two operations for rtl only, and making it look in rtl terms
> like a post-processing of a normal addition result, seems like it's going
> to come back to bite us.
>
> In lisp terms we're saying that the operand to the float_narrow is
> implicitly quoted:
>
>   (float_narrow:m '(plus:n a b))
>
> so that when float_narrow is evaluated, the argument is the unevaluated
> rtl expression "(plus a b)" rather than the evaluated result a + b.
> float_narrow then does its own evaluation of a and b and performs a
> fused addition and narrowing on the result.
>
> No other rtx rvalue works like this.  rtx nappings like simplification
> or evaluation are normally depth-first, so that the mapping is applied
> to the operands first, and then the root is mapped/simplified/evaluated
> with the results.  Adding implicit lisp quoting would require special
> cases in these routines for float_narrow.
>
> The only current analogue I can think of for this is the handling
> of zero_extend on const_ints.  Because const_ints are modeless, we have
> to avoid cases in which the recursion produces things like:
>
>   (zero_extend:m (const_int -1))
>
> because it's no longer clear what mode the zero_extend is extending from.
> But I think that's seen as a wart of having modeless const_ints.  I don't
> think it's something we should actively embrace by adding float_narrow.
>
> Using float_narrow would also be inconsistent with the way we handle
> saturating arithmetic.  There we use US_PLUS and SS_PLUS rtx codes for
> unsigned and signed saturating plus respectively, rather than:
>
>   (unsigned_sat '(plus a b))
>   (signed_sat '(plus a b))
>
> Using dedicated codes might seem clunky.  But it's simple, safe, and fits
> the existing model without special cases. :-)
>
> Thanks,
> Richard
>
> >
> > Thanks,
> > Tejas
> >
> >
> > On Fri, 16 Aug 2019 at 15:53, Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Segher Boessenkool <segher@kernel.crashing.org> writes:
> >> > On Thu, Aug 15, 2019 at 01:47:47PM +0100, Richard Sandiford wrote:
> >> >> Tejas Joshi <tejasjoshi9673@gmail.com> writes:
> >> >> > Hello.
> >> >> > I just wanted to make sure that I am looking at the correct code here.
> >> >> > Except for rtl.def where I should be introducing something like
> >> >> > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints
> >> >
> >> > I like that "float_narrow" name :-)
> >> >
> >> >> > set on functions around expr.c, cfgexpand.c where I grep for
> >> >> > float_truncate/FLOAT_TRUNCATE did not hit.
> >> >> > Also, in what manner should float_contract/narrow be different from
> >> >> > float_truncate as both are trying to do similar things? (truncation
> >> >> > from DF to SF)
> >> >>
> >> >> I think the code should instead be a fused addition and truncation,
> >> >> a bit like FMA is a fused addition and multiplication.  Describing it as
> >> >> a DFmode addition followed by some conversion to SF would still involve
> >> >> double rounding.
> >> >
> >> > How so?  It would *mean* there is only single rounding, even!  That's
> >> > the whole point of it.
> >>
> >> But a PLUS should behave as a PLUS in any context.  Making its
> >> behaviour dependent on the containing rtxes (if any) would be a
> >> can of worms.
> >>
> >> Richard

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-17  8:21                                                 ` Richard Sandiford
  2019-08-19 10:46                                                   ` Tejas Joshi
@ 2019-08-19 13:07                                                   ` Segher Boessenkool
  2019-08-20  7:41                                                     ` Richard Sandiford
  1 sibling, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-19 13:07 UTC (permalink / raw)
  To: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph, richard.sandiford

Hi Richard,

On Sat, Aug 17, 2019 at 09:21:00AM +0100, Richard Sandiford wrote:
> Tejas Joshi <tejasjoshi9673@gmail.com> writes:
> >> It's just a different name, nothing more, nothing less.  Because it is
> >> a different name it can not be accidentally generated from actual
> >> truncations.
> >
> > I have introduced float_narrow but I could not find appropriate places
> > to generate it for a call to fadd instead it to generate a CALL. I
> > used GDB to set breakpoints which hit fold_rtx and cse_insn but I got
> > confused with the rtx codes and passes which generate respective RTL.
> > It should not be similar to FLOAT_TRUNCATE if we want to avoid it
> > generating for actual truncations?
> 
> Please don't do it this way.  The whole point of the work is that this
> is a single operation that cannot be modelled as a post-processing of
> a normal double addition result.  It's a single operation at the source
> level, a single IFN, a single optab, and a single instruction.  Splitting
> it apart into two operations for rtl only, and making it look in rtl terms
> like a post-processing of a normal addition result, seems like it's going
> to come back to bite us.
> 
> In lisp terms we're saying that the operand to the float_narrow is
> implicitly quoted:
> 
>   (float_narrow:m '(plus:n a b))
> 
> so that when float_narrow is evaluated, the argument is the unevaluated
> rtl expression "(plus a b)" rather than the evaluated result a + b.
> float_narrow then does its own evaluation of a and b and performs a
> fused addition and narrowing on the result.

RTL isn't Lisp.  RTL doesn't have quotations.  RTL doesn't have
*evaluation*.

RTL is just a data structure that describes your program instructions.
A large part of what means what is system-specific.  Rounding of floating
point is not defined, for example.

And yes, various parts of GCC can manipulate RTL, doing substitution and
algebraic simplication and whatnot.  All within the rules of RTL.  And
that means nothing ever can "pass" a float_narrow, because there are no
rules that allow it to.

> No other rtx rvalue works like this.

A lot of unspecs are used like this, for example.

> Using float_narrow would also be inconsistent with the way we handle
> saturating arithmetic.  There we use US_PLUS and SS_PLUS rtx codes for
> unsigned and signed saturating plus respectively, rather than:
> 
>   (unsigned_sat '(plus a b))
>   (signed_sat '(plus a b))
> 
> Using dedicated codes might seem clunky.  But it's simple, safe, and fits
> the existing model without special cases. :-)

And you need many many more RTX codes, which you will not handle in
almost all places, because there are too many.

I agree this construct is not as nice as could be hoped for.  I don't
agree that 60 new RTX codes is an acceptable solution (or that that will
ever really work out, even).

It would be nice if somehow we could make a variant of RTL codes, so that
we could have nice and simple code that applies to all variants of some
code.  Not sure how that would work out.  Maybe we don't have to do this
very generically, how often will we need this anyway?

I have three examples so far:
1) Saturating arithmetic;
2) This float_narrow thing;
3) Ordered compares, that is, fp compares that set an exception on NaNs.

Something that works for all three would be nice!

Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-19 13:07                                                   ` Segher Boessenkool
@ 2019-08-20  7:41                                                     ` Richard Sandiford
  2019-08-20 12:11                                                       ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Richard Sandiford @ 2019-08-20  7:41 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph

Tejas: given the controversy, I agree unspecs sound like a good approach
for now.  We can always go back and add the rtx codes later once there's
agreement on what they should look like.

Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Sat, Aug 17, 2019 at 09:21:00AM +0100, Richard Sandiford wrote:
>> Tejas Joshi <tejasjoshi9673@gmail.com> writes:
>> >> It's just a different name, nothing more, nothing less.  Because it is
>> >> a different name it can not be accidentally generated from actual
>> >> truncations.
>> >
>> > I have introduced float_narrow but I could not find appropriate places
>> > to generate it for a call to fadd instead it to generate a CALL. I
>> > used GDB to set breakpoints which hit fold_rtx and cse_insn but I got
>> > confused with the rtx codes and passes which generate respective RTL.
>> > It should not be similar to FLOAT_TRUNCATE if we want to avoid it
>> > generating for actual truncations?
>> 
>> Please don't do it this way.  The whole point of the work is that this
>> is a single operation that cannot be modelled as a post-processing of
>> a normal double addition result.  It's a single operation at the source
>> level, a single IFN, a single optab, and a single instruction.  Splitting
>> it apart into two operations for rtl only, and making it look in rtl terms
>> like a post-processing of a normal addition result, seems like it's going
>> to come back to bite us.
>> 
>> In lisp terms we're saying that the operand to the float_narrow is
>> implicitly quoted:
>> 
>>   (float_narrow:m '(plus:n a b))
>> 
>> so that when float_narrow is evaluated, the argument is the unevaluated
>> rtl expression "(plus a b)" rather than the evaluated result a + b.
>> float_narrow then does its own evaluation of a and b and performs a
>> fused addition and narrowing on the result.
>
> RTL isn't Lisp.

Right.  But it's heavily influenced by lisp, so I was using quoting to
explain why I don't think the code is a good fit.

> RTL doesn't have quotations.

I'd like to keep it that way for rvalues :-)

> RTL doesn't have *evaluation*.

But we can (and do) evaluate some rtxes without target help.

> RTL is just a data structure that describes your program instructions.
> A large part of what means what is system-specific.  Rounding of floating
> point is not defined, for example.

Some of the semantics are target-specific, sure, with some of the details
controlled by hooks/macros and some left undefined.  But that's true to a
lesser extent of gimple too.

> And yes, various parts of GCC can manipulate RTL, doing substitution and
> algebraic simplication and whatnot.  All within the rules of RTL.  And
> that means nothing ever can "pass" a float_narrow, because there are no
> rules that allow it to.

You mean create a new float_narrow out of thin air, with no justification?
Sure, but I don't think that was ever the issue.

Or do you mean that target-independent code couldn't just use GET_RTX_FORMAT
to recurse on a float_narrow without first noting that it's a float_narrow
(and thus special)?  If so, then yeah, I agree that they wouldn't be
allowed to do that, which is essentially why I think it's a bad idea.

>> No other rtx rvalue works like this.
>
> A lot of unspecs are used like this, for example.

Unspecs don't have a quoting effect though.  I agree it's common to match
things like:

  (unspec:m [(plus:m ...)] UNSPEC_FOO)

But that doesn't have any quoting effect on the plus.  If the optimisers see:

  (unspec:m [(plus:m x y)] UNSPEC_FOO)

and know what x and y are, they can certainly fold this to:

  (unspec:m [(const_int N)] UNSPEC_FOO)

The result might not match an instruction, but it's still a valid
rtx and a valid thing to try.  A target would be in real trouble
if it allowed both, but with different semantics even for N==x+y.
(In constrast, having different semantics for N==x+y would be valid
if there was a quoting effect.)

Likewise if the optimisers see:

  (set (reg:m z) (plus:m x y))
  ...(unspec:m [(plus:m x y)] UNSPEC_FOO)...

they can create and try to match:

  ...(unspec:m [(reg:m z)] UNSPEC_FOO)...

Again, it might not match an instruction, but it's still a valid rtx and
a valid thing to try.

In other words, everything going into recog has to be valid rtx.
It just might not be a valid instruction.  And the .md files can't
make the target-independent code treat an operation as quoted.
All they can do is refuse to match simplified forms.

This is similar to things like (from mips.md):

(define_insn_and_split "<su>mulsi3_highpart_internal"
  [(set (match_operand:SI 0 "register_operand" "=d")
        (truncate:SI
         (lshiftrt:DI
          (mult:DI (any_extend:DI (match_operand:SI 1 "register_operand" "d"))
                   (any_extend:DI (match_operand:SI 2 "register_operand" "d")))
          (const_int 32))))
   (clobber (match_scratch:SI 3 "=l"))]

IIRC, the port has no highpart operation other than multiplication.
But there's again no quoting effect on the operands to the mult,
lshiftrt or truncate here, so if the optimisers knew that op2==2,
they could transform:

  [(set op0
        (truncate:SI
         (lshiftrt:DI
          (mult:DI (any_extend:DI op1)
                   (any_extend:DI op2))
          (const_int 32))))
   (clobber (scratch:SI))]

to:

  [(set op0
        (truncate:SI
         (lshiftrt:DI
          (plus:DI (any_extend:DI op1)
                   (any_extend:DI op1))
          (const_int 32))))
   (clobber (scratch:SI))]

Again, the instruction won't match, but it's still a valid rtx and
a valid transformation to try.

Going back to the unspec example: if at some point we added a target
hook for evaluating unspecs in the same way that we evaluate basic
arithmetic (might be useful!), the handling of UNSPEC_FOO wouldn't be
able to assert that the plus or whatever is there.  At best it could
punt evaluation when the plus isn't there, at the cost of losing
potentially useful optimisation.  (But to me, having to do that
smacks of a badly-designed unspec.  E.g. we use unspec wrappers around
operations a lot in the SVE port, but it would still be possible to
evaluate the unspec given fully-evaluated operands.)

float_narrow is different in that the plus (or whatever operation
it's quoting) has to be kept in-place rather than folded away,
otherwise the rtx itself is malformed and could trigger an ICE,
just like the zero_extend of a const_int that I mentioned.

>> Using float_narrow would also be inconsistent with the way we handle
>> saturating arithmetic.  There we use US_PLUS and SS_PLUS rtx codes for
>> unsigned and signed saturating plus respectively, rather than:
>> 
>>   (unsigned_sat '(plus a b))
>>   (signed_sat '(plus a b))
>> 
>> Using dedicated codes might seem clunky.  But it's simple, safe, and fits
>> the existing model without special cases. :-)
>
> And you need many many more RTX codes, which you will not handle in
> almost all places, because there are too many.
>
>
> I agree this construct is not as nice as could be hoped for.  I don't
> agree that 60 new RTX codes is an acceptable solution (or that that will
> ever really work out, even).

60 sounds a high number. :-)  Do we really have that many rtx codes with
a floating-point rounding effect?

Whatever the number is, we'll still be listing them individually for
built-in enumerations, internal_fn, and (I assume) optabs.  But maybe
after a certain point it does become too unwieldly for rtx codes.
We have to keep it within 16 bits at least...

> It would be nice if somehow we could make a variant of RTL codes, so that
> we could have nice and simple code that applies to all variants of some
> code.  Not sure how that would work out.  Maybe we don't have to do this
> very generically, how often will we need this anyway?
>
> I have three examples so far:
> 1) Saturating arithmetic;
> 2) This float_narrow thing;
> 3) Ordered compares, that is, fp compares that set an exception on NaNs.
>
> Something that works for all three would be nice!

Yeah, agree that sounds good.  Maybe we could bundle the code with some
flags.  Storage-wise, there should be room for that in the u2 field.

But there might still be cases in which it's useful to view the code+flags
as a combined supercode, e.g. for switch statements.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-20  7:41                                                     ` Richard Sandiford
@ 2019-08-20 12:11                                                       ` Segher Boessenkool
  2019-08-20 12:59                                                         ` Richard Sandiford
  2019-08-20 16:04                                                         ` Joseph Myers
  0 siblings, 2 replies; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-20 12:11 UTC (permalink / raw)
  To: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph, richard.sandiford

On Tue, Aug 20, 2019 at 08:41:29AM +0100, Richard Sandiford wrote:
> Tejas: given the controversy, I agree unspecs sound like a good approach
> for now.  We can always go back and add the rtx codes later once there's
> agreement on what they should look like.

Yup.

> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Sat, Aug 17, 2019 at 09:21:00AM +0100, Richard Sandiford wrote:
> >> In lisp terms we're saying that the operand to the float_narrow is
> >> implicitly quoted:
> >> 
> >>   (float_narrow:m '(plus:n a b))
> >> 
> >> so that when float_narrow is evaluated, the argument is the unevaluated
> >> rtl expression "(plus a b)" rather than the evaluated result a + b.
> >> float_narrow then does its own evaluation of a and b and performs a
> >> fused addition and narrowing on the result.
> >
> > RTL isn't Lisp.
> 
> Right.  But it's heavily influenced by lisp, so I was using quoting to
> explain why I don't think the code is a good fit.
> 
> > RTL doesn't have quotations.
> 
> I'd like to keep it that way for rvalues :-)
> 
> > RTL doesn't have *evaluation*.
> 
> But we can (and do) evaluate some rtxes without target help.

We do?  Other than constant folding; there is nothing to evaluate in
constants anyway.  Or do you mean simplification?

There are rules what kind of transformations are allowed.  Many unwritten
of course :-/

> > RTL is just a data structure that describes your program instructions.
> > A large part of what means what is system-specific.  Rounding of floating
> > point is not defined, for example.
> 
> Some of the semantics are target-specific, sure, with some of the details
> controlled by hooks/macros and some left undefined.  But that's true to a
> lesser extent of gimple too.

Yes, gimple and RTL are not very different, at the core of things.

> > And yes, various parts of GCC can manipulate RTL, doing substitution and
> > algebraic simplication and whatnot.  All within the rules of RTL.  And
> > that means nothing ever can "pass" a float_narrow, because there are no
> > rules that allow it to.
> 
> You mean create a new float_narrow out of thin air, with no justification?
> Sure, but I don't think that was ever the issue.

No.  I mean that if you have

... (float_narrow:M (x:N))

it will always stay in that form, with just x changed.  Nothing can
change the float_narrow.

> Or do you mean that target-independent code couldn't just use GET_RTX_FORMAT
> to recurse on a float_narrow without first noting that it's a float_narrow
> (and thus special)?  If so, then yeah, I agree that they wouldn't be
> allowed to do that, which is essentially why I think it's a bad idea.

No, they can do that just fine.

> >> No other rtx rvalue works like this.
> >
> > A lot of unspecs are used like this, for example.
> 
> Unspecs don't have a quoting effect though.  I agree it's common to match
> things like:
> 
>   (unspec:m [(plus:m ...)] UNSPEC_FOO)
> 
> But that doesn't have any quoting effect on the plus.  If the optimisers see:
> 
>   (unspec:m [(plus:m x y)] UNSPEC_FOO)
> 
> and know what x and y are, they can certainly fold this to:
> 
>   (unspec:m [(const_int N)] UNSPEC_FOO)

An the exact same is true for the proposed float_narrow!

The compiler should not do this if FP_CONTRACT is off, which it has to
be for fadd etc.  too make sense at all, to not be optimised to a plain
add.

> This is similar to things like (from mips.md):
> 
> (define_insn_and_split "<su>mulsi3_highpart_internal"

Yeah, I did that for rs6000.  Lots and lots and lots of special cases :-P
(RTL represents things differently for BE and LE, and there are the various
sizes of operation, both with and without 64-bit insns).

>   [(set (match_operand:SI 0 "register_operand" "=d")
>         (truncate:SI
>          (lshiftrt:DI

(this is optimised to a subreg, in many cases, for example).

> Going back to the unspec example: if at some point we added a target
> hook for evaluating unspecs in the same way that we evaluate basic
> arithmetic (might be useful!), the handling of UNSPEC_FOO wouldn't be
> able to assert that the plus or whatever is there.  At best it could
> punt evaluation when the plus isn't there, at the cost of losing
> potentially useful optimisation.

Yes.  And as far as I can see float_narrow will still work.

> float_narrow is different in that the plus (or whatever operation
> it's quoting) has to be kept in-place rather than folded away,
> otherwise the rtx itself is malformed and could trigger an ICE,
> just like the zero_extend of a const_int that I mentioned.

Yes, it will not pass recog.  Structurally it is just hunky-dory though.

> > And you need many many more RTX codes, which you will not handle in
> > almost all places, because there are too many.
> >
> >
> > I agree this construct is not as nice as could be hoped for.  I don't
> > agree that 60 new RTX codes is an acceptable solution (or that that will
> > ever really work out, even).
> 
> 60 sounds a high number. :-)  Do we really have that many rtx codes with
> a floating-point rounding effect?

It was meant to sound high, heh.  If things need a variant A, and also a
variant B, then before you know it there is a variant A+B as well, and
you have unbridled growth.

plus minus neg mult div mod smin smax abs sqrt fma  I think?  And let's
hope we never ever have to do saturating versions of FP :-)

> Whatever the number is, we'll still be listing them individually for
> built-in enumerations, internal_fn, and (I assume) optabs.  But maybe
> after a certain point it does become too unwieldly for rtx codes.
> We have to keep it within 16 bits at least...

My main concern is all the (simplification) code that parses RTL.  All of
that will have to handle all variant versions as well.

> > It would be nice if somehow we could make a variant of RTL codes, so that
> > we could have nice and simple code that applies to all variants of some
> > code.  Not sure how that would work out.  Maybe we don't have to do this
> > very generically, how often will we need this anyway?
> >
> > I have three examples so far:
> > 1) Saturating arithmetic;
> > 2) This float_narrow thing;
> > 3) Ordered compares, that is, fp compares that set an exception on NaNs.
> >
> > Something that works for all three would be nice!
> 
> Yeah, agree that sounds good.  Maybe we could bundle the code with some
> flags.  Storage-wise, there should be room for that in the u2 field.
> 
> But there might still be cases in which it's useful to view the code+flags
> as a combined supercode, e.g. for switch statements.

Yeah...  Whether to make "code" or "code+flags" the more usual version is
the biggest design question then.  Oh, and what the rest of the interface
to this looks like ;-)


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-20 12:11                                                       ` Segher Boessenkool
@ 2019-08-20 12:59                                                         ` Richard Sandiford
  2019-08-20 13:46                                                           ` Segher Boessenkool
  2019-08-20 16:04                                                         ` Joseph Myers
  1 sibling, 1 reply; 63+ messages in thread
From: Richard Sandiford @ 2019-08-20 12:59 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph

Segher Boessenkool <segher@kernel.crashing.org> writes:
>> > And yes, various parts of GCC can manipulate RTL, doing substitution and
>> > algebraic simplication and whatnot.  All within the rules of RTL.  And
>> > that means nothing ever can "pass" a float_narrow, because there are no
>> > rules that allow it to.
>> 
>> You mean create a new float_narrow out of thin air, with no justification?
>> Sure, but I don't think that was ever the issue.
>
> No.  I mean that if you have
>
> ... (float_narrow:M (x:N))
>
> it will always stay in that form, with just x changed.  Nothing can
> change the float_narrow.

OK, I guessed wrong :-)  But it was the change to x that IMO was the
problem.  I wasn't worried about code changing the float_narrow itself
to random other stuff.

>>   [(set (match_operand:SI 0 "register_operand" "=d")
>>         (truncate:SI
>>          (lshiftrt:DI
>
> (this is optimised to a subreg, in many cases, for example).

Right.  MIPS avoids that one thanks to TARGET_TRULY_NOOP_TRUNCATION.

>> float_narrow is different in that the plus (or whatever operation
>> it's quoting) has to be kept in-place rather than folded away,
>> otherwise the rtx itself is malformed and could trigger an ICE,
>> just like the zero_extend of a const_int that I mentioned.
>
> Yes, it will not pass recog.  Structurally it is just hunky-dory though.

So maybe that's the main point of difference.  We're introducing
float_narrow to modify another rtx operation rather than to operate
on an rtx value.  So to me it makes no sense to say that:

  (float_narrow:SF (const_double:DF X))
  (float_narrow:SF (reg:DF X))
  (float_narrow:SF (mem:DF X))

are well-formed rtxes and just happen not to match any instructions.
Without an operation to modify they're meaningless on their own terms,
regardless of what the target says about it.  Just like:

  (unsigned_saturate:QI (reg:QI X))

would be meaningless if we modelled saturation this way.

There's no way you can go from a normal unsaturated result to the
equivalent saturated result without knowing which operation was
performed, and on which operands.  This isn't a choice for targets
to make even in principle, just like it isn't for my favourite
(zero_extend:m (const_int -1)) example.

>> > And you need many many more RTX codes, which you will not handle in
>> > almost all places, because there are too many.
>> >
>> >
>> > I agree this construct is not as nice as could be hoped for.  I don't
>> > agree that 60 new RTX codes is an acceptable solution (or that that will
>> > ever really work out, even).
>> 
>> 60 sounds a high number. :-)  Do we really have that many rtx codes with
>> a floating-point rounding effect?
>
> It was meant to sound high, heh.  If things need a variant A, and also a
> variant B, then before you know it there is a variant A+B as well, and
> you have unbridled growth.
>
> plus minus neg mult div mod smin smax abs sqrt fma  I think?  And let's
> hope we never ever have to do saturating versions of FP :-)

neg, abs, smin and smax shouldn't do rounding AFAIK.  But yeah, the rest
look plausible.

That is only 7 though :-)  Unless I counted wrong.

Not that I'm saying I like adding codes for each one either.  It just
doesn't seem that bad (and definitely better than float_narrow IMO).

>> Whatever the number is, we'll still be listing them individually for
>> built-in enumerations, internal_fn, and (I assume) optabs.  But maybe
>> after a certain point it does become too unwieldly for rtx codes.
>> We have to keep it within 16 bits at least...
>
> My main concern is all the (simplification) code that parses RTL.  All of
> that will have to handle all variant versions as well.

True, but we'd have to err on the side of caution whatever happens.
Not all existing PLUS simplifications necessarily apply as-is.

Thanks,
Richard

>> > It would be nice if somehow we could make a variant of RTL codes, so that
>> > we could have nice and simple code that applies to all variants of some
>> > code.  Not sure how that would work out.  Maybe we don't have to do this
>> > very generically, how often will we need this anyway?
>> >
>> > I have three examples so far:
>> > 1) Saturating arithmetic;
>> > 2) This float_narrow thing;
>> > 3) Ordered compares, that is, fp compares that set an exception on NaNs.
>> >
>> > Something that works for all three would be nice!
>> 
>> Yeah, agree that sounds good.  Maybe we could bundle the code with some
>> flags.  Storage-wise, there should be room for that in the u2 field.
>> 
>> But there might still be cases in which it's useful to view the code+flags
>> as a combined supercode, e.g. for switch statements.
>
> Yeah...  Whether to make "code" or "code+flags" the more usual version is
> the biggest design question then.  Oh, and what the rest of the interface
> to this looks like ;-)
>
>
> Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-20 12:59                                                         ` Richard Sandiford
@ 2019-08-20 13:46                                                           ` Segher Boessenkool
  2019-08-20 14:43                                                             ` Richard Sandiford
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-20 13:46 UTC (permalink / raw)
  To: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph, richard.sandiford

On Tue, Aug 20, 2019 at 01:59:06PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> >>   [(set (match_operand:SI 0 "register_operand" "=d")
> >>         (truncate:SI
> >>          (lshiftrt:DI
> >
> > (this is optimised to a subreg, in many cases, for example).
> 
> Right.  MIPS avoids that one thanks to TARGET_TRULY_NOOP_TRUNCATION.

Trying 10 -> 18:
   10: r200:TI=zero_extend(r204:DI)*zero_extend(r205:DI)
      REG_DEAD r205:DI
      REG_DEAD r204:DI
   18: $2:DI=r200:TI#0
      REG_DEAD r200:TI
Failed to match this instruction:
(set (reg/i:DI 2 $2)
    (subreg:DI (mult:TI (zero_extend:TI (reg:DI 204))
            (zero_extend:TI (reg:DI 205))) 0))

I'm afraid not.

This was
mips64-linux-gcc -Wall -W -O2 -S mulh.c -mips64 -mabi=64 -fdump-rtl-combine-all
on
===
typedef unsigned long S;
typedef unsigned __int128 D;

S mulh(S a, S b) { return (D)a*b >> (8*sizeof(S)); }
===

> >> float_narrow is different in that the plus (or whatever operation
> >> it's quoting) has to be kept in-place rather than folded away,
> >> otherwise the rtx itself is malformed and could trigger an ICE,
> >> just like the zero_extend of a const_int that I mentioned.
> >
> > Yes, it will not pass recog.  Structurally it is just hunky-dory though.
> 
> So maybe that's the main point of difference.  We're introducing
> float_narrow to modify another rtx operation rather than to operate
> on an rtx value.

I wouldn't say it "operates" on anything.  A float_narrow rtx means the
thing inside it does single-rounding to SP float.  And it is just
notation: RTL itself knows *nothing* about float rounding, and because
of the way this is structured, nothing can change anything about the
float_narrow.

And yes, it is icky.  But it is sound, as far as I can see.

> >> Whatever the number is, we'll still be listing them individually for
> >> built-in enumerations, internal_fn, and (I assume) optabs.  But maybe
> >> after a certain point it does become too unwieldly for rtx codes.
> >> We have to keep it within 16 bits at least...
> >
> > My main concern is all the (simplification) code that parses RTL.  All of
> > that will have to handle all variant versions as well.
> 
> True, but we'd have to err on the side of caution whatever happens.

Yes.

> Not all existing PLUS simplifications necessarily apply as-is.

Yes.  Everything will have to be checked.  But not everything will
have to be modified, if we pick the defaults carefully.  I hope.  :-)


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-20 13:46                                                           ` Segher Boessenkool
@ 2019-08-20 14:43                                                             ` Richard Sandiford
  2019-08-20 15:12                                                               ` Richard Sandiford
  2019-08-20 19:42                                                               ` Segher Boessenkool
  0 siblings, 2 replies; 63+ messages in thread
From: Richard Sandiford @ 2019-08-20 14:43 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph

Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Tue, Aug 20, 2019 at 01:59:06PM +0100, Richard Sandiford wrote:
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>> >>   [(set (match_operand:SI 0 "register_operand" "=d")
>> >>         (truncate:SI
>> >>          (lshiftrt:DI
>> >
>> > (this is optimised to a subreg, in many cases, for example).
>> 
>> Right.  MIPS avoids that one thanks to TARGET_TRULY_NOOP_TRUNCATION.
>
> Trying 10 -> 18:
>    10: r200:TI=zero_extend(r204:DI)*zero_extend(r205:DI)
>       REG_DEAD r205:DI
>       REG_DEAD r204:DI
>    18: $2:DI=r200:TI#0
>       REG_DEAD r200:TI
> Failed to match this instruction:
> (set (reg/i:DI 2 $2)
>     (subreg:DI (mult:TI (zero_extend:TI (reg:DI 204))
>             (zero_extend:TI (reg:DI 205))) 0))
>
> I'm afraid not.

That's TI->DI though, whereas the pattern above is DI->SI.  The modes
matter :-)  There'd also need to be a shift to match a highpart pattern.

>> >> float_narrow is different in that the plus (or whatever operation
>> >> it's quoting) has to be kept in-place rather than folded away,
>> >> otherwise the rtx itself is malformed and could trigger an ICE,
>> >> just like the zero_extend of a const_int that I mentioned.
>> >
>> > Yes, it will not pass recog.  Structurally it is just hunky-dory though.
>> 
>> So maybe that's the main point of difference.  We're introducing
>> float_narrow to modify another rtx operation rather than to operate
>> on an rtx value.
>
> I wouldn't say it "operates" on anything.  A float_narrow rtx means the
> thing inside it does single-rounding to SP float.  And it is just
> notation: RTL itself knows *nothing* about float rounding, and because
> of the way this is structured, nothing can change anything about the
> float_narrow.

I wouldn't say it knows nothing about rounding.  It doesn't know
what the runtime rounding mode is, but that isn't the same thing.
(Just like not knowing what (mem:SI (sp)) contains isn't the same
thing as not knowing anything about stack memory.)

Besides, how much depends on target-independent code not knowing what
the rounding mode is?  Do you think float_narrow would still make
sense even if more information was available at compile time
(e.g. if a plus could be annotated with a specific rounding mode)?
Or is not knowing the rounding mode a fundamental part of float_narrow
being OK for you?

> And yes, it is icky.  But it is sound, as far as I can see.

I really disagree that it's sound, but no point me saying why again :-)

(It could certainly be made to work with sufficient hacks of course,
like pretty much anything could, but I don't think that's the same thing.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-20 14:43                                                             ` Richard Sandiford
@ 2019-08-20 15:12                                                               ` Richard Sandiford
  2019-08-20 19:42                                                               ` Segher Boessenkool
  1 sibling, 0 replies; 63+ messages in thread
From: Richard Sandiford @ 2019-08-20 15:12 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph

Richard Sandiford <richard.sandiford@arm.com> writes:
>> And yes, it is icky.  But it is sound, as far as I can see.
>
> I really disagree that it's sound, but no point me saying why again :-)
>
> (It could certainly be made to work with sufficient hacks of course,
> like pretty much anything could, but I don't think that's the same thing.)

For an example, we have:

      /* Maybe simplify x + 0 to x.  The two expressions are equivalent
	 when x is NaN, infinite, or finite and nonzero.  They aren't
	 when x is -0 and the rounding mode is not towards -infinity,
	 since (-0) + 0 is then 0.  */
      if (!HONOR_SIGNED_ZEROS (mode) && trueop1 == CONST0_RTX (mode))
	return op0;

I think it's plausible that people will care about accurate rounding
but not signed zeroes.  In that mode we could have:

    (set (reg:DF r3) (plus:DF (reg:DF r1) (reg:DF r2)))
    (set (reg:DF r4) (const_double:DF 0.0))
    (set (reg:SF r5) (float_narrow:SF (plus:DF (reg:DF r3) (reg:DF r4))))

Then combine through normal structural simplification could (with the
rule above) fold all this down to:

    (set (reg:SF r5) (float_narrow:SF (plus:DF (reg:DF r1) (reg:DF r2))))

where the truncation is now fused with r1+r2 instead of r3+r4.  We would
have to have to add specific checks to avoid this happening, it wouldn't
fall out naturally from structural PoV.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-20 12:11                                                       ` Segher Boessenkool
  2019-08-20 12:59                                                         ` Richard Sandiford
@ 2019-08-20 16:04                                                         ` Joseph Myers
  1 sibling, 0 replies; 63+ messages in thread
From: Joseph Myers @ 2019-08-20 16:04 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Tejas Joshi, gcc, Martin Jambor, hubicka, richard.sandiford

On Tue, 20 Aug 2019, Segher Boessenkool wrote:

> plus minus neg mult div mod smin smax abs sqrt fma  I think?  And let's
> hope we never ever have to do saturating versions of FP :-)

There are six operations with narrowing versions in TS 18661-1 (plus minus 
mult div sqrt fma).

neg and abs are operations that do no rounding, raise no exceptions and 
preserve signaling NaNs (affecting their sign bit appropriately).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-20 14:43                                                             ` Richard Sandiford
  2019-08-20 15:12                                                               ` Richard Sandiford
@ 2019-08-20 19:42                                                               ` Segher Boessenkool
  2019-08-21 17:20                                                                 ` Tejas Joshi
  1 sibling, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-20 19:42 UTC (permalink / raw)
  To: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph, richard.sandiford

On Tue, Aug 20, 2019 at 03:43:43PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Tue, Aug 20, 2019 at 01:59:06PM +0100, Richard Sandiford wrote:
> >> Segher Boessenkool <segher@kernel.crashing.org> writes:
> >> >>   [(set (match_operand:SI 0 "register_operand" "=d")
> >> >>         (truncate:SI
> >> >>          (lshiftrt:DI
> >> >
> >> > (this is optimised to a subreg, in many cases, for example).
> >> 
> >> Right.  MIPS avoids that one thanks to TARGET_TRULY_NOOP_TRUNCATION.
> >
> > Trying 10 -> 18:
> >    10: r200:TI=zero_extend(r204:DI)*zero_extend(r205:DI)
> >       REG_DEAD r205:DI
> >       REG_DEAD r204:DI
> >    18: $2:DI=r200:TI#0
> >       REG_DEAD r200:TI
> > Failed to match this instruction:
> > (set (reg/i:DI 2 $2)
> >     (subreg:DI (mult:TI (zero_extend:TI (reg:DI 204))
> >             (zero_extend:TI (reg:DI 205))) 0))
> >
> > I'm afraid not.
> 
> That's TI->DI though, whereas the pattern above is DI->SI.  The modes
> matter :-)  There'd also need to be a shift to match a highpart pattern.

It's the same for 32-bit:

mips-linux-gcc -Wall -W -O2 -S mulh.c -mips32 -mabi=32
(I hope these options are reasonable?  I don't know MIPS well at all).

Trying 12 -> 20:
   12: r200:DI=zero_extend(r204:SI)*zero_extend(r205:SI)
      REG_DEAD r205:SI
      REG_DEAD r204:SI
   20: $2:SI=r200:DI#0
      REG_DEAD r200:DI
Failed to match this instruction:
(set (reg/i:SI 2 $2)
    (subreg:SI (mult:DI (zero_extend:DI (reg:SI 204))
            (zero_extend:DI (reg:SI 205))) 0))

The point is that this is the form that this insn is simplified to.  If
that form is not recognised by your backend, various optimisation
opportunities are missed.

> I wouldn't say it knows nothing about rounding.  It doesn't know
> what the runtime rounding mode is, but that isn't the same thing.
> (Just like not knowing what (mem:SI (sp)) contains isn't the same
> thing as not knowing anything about stack memory.)

Does it even know if the rounding mode is one of the IEEE FP rounding
modes?


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-20 19:42                                                               ` Segher Boessenkool
@ 2019-08-21 17:20                                                                 ` Tejas Joshi
  2019-08-21 18:28                                                                   ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-08-21 17:20 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph

Hello.
I have the following code which uses unspec but I am really missing
something here. Does unspec not work encapsulating plus? Or I have
some more places to make changes to?

(define_insn "add_truncdfsf3"
  [(set (match_operand:SF 0 "gpc_reg_operand" "=<Ff>,wa")
       (unspec:SF
       [(plus:DF (match_operand:DF 1 "gpc_reg_operand" "%<Ff>,wa")
                 (match_operand:DF 2 "gpc_reg_operand" "<Ff>,wa"))]
                  UNSPEC_ADD_TRUNCATE))]
  "TARGET_HARD_FLOAT"
  "@
   fadds %0,%1,%2
   xsaddsp %x0,%x1,%x2"
  [(set_attr "type" "fp")])

and an UNSPEC_ADD_TRUNCATE in unspec enum.

Thanks,
Tejas

On Wed, 21 Aug 2019 at 01:12, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Tue, Aug 20, 2019 at 03:43:43PM +0100, Richard Sandiford wrote:
> > Segher Boessenkool <segher@kernel.crashing.org> writes:
> > > On Tue, Aug 20, 2019 at 01:59:06PM +0100, Richard Sandiford wrote:
> > >> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > >> >>   [(set (match_operand:SI 0 "register_operand" "=d")
> > >> >>         (truncate:SI
> > >> >>          (lshiftrt:DI
> > >> >
> > >> > (this is optimised to a subreg, in many cases, for example).
> > >>
> > >> Right.  MIPS avoids that one thanks to TARGET_TRULY_NOOP_TRUNCATION.
> > >
> > > Trying 10 -> 18:
> > >    10: r200:TI=zero_extend(r204:DI)*zero_extend(r205:DI)
> > >       REG_DEAD r205:DI
> > >       REG_DEAD r204:DI
> > >    18: $2:DI=r200:TI#0
> > >       REG_DEAD r200:TI
> > > Failed to match this instruction:
> > > (set (reg/i:DI 2 $2)
> > >     (subreg:DI (mult:TI (zero_extend:TI (reg:DI 204))
> > >             (zero_extend:TI (reg:DI 205))) 0))
> > >
> > > I'm afraid not.
> >
> > That's TI->DI though, whereas the pattern above is DI->SI.  The modes
> > matter :-)  There'd also need to be a shift to match a highpart pattern.
>
> It's the same for 32-bit:
>
> mips-linux-gcc -Wall -W -O2 -S mulh.c -mips32 -mabi=32
> (I hope these options are reasonable?  I don't know MIPS well at all).
>
> Trying 12 -> 20:
>    12: r200:DI=zero_extend(r204:SI)*zero_extend(r205:SI)
>       REG_DEAD r205:SI
>       REG_DEAD r204:SI
>    20: $2:SI=r200:DI#0
>       REG_DEAD r200:DI
> Failed to match this instruction:
> (set (reg/i:SI 2 $2)
>     (subreg:SI (mult:DI (zero_extend:DI (reg:SI 204))
>             (zero_extend:DI (reg:SI 205))) 0))
>
> The point is that this is the form that this insn is simplified to.  If
> that form is not recognised by your backend, various optimisation
> opportunities are missed.
>
> > I wouldn't say it knows nothing about rounding.  It doesn't know
> > what the runtime rounding mode is, but that isn't the same thing.
> > (Just like not knowing what (mem:SI (sp)) contains isn't the same
> > thing as not knowing anything about stack memory.)
>
> Does it even know if the rounding mode is one of the IEEE FP rounding
> modes?
>
>
> Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-21 17:20                                                                 ` Tejas Joshi
@ 2019-08-21 18:28                                                                   ` Segher Boessenkool
  2019-08-21 19:17                                                                     ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-21 18:28 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

Hi Tejas,

On Wed, Aug 21, 2019 at 10:56:51PM +0530, Tejas Joshi wrote:
> I have the following code which uses unspec but I am really missing
> something here. Does unspec not work encapsulating plus? Or I have
> some more places to make changes to?
> 
> (define_insn "add_truncdfsf3"
>   [(set (match_operand:SF 0 "gpc_reg_operand" "=<Ff>,wa")
>        (unspec:SF
>        [(plus:DF (match_operand:DF 1 "gpc_reg_operand" "%<Ff>,wa")
>                  (match_operand:DF 2 "gpc_reg_operand" "<Ff>,wa"))]
>                   UNSPEC_ADD_TRUNCATE))]
>   "TARGET_HARD_FLOAT"
>   "@
>    fadds %0,%1,%2
>    xsaddsp %x0,%x1,%x2"
>   [(set_attr "type" "fp")])

This does almost exactly the same as what the proposed float_narrow
would do.  Instead, write it as

(define_insn "add_truncdfsf3"
  [(set (match_operand:SF 0 "gpc_reg_operand" "=<Ff>,wa")
	(unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%<Ff>,wa")
		    (match_operand:DF 2 "gpc_reg_operand" "<Ff>,wa")]
		   UNSPEC_ADD_TRUNCATE)]
  "TARGET_HARD_FLOAT"
  "@
   fadds %0,%1,%2
   xsaddsp %x0,%x1,%x2"
  [(set_attr "type" "fp")
   (set_attr "isa" "*,p8v")])

(note the "isa" attribute)


to prevent any folding etc. from happening to it.

> and an UNSPEC_ADD_TRUNCATE in unspec enum.

UNSPEC_ADD_NARROWING?


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-21 18:28                                                                   ` Segher Boessenkool
@ 2019-08-21 19:17                                                                     ` Segher Boessenkool
  2019-08-22  3:33                                                                       ` Tejas Joshi
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-21 19:17 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

On Wed, Aug 21, 2019 at 01:28:52PM -0500, Segher Boessenkool wrote:
> (define_insn "add_truncdfsf3"
>   [(set (match_operand:SF 0 "gpc_reg_operand" "=<Ff>,wa")
> 	(unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%<Ff>,wa")
> 		    (match_operand:DF 2 "gpc_reg_operand" "<Ff>,wa")]
> 		   UNSPEC_ADD_TRUNCATE)]
>   "TARGET_HARD_FLOAT"
>   "@
>    fadds %0,%1,%2
>    xsaddsp %x0,%x1,%x2"
>   [(set_attr "type" "fp")
>    (set_attr "isa" "*,p8v")])

And not <Ff>...  f, d, d respectively (f for SF, d for DF).


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-21 19:17                                                                     ` Segher Boessenkool
@ 2019-08-22  3:33                                                                       ` Tejas Joshi
  2019-08-22  6:25                                                                         ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-08-22  3:33 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph

> This does almost exactly the same as what the proposed float_narrow
> would do.  Instead, write it as
>
> (define_insn "add_truncdfsf3"
>   [(set (match_operand:SF 0 "gpc_reg_operand" "=<Ff>,wa")
>         (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%<Ff>,wa")
>                     (match_operand:DF 2 "gpc_reg_operand" "<Ff>,wa")]
>                    UNSPEC_ADD_TRUNCATE)]
>   "TARGET_HARD_FLOAT"
>   "@
>    fadds %0,%1,%2
>    xsaddsp %x0,%x1,%x2"
>   [(set_attr "type" "fp")
>    (set_attr "isa" "*,p8v")])

Yes, I tried basically every combination I could think of, just not
with the "isa attr". Now, I have the following code and it is still
seems not to be working. Am I missing any options to pass?

(define_insn "add_truncdfsf3"
  [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
          (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%d,wa")
                             (match_operand:DF 2 "gpc_reg_operand" "d,wa")]
                              UNSPEC_ADD_NARROWING))]
  "TARGET_HARD_FLOAT"
  "@
   fadds %0,%1,%2
   xsaddsp %x0,%x1,%x2"
  [(set_attr "type" "fp")
   (set_attr "isa" "*,p8v")])

with the code, I pass -O2 foo.c :
float
foo (double x, double y)
{
   return __builtin_fadd (x, y);
}

Thanks,
Tejas


On Thu, 22 Aug 2019 at 00:47, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Wed, Aug 21, 2019 at 01:28:52PM -0500, Segher Boessenkool wrote:
> > (define_insn "add_truncdfsf3"
> >   [(set (match_operand:SF 0 "gpc_reg_operand" "=<Ff>,wa")
> >       (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%<Ff>,wa")
> >                   (match_operand:DF 2 "gpc_reg_operand" "<Ff>,wa")]
> >                  UNSPEC_ADD_TRUNCATE)]
> >   "TARGET_HARD_FLOAT"
> >   "@
> >    fadds %0,%1,%2
> >    xsaddsp %x0,%x1,%x2"
> >   [(set_attr "type" "fp")
> >    (set_attr "isa" "*,p8v")])
>
> And not <Ff>...  f, d, d respectively (f for SF, d for DF).
>
>
> Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-22  3:33                                                                       ` Tejas Joshi
@ 2019-08-22  6:25                                                                         ` Segher Boessenkool
  2019-08-22  7:57                                                                           ` Tejas Joshi
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-22  6:25 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

Hi Tejas,

[ Please do not top-post. ]

On Thu, Aug 22, 2019 at 09:09:37AM +0530, Tejas Joshi wrote:
> Yes, I tried basically every combination I could think of, just not
> with the "isa attr". Now, I have the following code and it is still
> seems not to be working. Am I missing any options to pass?
> 
> (define_insn "add_truncdfsf3"
>   [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
>           (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%d,wa")
>                              (match_operand:DF 2 "gpc_reg_operand" "d,wa")]
>                               UNSPEC_ADD_NARROWING))]
>   "TARGET_HARD_FLOAT"
>   "@
>    fadds %0,%1,%2
>    xsaddsp %x0,%x1,%x2"
>   [(set_attr "type" "fp")
>    (set_attr "isa" "*,p8v")])
> 
> with the code, I pass -O2 foo.c :
> float
> foo (double x, double y)
> {
>    return __builtin_fadd (x, y);
> }

What happens then?  "It does not work" is very very vague.  At least it
seems the compiler does build now?


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-22  6:25                                                                         ` Segher Boessenkool
@ 2019-08-22  7:57                                                                           ` Tejas Joshi
  2019-08-22  9:56                                                                             ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-08-22  7:57 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, joseph, segher

> What happens then?  "It does not work" is very very vague.  At least it
> seems the compiler does build now?

Oh, compiler builds but instruction is still "bl fadd". It should be
"fadds" right?


On Thu, 22 Aug 2019 at 11:55, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi Tejas,
>
> [ Please do not top-post. ]
>
> On Thu, Aug 22, 2019 at 09:09:37AM +0530, Tejas Joshi wrote:
> > Yes, I tried basically every combination I could think of, just not
> > with the "isa attr". Now, I have the following code and it is still
> > seems not to be working. Am I missing any options to pass?
> >
> > (define_insn "add_truncdfsf3"
> >   [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
> >           (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%d,wa")
> >                              (match_operand:DF 2 "gpc_reg_operand" "d,wa")]
> >                               UNSPEC_ADD_NARROWING))]
> >   "TARGET_HARD_FLOAT"
> >   "@
> >    fadds %0,%1,%2
> >    xsaddsp %x0,%x1,%x2"
> >   [(set_attr "type" "fp")
> >    (set_attr "isa" "*,p8v")])
> >
> > with the code, I pass -O2 foo.c :
> > float
> > foo (double x, double y)
> > {
> >    return __builtin_fadd (x, y);
> > }
>
> What happens then?  "It does not work" is very very vague.  At least it
> seems the compiler does build now?
>
>
> Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-22  7:57                                                                           ` Tejas Joshi
@ 2019-08-22  9:56                                                                             ` Segher Boessenkool
  2019-08-23 17:17                                                                               ` Martin Jambor
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-22  9:56 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

> > Hi Tejas,
> >
> > [ Please do not top-post. ]

On Thu, Aug 22, 2019 at 01:27:06PM +0530, Tejas Joshi wrote:
> > What happens then?  "It does not work" is very very vague.  At least it
> > seems the compiler does build now?
> 
> Oh, compiler builds but instruction is still "bl fadd". It should be
> "fadds" right?

Yes, but that means the problem is earlier, before it hits RTL perhaps.

Compile with -dap, look at the expand dump (the lowest numbered one, 234
or so), and see what it looked like in the final Gimple, and then in the
RTL generated from that.  And then drill down.

Maybe you don't get what is needed at Gimple level already.  Maybe it is
something simple like a typo in the RTL pattern name.  You'll find out.


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-22  9:56                                                                             ` Segher Boessenkool
@ 2019-08-23 17:17                                                                               ` Martin Jambor
  2019-08-23 19:13                                                                                 ` Segher Boessenkool
  2019-08-24  9:53                                                                                 ` Richard Sandiford
  0 siblings, 2 replies; 63+ messages in thread
From: Martin Jambor @ 2019-08-23 17:17 UTC (permalink / raw)
  To: Segher Boessenkool, Tejas Joshi; +Cc: gcc, hubicka, joseph, Richard Sandiford

Hello,

On Thu, Aug 22 2019, Segher Boessenkool wrote:
>> > Hi Tejas,
>> >
>> > [ Please do not top-post. ]
>
> On Thu, Aug 22, 2019 at 01:27:06PM +0530, Tejas Joshi wrote:
>> > What happens then?  "It does not work" is very very vague.  At least it
>> > seems the compiler does build now?
>> 
>> Oh, compiler builds but instruction is still "bl fadd". It should be
>> "fadds" right?
>
> Yes, but that means the problem is earlier, before it hits RTL perhaps.
>
> Compile with -dap, look at the expand dump (the lowest numbered one, 234
> or so), and see what it looked like in the final Gimple, and then in the
> RTL generated from that.  And then drill down.
>

Tejas sent me his patch and I looked at why it did not work.  I found
two reasons:

1. associated_internal_fn (in builtins.c) does not handle
   DEF_INTERNAL_OPTAB_FN kind of internal functions, and Tejas
   (sensibly, I'd say) used that macro to define the internal function.
   But when I worked around that by manually adding a case for it in the
   switch statement, I ran into an assert because...

2. direct_internal_fn_supported_p on which replacement_internal_fn
   depends to expand built-ins as internal functions cannot handle
   conversion optabs... and narrowing is a kind of conversion and the
   optab is added as such with OPTAB_CD.

Actually, the second statement is not entirely true because somehow it
can handle optab while_ult which is a conversion optab but a) the way it
is handled, if I can understand it at all, seems to be a big hack and
would be even worse if we decided to copy that for all narrowing math
functions and b) it gets both modes from argument types whereas we need
one from the result type and so we would have to rewrite
replacement_internal_fn anyway.

Therefore, at least for now (GSoC deadline is kind of looming), I
decided that the best way forward would be to not rely on internal
functions but plug into expand_builtin() and I wrote the following,
lightly tested patch - which of course misses testcases and stuff - but
I'd be curious about any feedback now anyway.  When I proposed a very
similar approach for the roundeven x86_64 expansion, Uros actually then
opted for a solution based on internal functions, so I am curious
whether there are simple alternatives I do not see.

Tejas, of course cases for other fadd variants should at least be added
to expand_builtin.

Thanks,

Martin


2019-08-23  Tejas Joshi  <tejasjoshi9673@gmail.com>
	    Martin Jambor  <mjambor@suse.cz>

	* builtins.c (expand_builtin_binary_conversion): New function.
	  (expand_builtin): Call it.
	* config/rs6000/rs6000.md (unspec): Add UNSPEC_ADD_NARROWING.
	(add_truncdfsf3): New define_insn.
	* optabs.def (fadd_optab): New.


---
 gcc/builtins.c              | 55 +++++++++++++++++++++++++++++++++++++
 gcc/config/rs6000/rs6000.md | 13 +++++++++
 gcc/internal-fn.def         |  2 ++
 gcc/optabs.def              |  1 +
 4 files changed, 71 insertions(+)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 9a766e4ad63..a9bf5710834 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2935,6 +2935,54 @@ expand_builtin_powi (tree exp, rtx target)
   return target;
 }
 
+/* Attempt to expand a builtin function call EXP which performs a binary
+   operation on its floating point arguments and then converts the result into
+   a different floating point format.  The operation in question is specified
+   in OP_OPTAB.  Return NULL if the attempt failed.  SUBTARGET may be used as
+   the target for computing the operand of EXP.  */
+
+static rtx
+expand_builtin_binary_conversion (tree exp, rtx target, rtx subtarget,
+				  optab op_optab)
+{
+  if (TREE_CODE (TREE_TYPE (exp)) != REAL_TYPE
+      || !validate_arglist (exp, REAL_TYPE, REAL_TYPE, VOID_TYPE))
+    return NULL_RTX;
+
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  gcc_assert (TYPE_MAIN_VARIANT (TREE_TYPE (arg0))
+	      == TYPE_MAIN_VARIANT (TREE_TYPE (arg1)));
+  machine_mode arg_mode = TYPE_MODE (TREE_TYPE (arg1));
+  machine_mode res_mode = TYPE_MODE (TREE_TYPE (exp));
+
+  insn_code icode = convert_optab_handler (op_optab, res_mode, arg_mode);
+  if (icode == CODE_FOR_nothing)
+    return NULL_RTX;
+
+  /* Wrap the computation of the arguments in a SAVE_EXPR, as we may
+     need to expand the argument again.  This way, we will not perform
+     side-effects more the once.  */
+  CALL_EXPR_ARG (exp, 0) = arg0 = builtin_save_expr (arg0);
+  CALL_EXPR_ARG (exp, 1) = arg1 = builtin_save_expr (arg1);
+
+  rtx op0 = expand_expr (arg0, subtarget, VOIDmode, EXPAND_NORMAL);
+  rtx op1 = expand_expr (arg1, subtarget, VOIDmode, EXPAND_NORMAL);
+
+  struct expand_operand ops[3];
+  create_output_operand (&ops[0], target, res_mode);
+  create_input_operand (&ops[1], op0, arg_mode);
+  create_input_operand (&ops[2], op1, arg_mode);
+  rtx_insn *pat = maybe_gen_insn (icode, 3, ops);
+  if (pat)
+    {
+      emit_insn (pat);
+      return ops[0].value;
+    }
+
+  return NULL_RTX;
+}
+
 /* Expand expression EXP which is a call to the strlen builtin.  Return
    NULL_RTX if we failed and the caller should emit a normal call, otherwise
    try to get the result in TARGET, if convenient.  */
@@ -7392,6 +7440,13 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 	return target;
       break;
 
+    case BUILT_IN_FADD:
+      target = expand_builtin_binary_conversion (exp, target, subtarget,
+						 fadd_optab);
+      if (target)
+	return target;
+      break;
+
     case BUILT_IN_APPLY_ARGS:
       return expand_builtin_apply_args ();
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 9a7a1da987f..b44783a5028 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -89,6 +89,7 @@
    UNSPEC_TLSGOTTPREL
    UNSPEC_TLSTLS
    UNSPEC_FIX_TRUNC_TF		; fadd, rounding towards zero
+   UNSPEC_ADD_NARROWING		; fadd, narrow down to return type
    UNSPEC_STFIWX
    UNSPEC_POPCNTB
    UNSPEC_FRES
@@ -4653,6 +4654,18 @@
   [(set_attr "type" "fp")
    (set_attr "isa" "*,<Fisa>")])
 
+(define_insn "add_truncdfsf3"
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
+	(unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%d,wa")
+		    (match_operand:DF 2 "gpc_reg_operand" "d,wa")]
+		     UNSPEC_ADD_NARROWING))]
+  "TARGET_HARD_FLOAT"
+  "@
+   fadds %0,%1,%2
+   xsaddsp %x0,%x1,%x2"
+  [(set_attr "type" "fp")
+   (set_attr "isa" "*,p8v")])
+
 (define_expand "sub<mode>3"
   [(set (match_operand:SFDF 0 "gpc_reg_operand")
 	(minus:SFDF (match_operand:SFDF 1 "gpc_reg_operand")
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 9461693bcd1..3f56880c23f 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -140,6 +140,8 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
 DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW,
 		       vec_shl_insert, binary)
 
+DEF_INTERNAL_OPTAB_FN (FADD, ECF_CONST, fadd, binary)
+
 DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 5283e6753f2..209369e9da1 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -67,6 +67,7 @@ OPTAB_CD(sfixtrunc_optab, "fix_trunc$F$b$I$a2")
 OPTAB_CD(ufixtrunc_optab, "fixuns_trunc$F$b$I$a2")
 
 /* Misc optabs that use two modes; model them as "conversions".  */
+OPTAB_CD(fadd_optab, "add_trunc$b$a3")
 OPTAB_CD(smul_widen_optab, "mul$b$a3")
 OPTAB_CD(umul_widen_optab, "umul$b$a3")
 OPTAB_CD(usmul_widen_optab, "usmul$b$a3")
-- 
2.22.0

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-23 17:17                                                                               ` Martin Jambor
@ 2019-08-23 19:13                                                                                 ` Segher Boessenkool
  2019-08-24  9:53                                                                                 ` Richard Sandiford
  1 sibling, 0 replies; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-23 19:13 UTC (permalink / raw)
  To: Martin Jambor; +Cc: Tejas Joshi, gcc, hubicka, joseph, Richard Sandiford

Hi!

On Fri, Aug 23, 2019 at 07:16:59PM +0200, Martin Jambor wrote:
> Therefore, at least for now (GSoC deadline is kind of looming), I
> decided that the best way forward would be to not rely on internal
> functions but plug into expand_builtin() and I wrote the following,
> lightly tested patch - which of course misses testcases and stuff - but
> I'd be curious about any feedback now anyway.  When I proposed a very
> similar approach for the roundeven x86_64 expansion, Uros actually then
> opted for a solution based on internal functions, so I am curious
> whether there are simple alternatives I do not see.
> 
> Tejas, of course cases for other fadd variants should at least be added
> to expand_builtin.

Looks good for the rs6000 part, thanks!  Some trivialities:

> 	* builtins.c (expand_builtin_binary_conversion): New function.
> 	  (expand_builtin): Call it.

(Wrong indentation, should be just one tab, no extra spaces).

> +(define_insn "add_truncdfsf3"
> +  [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
> +	(unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%d,wa")
> +		    (match_operand:DF 2 "gpc_reg_operand" "d,wa")]
> +		     UNSPEC_ADD_NARROWING))]

Please align the U with the preceding [.


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-23 17:17                                                                               ` Martin Jambor
  2019-08-23 19:13                                                                                 ` Segher Boessenkool
@ 2019-08-24  9:53                                                                                 ` Richard Sandiford
  2019-08-25 13:55                                                                                   ` Tejas Joshi
  2019-08-26 13:23                                                                                   ` Martin Jambor
  1 sibling, 2 replies; 63+ messages in thread
From: Richard Sandiford @ 2019-08-24  9:53 UTC (permalink / raw)
  To: Martin Jambor; +Cc: Segher Boessenkool, Tejas Joshi, gcc, hubicka, joseph

Martin Jambor <mjambor@suse.cz> writes:
> Hello,
>
> On Thu, Aug 22 2019, Segher Boessenkool wrote:
>>> > Hi Tejas,
>>> >
>>> > [ Please do not top-post. ]
>>
>> On Thu, Aug 22, 2019 at 01:27:06PM +0530, Tejas Joshi wrote:
>>> > What happens then?  "It does not work" is very very vague.  At least it
>>> > seems the compiler does build now?
>>> 
>>> Oh, compiler builds but instruction is still "bl fadd". It should be
>>> "fadds" right?
>>
>> Yes, but that means the problem is earlier, before it hits RTL perhaps.
>>
>> Compile with -dap, look at the expand dump (the lowest numbered one, 234
>> or so), and see what it looked like in the final Gimple, and then in the
>> RTL generated from that.  And then drill down.
>>
>
> Tejas sent me his patch and I looked at why it did not work.  I found
> two reasons:
>
> 1. associated_internal_fn (in builtins.c) does not handle
>    DEF_INTERNAL_OPTAB_FN kind of internal functions, and Tejas
>    (sensibly, I'd say) used that macro to define the internal function.
>    But when I worked around that by manually adding a case for it in the
>    switch statement, I ran into an assert because...
>
> 2. direct_internal_fn_supported_p on which replacement_internal_fn
>    depends to expand built-ins as internal functions cannot handle
>    conversion optabs... and narrowing is a kind of conversion and the
>    optab is added as such with OPTAB_CD.
>
> Actually, the second statement is not entirely true because somehow it
> can handle optab while_ult which is a conversion optab but a) the way it
> is handled, if I can understand it at all, seems to be a big hack and
> would be even worse if we decided to copy that for all narrowing math
> functions

Think "big hack" is a bit unfair.  The way that the internal function
maps argument types to the optab modes, and the way it expands calls
into rtl, depends on the "optab type" argument (the final argument to
DEF_INTERNAL_OPTAB_FN).  This is relatively flexible in that it can use
a single-mode "direct" optab or a dual-mode "conversion" optab, with the
modes coming from whichever arguments are appropriate.  New optab types
can be added as needed.

FWIW, several other DEF_INTERNAL_OPTAB_FNs are conversion optabs too
(e.g. IFN_LOAD_LANES, IFN_STORE_LANES, IFN_MASK_LOAD, etc.).

But...

> and b) it gets both modes from argument types whereas we need one from
> the result type and so we would have to rewrite
> replacement_internal_fn anyway.

...yeah, I agree this breaks the current model.  The reason IFN_WHILE_ULT
doesn't rely on the return type is that if you have:

  _2 = .WHILE_ULT (_0, _1) // returning a vector of 4 booleans
  _3 = .WHILE_ULT (_0, _1) // returning a vector of 8 booleans

then the calls look equivalent.  So instead we pass an extra argument
indicating the required boolean vector "shape".

The same "problem" could in principle apply to FADD if we ever needed
to support double+double->_Float16 for example.

> Therefore, at least for now (GSoC deadline is kind of looming), I
> decided that the best way forward would be to not rely on internal
> functions but plug into expand_builtin() and I wrote the following,
> lightly tested patch - which of course misses testcases and stuff - but
> I'd be curious about any feedback now anyway.  When I proposed a very
> similar approach for the roundeven x86_64 expansion, Uros actually then
> opted for a solution based on internal functions, so I am curious
> whether there are simple alternatives I do not see.
>
> Tejas, of course cases for other fadd variants should at least be added
> to expand_builtin.
>
> Thanks,
>
> Martin
>
>
> 2019-08-23  Tejas Joshi  <tejasjoshi9673@gmail.com>
> 	    Martin Jambor  <mjambor@suse.cz>
>
> 	* builtins.c (expand_builtin_binary_conversion): New function.
> 	  (expand_builtin): Call it.
> 	* config/rs6000/rs6000.md (unspec): Add UNSPEC_ADD_NARROWING.
> 	(add_truncdfsf3): New define_insn.
> 	* optabs.def (fadd_optab): New.
>
> [...]
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 9461693bcd1..3f56880c23f 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -140,6 +140,8 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
>  DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW,
>  		       vec_shl_insert, binary)
>  
> +DEF_INTERNAL_OPTAB_FN (FADD, ECF_CONST, fadd, binary)
> +
>  DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
>  DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
>  DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)

Should be dropped now.

OK with that change and the ones Segher asked for.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-24  9:53                                                                                 ` Richard Sandiford
@ 2019-08-25 13:55                                                                                   ` Tejas Joshi
  2019-08-25 16:47                                                                                     ` Segher Boessenkool
  2019-08-26 13:23                                                                                   ` Martin Jambor
  1 sibling, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-08-25 13:55 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph

Hello.
>
> Similarly addtfsf3 that multiplies TFmode and produces an SFmode result, and so on.

I want to extend this patch for FADDL and DADDL. What operand
constraints should I use for TFmode alongside "f"?

> In cases where long double and double have the same mode,
>the daddl function should use the existing adddf3 pattern.

So, should I use adddf3 for DADDL directly? How would I map the
add<mode>3 optab with DADDL?

Thanks,
Tejas


On Sat, 24 Aug 2019 at 15:23, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Martin Jambor <mjambor@suse.cz> writes:
> > Hello,
> >
> > On Thu, Aug 22 2019, Segher Boessenkool wrote:
> >>> > Hi Tejas,
> >>> >
> >>> > [ Please do not top-post. ]
> >>
> >> On Thu, Aug 22, 2019 at 01:27:06PM +0530, Tejas Joshi wrote:
> >>> > What happens then?  "It does not work" is very very vague.  At least it
> >>> > seems the compiler does build now?
> >>>
> >>> Oh, compiler builds but instruction is still "bl fadd". It should be
> >>> "fadds" right?
> >>
> >> Yes, but that means the problem is earlier, before it hits RTL perhaps.
> >>
> >> Compile with -dap, look at the expand dump (the lowest numbered one, 234
> >> or so), and see what it looked like in the final Gimple, and then in the
> >> RTL generated from that.  And then drill down.
> >>
> >
> > Tejas sent me his patch and I looked at why it did not work.  I found
> > two reasons:
> >
> > 1. associated_internal_fn (in builtins.c) does not handle
> >    DEF_INTERNAL_OPTAB_FN kind of internal functions, and Tejas
> >    (sensibly, I'd say) used that macro to define the internal function.
> >    But when I worked around that by manually adding a case for it in the
> >    switch statement, I ran into an assert because...
> >
> > 2. direct_internal_fn_supported_p on which replacement_internal_fn
> >    depends to expand built-ins as internal functions cannot handle
> >    conversion optabs... and narrowing is a kind of conversion and the
> >    optab is added as such with OPTAB_CD.
> >
> > Actually, the second statement is not entirely true because somehow it
> > can handle optab while_ult which is a conversion optab but a) the way it
> > is handled, if I can understand it at all, seems to be a big hack and
> > would be even worse if we decided to copy that for all narrowing math
> > functions
>
> Think "big hack" is a bit unfair.  The way that the internal function
> maps argument types to the optab modes, and the way it expands calls
> into rtl, depends on the "optab type" argument (the final argument to
> DEF_INTERNAL_OPTAB_FN).  This is relatively flexible in that it can use
> a single-mode "direct" optab or a dual-mode "conversion" optab, with the
> modes coming from whichever arguments are appropriate.  New optab types
> can be added as needed.
>
> FWIW, several other DEF_INTERNAL_OPTAB_FNs are conversion optabs too
> (e.g. IFN_LOAD_LANES, IFN_STORE_LANES, IFN_MASK_LOAD, etc.).
>
> But...
>
> > and b) it gets both modes from argument types whereas we need one from
> > the result type and so we would have to rewrite
> > replacement_internal_fn anyway.
>
> ...yeah, I agree this breaks the current model.  The reason IFN_WHILE_ULT
> doesn't rely on the return type is that if you have:
>
>   _2 = .WHILE_ULT (_0, _1) // returning a vector of 4 booleans
>   _3 = .WHILE_ULT (_0, _1) // returning a vector of 8 booleans
>
> then the calls look equivalent.  So instead we pass an extra argument
> indicating the required boolean vector "shape".
>
> The same "problem" could in principle apply to FADD if we ever needed
> to support double+double->_Float16 for example.
>
> > Therefore, at least for now (GSoC deadline is kind of looming), I
> > decided that the best way forward would be to not rely on internal
> > functions but plug into expand_builtin() and I wrote the following,
> > lightly tested patch - which of course misses testcases and stuff - but
> > I'd be curious about any feedback now anyway.  When I proposed a very
> > similar approach for the roundeven x86_64 expansion, Uros actually then
> > opted for a solution based on internal functions, so I am curious
> > whether there are simple alternatives I do not see.
> >
> > Tejas, of course cases for other fadd variants should at least be added
> > to expand_builtin.
> >
> > Thanks,
> >
> > Martin
> >
> >
> > 2019-08-23  Tejas Joshi  <tejasjoshi9673@gmail.com>
> >           Martin Jambor  <mjambor@suse.cz>
> >
> >       * builtins.c (expand_builtin_binary_conversion): New function.
> >         (expand_builtin): Call it.
> >       * config/rs6000/rs6000.md (unspec): Add UNSPEC_ADD_NARROWING.
> >       (add_truncdfsf3): New define_insn.
> >       * optabs.def (fadd_optab): New.
> >
> > [...]
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 9461693bcd1..3f56880c23f 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -140,6 +140,8 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
> >  DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW,
> >                      vec_shl_insert, binary)
> >
> > +DEF_INTERNAL_OPTAB_FN (FADD, ECF_CONST, fadd, binary)
> > +
> >  DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
> >  DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
> >  DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
>
> Should be dropped now.
>
> OK with that change and the ones Segher asked for.
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-25 13:55                                                                                   ` Tejas Joshi
@ 2019-08-25 16:47                                                                                     ` Segher Boessenkool
  2019-08-26  7:07                                                                                       ` Tejas Joshi
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-25 16:47 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

[ Please don't top-post ]

On Sun, Aug 25, 2019 at 07:32:01PM +0530, Tejas Joshi wrote:
> I want to extend this patch for FADDL and DADDL. What operand
> constraints should I use for TFmode alongside "f"?

It depends on the instruction you use, and what registers that then
works on.  GPRs get "r", FPRs get "f" for SFmode but "d" otherwise, the
VRs get "v", if all VSRs are allowed you get "wa".  And there are some
mode attributes to go with mode iterators for when you handle multiple
modes (which you always do, you need to handle KF as well).

What machine insns do you want to generate?  There most likely is
something a lot like it already, so take that as example?

> > In cases where long double and double have the same mode,
> >the daddl function should use the existing adddf3 pattern.

Sure, that probably should be handled in generic code (not rs6000).
Where it would generate an adddfdf2 it should just do an adddf3.

> So, should I use adddf3 for DADDL directly? How would I map the
> add<mode>3 optab with DADDL?

Simply check if source and target mode are the same?

Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-25 16:47                                                                                     ` Segher Boessenkool
@ 2019-08-26  7:07                                                                                       ` Tejas Joshi
  2019-08-26  7:42                                                                                         ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-08-26  7:07 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph

Hello.
Sorry for not being clear. I am confused about some modes here. I
meant, just as we expanded fadd (which narrows down from double to
float) with add_truncdfsf3, how can I expand faddl (which narrows down
long double to float). Wouldn't I require TFmode -> SFmode as
add_trunctfsf3 just as Joseph had previously mentioned? And if yes,
the operand constraints would still be f,d and d for TF->SF or what?
Also, just as we generated fadds/xsaddsp instructions for fadd, would
I be generating the same ones for faddl and fadd/xsadddp for daddl
(long double to double) or something different? all for ISA 2.07. (for
ISA 3.0, I might use IEEE128/FLOAT128 round-to-odd instructions like
add<mode>_odd followed by conversion to narrower?)

Thanks,
Tejas

On Sun, 25 Aug 2019 at 22:17, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> [ Please don't top-post ]
>
> On Sun, Aug 25, 2019 at 07:32:01PM +0530, Tejas Joshi wrote:
> > I want to extend this patch for FADDL and DADDL. What operand
> > constraints should I use for TFmode alongside "f"?
>
> It depends on the instruction you use, and what registers that then
> works on.  GPRs get "r", FPRs get "f" for SFmode but "d" otherwise, the
> VRs get "v", if all VSRs are allowed you get "wa".  And there are some
> mode attributes to go with mode iterators for when you handle multiple
> modes (which you always do, you need to handle KF as well).
>
> What machine insns do you want to generate?  There most likely is
> something a lot like it already, so take that as example?
>
> > > In cases where long double and double have the same mode,
> > >the daddl function should use the existing adddf3 pattern.
>
> Sure, that probably should be handled in generic code (not rs6000).
> Where it would generate an adddfdf2 it should just do an adddf3.
>
> > So, should I use adddf3 for DADDL directly? How would I map the
> > add<mode>3 optab with DADDL?
>
> Simply check if source and target mode are the same?
>
>
> Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-26  7:07                                                                                       ` Tejas Joshi
@ 2019-08-26  7:42                                                                                         ` Segher Boessenkool
  2019-08-30 19:12                                                                                           ` Tejas Joshi
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-26  7:42 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

> > [ Please don't top-post ]

On Mon, Aug 26, 2019 at 12:43:44PM +0530, Tejas Joshi wrote:
> Sorry for not being clear. I am confused about some modes here. I
> meant, just as we expanded fadd (which narrows down from double to
> float) with add_truncdfsf3, how can I expand faddl (which narrows down
> long double to float). Wouldn't I require TFmode -> SFmode as
> add_trunctfsf3 just as Joseph had previously mentioned?

Yes, you need an addsfkf2 as well as adddfkf2 (and tf variants of those,
there are iterators for that).

KF is IEEE QP float.  TF is whatever long double maps to, IEEE QP or
double-double.

> And if yes,
> the operand constraints would still be f,d and d for TF->SF or what?

SF is "f".  KF does not fit in "d".

You won't need constraints anyway.  There already is add<mode>3_odd and
you can just use that, in a new defione_expand you make.  For example,
for DP you need two insns: xsaddqpo followed by xscvqpdp.  The second
of those is the existing insn pattern trunc<mode>df2_hw, so you just get
something like

(define_expand "adddfkf2"
  [(set (match_operand:DF 0 "gpc_reg_operand")
        (unspec:DF [(match_operand:IEEE128 1 "gpc_reg_operand")
		    (match_operand:IEEE128 2 "gpc_reg_operand")]
		   UNSPEC_DUNNO_MENTION_DF_SOMEHOW))]
  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
{
  rtx tmp = gen_reg_rtx (<MODE>mode);
  emit_insn (gen_add<mode>3_odd (tmp, operands[1])))), operands[2]);
  emit_insn (trunc<mode>df2_hw (operands[0], tmp));
  DONE;
})

(not tested at all, be careful :-) )

> Also, just as we generated fadds/xsaddsp instructions for fadd, would
> I be generating the same ones for faddl and fadd/xsadddp for daddl
> (long double to double) or something different? all for ISA 2.07. (for
> ISA 3.0, I might use IEEE128/FLOAT128 round-to-odd instructions like
> add<mode>_odd followed by conversion to narrower?)

For ISA 2.07 (Power 8) you don't have IEEE128 at all, not in hardware
that is.  I don't know if we'll want fadd support in the emulation
libraries ever; don't worry about it for now, anyway.

"long double is double" you should probably handle in generic code.
"long double is double-double", well, fadd cannot really be done better
than an add followed by a conversion in that case?  Which boils down
to truncating the inputs to double, and then doing whatever you would
do for IEEE DP float.

Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-24  9:53                                                                                 ` Richard Sandiford
  2019-08-25 13:55                                                                                   ` Tejas Joshi
@ 2019-08-26 13:23                                                                                   ` Martin Jambor
  1 sibling, 0 replies; 63+ messages in thread
From: Martin Jambor @ 2019-08-26 13:23 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Segher Boessenkool, Tejas Joshi, gcc, hubicka, joseph

Hi,

On Sat, Aug 24 2019, Richard Sandiford wrote:
> Martin Jambor <mjambor@suse.cz> writes:

...

>>
>> 2. direct_internal_fn_supported_p on which replacement_internal_fn
>>    depends to expand built-ins as internal functions cannot handle
>>    conversion optabs... and narrowing is a kind of conversion and the
>>    optab is added as such with OPTAB_CD.
>>
>> Actually, the second statement is not entirely true because somehow it
>> can handle optab while_ult which is a conversion optab but a) the way it
>> is handled, if I can understand it at all, seems to be a big hack and
>> would be even worse if we decided to copy that for all narrowing math
>> functions
>
> Think "big hack" is a bit unfair.  The way that the internal function
> maps argument types to the optab modes, and the way it expands calls
> into rtl, depends on the "optab type" argument (the final argument to
> DEF_INTERNAL_OPTAB_FN).  This is relatively flexible in that it can use
> a single-mode "direct" optab or a dual-mode "conversion" optab, with the
> modes coming from whichever arguments are appropriate.  New optab types
> can be added as needed.

My apologies. I guess I should have been more careful with my choice of
words when perhaps I did not understand all aspects but when I saw:

#define direct_while_optab_supported_p convert_optab_supported_p

(and when I saw expand_while_optab_fn defined normally while all(?)
other were constructed in an elaborate macro), I thought that I did not
want to replicate the mechanism, not for a number of functions.

>
> FWIW, several other DEF_INTERNAL_OPTAB_FNs are conversion optabs too
> (e.g. IFN_LOAD_LANES, IFN_STORE_LANES, IFN_MASK_LOAD, etc.).
>
> But...
>
>> and b) it gets both modes from argument types whereas we need one from
>> the result type and so we would have to rewrite
>> replacement_internal_fn anyway.
>
> ...yeah, I agree this breaks the current model.  The reason IFN_WHILE_ULT
> doesn't rely on the return type is that if you have:
>
>   _2 = .WHILE_ULT (_0, _1) // returning a vector of 4 booleans
>   _3 = .WHILE_ULT (_0, _1) // returning a vector of 8 booleans
>
> then the calls look equivalent.  So instead we pass an extra argument
> indicating the required boolean vector "shape".
>
> The same "problem" could in principle apply to FADD if we ever needed
> to support double+double->_Float16 for example.

Right.  I hope not going through an internal function is acceptable.  If
not, we'll have to teach this builtin->internal_funcxtion->optab
mechanism about conversions.

Thanks,

Martin


>
>> Therefore, at least for now (GSoC deadline is kind of looming), I
>> decided that the best way forward would be to not rely on internal
>> functions but plug into expand_builtin() and I wrote the following,
>> lightly tested patch - which of course misses testcases and stuff - but
>> I'd be curious about any feedback now anyway.  When I proposed a very
>> similar approach for the roundeven x86_64 expansion, Uros actually then
>> opted for a solution based on internal functions, so I am curious
>> whether there are simple alternatives I do not see.
>>
>> Tejas, of course cases for other fadd variants should at least be added
>> to expand_builtin.
>>
>> Thanks,
>>
>> Martin
>>
>>
>> 2019-08-23  Tejas Joshi  <tejasjoshi9673@gmail.com>
>> 	    Martin Jambor  <mjambor@suse.cz>
>>
>> 	* builtins.c (expand_builtin_binary_conversion): New function.
>> 	  (expand_builtin): Call it.
>> 	* config/rs6000/rs6000.md (unspec): Add UNSPEC_ADD_NARROWING.
>> 	(add_truncdfsf3): New define_insn.
>> 	* optabs.def (fadd_optab): New.
>>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-26  7:42                                                                                         ` Segher Boessenkool
@ 2019-08-30 19:12                                                                                           ` Tejas Joshi
  2019-08-30 20:35                                                                                             ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-08-30 19:12 UTC (permalink / raw)
  To: gcc; +Cc: segher, joseph, Martin Jambor

Hello.

> For ISA 2.07 (Power 8) you don't have IEEE128 at all, not in hardware
> that is.  I don't know if we'll want fadd support in the emulation
> libraries ever; don't worry about it for now, anyway.

What instructions would need to be expanded for FADDL (long double to
float) and DADDL (long double to double) on power8 (ISA 2.07) and
power9 (ISA 3.0) respectively, along with VSX? (Just as we expanded
FADD to fadds and xsaddsp for vsx).

Thanks,
Tejas


On Mon, 26 Aug 2019 at 13:12, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> > > [ Please don't top-post ]
>
> On Mon, Aug 26, 2019 at 12:43:44PM +0530, Tejas Joshi wrote:
> > Sorry for not being clear. I am confused about some modes here. I
> > meant, just as we expanded fadd (which narrows down from double to
> > float) with add_truncdfsf3, how can I expand faddl (which narrows down
> > long double to float). Wouldn't I require TFmode -> SFmode as
> > add_trunctfsf3 just as Joseph had previously mentioned?
>
> Yes, you need an addsfkf2 as well as adddfkf2 (and tf variants of those,
> there are iterators for that).
>
> KF is IEEE QP float.  TF is whatever long double maps to, IEEE QP or
> double-double.
>
> > And if yes,
> > the operand constraints would still be f,d and d for TF->SF or what?
>
> SF is "f".  KF does not fit in "d".
>
> You won't need constraints anyway.  There already is add<mode>3_odd and
> you can just use that, in a new defione_expand you make.  For example,
> for DP you need two insns: xsaddqpo followed by xscvqpdp.  The second
> of those is the existing insn pattern trunc<mode>df2_hw, so you just get
> something like
>
> (define_expand "adddfkf2"
>   [(set (match_operand:DF 0 "gpc_reg_operand")
>         (unspec:DF [(match_operand:IEEE128 1 "gpc_reg_operand")
>                     (match_operand:IEEE128 2 "gpc_reg_operand")]
>                    UNSPEC_DUNNO_MENTION_DF_SOMEHOW))]
>   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
> {
>   rtx tmp = gen_reg_rtx (<MODE>mode);
>   emit_insn (gen_add<mode>3_odd (tmp, operands[1])))), operands[2]);
>   emit_insn (trunc<mode>df2_hw (operands[0], tmp));
>   DONE;
> })
>
> (not tested at all, be careful :-) )
>
> > Also, just as we generated fadds/xsaddsp instructions for fadd, would
> > I be generating the same ones for faddl and fadd/xsadddp for daddl
> > (long double to double) or something different? all for ISA 2.07. (for
> > ISA 3.0, I might use IEEE128/FLOAT128 round-to-odd instructions like
> > add<mode>_odd followed by conversion to narrower?)
>
> For ISA 2.07 (Power 8) you don't have IEEE128 at all, not in hardware
> that is.  I don't know if we'll want fadd support in the emulation
> libraries ever; don't worry about it for now, anyway.
>
> "long double is double" you should probably handle in generic code.
> "long double is double-double", well, fadd cannot really be done better
> than an add followed by a conversion in that case?  Which boils down
> to truncating the inputs to double, and then doing whatever you would
> do for IEEE DP float.
>
>
> Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-30 19:12                                                                                           ` Tejas Joshi
@ 2019-08-30 20:35                                                                                             ` Segher Boessenkool
  2019-09-02  3:19                                                                                               ` Tejas Joshi
  0 siblings, 1 reply; 63+ messages in thread
From: Segher Boessenkool @ 2019-08-30 20:35 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, joseph, Martin Jambor

> > > > [ Please don't top-post ]

(I delete everything under your signature, without looking, assuming you
just forgot to).

On Sat, Aug 31, 2019 at 12:48:42AM +0530, Tejas Joshi wrote:
> > For ISA 2.07 (Power 8) you don't have IEEE128 at all, not in hardware
> > that is.  I don't know if we'll want fadd support in the emulation
> > libraries ever; don't worry about it for now, anyway.
> 
> What instructions would need to be expanded for FADDL (long double to
> float) and DADDL (long double to double) on power8 (ISA 2.07) and
> power9 (ISA 3.0) respectively, along with VSX? (Just as we expanded
> FADD to fadds and xsaddsp for vsx).

If long double is double, faddl is the same as fadd, and daddl is just
normal addition.

If long double is double-double, faddl can be done as fadd on the first
double precision component of both args, and daddl is just normal addition
of those.

If long double is IEEE QP, then it is more interesting :-)

daddl is
  xsaddqpo # add qp, with round to odd
  xscvqpdp # convert qp to dp

faddl is
  xsaddqpo  # add qp, with round to odd
  xscvqpdpo # convert qp to dp, with round to odd
  xsrsp     # convert dp to sp
    (single precision numbers are stored in double precision format, but
     this is rounded as single precision)

fadds is
  fadds ;  :-)
    or
  xsaddsp

Both faddl and daddl are the sequences for Power9.  There are no instructions
for QP format on Power8; see libgcc/config/rs6000/t-float128 for how support
for the emulation QP math is built, if you are interested.


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-08-30 20:35                                                                                             ` Segher Boessenkool
@ 2019-09-02  3:19                                                                                               ` Tejas Joshi
  2019-09-02 11:30                                                                                                 ` Segher Boessenkool
  0 siblings, 1 reply; 63+ messages in thread
From: Tejas Joshi @ 2019-09-02  3:19 UTC (permalink / raw)
  To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph

On Sat, 31 Aug 2019 at 02:05, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> > > > > [ Please don't top-post ]
>
> (I delete everything under your signature, without looking, assuming you
> just forgot to).

Oh sorry, I didn't know the reply button does evil things. :-)

> If long double is double, faddl is the same as fadd, and daddl is just
> normal addition.
>
> If long double is double-double, faddl can be done as fadd on the first
> double precision component of both args, and daddl is just normal addition
> of those.
>
> If long double is IEEE QP, then it is more interesting :-)

On what conditions does the mapping of long double to double/
double-double or IEEE QP changes or depends, so that I can test it.

Thanks,
Tejas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Expansion of narrowing math built-ins into power instructions
  2019-09-02  3:19                                                                                               ` Tejas Joshi
@ 2019-09-02 11:30                                                                                                 ` Segher Boessenkool
  0 siblings, 0 replies; 63+ messages in thread
From: Segher Boessenkool @ 2019-09-02 11:30 UTC (permalink / raw)
  To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph

Hi Tejas,

On Mon, Sep 02, 2019 at 08:55:28AM +0530, Tejas Joshi wrote:
> On Sat, 31 Aug 2019 at 02:05, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > > > > > [ Please don't top-post ]
> >
> > (I delete everything under your signature, without looking, assuming you
> > just forgot to).
> 
> Oh sorry, I didn't know the reply button does evil things. :-)

You're supposed to write your email as if the time reading it is more
valuable than the time writing it.  You can afford to spend a few seconds
deleting some stuff, or checking that what you wrote is good, etc.  There
is only one you, and there are many people reading this.

> > If long double is double, faddl is the same as fadd, and daddl is just
> > normal addition.
> >
> > If long double is double-double, faddl can be done as fadd on the first
> > double precision component of both args, and daddl is just normal addition
> > of those.
> >
> > If long double is IEEE QP, then it is more interesting :-)
> 
> On what conditions does the mapping of long double to double/
> double-double or IEEE QP changes or depends, so that I can test it.

gcc -Q --help=target | grep long

-mlong-double-64 is to select double, and -mlong-double-128 says to use
something 128 bits.  These are not mentioned in the manual, and GCC often
says it has a different number of bits selected, hrm.  I'll open a PR.

When having a 128-bit long double, -mabi=ibmlongdouble says to use the
double-double format, and -mabi=ieeelongdouble says to use IEEE QP FP.

HTH,


Segher

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2019-09-02 11:30 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-29 17:37 Expansion of narrowing math built-ins into power instructions Martin Jambor
2019-07-29 18:40 ` Segher Boessenkool
2019-07-30 19:47   ` Joseph Myers
2019-07-30  9:20 ` Florian Weimer
2019-07-30 19:49   ` Joseph Myers
2019-07-31  6:47     ` Tejas Joshi
2019-07-31 14:47       ` Segher Boessenkool
2019-08-08 18:39         ` Tejas Joshi
2019-08-08 20:05           ` Segher Boessenkool
2019-08-08 23:09             ` Joseph Myers
2019-08-10 10:24               ` Tejas Joshi
2019-08-10 16:46                 ` Segher Boessenkool
2019-08-11  4:58                   ` Tejas Joshi
2019-08-11  7:20                     ` Segher Boessenkool
2019-08-11 12:46                       ` Tejas Joshi
2019-08-11 16:59                 ` Segher Boessenkool
2019-08-12 17:25                   ` Tejas Joshi
2019-08-12 17:55                     ` Segher Boessenkool
2019-08-12 21:20                       ` Joseph Myers
2019-08-12 21:52                         ` Segher Boessenkool
2019-08-14  6:15                           ` Tejas Joshi
2019-08-14  7:21                             ` Segher Boessenkool
2019-08-14 16:11                               ` Joseph Myers
2019-08-14 20:21                                 ` Segher Boessenkool
2019-08-14 20:23                                   ` Joseph Myers
2019-08-14 21:00                                     ` Segher Boessenkool
2019-08-15  9:52                                       ` Tejas Joshi
2019-08-15 12:47                                         ` Richard Sandiford
2019-08-15 13:55                                           ` Tejas Joshi
2019-08-15 18:45                                           ` Segher Boessenkool
2019-08-16 10:23                                             ` Richard Sandiford
2019-08-17  5:40                                               ` Tejas Joshi
2019-08-17  8:21                                                 ` Richard Sandiford
2019-08-19 10:46                                                   ` Tejas Joshi
2019-08-19 13:07                                                   ` Segher Boessenkool
2019-08-20  7:41                                                     ` Richard Sandiford
2019-08-20 12:11                                                       ` Segher Boessenkool
2019-08-20 12:59                                                         ` Richard Sandiford
2019-08-20 13:46                                                           ` Segher Boessenkool
2019-08-20 14:43                                                             ` Richard Sandiford
2019-08-20 15:12                                                               ` Richard Sandiford
2019-08-20 19:42                                                               ` Segher Boessenkool
2019-08-21 17:20                                                                 ` Tejas Joshi
2019-08-21 18:28                                                                   ` Segher Boessenkool
2019-08-21 19:17                                                                     ` Segher Boessenkool
2019-08-22  3:33                                                                       ` Tejas Joshi
2019-08-22  6:25                                                                         ` Segher Boessenkool
2019-08-22  7:57                                                                           ` Tejas Joshi
2019-08-22  9:56                                                                             ` Segher Boessenkool
2019-08-23 17:17                                                                               ` Martin Jambor
2019-08-23 19:13                                                                                 ` Segher Boessenkool
2019-08-24  9:53                                                                                 ` Richard Sandiford
2019-08-25 13:55                                                                                   ` Tejas Joshi
2019-08-25 16:47                                                                                     ` Segher Boessenkool
2019-08-26  7:07                                                                                       ` Tejas Joshi
2019-08-26  7:42                                                                                         ` Segher Boessenkool
2019-08-30 19:12                                                                                           ` Tejas Joshi
2019-08-30 20:35                                                                                             ` Segher Boessenkool
2019-09-02  3:19                                                                                               ` Tejas Joshi
2019-09-02 11:30                                                                                                 ` Segher Boessenkool
2019-08-26 13:23                                                                                   ` Martin Jambor
2019-08-20 16:04                                                         ` Joseph Myers
2019-08-15 18:54                                         ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).