* Expansion of narrowing math built-ins into power instructions @ 2019-07-29 17:37 Martin Jambor 2019-07-29 18:40 ` Segher Boessenkool 2019-07-30 9:20 ` Florian Weimer 0 siblings, 2 replies; 63+ messages in thread From: Martin Jambor @ 2019-07-29 17:37 UTC (permalink / raw) To: segher; +Cc: Tejas Joshi, Jan Hubicka, Joseph Myers, GCC Mailing List Hi Segher, as you might know, Tejas is our Google Summer of Code student working on adding built-in functions for some new math functions added in ISO/IEC TS 18661. His next step is to expand "functions rounding result to narrower type" (so fadd, fsub and possibly fmul and fdiv described in http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2314.pdf) into ISA instructions on targets that have such instructions. And Joseph suggested when he proposed this project that POWER8 (and I suppose also 9) is one of them. Can you please confirm this and also perhaps point Tejas to the right pieces of power machine description and target code to emulate to implement expansion of these functions? It would be very appreciated, because even though me and Honza are official mentors of the project, we are not very well versed in ppc target. Thanks a lot, Martin ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-07-29 17:37 Expansion of narrowing math built-ins into power instructions Martin Jambor @ 2019-07-29 18:40 ` Segher Boessenkool 2019-07-30 19:47 ` Joseph Myers 2019-07-30 9:20 ` Florian Weimer 1 sibling, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-07-29 18:40 UTC (permalink / raw) To: Martin Jambor; +Cc: Tejas Joshi, Jan Hubicka, Joseph Myers, GCC Mailing List Hi! On Mon, Jul 29, 2019 at 07:37:53PM +0200, Martin Jambor wrote: > as you might know, Tejas is our Google Summer of Code student working on > adding built-in functions for some new math functions added in ISO/IEC > TS 18661. > > His next step is to expand "functions rounding result to narrower type" > (so fadd, fsub and possibly fmul and fdiv described in > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2314.pdf) into ISA > instructions on targets that have such instructions. And Joseph > suggested when he proposed this project that POWER8 (and I suppose also > 9) is one of them. > > Can you please confirm this and also perhaps point Tejas to the right > pieces of power machine description and target code to emulate to > implement expansion of these functions? It would be very appreciated, > because even though me and Honza are official mentors of the project, we > are not very well versed in ppc target. I think this is refering to the "fadds" and similar Power architecture instructions, which take as inputs any single or double precision numbers, and round the result to single precision? These instructions produce a correct result also for double-precision inputs, from ISA 2.07 (POWER8 and later) on. (The result if OE=1 or UE=1 is undefined). (See 4.3.5.1 in the ISA). In GCC (in rs6000.md) we have the "*add<mode>3_fpr" and similar insns, which could be extended to allow DF inputs with an SF output; it doesn't yet allow it. gcc112 is a Power8, and gcc135 is a Power9, and Tejas does have a compile farm account already ;-) Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-07-29 18:40 ` Segher Boessenkool @ 2019-07-30 19:47 ` Joseph Myers 0 siblings, 0 replies; 63+ messages in thread From: Joseph Myers @ 2019-07-30 19:47 UTC (permalink / raw) To: Segher Boessenkool Cc: Martin Jambor, Tejas Joshi, Jan Hubicka, GCC Mailing List On Mon, 29 Jul 2019, Segher Boessenkool wrote: > I think this is refering to the "fadds" and similar Power architecture > instructions, which take as inputs any single or double precision > numbers, and round the result to single precision? These instructions Yes. On Power9, it is *also* possible to do such narrowing operations from IEEE binary128 to binary32 or binary64 format, by first doing the operation on binary128 using one of the "round to odd" instruction variants, then doing a conversion to the narrower format. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-07-29 17:37 Expansion of narrowing math built-ins into power instructions Martin Jambor 2019-07-29 18:40 ` Segher Boessenkool @ 2019-07-30 9:20 ` Florian Weimer 2019-07-30 19:49 ` Joseph Myers 1 sibling, 1 reply; 63+ messages in thread From: Florian Weimer @ 2019-07-30 9:20 UTC (permalink / raw) To: Martin Jambor Cc: segher, Tejas Joshi, Jan Hubicka, Joseph Myers, GCC Mailing List * Martin Jambor: > as you might know, Tejas is our Google Summer of Code student working on > adding built-in functions for some new math functions added in ISO/IEC > TS 18661. > > His next step is to expand "functions rounding result to narrower type" > (so fadd, fsub and possibly fmul and fdiv described in > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2314.pdf) into ISA > instructions on targets that have such instructions. Sorry, this might be a silly question, but: How do you plan to recognize that the fadd/fsub being called is indeed the one from the TS? Thanks, Florian ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-07-30 9:20 ` Florian Weimer @ 2019-07-30 19:49 ` Joseph Myers 2019-07-31 6:47 ` Tejas Joshi 0 siblings, 1 reply; 63+ messages in thread From: Joseph Myers @ 2019-07-30 19:49 UTC (permalink / raw) To: Florian Weimer Cc: Martin Jambor, segher, Tejas Joshi, Jan Hubicka, GCC Mailing List On Tue, 30 Jul 2019, Florian Weimer wrote: > * Martin Jambor: > > > as you might know, Tejas is our Google Summer of Code student working on > > adding built-in functions for some new math functions added in ISO/IEC > > TS 18661. > > > > His next step is to expand "functions rounding result to narrower type" > > (so fadd, fsub and possibly fmul and fdiv described in > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2314.pdf) into ISA > > instructions on targets that have such instructions. > > Sorry, this might be a silly question, but: How do you plan to recognize > that the fadd/fsub being called is indeed the one from the TS? I expect it's the same as any other built-in function: compatible prototype plus appropriate options (-std=gnu*, or -std=c2x in future once we teach GCC that these functions are in C2x) that enable the built-in functions. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-07-30 19:49 ` Joseph Myers @ 2019-07-31 6:47 ` Tejas Joshi 2019-07-31 14:47 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-07-31 6:47 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, segher, joseph Hi, > In GCC (in rs6000.md) we have the "*add<mode>3_fpr" and similar insns, > which could be extended to allow DF inputs with an SF output; it doesn't > yet allow it. Thanks for the inputs, I will try to address these points now. I have built GCC on gcc112 and will apply patch and test testcases there. Tejas On Wed, 31 Jul 2019 at 01:18, Joseph Myers <joseph@codesourcery.com> wrote: > > On Tue, 30 Jul 2019, Florian Weimer wrote: > > > * Martin Jambor: > > > > > as you might know, Tejas is our Google Summer of Code student working on > > > adding built-in functions for some new math functions added in ISO/IEC > > > TS 18661. > > > > > > His next step is to expand "functions rounding result to narrower type" > > > (so fadd, fsub and possibly fmul and fdiv described in > > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2314.pdf) into ISA > > > instructions on targets that have such instructions. > > > > Sorry, this might be a silly question, but: How do you plan to recognize > > that the fadd/fsub being called is indeed the one from the TS? > > I expect it's the same as any other built-in function: compatible > prototype plus appropriate options (-std=gnu*, or -std=c2x in future once > we teach GCC that these functions are in C2x) that enable the built-in > functions. > > -- > Joseph S. Myers > joseph@codesourcery.com ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-07-31 6:47 ` Tejas Joshi @ 2019-07-31 14:47 ` Segher Boessenkool 2019-08-08 18:39 ` Tejas Joshi 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-07-31 14:47 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, joseph On Wed, Jul 31, 2019 at 12:23:18PM +0530, Tejas Joshi wrote: > > In GCC (in rs6000.md) we have the "*add<mode>3_fpr" and similar insns, > > which could be extended to allow DF inputs with an SF output; it doesn't > > yet allow it. > > Thanks for the inputs, I will try to address these points now. I have > built GCC on gcc112 and will apply patch and test testcases there. For the QP float (binary128, KFmode, take your pick) you need Power9 or newer, so gcc135. Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-07-31 14:47 ` Segher Boessenkool @ 2019-08-08 18:39 ` Tejas Joshi 2019-08-08 20:05 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-08-08 18:39 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, joseph, segher Hi. It took some time for me to finish with the folding part for fadd variants and till it is reviewed, I want to move ahead with power8/9 expansions on top of the current fadd patch. > In GCC (in rs6000.md) we have the "*add<mode>3_fpr" and similar insns, > which could be extended to allow DF inputs with an SF output; it doesn't > yet allow it. This might be very lousy but I am confused with the optabs and insn name rn, the comments in obtabs.def says that these patterns are present in md as insn names. How can fadd function be mapped with the "fadd<mode>3_fpr" pattern name? Also, faddl and daddl functions take long double as argument, can they also be expanded on DF to SF mode or only on QP float on power9? I have built GCC and applied my current patches on gcc112 and yes, on gcc135 too. Thanks, Tejas On Wed, 31 Jul 2019 at 20:17, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Wed, Jul 31, 2019 at 12:23:18PM +0530, Tejas Joshi wrote: > > > In GCC (in rs6000.md) we have the "*add<mode>3_fpr" and similar insns, > > > which could be extended to allow DF inputs with an SF output; it doesn't > > > yet allow it. > > > > Thanks for the inputs, I will try to address these points now. I have > > built GCC on gcc112 and will apply patch and test testcases there. > > For the QP float (binary128, KFmode, take your pick) you need Power9 or > newer, so gcc135. > > > Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-08 18:39 ` Tejas Joshi @ 2019-08-08 20:05 ` Segher Boessenkool 2019-08-08 23:09 ` Joseph Myers 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-08 20:05 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph Hi! On Fri, Aug 09, 2019 at 12:14:54AM +0530, Tejas Joshi wrote: > > In GCC (in rs6000.md) we have the "*add<mode>3_fpr" and similar insns, > > which could be extended to allow DF inputs with an SF output; it doesn't > > yet allow it. > > This might be very lousy but I am confused with the optabs and insn > name rn, the comments in obtabs.def says that these patterns are > present in md as insn names. How can fadd function be mapped with the > "fadd<mode>3_fpr" pattern name? The actual name starts with an asterisk, which means as it is it can never be used by name. But, right above this pattern, there is the define_expand named add<mode>3 (for modes SFDF). These current patterns all take the same mode for all inputs and outputs (that's what <mode>3 indicates, say, fadddf3). You will need to define something that takes two SFs in and produces a DF. That cannot really be in this same pattern, it needs a float_extend added (you can do all kinds of trickery, but just adding a few extra patterns is much easier than define_subst and whatnot). > Also, faddl and daddl functions take long double as argument, can they > also be expanded on DF to SF mode or only on QP float on power9? We can have three different long double modes on powerpc: DP float, QP float, or "IBM long double", also known as "double double", which is essentially the sum of two double precision numbers. Types (a source level construct) are not the same as modes (an RTL concept). Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-08 20:05 ` Segher Boessenkool @ 2019-08-08 23:09 ` Joseph Myers 2019-08-10 10:24 ` Tejas Joshi 0 siblings, 1 reply; 63+ messages in thread From: Joseph Myers @ 2019-08-08 23:09 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka On Thu, 8 Aug 2019, Segher Boessenkool wrote: > These current patterns all take the same mode for all inputs and outputs > (that's what <mode>3 indicates, say, fadddf3). You will need to define > something that takes two SFs in and produces a DF. That cannot really For example, md.texi describes standard patterns such as mulhisi3 that multiply two HImode values and produce an SImode result (widening integer multiply). Using a similar naming pattern, you might have a pattern adddfsf3 that multiplies two DFmode values and produces an SFmode result (or you could call it something like add_truncdfsf3 if you wish to emphasise the truncation involved, for example). Similarly addtfsf3 that multiplies TFmode and produces an SFmode result, and so on. Of course these names need documenting (and you need corresponding RTL for them to generate that distinguishes the fused add+truncate from the different RTL for separate addition and truncation with double rounding). In cases where long double and double have the same mode, the daddl function should use the existing adddf3 pattern. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-08 23:09 ` Joseph Myers @ 2019-08-10 10:24 ` Tejas Joshi 2019-08-10 16:46 ` Segher Boessenkool 2019-08-11 16:59 ` Segher Boessenkool 0 siblings, 2 replies; 63+ messages in thread From: Tejas Joshi @ 2019-08-10 10:24 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph [-- Attachment #1: Type: text/plain, Size: 2109 bytes --] Hello. I have been trying to write a basic pattern taking all the suggestions you both have mentioned. The same patch is attached here, but I cannot see call to : float foo (double x, double y) { return __builtin_fadd (x, y); } being expanded to any instruction, at least a simple one, using -fno-builtin-fadd (and also -mhard-float?). It always stays "bl fadd". What am I missing here? > (POWER8 and later) on. (The result if OE=1 or UE=1 is undefined). (See > 4.3.5.1 in the ISA). 4.3.5.1 in the ISA says that single precision arithmetic instructions perform operation in double format and coerces the result in single format. Can fadd be considered as this type of instruction or do I need to perform add in DFmode and then use "instruction provided to explicitly convert double format operand in FPR to single format."? Thanks, Tejas On Fri, 9 Aug 2019 at 04:39, Joseph Myers <joseph@codesourcery.com> wrote: > > On Thu, 8 Aug 2019, Segher Boessenkool wrote: > > > These current patterns all take the same mode for all inputs and outputs > > (that's what <mode>3 indicates, say, fadddf3). You will need to define > > something that takes two SFs in and produces a DF. That cannot really > > For example, md.texi describes standard patterns such as mulhisi3 that > multiply two HImode values and produce an SImode result (widening integer > multiply). > > Using a similar naming pattern, you might have a pattern adddfsf3 that > multiplies two DFmode values and produces an SFmode result (or you could > call it something like add_truncdfsf3 if you wish to emphasise the > truncation involved, for example). Similarly addtfsf3 that multiplies > TFmode and produces an SFmode result, and so on. Of course these names > need documenting (and you need corresponding RTL for them to generate that > distinguishes the fused add+truncate from the different RTL for separate > addition and truncation with double rounding). In cases where long double > and double have the same mode, the daddl function should use the existing > adddf3 pattern. > > -- > Joseph S. Myers > joseph@codesourcery.com [-- Attachment #2: fadd-md.diff --] [-- Type: text/x-patch, Size: 1380 bytes --] diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 4ef1993..e4bfc4a 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4652,6 +4652,21 @@ [(set_attr "type" "fp") (set_attr "isa" "*,<Fisa>")]) +(define_expand "add_truncdfsf3" + [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand")) + (plus:DF (match_operand:DF 1 "gpc_reg_operand") + (match_operand:DF 2 "gpc_reg_operand")))] + "TARGET_HARD_FLOAT" + "") + +(define_insn "*add_truncdfsf3_fpr" + [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand" "=<Ff>")) + (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%<Ff>") + (match_operand:DF 2 "gpc_reg_operand" "<Ff>")))] + "TARGET_HARD_FLOAT" + "fadd %0,%1,%2" + [(set_attr "type" "fp")]) + (define_expand "sub<mode>3" [(set (match_operand:SFDF 0 "gpc_reg_operand") (minus:SFDF (match_operand:SFDF 1 "gpc_reg_operand") diff --git a/gcc/optabs.def b/gcc/optabs.def index 4ffd0f3..45be794 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -67,6 +67,7 @@ OPTAB_CD(sfixtrunc_optab, "fix_trunc$F$b$I$a2") OPTAB_CD(ufixtrunc_optab, "fixuns_trunc$F$b$I$a2") /* Misc optabs that use two modes; model them as "conversions". */ +OPTAB_CD(fadd_optab, "add_trunc$b$a3") OPTAB_CD(smul_widen_optab, "mul$b$a3") OPTAB_CD(umul_widen_optab, "umul$b$a3") OPTAB_CD(usmul_widen_optab, "usmul$b$a3") ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-10 10:24 ` Tejas Joshi @ 2019-08-10 16:46 ` Segher Boessenkool 2019-08-11 4:58 ` Tejas Joshi 2019-08-11 16:59 ` Segher Boessenkool 1 sibling, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-10 16:46 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph Hi! On Sat, Aug 10, 2019 at 04:00:53PM +0530, Tejas Joshi wrote: > I have been trying to write a basic pattern taking all the suggestions > you both have mentioned. The same patch is attached here, but I cannot > see call to : > > float > foo (double x, double y) > { > return __builtin_fadd (x, y); > } > being expanded to any instruction, at least a simple one, using > -fno-builtin-fadd (and also -mhard-float?). It always stays "bl fadd". > What am I missing here? As far as I understand that flag should set the behaviour of the fadd function, not the __builtin_fadd one. So I don't know. > > (POWER8 and later) on. (The result if OE=1 or UE=1 is undefined). (See > > 4.3.5.1 in the ISA). > > 4.3.5.1 in the ISA says that single precision arithmetic instructions > perform operation in double format and coerces the result in single > format. Can fadd be considered as this type of instruction or do I > need to perform add in DFmode and then use "instruction provided to > explicitly convert double format operand in FPR to single format."? A single precision add is "fadds". It rounds its result to single precision. I'm lost what the exact semantic of the wanted fadd() function are. I thought you wanted to add two single precision numbers, producing a double precision one. But instead you want to add two double precision numbers, producing a single precision one? The fadds instruction fits well to that, but you'll have to check exactly how the fadd() function should behave with respect to rounding and exceptions and the like. Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-10 16:46 ` Segher Boessenkool @ 2019-08-11 4:58 ` Tejas Joshi 2019-08-11 7:20 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-08-11 4:58 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph Hi! > As far as I understand that flag should set the behaviour of the fadd > function, not the __builtin_fadd one. So I don't know. According to ISO/IEC TS 18661, I am supposed to implement the fadd variants for folding and expand them inline, that take double and long double as arguments and return addition in appropriate narrower type, float and double. As far as I know, we use __builtin_ to call the internal functions? I do not know which the only fadd function is. > double precision one. But instead you want to add two double precision > numbers, producing a single precision one? The fadds instruction fits Yes. > well to that, but you'll have to check exactly how the fadd() function > should behave with respect to rounding and exceptions and the like. In Joseph's initial mail that describes what should be carried out in the course of project, about rounding and exceptions. I have strictly followed this description for my folding patch : * The narrowing functions, e.g. fadd, faddl, daddl, are a bit different from most other built-in math.h functions because the return type is different from the argument types. You could start by adding them to builtins.def similarly to roundeven (with new macros to handle adding such functions for relevant pairs of _FloatN, _FloatNx types). These functions could be folded for constant arguments only if the result is exact, or if -fno-rounding-math -fno-trapping-math (and -fno-math-errno if the result involves overflow / underflow). Thanks, Tejas On Sat, 10 Aug 2019 at 22:16, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi! > > On Sat, Aug 10, 2019 at 04:00:53PM +0530, Tejas Joshi wrote: > > I have been trying to write a basic pattern taking all the suggestions > > you both have mentioned. The same patch is attached here, but I cannot > > see call to : > > > > float > > foo (double x, double y) > > { > > return __builtin_fadd (x, y); > > } > > being expanded to any instruction, at least a simple one, using > > -fno-builtin-fadd (and also -mhard-float?). It always stays "bl fadd". > > What am I missing here? > > As far as I understand that flag should set the behaviour of the fadd > function, not the __builtin_fadd one. So I don't know. > > > > (POWER8 and later) on. (The result if OE=1 or UE=1 is undefined). (See > > > 4.3.5.1 in the ISA). > > > > 4.3.5.1 in the ISA says that single precision arithmetic instructions > > perform operation in double format and coerces the result in single > > format. Can fadd be considered as this type of instruction or do I > > need to perform add in DFmode and then use "instruction provided to > > explicitly convert double format operand in FPR to single format."? > > A single precision add is "fadds". It rounds its result to single > precision. > > I'm lost what the exact semantic of the wanted fadd() function are. > I thought you wanted to add two single precision numbers, producing a > double precision one. But instead you want to add two double precision > numbers, producing a single precision one? The fadds instruction fits > well to that, but you'll have to check exactly how the fadd() function > should behave with respect to rounding and exceptions and the like. > > > Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-11 4:58 ` Tejas Joshi @ 2019-08-11 7:20 ` Segher Boessenkool 2019-08-11 12:46 ` Tejas Joshi 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-11 7:20 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph Hi Tejas, On Sun, Aug 11, 2019 at 10:34:26AM +0530, Tejas Joshi wrote: > > As far as I understand that flag should set the behaviour of the fadd > > function, not the __builtin_fadd one. So I don't know. > > According to ISO/IEC TS 18661, I am supposed to implement the fadd > variants for folding and expand them inline, that take double and long > double as arguments and return > addition in appropriate narrower type, float and double. As far as I > know, we use __builtin_ to call the internal functions? I do not know > which the only fadd function is. See the manual, section "Other Built-in Functions Provided by GCC": @opindex fno-builtin GCC includes built-in versions of many of the functions in the standard C library. These functions come in two forms: one whose names start with the @code{__builtin_} prefix, and the other without. Both forms have the same type (including prototype), the same address (when their address is taken), and the same meaning as the C library functions even if you specify the @option{-fno-builtin} option @pxref{C Dialect Options}). Many of these functions are only optimized in certain cases; if they are not optimized in a particular case, a call to the library function is emitted. > > double precision one. But instead you want to add two double precision > > numbers, producing a single precision one? The fadds instruction fits > > Yes. > > > well to that, but you'll have to check exactly how the fadd() function > > should behave with respect to rounding and exceptions and the like. I read 18661-1 now... and yup, "fadds" will work fine, and there are no complications like this as far as I see. For QP to either DP or SP, you can do round-to-odd followed by one of the conversion instructions. The ISA manual describes this; I can help you with it, but first get DP->SP (fadd) working? For the non-QP long doubles we have... There is the option of using DP for it, which isn't standard-compliant, many other archs have it too, and it is simple anyway, because you have all code for operations already. You can mostly just ignore this option. For double-double... Well firstly, double-double is on the way out, so adding new features to it is pretty useless? Just ignore it unless you have time left, I'd say. > In Joseph's initial mail that describes what should be carried out in > the course of project, about rounding and exceptions. I have strictly > followed this description for my folding patch : > > * The narrowing functions, e.g. fadd, faddl, daddl, are a bit different > from most other built-in math.h functions because the return type is > different from the argument types. You could start by adding them to > builtins.def similarly to roundeven (with new macros to handle adding such > functions for relevant pairs of _FloatN, _FloatNx types). These functions > could be folded for constant arguments only if the result is exact, or if > -fno-rounding-math -fno-trapping-math (and -fno-math-errno if the result > involves overflow / underflow). For Power, all five basic operations (add, sub, mul, div, fma) work fine wrt rounding mode if using the fadds etc. insns, for DP->SP. All exceptions work as expected, except maybe underflow and overflow, but 18661 doesn't require much at all for those anyway :-) Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-11 7:20 ` Segher Boessenkool @ 2019-08-11 12:46 ` Tejas Joshi 0 siblings, 0 replies; 63+ messages in thread From: Tejas Joshi @ 2019-08-11 12:46 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph Hello. > with it, but first get DP->SP (fadd) working? Can you please review what have I have been trying and facing the issues on patch : <https://gcc.gnu.org/ml/gcc/2019-08/msg00078.html> Thanks, Tejas On Sun, 11 Aug 2019 at 12:50, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi Tejas, > > On Sun, Aug 11, 2019 at 10:34:26AM +0530, Tejas Joshi wrote: > > > As far as I understand that flag should set the behaviour of the fadd > > > function, not the __builtin_fadd one. So I don't know. > > > > According to ISO/IEC TS 18661, I am supposed to implement the fadd > > variants for folding and expand them inline, that take double and long > > double as arguments and return > > addition in appropriate narrower type, float and double. As far as I > > know, we use __builtin_ to call the internal functions? I do not know > > which the only fadd function is. > > See the manual, section "Other Built-in Functions Provided by GCC": > > @opindex fno-builtin > GCC includes built-in versions of many of the functions in the standard > C library. These functions come in two forms: one whose names start with > the @code{__builtin_} prefix, and the other without. Both forms have the > same type (including prototype), the same address (when their address is > taken), and the same meaning as the C library functions even if you specify > the @option{-fno-builtin} option @pxref{C Dialect Options}). Many of these > functions are only optimized in certain cases; if they are not optimized in > a particular case, a call to the library function is emitted. > > > > double precision one. But instead you want to add two double precision > > > numbers, producing a single precision one? The fadds instruction fits > > > > Yes. > > > > > well to that, but you'll have to check exactly how the fadd() function > > > should behave with respect to rounding and exceptions and the like. > > I read 18661-1 now... and yup, "fadds" will work fine, and there are > no complications like this as far as I see. > > For QP to either DP or SP, you can do round-to-odd followed by one of the > conversion instructions. The ISA manual describes this; I can help you > with it, but first get DP->SP (fadd) working? > > For the non-QP long doubles we have... There is the option of using DP > for it, which isn't standard-compliant, many other archs have it too, > and it is simple anyway, because you have all code for operations > already. You can mostly just ignore this option. > > For double-double... Well firstly, double-double is on the way out, so > adding new features to it is pretty useless? Just ignore it unless you > have time left, I'd say. > > > In Joseph's initial mail that describes what should be carried out in > > the course of project, about rounding and exceptions. I have strictly > > followed this description for my folding patch : > > > > * The narrowing functions, e.g. fadd, faddl, daddl, are a bit different > > from most other built-in math.h functions because the return type is > > different from the argument types. You could start by adding them to > > builtins.def similarly to roundeven (with new macros to handle adding such > > functions for relevant pairs of _FloatN, _FloatNx types). These functions > > could be folded for constant arguments only if the result is exact, or if > > -fno-rounding-math -fno-trapping-math (and -fno-math-errno if the result > > involves overflow / underflow). > > For Power, all five basic operations (add, sub, mul, div, fma) work fine > wrt rounding mode if using the fadds etc. insns, for DP->SP. All > exceptions work as expected, except maybe underflow and overflow, but > 18661 doesn't require much at all for those anyway :-) > > > Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-10 10:24 ` Tejas Joshi 2019-08-10 16:46 ` Segher Boessenkool @ 2019-08-11 16:59 ` Segher Boessenkool 2019-08-12 17:25 ` Tejas Joshi 1 sibling, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-11 16:59 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph Hi Tejas, On Sat, Aug 10, 2019 at 04:00:53PM +0530, Tejas Joshi wrote: > +(define_expand "add_truncdfsf3" > + [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand")) > + (plus:DF (match_operand:DF 1 "gpc_reg_operand") > + (match_operand:DF 2 "gpc_reg_operand")))] > + "TARGET_HARD_FLOAT" > + "") float_extend on the LHS is never correct. I think the following should work, never mind that it looks like it does double rounding, because it doesn't (famous last words ;-) ): (define_expand "add_truncdfsf3" [(set (match_operand:SF 0 "gpc_reg_operand") (float_truncate:SF (plus:DF (match_operand:DF 1 "gpc_reg_operand") (match_operand:DF 2 "gpc_reg_operand"))))] "TARGET_HARD_FLOAT" "") > +(define_insn "*add_truncdfsf3_fpr" > + [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand" "=<Ff>")) > + (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%<Ff>") > + (match_operand:DF 2 "gpc_reg_operand" "<Ff>")))] > + "TARGET_HARD_FLOAT" > + "fadd %0,%1,%2" > + [(set_attr "type" "fp")]) The constraints should be "f", "%d", "d", respectively. <Ff> says to display something for the mode in a mode iterator. There is no mode iterator here. (In what you copied this from, there was SFDF). You want to output "fadds", not "fadd". Maybe it is easier to immediately write the VSX scalar version for this as well? That's xsaddsp. Oh, and you need to restrict all of this to more recent CPUs, we'll have to do some new TARGET_* flag for that I think. Finally: please send patches to gcc-patches@ (not gcc@). Thanks, Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-11 16:59 ` Segher Boessenkool @ 2019-08-12 17:25 ` Tejas Joshi 2019-08-12 17:55 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-08-12 17:25 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph Hi, I have the following code in my rs6000.md (I haven't used new TARGET_* yet) : (define_expand "add_truncdfsf3" [(set (match_operand:SF 0 "gpc_reg_operand") (float_truncate:SF (plus:DF (match_operand:DF 1 "gpc_reg_operand") (match_operand:DF 2 "gpc_reg_operand"))))] "TARGET_HARD_FLOAT" "") (define_insn "*add_truncdfsf3_fpr" [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") (float_truncate:SF (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa") (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))] "TARGET_HARD_FLOAT" "@ fadds %0,%1,%2 xsaddsp %x0,%x1,%x2" [(set_attr "type" "fp")]) with following optab in optabs.def : OPTAB_CD(fadd_optab, "add_trunc$b$a3") (what is the difference between $b$a and $a$b?) I have also tried adding fadd, add_truncdfsf3 in rs6000-builtin.def, examined rtl dumps multiple times but couldn't get fadd to be exapanded. What am I missing here? Thanks, Tejas On Sun, 11 Aug 2019 at 22:29, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi Tejas, > > On Sat, Aug 10, 2019 at 04:00:53PM +0530, Tejas Joshi wrote: > > +(define_expand "add_truncdfsf3" > > + [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand")) > > + (plus:DF (match_operand:DF 1 "gpc_reg_operand") > > + (match_operand:DF 2 "gpc_reg_operand")))] > > + "TARGET_HARD_FLOAT" > > + "") > > float_extend on the LHS is never correct. I think the following should > work, never mind that it looks like it does double rounding, because it > doesn't (famous last words ;-) ): > > (define_expand "add_truncdfsf3" > [(set (match_operand:SF 0 "gpc_reg_operand") > (float_truncate:SF > (plus:DF (match_operand:DF 1 "gpc_reg_operand") > (match_operand:DF 2 "gpc_reg_operand"))))] > "TARGET_HARD_FLOAT" > "") > > > +(define_insn "*add_truncdfsf3_fpr" > > + [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand" "=<Ff>")) > > + (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%<Ff>") > > + (match_operand:DF 2 "gpc_reg_operand" "<Ff>")))] > > + "TARGET_HARD_FLOAT" > > + "fadd %0,%1,%2" > > + [(set_attr "type" "fp")]) > > The constraints should be "f", "%d", "d", respectively. <Ff> says to > display something for the mode in a mode iterator. There is no mode > iterator here. (In what you copied this from, there was SFDF). > > You want to output "fadds", not "fadd". > > Maybe it is easier to immediately write the VSX scalar version for this > as well? That's xsaddsp. Oh, and you need to restrict all of this to > more recent CPUs, we'll have to do some new TARGET_* flag for that I > think. > > Finally: please send patches to gcc-patches@ (not gcc@). > > Thanks, > > > Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-12 17:25 ` Tejas Joshi @ 2019-08-12 17:55 ` Segher Boessenkool 2019-08-12 21:20 ` Joseph Myers 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-12 17:55 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph On Mon, Aug 12, 2019 at 11:01:11PM +0530, Tejas Joshi wrote: > I have the following code in my rs6000.md (I haven't used new TARGET_* yet) : > > (define_expand "add_truncdfsf3" > [(set (match_operand:SF 0 "gpc_reg_operand") > (float_truncate:SF > (plus:DF (match_operand:DF 1 "gpc_reg_operand") > (match_operand:DF 2 "gpc_reg_operand"))))] > "TARGET_HARD_FLOAT" > "") > > (define_insn "*add_truncdfsf3_fpr" > [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") > (float_truncate:SF > (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa") > (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))] > "TARGET_HARD_FLOAT" > "@ > fadds %0,%1,%2 > xsaddsp %x0,%x1,%x2" > [(set_attr "type" "fp")]) Those look fine. You can also merge them into one: (define_insn "add_truncdfsf3" [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") (float_truncate:SF (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa") (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))] "TARGET_HARD_FLOAT" "@ fadds %0,%1,%2 xsaddsp %x0,%x1,%x2" [(set_attr "type" "fp")]) > with following optab in optabs.def : > > OPTAB_CD(fadd_optab, "add_trunc$b$a3") (what is the > difference between $b$a and $a$b?) Which of the two modes becomes $a and which becomes $b? It depends on the definition of fadd_optab what order is expected, I think. Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-12 17:55 ` Segher Boessenkool @ 2019-08-12 21:20 ` Joseph Myers 2019-08-12 21:52 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Joseph Myers @ 2019-08-12 21:20 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka On Mon, 12 Aug 2019, Segher Boessenkool wrote: > (define_insn "add_truncdfsf3" > [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") > (float_truncate:SF > (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa") > (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))] That sort of pattern is incorrect for a fused operation such as fadd, because combine could match it for code that is supposed to do separate addition and narrowing conversion. The RTL needs to be something that does *not* match the combination of separate operations (just as fma has its own RTL, and a separate pass is responsible for converting separate operations to fused ones in the -ffp-contract=fast case where it's permitted). -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-12 21:20 ` Joseph Myers @ 2019-08-12 21:52 ` Segher Boessenkool 2019-08-14 6:15 ` Tejas Joshi 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-12 21:52 UTC (permalink / raw) To: Joseph Myers; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka On Mon, Aug 12, 2019 at 09:20:18PM +0000, Joseph Myers wrote: > On Mon, 12 Aug 2019, Segher Boessenkool wrote: > > > (define_insn "add_truncdfsf3" > > [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") > > (float_truncate:SF > > (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa") > > (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))] > > That sort of pattern is incorrect for a fused operation such as fadd, > because combine could match it for code that is supposed to do separate > addition and narrowing conversion. The RTL needs to be something that > does *not* match the combination of separate operations (just as fma has > its own RTL, and a separate pass is responsible for converting separate > operations to fused ones in the -ffp-contract=fast case where it's > permitted). Ugh, we allow disabling contraction, I forgot. Rats. Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-12 21:52 ` Segher Boessenkool @ 2019-08-14 6:15 ` Tejas Joshi 2019-08-14 7:21 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-08-14 6:15 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph > The RTL needs to be something that > does *not* match the combination of separate operations (just as fma has > its own RTL, and a separate pass is responsible for converting separate So do I need to introduce fadd's own RTL just as fma which would emit a fused instruction while -ffp-contract is default (fast) and would emit separate instructions like add in DFmode and then truncate to SF? while -ffp-contract=off ? (just as fma) On Tue, 13 Aug 2019 at 03:22, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Mon, Aug 12, 2019 at 09:20:18PM +0000, Joseph Myers wrote: > > On Mon, 12 Aug 2019, Segher Boessenkool wrote: > > > > > (define_insn "add_truncdfsf3" > > > [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") > > > (float_truncate:SF > > > (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa") > > > (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))] > > > > That sort of pattern is incorrect for a fused operation such as fadd, > > because combine could match it for code that is supposed to do separate > > addition and narrowing conversion. The RTL needs to be something that > > does *not* match the combination of separate operations (just as fma has > > its own RTL, and a separate pass is responsible for converting separate > > operations to fused ones in the -ffp-contract=fast case where it's > > permitted). > > Ugh, we allow disabling contraction, I forgot. Rats. > > > Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-14 6:15 ` Tejas Joshi @ 2019-08-14 7:21 ` Segher Boessenkool 2019-08-14 16:11 ` Joseph Myers 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-14 7:21 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph On Wed, Aug 14, 2019 at 11:51:28AM +0530, Tejas Joshi wrote: > > The RTL needs to be something that > > does *not* match the combination of separate operations (just as fma has > > its own RTL, and a separate pass is responsible for converting separate > > So do I need to introduce fadd's own RTL just as fma which would emit > a fused instruction while -ffp-contract is default (fast) and would > emit separate instructions like add in DFmode and then truncate to SF? > while -ffp-contract=off ? (just as fma) I think you can do one RTL code that replaces float_truncate in > > > > (define_insn "add_truncdfsf3" > > > > [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") > > > > (float_truncate:SF > > > > (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa") > > > > (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))] but that is only meant for such explicit contraction. This can then happily be used to implement all such patterns. Is there some issue with that I overlook? A good name for this... I would say "float_contract", because I like horrible names. It shouldn't be hard to think of something better :-) Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-14 7:21 ` Segher Boessenkool @ 2019-08-14 16:11 ` Joseph Myers 2019-08-14 20:21 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Joseph Myers @ 2019-08-14 16:11 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka On Wed, 14 Aug 2019, Segher Boessenkool wrote: > I think you can do one RTL code that replaces float_truncate in > > > > > > (define_insn "add_truncdfsf3" > > > > > [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") > > > > > (float_truncate:SF > > > > > (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa") > > > > > (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))] > > but that is only meant for such explicit contraction. This can then > happily be used to implement all such patterns. Is there some issue > with that I overlook? Yes, I think such a separate RTL code would work (as would an architecture-specific UNSPEC) - it just needs to avoid the pattern matching RTL that can arise other than from the built-in functions. (Everything to do with needing -fno-math-errno to expand into such instructions should be handled in the architecture-independent compiler.) -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-14 16:11 ` Joseph Myers @ 2019-08-14 20:21 ` Segher Boessenkool 2019-08-14 20:23 ` Joseph Myers 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-14 20:21 UTC (permalink / raw) To: Joseph Myers; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka On Wed, Aug 14, 2019 at 04:10:56PM +0000, Joseph Myers wrote: > On Wed, 14 Aug 2019, Segher Boessenkool wrote: > > > I think you can do one RTL code that replaces float_truncate in > > > > > > > > (define_insn "add_truncdfsf3" > > > > > > [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") > > > > > > (float_truncate:SF > > > > > > (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa") > > > > > > (match_operand:DF 2 "gpc_reg_operand" "d,wa"))))] > > > > but that is only meant for such explicit contraction. This can then > > happily be used to implement all such patterns. Is there some issue > > with that I overlook? > > Yes, I think such a separate RTL code would work (as would an > architecture-specific UNSPEC) - it just needs to avoid the pattern > matching RTL that can arise other than from the built-in functions. > > (Everything to do with needing -fno-math-errno to expand into such > instructions should be handled in the architecture-independent compiler.) Does something like float d; double a, b, x; ... d = fadd (a + x, b - x); work as wanted, with such a representation? It would simplify (does it?) to d = fadd (a, b); but is that allowed? Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-14 20:21 ` Segher Boessenkool @ 2019-08-14 20:23 ` Joseph Myers 2019-08-14 21:00 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Joseph Myers @ 2019-08-14 20:23 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka On Wed, 14 Aug 2019, Segher Boessenkool wrote: > Does something like > float d; double a, b, x; > ... > d = fadd (a + x, b - x); > work as wanted, with such a representation? It would simplify (does it?) to > d = fadd (a, b); > but is that allowed? It's not allowed, but neither is simplifying (a + x) + (b - x) into (a + b), when contraction isn't allowed. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-14 20:23 ` Joseph Myers @ 2019-08-14 21:00 ` Segher Boessenkool 2019-08-15 9:52 ` Tejas Joshi 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-14 21:00 UTC (permalink / raw) To: Joseph Myers; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka On Wed, Aug 14, 2019 at 08:23:27PM +0000, Joseph Myers wrote: > On Wed, 14 Aug 2019, Segher Boessenkool wrote: > > > Does something like > > float d; double a, b, x; > > ... > > d = fadd (a + x, b - x); > > work as wanted, with such a representation? It would simplify (does it?) to > > d = fadd (a, b); > > but is that allowed? > > It's not allowed, but neither is simplifying (a + x) + (b - x) into (a + > b), when contraction isn't allowed. Ah of course. And we already should not do such simplification on RTL, when contraction is disallowed. So yeah it should work fine I think. A new RTL code would be best (it would be silly to have to make an unspec for it in every port separately), but an unspec is of course easiest for now. Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-14 21:00 ` Segher Boessenkool @ 2019-08-15 9:52 ` Tejas Joshi 2019-08-15 12:47 ` Richard Sandiford 2019-08-15 18:54 ` Segher Boessenkool 0 siblings, 2 replies; 63+ messages in thread From: Tejas Joshi @ 2019-08-15 9:52 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph Hello. I just wanted to make sure that I am looking at the correct code here. Except for rtl.def where I should be introducing something like float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints set on functions around expr.c, cfgexpand.c where I grep for float_truncate/FLOAT_TRUNCATE did not hit. Also, in what manner should float_contract/narrow be different from float_truncate as both are trying to do similar things? (truncation from DF to SF) Thanks, Tejas On Thu, 15 Aug 2019 at 02:30, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Wed, Aug 14, 2019 at 08:23:27PM +0000, Joseph Myers wrote: > > On Wed, 14 Aug 2019, Segher Boessenkool wrote: > > > > > Does something like > > > float d; double a, b, x; > > > ... > > > d = fadd (a + x, b - x); > > > work as wanted, with such a representation? It would simplify (does it?) to > > > d = fadd (a, b); > > > but is that allowed? > > > > It's not allowed, but neither is simplifying (a + x) + (b - x) into (a + > > b), when contraction isn't allowed. > > Ah of course. And we already should not do such simplification on RTL, > when contraction is disallowed. > > So yeah it should work fine I think. A new RTL code would be best (it > would be silly to have to make an unspec for it in every port separately), > but an unspec is of course easiest for now. > > > Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-15 9:52 ` Tejas Joshi @ 2019-08-15 12:47 ` Richard Sandiford 2019-08-15 13:55 ` Tejas Joshi 2019-08-15 18:45 ` Segher Boessenkool 2019-08-15 18:54 ` Segher Boessenkool 1 sibling, 2 replies; 63+ messages in thread From: Richard Sandiford @ 2019-08-15 12:47 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, segher, joseph Tejas Joshi <tejasjoshi9673@gmail.com> writes: > Hello. > I just wanted to make sure that I am looking at the correct code here. > Except for rtl.def where I should be introducing something like > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints > set on functions around expr.c, cfgexpand.c where I grep for > float_truncate/FLOAT_TRUNCATE did not hit. > Also, in what manner should float_contract/narrow be different from > float_truncate as both are trying to do similar things? (truncation > from DF to SF) I think the code should instead be a fused addition and truncation, a bit like FMA is a fused addition and multiplication. Describing it as a DFmode addition followed by some conversion to SF would still involve double rounding. simplify-rtx.c is probably the most important place to handle it. It would be easiest to test using the selftests at the end of the file. Thanks, Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-15 12:47 ` Richard Sandiford @ 2019-08-15 13:55 ` Tejas Joshi 2019-08-15 18:45 ` Segher Boessenkool 1 sibling, 0 replies; 63+ messages in thread From: Tejas Joshi @ 2019-08-15 13:55 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph, richard.sandiford > I think the code should instead be a fused addition and truncation, > a bit like FMA is a fused addition and multiplication. Describing it as > a DFmode addition followed by some conversion to SF would still involve > double rounding. In that case, something like FADD. But for functions like fsub, fmul and fdiv that does similar computation, wouldn't we need more operation codes for them? Is it possible to have something generalized that does *arithmetic computation (rather than just addition)* and then *conversion (narrowing)*? just a thought. Thanks, Tejas On Thu, 15 Aug 2019 at 18:17, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Tejas Joshi <tejasjoshi9673@gmail.com> writes: > > Hello. > > I just wanted to make sure that I am looking at the correct code here. > > Except for rtl.def where I should be introducing something like > > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints > > set on functions around expr.c, cfgexpand.c where I grep for > > float_truncate/FLOAT_TRUNCATE did not hit. > > Also, in what manner should float_contract/narrow be different from > > float_truncate as both are trying to do similar things? (truncation > > from DF to SF) > > I think the code should instead be a fused addition and truncation, > a bit like FMA is a fused addition and multiplication. Describing it as > a DFmode addition followed by some conversion to SF would still involve > double rounding. > > simplify-rtx.c is probably the most important place to handle it. > It would be easiest to test using the selftests at the end of the file. > > Thanks, > Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-15 12:47 ` Richard Sandiford 2019-08-15 13:55 ` Tejas Joshi @ 2019-08-15 18:45 ` Segher Boessenkool 2019-08-16 10:23 ` Richard Sandiford 1 sibling, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-15 18:45 UTC (permalink / raw) To: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph, richard.sandiford On Thu, Aug 15, 2019 at 01:47:47PM +0100, Richard Sandiford wrote: > Tejas Joshi <tejasjoshi9673@gmail.com> writes: > > Hello. > > I just wanted to make sure that I am looking at the correct code here. > > Except for rtl.def where I should be introducing something like > > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints I like that "float_narrow" name :-) > > set on functions around expr.c, cfgexpand.c where I grep for > > float_truncate/FLOAT_TRUNCATE did not hit. > > Also, in what manner should float_contract/narrow be different from > > float_truncate as both are trying to do similar things? (truncation > > from DF to SF) > > I think the code should instead be a fused addition and truncation, > a bit like FMA is a fused addition and multiplication. Describing it as > a DFmode addition followed by some conversion to SF would still involve > double rounding. How so? It would *mean* there is only single rounding, even! That's the whole point of it. > simplify-rtx.c is probably the most important place to handle it. > It would be easiest to test using the selftests at the end of the file. Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-15 18:45 ` Segher Boessenkool @ 2019-08-16 10:23 ` Richard Sandiford 2019-08-17 5:40 ` Tejas Joshi 0 siblings, 1 reply; 63+ messages in thread From: Richard Sandiford @ 2019-08-16 10:23 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph Segher Boessenkool <segher@kernel.crashing.org> writes: > On Thu, Aug 15, 2019 at 01:47:47PM +0100, Richard Sandiford wrote: >> Tejas Joshi <tejasjoshi9673@gmail.com> writes: >> > Hello. >> > I just wanted to make sure that I am looking at the correct code here. >> > Except for rtl.def where I should be introducing something like >> > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints > > I like that "float_narrow" name :-) > >> > set on functions around expr.c, cfgexpand.c where I grep for >> > float_truncate/FLOAT_TRUNCATE did not hit. >> > Also, in what manner should float_contract/narrow be different from >> > float_truncate as both are trying to do similar things? (truncation >> > from DF to SF) >> >> I think the code should instead be a fused addition and truncation, >> a bit like FMA is a fused addition and multiplication. Describing it as >> a DFmode addition followed by some conversion to SF would still involve >> double rounding. > > How so? It would *mean* there is only single rounding, even! That's > the whole point of it. But a PLUS should behave as a PLUS in any context. Making its behaviour dependent on the containing rtxes (if any) would be a can of worms. Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-16 10:23 ` Richard Sandiford @ 2019-08-17 5:40 ` Tejas Joshi 2019-08-17 8:21 ` Richard Sandiford 0 siblings, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-08-17 5:40 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph Hi, > It's just a different name, nothing more, nothing less. Because it is > a different name it can not be accidentally generated from actual > truncations. I have introduced float_narrow but I could not find appropriate places to generate it for a call to fadd instead it to generate a CALL. I used GDB to set breakpoints which hit fold_rtx and cse_insn but I got confused with the rtx codes and passes which generate respective RTL. It should not be similar to FLOAT_TRUNCATE if we want to avoid it generating for actual truncations? Thanks, Tejas On Fri, 16 Aug 2019 at 15:53, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Segher Boessenkool <segher@kernel.crashing.org> writes: > > On Thu, Aug 15, 2019 at 01:47:47PM +0100, Richard Sandiford wrote: > >> Tejas Joshi <tejasjoshi9673@gmail.com> writes: > >> > Hello. > >> > I just wanted to make sure that I am looking at the correct code here. > >> > Except for rtl.def where I should be introducing something like > >> > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints > > > > I like that "float_narrow" name :-) > > > >> > set on functions around expr.c, cfgexpand.c where I grep for > >> > float_truncate/FLOAT_TRUNCATE did not hit. > >> > Also, in what manner should float_contract/narrow be different from > >> > float_truncate as both are trying to do similar things? (truncation > >> > from DF to SF) > >> > >> I think the code should instead be a fused addition and truncation, > >> a bit like FMA is a fused addition and multiplication. Describing it as > >> a DFmode addition followed by some conversion to SF would still involve > >> double rounding. > > > > How so? It would *mean* there is only single rounding, even! That's > > the whole point of it. > > But a PLUS should behave as a PLUS in any context. Making its > behaviour dependent on the containing rtxes (if any) would be a > can of worms. > > Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-17 5:40 ` Tejas Joshi @ 2019-08-17 8:21 ` Richard Sandiford 2019-08-19 10:46 ` Tejas Joshi 2019-08-19 13:07 ` Segher Boessenkool 0 siblings, 2 replies; 63+ messages in thread From: Richard Sandiford @ 2019-08-17 8:21 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, segher, joseph Tejas Joshi <tejasjoshi9673@gmail.com> writes: > Hi, > >> It's just a different name, nothing more, nothing less. Because it is >> a different name it can not be accidentally generated from actual >> truncations. > > I have introduced float_narrow but I could not find appropriate places > to generate it for a call to fadd instead it to generate a CALL. I > used GDB to set breakpoints which hit fold_rtx and cse_insn but I got > confused with the rtx codes and passes which generate respective RTL. > It should not be similar to FLOAT_TRUNCATE if we want to avoid it > generating for actual truncations? Please don't do it this way. The whole point of the work is that this is a single operation that cannot be modelled as a post-processing of a normal double addition result. It's a single operation at the source level, a single IFN, a single optab, and a single instruction. Splitting it apart into two operations for rtl only, and making it look in rtl terms like a post-processing of a normal addition result, seems like it's going to come back to bite us. In lisp terms we're saying that the operand to the float_narrow is implicitly quoted: (float_narrow:m '(plus:n a b)) so that when float_narrow is evaluated, the argument is the unevaluated rtl expression "(plus a b)" rather than the evaluated result a + b. float_narrow then does its own evaluation of a and b and performs a fused addition and narrowing on the result. No other rtx rvalue works like this. rtx nappings like simplification or evaluation are normally depth-first, so that the mapping is applied to the operands first, and then the root is mapped/simplified/evaluated with the results. Adding implicit lisp quoting would require special cases in these routines for float_narrow. The only current analogue I can think of for this is the handling of zero_extend on const_ints. Because const_ints are modeless, we have to avoid cases in which the recursion produces things like: (zero_extend:m (const_int -1)) because it's no longer clear what mode the zero_extend is extending from. But I think that's seen as a wart of having modeless const_ints. I don't think it's something we should actively embrace by adding float_narrow. Using float_narrow would also be inconsistent with the way we handle saturating arithmetic. There we use US_PLUS and SS_PLUS rtx codes for unsigned and signed saturating plus respectively, rather than: (unsigned_sat '(plus a b)) (signed_sat '(plus a b)) Using dedicated codes might seem clunky. But it's simple, safe, and fits the existing model without special cases. :-) Thanks, Richard > > Thanks, > Tejas > > > On Fri, 16 Aug 2019 at 15:53, Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> Segher Boessenkool <segher@kernel.crashing.org> writes: >> > On Thu, Aug 15, 2019 at 01:47:47PM +0100, Richard Sandiford wrote: >> >> Tejas Joshi <tejasjoshi9673@gmail.com> writes: >> >> > Hello. >> >> > I just wanted to make sure that I am looking at the correct code here. >> >> > Except for rtl.def where I should be introducing something like >> >> > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints >> > >> > I like that "float_narrow" name :-) >> > >> >> > set on functions around expr.c, cfgexpand.c where I grep for >> >> > float_truncate/FLOAT_TRUNCATE did not hit. >> >> > Also, in what manner should float_contract/narrow be different from >> >> > float_truncate as both are trying to do similar things? (truncation >> >> > from DF to SF) >> >> >> >> I think the code should instead be a fused addition and truncation, >> >> a bit like FMA is a fused addition and multiplication. Describing it as >> >> a DFmode addition followed by some conversion to SF would still involve >> >> double rounding. >> > >> > How so? It would *mean* there is only single rounding, even! That's >> > the whole point of it. >> >> But a PLUS should behave as a PLUS in any context. Making its >> behaviour dependent on the containing rtxes (if any) would be a >> can of worms. >> >> Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-17 8:21 ` Richard Sandiford @ 2019-08-19 10:46 ` Tejas Joshi 2019-08-19 13:07 ` Segher Boessenkool 1 sibling, 0 replies; 63+ messages in thread From: Tejas Joshi @ 2019-08-19 10:46 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph, richard.sandiford > but an unspec is of course easiest for now. So, at this point, should I proceed with UNSPEC considering the complications that might arise as Richard points out? On Sat, 17 Aug 2019 at 13:51, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Tejas Joshi <tejasjoshi9673@gmail.com> writes: > > Hi, > > > >> It's just a different name, nothing more, nothing less. Because it is > >> a different name it can not be accidentally generated from actual > >> truncations. > > > > I have introduced float_narrow but I could not find appropriate places > > to generate it for a call to fadd instead it to generate a CALL. I > > used GDB to set breakpoints which hit fold_rtx and cse_insn but I got > > confused with the rtx codes and passes which generate respective RTL. > > It should not be similar to FLOAT_TRUNCATE if we want to avoid it > > generating for actual truncations? > > Please don't do it this way. The whole point of the work is that this > is a single operation that cannot be modelled as a post-processing of > a normal double addition result. It's a single operation at the source > level, a single IFN, a single optab, and a single instruction. Splitting > it apart into two operations for rtl only, and making it look in rtl terms > like a post-processing of a normal addition result, seems like it's going > to come back to bite us. > > In lisp terms we're saying that the operand to the float_narrow is > implicitly quoted: > > (float_narrow:m '(plus:n a b)) > > so that when float_narrow is evaluated, the argument is the unevaluated > rtl expression "(plus a b)" rather than the evaluated result a + b. > float_narrow then does its own evaluation of a and b and performs a > fused addition and narrowing on the result. > > No other rtx rvalue works like this. rtx nappings like simplification > or evaluation are normally depth-first, so that the mapping is applied > to the operands first, and then the root is mapped/simplified/evaluated > with the results. Adding implicit lisp quoting would require special > cases in these routines for float_narrow. > > The only current analogue I can think of for this is the handling > of zero_extend on const_ints. Because const_ints are modeless, we have > to avoid cases in which the recursion produces things like: > > (zero_extend:m (const_int -1)) > > because it's no longer clear what mode the zero_extend is extending from. > But I think that's seen as a wart of having modeless const_ints. I don't > think it's something we should actively embrace by adding float_narrow. > > Using float_narrow would also be inconsistent with the way we handle > saturating arithmetic. There we use US_PLUS and SS_PLUS rtx codes for > unsigned and signed saturating plus respectively, rather than: > > (unsigned_sat '(plus a b)) > (signed_sat '(plus a b)) > > Using dedicated codes might seem clunky. But it's simple, safe, and fits > the existing model without special cases. :-) > > Thanks, > Richard > > > > > Thanks, > > Tejas > > > > > > On Fri, 16 Aug 2019 at 15:53, Richard Sandiford > > <richard.sandiford@arm.com> wrote: > >> > >> Segher Boessenkool <segher@kernel.crashing.org> writes: > >> > On Thu, Aug 15, 2019 at 01:47:47PM +0100, Richard Sandiford wrote: > >> >> Tejas Joshi <tejasjoshi9673@gmail.com> writes: > >> >> > Hello. > >> >> > I just wanted to make sure that I am looking at the correct code here. > >> >> > Except for rtl.def where I should be introducing something like > >> >> > float_contract (or float_narrow?) and also simplify-rtx.c, breakpoints > >> > > >> > I like that "float_narrow" name :-) > >> > > >> >> > set on functions around expr.c, cfgexpand.c where I grep for > >> >> > float_truncate/FLOAT_TRUNCATE did not hit. > >> >> > Also, in what manner should float_contract/narrow be different from > >> >> > float_truncate as both are trying to do similar things? (truncation > >> >> > from DF to SF) > >> >> > >> >> I think the code should instead be a fused addition and truncation, > >> >> a bit like FMA is a fused addition and multiplication. Describing it as > >> >> a DFmode addition followed by some conversion to SF would still involve > >> >> double rounding. > >> > > >> > How so? It would *mean* there is only single rounding, even! That's > >> > the whole point of it. > >> > >> But a PLUS should behave as a PLUS in any context. Making its > >> behaviour dependent on the containing rtxes (if any) would be a > >> can of worms. > >> > >> Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-17 8:21 ` Richard Sandiford 2019-08-19 10:46 ` Tejas Joshi @ 2019-08-19 13:07 ` Segher Boessenkool 2019-08-20 7:41 ` Richard Sandiford 1 sibling, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-19 13:07 UTC (permalink / raw) To: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph, richard.sandiford Hi Richard, On Sat, Aug 17, 2019 at 09:21:00AM +0100, Richard Sandiford wrote: > Tejas Joshi <tejasjoshi9673@gmail.com> writes: > >> It's just a different name, nothing more, nothing less. Because it is > >> a different name it can not be accidentally generated from actual > >> truncations. > > > > I have introduced float_narrow but I could not find appropriate places > > to generate it for a call to fadd instead it to generate a CALL. I > > used GDB to set breakpoints which hit fold_rtx and cse_insn but I got > > confused with the rtx codes and passes which generate respective RTL. > > It should not be similar to FLOAT_TRUNCATE if we want to avoid it > > generating for actual truncations? > > Please don't do it this way. The whole point of the work is that this > is a single operation that cannot be modelled as a post-processing of > a normal double addition result. It's a single operation at the source > level, a single IFN, a single optab, and a single instruction. Splitting > it apart into two operations for rtl only, and making it look in rtl terms > like a post-processing of a normal addition result, seems like it's going > to come back to bite us. > > In lisp terms we're saying that the operand to the float_narrow is > implicitly quoted: > > (float_narrow:m '(plus:n a b)) > > so that when float_narrow is evaluated, the argument is the unevaluated > rtl expression "(plus a b)" rather than the evaluated result a + b. > float_narrow then does its own evaluation of a and b and performs a > fused addition and narrowing on the result. RTL isn't Lisp. RTL doesn't have quotations. RTL doesn't have *evaluation*. RTL is just a data structure that describes your program instructions. A large part of what means what is system-specific. Rounding of floating point is not defined, for example. And yes, various parts of GCC can manipulate RTL, doing substitution and algebraic simplication and whatnot. All within the rules of RTL. And that means nothing ever can "pass" a float_narrow, because there are no rules that allow it to. > No other rtx rvalue works like this. A lot of unspecs are used like this, for example. > Using float_narrow would also be inconsistent with the way we handle > saturating arithmetic. There we use US_PLUS and SS_PLUS rtx codes for > unsigned and signed saturating plus respectively, rather than: > > (unsigned_sat '(plus a b)) > (signed_sat '(plus a b)) > > Using dedicated codes might seem clunky. But it's simple, safe, and fits > the existing model without special cases. :-) And you need many many more RTX codes, which you will not handle in almost all places, because there are too many. I agree this construct is not as nice as could be hoped for. I don't agree that 60 new RTX codes is an acceptable solution (or that that will ever really work out, even). It would be nice if somehow we could make a variant of RTL codes, so that we could have nice and simple code that applies to all variants of some code. Not sure how that would work out. Maybe we don't have to do this very generically, how often will we need this anyway? I have three examples so far: 1) Saturating arithmetic; 2) This float_narrow thing; 3) Ordered compares, that is, fp compares that set an exception on NaNs. Something that works for all three would be nice! Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-19 13:07 ` Segher Boessenkool @ 2019-08-20 7:41 ` Richard Sandiford 2019-08-20 12:11 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Richard Sandiford @ 2019-08-20 7:41 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph Tejas: given the controversy, I agree unspecs sound like a good approach for now. We can always go back and add the rtx codes later once there's agreement on what they should look like. Segher Boessenkool <segher@kernel.crashing.org> writes: > On Sat, Aug 17, 2019 at 09:21:00AM +0100, Richard Sandiford wrote: >> Tejas Joshi <tejasjoshi9673@gmail.com> writes: >> >> It's just a different name, nothing more, nothing less. Because it is >> >> a different name it can not be accidentally generated from actual >> >> truncations. >> > >> > I have introduced float_narrow but I could not find appropriate places >> > to generate it for a call to fadd instead it to generate a CALL. I >> > used GDB to set breakpoints which hit fold_rtx and cse_insn but I got >> > confused with the rtx codes and passes which generate respective RTL. >> > It should not be similar to FLOAT_TRUNCATE if we want to avoid it >> > generating for actual truncations? >> >> Please don't do it this way. The whole point of the work is that this >> is a single operation that cannot be modelled as a post-processing of >> a normal double addition result. It's a single operation at the source >> level, a single IFN, a single optab, and a single instruction. Splitting >> it apart into two operations for rtl only, and making it look in rtl terms >> like a post-processing of a normal addition result, seems like it's going >> to come back to bite us. >> >> In lisp terms we're saying that the operand to the float_narrow is >> implicitly quoted: >> >> (float_narrow:m '(plus:n a b)) >> >> so that when float_narrow is evaluated, the argument is the unevaluated >> rtl expression "(plus a b)" rather than the evaluated result a + b. >> float_narrow then does its own evaluation of a and b and performs a >> fused addition and narrowing on the result. > > RTL isn't Lisp. Right. But it's heavily influenced by lisp, so I was using quoting to explain why I don't think the code is a good fit. > RTL doesn't have quotations. I'd like to keep it that way for rvalues :-) > RTL doesn't have *evaluation*. But we can (and do) evaluate some rtxes without target help. > RTL is just a data structure that describes your program instructions. > A large part of what means what is system-specific. Rounding of floating > point is not defined, for example. Some of the semantics are target-specific, sure, with some of the details controlled by hooks/macros and some left undefined. But that's true to a lesser extent of gimple too. > And yes, various parts of GCC can manipulate RTL, doing substitution and > algebraic simplication and whatnot. All within the rules of RTL. And > that means nothing ever can "pass" a float_narrow, because there are no > rules that allow it to. You mean create a new float_narrow out of thin air, with no justification? Sure, but I don't think that was ever the issue. Or do you mean that target-independent code couldn't just use GET_RTX_FORMAT to recurse on a float_narrow without first noting that it's a float_narrow (and thus special)? If so, then yeah, I agree that they wouldn't be allowed to do that, which is essentially why I think it's a bad idea. >> No other rtx rvalue works like this. > > A lot of unspecs are used like this, for example. Unspecs don't have a quoting effect though. I agree it's common to match things like: (unspec:m [(plus:m ...)] UNSPEC_FOO) But that doesn't have any quoting effect on the plus. If the optimisers see: (unspec:m [(plus:m x y)] UNSPEC_FOO) and know what x and y are, they can certainly fold this to: (unspec:m [(const_int N)] UNSPEC_FOO) The result might not match an instruction, but it's still a valid rtx and a valid thing to try. A target would be in real trouble if it allowed both, but with different semantics even for N==x+y. (In constrast, having different semantics for N==x+y would be valid if there was a quoting effect.) Likewise if the optimisers see: (set (reg:m z) (plus:m x y)) ...(unspec:m [(plus:m x y)] UNSPEC_FOO)... they can create and try to match: ...(unspec:m [(reg:m z)] UNSPEC_FOO)... Again, it might not match an instruction, but it's still a valid rtx and a valid thing to try. In other words, everything going into recog has to be valid rtx. It just might not be a valid instruction. And the .md files can't make the target-independent code treat an operation as quoted. All they can do is refuse to match simplified forms. This is similar to things like (from mips.md): (define_insn_and_split "<su>mulsi3_highpart_internal" [(set (match_operand:SI 0 "register_operand" "=d") (truncate:SI (lshiftrt:DI (mult:DI (any_extend:DI (match_operand:SI 1 "register_operand" "d")) (any_extend:DI (match_operand:SI 2 "register_operand" "d"))) (const_int 32)))) (clobber (match_scratch:SI 3 "=l"))] IIRC, the port has no highpart operation other than multiplication. But there's again no quoting effect on the operands to the mult, lshiftrt or truncate here, so if the optimisers knew that op2==2, they could transform: [(set op0 (truncate:SI (lshiftrt:DI (mult:DI (any_extend:DI op1) (any_extend:DI op2)) (const_int 32)))) (clobber (scratch:SI))] to: [(set op0 (truncate:SI (lshiftrt:DI (plus:DI (any_extend:DI op1) (any_extend:DI op1)) (const_int 32)))) (clobber (scratch:SI))] Again, the instruction won't match, but it's still a valid rtx and a valid transformation to try. Going back to the unspec example: if at some point we added a target hook for evaluating unspecs in the same way that we evaluate basic arithmetic (might be useful!), the handling of UNSPEC_FOO wouldn't be able to assert that the plus or whatever is there. At best it could punt evaluation when the plus isn't there, at the cost of losing potentially useful optimisation. (But to me, having to do that smacks of a badly-designed unspec. E.g. we use unspec wrappers around operations a lot in the SVE port, but it would still be possible to evaluate the unspec given fully-evaluated operands.) float_narrow is different in that the plus (or whatever operation it's quoting) has to be kept in-place rather than folded away, otherwise the rtx itself is malformed and could trigger an ICE, just like the zero_extend of a const_int that I mentioned. >> Using float_narrow would also be inconsistent with the way we handle >> saturating arithmetic. There we use US_PLUS and SS_PLUS rtx codes for >> unsigned and signed saturating plus respectively, rather than: >> >> (unsigned_sat '(plus a b)) >> (signed_sat '(plus a b)) >> >> Using dedicated codes might seem clunky. But it's simple, safe, and fits >> the existing model without special cases. :-) > > And you need many many more RTX codes, which you will not handle in > almost all places, because there are too many. > > > I agree this construct is not as nice as could be hoped for. I don't > agree that 60 new RTX codes is an acceptable solution (or that that will > ever really work out, even). 60 sounds a high number. :-) Do we really have that many rtx codes with a floating-point rounding effect? Whatever the number is, we'll still be listing them individually for built-in enumerations, internal_fn, and (I assume) optabs. But maybe after a certain point it does become too unwieldly for rtx codes. We have to keep it within 16 bits at least... > It would be nice if somehow we could make a variant of RTL codes, so that > we could have nice and simple code that applies to all variants of some > code. Not sure how that would work out. Maybe we don't have to do this > very generically, how often will we need this anyway? > > I have three examples so far: > 1) Saturating arithmetic; > 2) This float_narrow thing; > 3) Ordered compares, that is, fp compares that set an exception on NaNs. > > Something that works for all three would be nice! Yeah, agree that sounds good. Maybe we could bundle the code with some flags. Storage-wise, there should be room for that in the u2 field. But there might still be cases in which it's useful to view the code+flags as a combined supercode, e.g. for switch statements. Thanks, Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-20 7:41 ` Richard Sandiford @ 2019-08-20 12:11 ` Segher Boessenkool 2019-08-20 12:59 ` Richard Sandiford 2019-08-20 16:04 ` Joseph Myers 0 siblings, 2 replies; 63+ messages in thread From: Segher Boessenkool @ 2019-08-20 12:11 UTC (permalink / raw) To: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph, richard.sandiford On Tue, Aug 20, 2019 at 08:41:29AM +0100, Richard Sandiford wrote: > Tejas: given the controversy, I agree unspecs sound like a good approach > for now. We can always go back and add the rtx codes later once there's > agreement on what they should look like. Yup. > Segher Boessenkool <segher@kernel.crashing.org> writes: > > On Sat, Aug 17, 2019 at 09:21:00AM +0100, Richard Sandiford wrote: > >> In lisp terms we're saying that the operand to the float_narrow is > >> implicitly quoted: > >> > >> (float_narrow:m '(plus:n a b)) > >> > >> so that when float_narrow is evaluated, the argument is the unevaluated > >> rtl expression "(plus a b)" rather than the evaluated result a + b. > >> float_narrow then does its own evaluation of a and b and performs a > >> fused addition and narrowing on the result. > > > > RTL isn't Lisp. > > Right. But it's heavily influenced by lisp, so I was using quoting to > explain why I don't think the code is a good fit. > > > RTL doesn't have quotations. > > I'd like to keep it that way for rvalues :-) > > > RTL doesn't have *evaluation*. > > But we can (and do) evaluate some rtxes without target help. We do? Other than constant folding; there is nothing to evaluate in constants anyway. Or do you mean simplification? There are rules what kind of transformations are allowed. Many unwritten of course :-/ > > RTL is just a data structure that describes your program instructions. > > A large part of what means what is system-specific. Rounding of floating > > point is not defined, for example. > > Some of the semantics are target-specific, sure, with some of the details > controlled by hooks/macros and some left undefined. But that's true to a > lesser extent of gimple too. Yes, gimple and RTL are not very different, at the core of things. > > And yes, various parts of GCC can manipulate RTL, doing substitution and > > algebraic simplication and whatnot. All within the rules of RTL. And > > that means nothing ever can "pass" a float_narrow, because there are no > > rules that allow it to. > > You mean create a new float_narrow out of thin air, with no justification? > Sure, but I don't think that was ever the issue. No. I mean that if you have ... (float_narrow:M (x:N)) it will always stay in that form, with just x changed. Nothing can change the float_narrow. > Or do you mean that target-independent code couldn't just use GET_RTX_FORMAT > to recurse on a float_narrow without first noting that it's a float_narrow > (and thus special)? If so, then yeah, I agree that they wouldn't be > allowed to do that, which is essentially why I think it's a bad idea. No, they can do that just fine. > >> No other rtx rvalue works like this. > > > > A lot of unspecs are used like this, for example. > > Unspecs don't have a quoting effect though. I agree it's common to match > things like: > > (unspec:m [(plus:m ...)] UNSPEC_FOO) > > But that doesn't have any quoting effect on the plus. If the optimisers see: > > (unspec:m [(plus:m x y)] UNSPEC_FOO) > > and know what x and y are, they can certainly fold this to: > > (unspec:m [(const_int N)] UNSPEC_FOO) An the exact same is true for the proposed float_narrow! The compiler should not do this if FP_CONTRACT is off, which it has to be for fadd etc. too make sense at all, to not be optimised to a plain add. > This is similar to things like (from mips.md): > > (define_insn_and_split "<su>mulsi3_highpart_internal" Yeah, I did that for rs6000. Lots and lots and lots of special cases :-P (RTL represents things differently for BE and LE, and there are the various sizes of operation, both with and without 64-bit insns). > [(set (match_operand:SI 0 "register_operand" "=d") > (truncate:SI > (lshiftrt:DI (this is optimised to a subreg, in many cases, for example). > Going back to the unspec example: if at some point we added a target > hook for evaluating unspecs in the same way that we evaluate basic > arithmetic (might be useful!), the handling of UNSPEC_FOO wouldn't be > able to assert that the plus or whatever is there. At best it could > punt evaluation when the plus isn't there, at the cost of losing > potentially useful optimisation. Yes. And as far as I can see float_narrow will still work. > float_narrow is different in that the plus (or whatever operation > it's quoting) has to be kept in-place rather than folded away, > otherwise the rtx itself is malformed and could trigger an ICE, > just like the zero_extend of a const_int that I mentioned. Yes, it will not pass recog. Structurally it is just hunky-dory though. > > And you need many many more RTX codes, which you will not handle in > > almost all places, because there are too many. > > > > > > I agree this construct is not as nice as could be hoped for. I don't > > agree that 60 new RTX codes is an acceptable solution (or that that will > > ever really work out, even). > > 60 sounds a high number. :-) Do we really have that many rtx codes with > a floating-point rounding effect? It was meant to sound high, heh. If things need a variant A, and also a variant B, then before you know it there is a variant A+B as well, and you have unbridled growth. plus minus neg mult div mod smin smax abs sqrt fma I think? And let's hope we never ever have to do saturating versions of FP :-) > Whatever the number is, we'll still be listing them individually for > built-in enumerations, internal_fn, and (I assume) optabs. But maybe > after a certain point it does become too unwieldly for rtx codes. > We have to keep it within 16 bits at least... My main concern is all the (simplification) code that parses RTL. All of that will have to handle all variant versions as well. > > It would be nice if somehow we could make a variant of RTL codes, so that > > we could have nice and simple code that applies to all variants of some > > code. Not sure how that would work out. Maybe we don't have to do this > > very generically, how often will we need this anyway? > > > > I have three examples so far: > > 1) Saturating arithmetic; > > 2) This float_narrow thing; > > 3) Ordered compares, that is, fp compares that set an exception on NaNs. > > > > Something that works for all three would be nice! > > Yeah, agree that sounds good. Maybe we could bundle the code with some > flags. Storage-wise, there should be room for that in the u2 field. > > But there might still be cases in which it's useful to view the code+flags > as a combined supercode, e.g. for switch statements. Yeah... Whether to make "code" or "code+flags" the more usual version is the biggest design question then. Oh, and what the rest of the interface to this looks like ;-) Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-20 12:11 ` Segher Boessenkool @ 2019-08-20 12:59 ` Richard Sandiford 2019-08-20 13:46 ` Segher Boessenkool 2019-08-20 16:04 ` Joseph Myers 1 sibling, 1 reply; 63+ messages in thread From: Richard Sandiford @ 2019-08-20 12:59 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph Segher Boessenkool <segher@kernel.crashing.org> writes: >> > And yes, various parts of GCC can manipulate RTL, doing substitution and >> > algebraic simplication and whatnot. All within the rules of RTL. And >> > that means nothing ever can "pass" a float_narrow, because there are no >> > rules that allow it to. >> >> You mean create a new float_narrow out of thin air, with no justification? >> Sure, but I don't think that was ever the issue. > > No. I mean that if you have > > ... (float_narrow:M (x:N)) > > it will always stay in that form, with just x changed. Nothing can > change the float_narrow. OK, I guessed wrong :-) But it was the change to x that IMO was the problem. I wasn't worried about code changing the float_narrow itself to random other stuff. >> [(set (match_operand:SI 0 "register_operand" "=d") >> (truncate:SI >> (lshiftrt:DI > > (this is optimised to a subreg, in many cases, for example). Right. MIPS avoids that one thanks to TARGET_TRULY_NOOP_TRUNCATION. >> float_narrow is different in that the plus (or whatever operation >> it's quoting) has to be kept in-place rather than folded away, >> otherwise the rtx itself is malformed and could trigger an ICE, >> just like the zero_extend of a const_int that I mentioned. > > Yes, it will not pass recog. Structurally it is just hunky-dory though. So maybe that's the main point of difference. We're introducing float_narrow to modify another rtx operation rather than to operate on an rtx value. So to me it makes no sense to say that: (float_narrow:SF (const_double:DF X)) (float_narrow:SF (reg:DF X)) (float_narrow:SF (mem:DF X)) are well-formed rtxes and just happen not to match any instructions. Without an operation to modify they're meaningless on their own terms, regardless of what the target says about it. Just like: (unsigned_saturate:QI (reg:QI X)) would be meaningless if we modelled saturation this way. There's no way you can go from a normal unsaturated result to the equivalent saturated result without knowing which operation was performed, and on which operands. This isn't a choice for targets to make even in principle, just like it isn't for my favourite (zero_extend:m (const_int -1)) example. >> > And you need many many more RTX codes, which you will not handle in >> > almost all places, because there are too many. >> > >> > >> > I agree this construct is not as nice as could be hoped for. I don't >> > agree that 60 new RTX codes is an acceptable solution (or that that will >> > ever really work out, even). >> >> 60 sounds a high number. :-) Do we really have that many rtx codes with >> a floating-point rounding effect? > > It was meant to sound high, heh. If things need a variant A, and also a > variant B, then before you know it there is a variant A+B as well, and > you have unbridled growth. > > plus minus neg mult div mod smin smax abs sqrt fma I think? And let's > hope we never ever have to do saturating versions of FP :-) neg, abs, smin and smax shouldn't do rounding AFAIK. But yeah, the rest look plausible. That is only 7 though :-) Unless I counted wrong. Not that I'm saying I like adding codes for each one either. It just doesn't seem that bad (and definitely better than float_narrow IMO). >> Whatever the number is, we'll still be listing them individually for >> built-in enumerations, internal_fn, and (I assume) optabs. But maybe >> after a certain point it does become too unwieldly for rtx codes. >> We have to keep it within 16 bits at least... > > My main concern is all the (simplification) code that parses RTL. All of > that will have to handle all variant versions as well. True, but we'd have to err on the side of caution whatever happens. Not all existing PLUS simplifications necessarily apply as-is. Thanks, Richard >> > It would be nice if somehow we could make a variant of RTL codes, so that >> > we could have nice and simple code that applies to all variants of some >> > code. Not sure how that would work out. Maybe we don't have to do this >> > very generically, how often will we need this anyway? >> > >> > I have three examples so far: >> > 1) Saturating arithmetic; >> > 2) This float_narrow thing; >> > 3) Ordered compares, that is, fp compares that set an exception on NaNs. >> > >> > Something that works for all three would be nice! >> >> Yeah, agree that sounds good. Maybe we could bundle the code with some >> flags. Storage-wise, there should be room for that in the u2 field. >> >> But there might still be cases in which it's useful to view the code+flags >> as a combined supercode, e.g. for switch statements. > > Yeah... Whether to make "code" or "code+flags" the more usual version is > the biggest design question then. Oh, and what the rest of the interface > to this looks like ;-) > > > Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-20 12:59 ` Richard Sandiford @ 2019-08-20 13:46 ` Segher Boessenkool 2019-08-20 14:43 ` Richard Sandiford 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-20 13:46 UTC (permalink / raw) To: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph, richard.sandiford On Tue, Aug 20, 2019 at 01:59:06PM +0100, Richard Sandiford wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > >> [(set (match_operand:SI 0 "register_operand" "=d") > >> (truncate:SI > >> (lshiftrt:DI > > > > (this is optimised to a subreg, in many cases, for example). > > Right. MIPS avoids that one thanks to TARGET_TRULY_NOOP_TRUNCATION. Trying 10 -> 18: 10: r200:TI=zero_extend(r204:DI)*zero_extend(r205:DI) REG_DEAD r205:DI REG_DEAD r204:DI 18: $2:DI=r200:TI#0 REG_DEAD r200:TI Failed to match this instruction: (set (reg/i:DI 2 $2) (subreg:DI (mult:TI (zero_extend:TI (reg:DI 204)) (zero_extend:TI (reg:DI 205))) 0)) I'm afraid not. This was mips64-linux-gcc -Wall -W -O2 -S mulh.c -mips64 -mabi=64 -fdump-rtl-combine-all on === typedef unsigned long S; typedef unsigned __int128 D; S mulh(S a, S b) { return (D)a*b >> (8*sizeof(S)); } === > >> float_narrow is different in that the plus (or whatever operation > >> it's quoting) has to be kept in-place rather than folded away, > >> otherwise the rtx itself is malformed and could trigger an ICE, > >> just like the zero_extend of a const_int that I mentioned. > > > > Yes, it will not pass recog. Structurally it is just hunky-dory though. > > So maybe that's the main point of difference. We're introducing > float_narrow to modify another rtx operation rather than to operate > on an rtx value. I wouldn't say it "operates" on anything. A float_narrow rtx means the thing inside it does single-rounding to SP float. And it is just notation: RTL itself knows *nothing* about float rounding, and because of the way this is structured, nothing can change anything about the float_narrow. And yes, it is icky. But it is sound, as far as I can see. > >> Whatever the number is, we'll still be listing them individually for > >> built-in enumerations, internal_fn, and (I assume) optabs. But maybe > >> after a certain point it does become too unwieldly for rtx codes. > >> We have to keep it within 16 bits at least... > > > > My main concern is all the (simplification) code that parses RTL. All of > > that will have to handle all variant versions as well. > > True, but we'd have to err on the side of caution whatever happens. Yes. > Not all existing PLUS simplifications necessarily apply as-is. Yes. Everything will have to be checked. But not everything will have to be modified, if we pick the defaults carefully. I hope. :-) Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-20 13:46 ` Segher Boessenkool @ 2019-08-20 14:43 ` Richard Sandiford 2019-08-20 15:12 ` Richard Sandiford 2019-08-20 19:42 ` Segher Boessenkool 0 siblings, 2 replies; 63+ messages in thread From: Richard Sandiford @ 2019-08-20 14:43 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph Segher Boessenkool <segher@kernel.crashing.org> writes: > On Tue, Aug 20, 2019 at 01:59:06PM +0100, Richard Sandiford wrote: >> Segher Boessenkool <segher@kernel.crashing.org> writes: >> >> [(set (match_operand:SI 0 "register_operand" "=d") >> >> (truncate:SI >> >> (lshiftrt:DI >> > >> > (this is optimised to a subreg, in many cases, for example). >> >> Right. MIPS avoids that one thanks to TARGET_TRULY_NOOP_TRUNCATION. > > Trying 10 -> 18: > 10: r200:TI=zero_extend(r204:DI)*zero_extend(r205:DI) > REG_DEAD r205:DI > REG_DEAD r204:DI > 18: $2:DI=r200:TI#0 > REG_DEAD r200:TI > Failed to match this instruction: > (set (reg/i:DI 2 $2) > (subreg:DI (mult:TI (zero_extend:TI (reg:DI 204)) > (zero_extend:TI (reg:DI 205))) 0)) > > I'm afraid not. That's TI->DI though, whereas the pattern above is DI->SI. The modes matter :-) There'd also need to be a shift to match a highpart pattern. >> >> float_narrow is different in that the plus (or whatever operation >> >> it's quoting) has to be kept in-place rather than folded away, >> >> otherwise the rtx itself is malformed and could trigger an ICE, >> >> just like the zero_extend of a const_int that I mentioned. >> > >> > Yes, it will not pass recog. Structurally it is just hunky-dory though. >> >> So maybe that's the main point of difference. We're introducing >> float_narrow to modify another rtx operation rather than to operate >> on an rtx value. > > I wouldn't say it "operates" on anything. A float_narrow rtx means the > thing inside it does single-rounding to SP float. And it is just > notation: RTL itself knows *nothing* about float rounding, and because > of the way this is structured, nothing can change anything about the > float_narrow. I wouldn't say it knows nothing about rounding. It doesn't know what the runtime rounding mode is, but that isn't the same thing. (Just like not knowing what (mem:SI (sp)) contains isn't the same thing as not knowing anything about stack memory.) Besides, how much depends on target-independent code not knowing what the rounding mode is? Do you think float_narrow would still make sense even if more information was available at compile time (e.g. if a plus could be annotated with a specific rounding mode)? Or is not knowing the rounding mode a fundamental part of float_narrow being OK for you? > And yes, it is icky. But it is sound, as far as I can see. I really disagree that it's sound, but no point me saying why again :-) (It could certainly be made to work with sufficient hacks of course, like pretty much anything could, but I don't think that's the same thing.) Thanks, Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-20 14:43 ` Richard Sandiford @ 2019-08-20 15:12 ` Richard Sandiford 2019-08-20 19:42 ` Segher Boessenkool 1 sibling, 0 replies; 63+ messages in thread From: Richard Sandiford @ 2019-08-20 15:12 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph Richard Sandiford <richard.sandiford@arm.com> writes: >> And yes, it is icky. But it is sound, as far as I can see. > > I really disagree that it's sound, but no point me saying why again :-) > > (It could certainly be made to work with sufficient hacks of course, > like pretty much anything could, but I don't think that's the same thing.) For an example, we have: /* Maybe simplify x + 0 to x. The two expressions are equivalent when x is NaN, infinite, or finite and nonzero. They aren't when x is -0 and the rounding mode is not towards -infinity, since (-0) + 0 is then 0. */ if (!HONOR_SIGNED_ZEROS (mode) && trueop1 == CONST0_RTX (mode)) return op0; I think it's plausible that people will care about accurate rounding but not signed zeroes. In that mode we could have: (set (reg:DF r3) (plus:DF (reg:DF r1) (reg:DF r2))) (set (reg:DF r4) (const_double:DF 0.0)) (set (reg:SF r5) (float_narrow:SF (plus:DF (reg:DF r3) (reg:DF r4)))) Then combine through normal structural simplification could (with the rule above) fold all this down to: (set (reg:SF r5) (float_narrow:SF (plus:DF (reg:DF r1) (reg:DF r2)))) where the truncation is now fused with r1+r2 instead of r3+r4. We would have to have to add specific checks to avoid this happening, it wouldn't fall out naturally from structural PoV. Thanks, Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-20 14:43 ` Richard Sandiford 2019-08-20 15:12 ` Richard Sandiford @ 2019-08-20 19:42 ` Segher Boessenkool 2019-08-21 17:20 ` Tejas Joshi 1 sibling, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-20 19:42 UTC (permalink / raw) To: Tejas Joshi, gcc, Martin Jambor, hubicka, joseph, richard.sandiford On Tue, Aug 20, 2019 at 03:43:43PM +0100, Richard Sandiford wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > > On Tue, Aug 20, 2019 at 01:59:06PM +0100, Richard Sandiford wrote: > >> Segher Boessenkool <segher@kernel.crashing.org> writes: > >> >> [(set (match_operand:SI 0 "register_operand" "=d") > >> >> (truncate:SI > >> >> (lshiftrt:DI > >> > > >> > (this is optimised to a subreg, in many cases, for example). > >> > >> Right. MIPS avoids that one thanks to TARGET_TRULY_NOOP_TRUNCATION. > > > > Trying 10 -> 18: > > 10: r200:TI=zero_extend(r204:DI)*zero_extend(r205:DI) > > REG_DEAD r205:DI > > REG_DEAD r204:DI > > 18: $2:DI=r200:TI#0 > > REG_DEAD r200:TI > > Failed to match this instruction: > > (set (reg/i:DI 2 $2) > > (subreg:DI (mult:TI (zero_extend:TI (reg:DI 204)) > > (zero_extend:TI (reg:DI 205))) 0)) > > > > I'm afraid not. > > That's TI->DI though, whereas the pattern above is DI->SI. The modes > matter :-) There'd also need to be a shift to match a highpart pattern. It's the same for 32-bit: mips-linux-gcc -Wall -W -O2 -S mulh.c -mips32 -mabi=32 (I hope these options are reasonable? I don't know MIPS well at all). Trying 12 -> 20: 12: r200:DI=zero_extend(r204:SI)*zero_extend(r205:SI) REG_DEAD r205:SI REG_DEAD r204:SI 20: $2:SI=r200:DI#0 REG_DEAD r200:DI Failed to match this instruction: (set (reg/i:SI 2 $2) (subreg:SI (mult:DI (zero_extend:DI (reg:SI 204)) (zero_extend:DI (reg:SI 205))) 0)) The point is that this is the form that this insn is simplified to. If that form is not recognised by your backend, various optimisation opportunities are missed. > I wouldn't say it knows nothing about rounding. It doesn't know > what the runtime rounding mode is, but that isn't the same thing. > (Just like not knowing what (mem:SI (sp)) contains isn't the same > thing as not knowing anything about stack memory.) Does it even know if the rounding mode is one of the IEEE FP rounding modes? Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-20 19:42 ` Segher Boessenkool @ 2019-08-21 17:20 ` Tejas Joshi 2019-08-21 18:28 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-08-21 17:20 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph Hello. I have the following code which uses unspec but I am really missing something here. Does unspec not work encapsulating plus? Or I have some more places to make changes to? (define_insn "add_truncdfsf3" [(set (match_operand:SF 0 "gpc_reg_operand" "=<Ff>,wa") (unspec:SF [(plus:DF (match_operand:DF 1 "gpc_reg_operand" "%<Ff>,wa") (match_operand:DF 2 "gpc_reg_operand" "<Ff>,wa"))] UNSPEC_ADD_TRUNCATE))] "TARGET_HARD_FLOAT" "@ fadds %0,%1,%2 xsaddsp %x0,%x1,%x2" [(set_attr "type" "fp")]) and an UNSPEC_ADD_TRUNCATE in unspec enum. Thanks, Tejas On Wed, 21 Aug 2019 at 01:12, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Tue, Aug 20, 2019 at 03:43:43PM +0100, Richard Sandiford wrote: > > Segher Boessenkool <segher@kernel.crashing.org> writes: > > > On Tue, Aug 20, 2019 at 01:59:06PM +0100, Richard Sandiford wrote: > > >> Segher Boessenkool <segher@kernel.crashing.org> writes: > > >> >> [(set (match_operand:SI 0 "register_operand" "=d") > > >> >> (truncate:SI > > >> >> (lshiftrt:DI > > >> > > > >> > (this is optimised to a subreg, in many cases, for example). > > >> > > >> Right. MIPS avoids that one thanks to TARGET_TRULY_NOOP_TRUNCATION. > > > > > > Trying 10 -> 18: > > > 10: r200:TI=zero_extend(r204:DI)*zero_extend(r205:DI) > > > REG_DEAD r205:DI > > > REG_DEAD r204:DI > > > 18: $2:DI=r200:TI#0 > > > REG_DEAD r200:TI > > > Failed to match this instruction: > > > (set (reg/i:DI 2 $2) > > > (subreg:DI (mult:TI (zero_extend:TI (reg:DI 204)) > > > (zero_extend:TI (reg:DI 205))) 0)) > > > > > > I'm afraid not. > > > > That's TI->DI though, whereas the pattern above is DI->SI. The modes > > matter :-) There'd also need to be a shift to match a highpart pattern. > > It's the same for 32-bit: > > mips-linux-gcc -Wall -W -O2 -S mulh.c -mips32 -mabi=32 > (I hope these options are reasonable? I don't know MIPS well at all). > > Trying 12 -> 20: > 12: r200:DI=zero_extend(r204:SI)*zero_extend(r205:SI) > REG_DEAD r205:SI > REG_DEAD r204:SI > 20: $2:SI=r200:DI#0 > REG_DEAD r200:DI > Failed to match this instruction: > (set (reg/i:SI 2 $2) > (subreg:SI (mult:DI (zero_extend:DI (reg:SI 204)) > (zero_extend:DI (reg:SI 205))) 0)) > > The point is that this is the form that this insn is simplified to. If > that form is not recognised by your backend, various optimisation > opportunities are missed. > > > I wouldn't say it knows nothing about rounding. It doesn't know > > what the runtime rounding mode is, but that isn't the same thing. > > (Just like not knowing what (mem:SI (sp)) contains isn't the same > > thing as not knowing anything about stack memory.) > > Does it even know if the rounding mode is one of the IEEE FP rounding > modes? > > > Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-21 17:20 ` Tejas Joshi @ 2019-08-21 18:28 ` Segher Boessenkool 2019-08-21 19:17 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-21 18:28 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph Hi Tejas, On Wed, Aug 21, 2019 at 10:56:51PM +0530, Tejas Joshi wrote: > I have the following code which uses unspec but I am really missing > something here. Does unspec not work encapsulating plus? Or I have > some more places to make changes to? > > (define_insn "add_truncdfsf3" > [(set (match_operand:SF 0 "gpc_reg_operand" "=<Ff>,wa") > (unspec:SF > [(plus:DF (match_operand:DF 1 "gpc_reg_operand" "%<Ff>,wa") > (match_operand:DF 2 "gpc_reg_operand" "<Ff>,wa"))] > UNSPEC_ADD_TRUNCATE))] > "TARGET_HARD_FLOAT" > "@ > fadds %0,%1,%2 > xsaddsp %x0,%x1,%x2" > [(set_attr "type" "fp")]) This does almost exactly the same as what the proposed float_narrow would do. Instead, write it as (define_insn "add_truncdfsf3" [(set (match_operand:SF 0 "gpc_reg_operand" "=<Ff>,wa") (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%<Ff>,wa") (match_operand:DF 2 "gpc_reg_operand" "<Ff>,wa")] UNSPEC_ADD_TRUNCATE)] "TARGET_HARD_FLOAT" "@ fadds %0,%1,%2 xsaddsp %x0,%x1,%x2" [(set_attr "type" "fp") (set_attr "isa" "*,p8v")]) (note the "isa" attribute) to prevent any folding etc. from happening to it. > and an UNSPEC_ADD_TRUNCATE in unspec enum. UNSPEC_ADD_NARROWING? Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-21 18:28 ` Segher Boessenkool @ 2019-08-21 19:17 ` Segher Boessenkool 2019-08-22 3:33 ` Tejas Joshi 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-21 19:17 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph On Wed, Aug 21, 2019 at 01:28:52PM -0500, Segher Boessenkool wrote: > (define_insn "add_truncdfsf3" > [(set (match_operand:SF 0 "gpc_reg_operand" "=<Ff>,wa") > (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%<Ff>,wa") > (match_operand:DF 2 "gpc_reg_operand" "<Ff>,wa")] > UNSPEC_ADD_TRUNCATE)] > "TARGET_HARD_FLOAT" > "@ > fadds %0,%1,%2 > xsaddsp %x0,%x1,%x2" > [(set_attr "type" "fp") > (set_attr "isa" "*,p8v")]) And not <Ff>... f, d, d respectively (f for SF, d for DF). Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-21 19:17 ` Segher Boessenkool @ 2019-08-22 3:33 ` Tejas Joshi 2019-08-22 6:25 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-08-22 3:33 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph > This does almost exactly the same as what the proposed float_narrow > would do. Instead, write it as > > (define_insn "add_truncdfsf3" > [(set (match_operand:SF 0 "gpc_reg_operand" "=<Ff>,wa") > (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%<Ff>,wa") > (match_operand:DF 2 "gpc_reg_operand" "<Ff>,wa")] > UNSPEC_ADD_TRUNCATE)] > "TARGET_HARD_FLOAT" > "@ > fadds %0,%1,%2 > xsaddsp %x0,%x1,%x2" > [(set_attr "type" "fp") > (set_attr "isa" "*,p8v")]) Yes, I tried basically every combination I could think of, just not with the "isa attr". Now, I have the following code and it is still seems not to be working. Am I missing any options to pass? (define_insn "add_truncdfsf3" [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%d,wa") (match_operand:DF 2 "gpc_reg_operand" "d,wa")] UNSPEC_ADD_NARROWING))] "TARGET_HARD_FLOAT" "@ fadds %0,%1,%2 xsaddsp %x0,%x1,%x2" [(set_attr "type" "fp") (set_attr "isa" "*,p8v")]) with the code, I pass -O2 foo.c : float foo (double x, double y) { return __builtin_fadd (x, y); } Thanks, Tejas On Thu, 22 Aug 2019 at 00:47, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Wed, Aug 21, 2019 at 01:28:52PM -0500, Segher Boessenkool wrote: > > (define_insn "add_truncdfsf3" > > [(set (match_operand:SF 0 "gpc_reg_operand" "=<Ff>,wa") > > (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%<Ff>,wa") > > (match_operand:DF 2 "gpc_reg_operand" "<Ff>,wa")] > > UNSPEC_ADD_TRUNCATE)] > > "TARGET_HARD_FLOAT" > > "@ > > fadds %0,%1,%2 > > xsaddsp %x0,%x1,%x2" > > [(set_attr "type" "fp") > > (set_attr "isa" "*,p8v")]) > > And not <Ff>... f, d, d respectively (f for SF, d for DF). > > > Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-22 3:33 ` Tejas Joshi @ 2019-08-22 6:25 ` Segher Boessenkool 2019-08-22 7:57 ` Tejas Joshi 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-22 6:25 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph Hi Tejas, [ Please do not top-post. ] On Thu, Aug 22, 2019 at 09:09:37AM +0530, Tejas Joshi wrote: > Yes, I tried basically every combination I could think of, just not > with the "isa attr". Now, I have the following code and it is still > seems not to be working. Am I missing any options to pass? > > (define_insn "add_truncdfsf3" > [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") > (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%d,wa") > (match_operand:DF 2 "gpc_reg_operand" "d,wa")] > UNSPEC_ADD_NARROWING))] > "TARGET_HARD_FLOAT" > "@ > fadds %0,%1,%2 > xsaddsp %x0,%x1,%x2" > [(set_attr "type" "fp") > (set_attr "isa" "*,p8v")]) > > with the code, I pass -O2 foo.c : > float > foo (double x, double y) > { > return __builtin_fadd (x, y); > } What happens then? "It does not work" is very very vague. At least it seems the compiler does build now? Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-22 6:25 ` Segher Boessenkool @ 2019-08-22 7:57 ` Tejas Joshi 2019-08-22 9:56 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-08-22 7:57 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, joseph, segher > What happens then? "It does not work" is very very vague. At least it > seems the compiler does build now? Oh, compiler builds but instruction is still "bl fadd". It should be "fadds" right? On Thu, 22 Aug 2019 at 11:55, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi Tejas, > > [ Please do not top-post. ] > > On Thu, Aug 22, 2019 at 09:09:37AM +0530, Tejas Joshi wrote: > > Yes, I tried basically every combination I could think of, just not > > with the "isa attr". Now, I have the following code and it is still > > seems not to be working. Am I missing any options to pass? > > > > (define_insn "add_truncdfsf3" > > [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") > > (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%d,wa") > > (match_operand:DF 2 "gpc_reg_operand" "d,wa")] > > UNSPEC_ADD_NARROWING))] > > "TARGET_HARD_FLOAT" > > "@ > > fadds %0,%1,%2 > > xsaddsp %x0,%x1,%x2" > > [(set_attr "type" "fp") > > (set_attr "isa" "*,p8v")]) > > > > with the code, I pass -O2 foo.c : > > float > > foo (double x, double y) > > { > > return __builtin_fadd (x, y); > > } > > What happens then? "It does not work" is very very vague. At least it > seems the compiler does build now? > > > Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-22 7:57 ` Tejas Joshi @ 2019-08-22 9:56 ` Segher Boessenkool 2019-08-23 17:17 ` Martin Jambor 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-22 9:56 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph > > Hi Tejas, > > > > [ Please do not top-post. ] On Thu, Aug 22, 2019 at 01:27:06PM +0530, Tejas Joshi wrote: > > What happens then? "It does not work" is very very vague. At least it > > seems the compiler does build now? > > Oh, compiler builds but instruction is still "bl fadd". It should be > "fadds" right? Yes, but that means the problem is earlier, before it hits RTL perhaps. Compile with -dap, look at the expand dump (the lowest numbered one, 234 or so), and see what it looked like in the final Gimple, and then in the RTL generated from that. And then drill down. Maybe you don't get what is needed at Gimple level already. Maybe it is something simple like a typo in the RTL pattern name. You'll find out. Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-22 9:56 ` Segher Boessenkool @ 2019-08-23 17:17 ` Martin Jambor 2019-08-23 19:13 ` Segher Boessenkool 2019-08-24 9:53 ` Richard Sandiford 0 siblings, 2 replies; 63+ messages in thread From: Martin Jambor @ 2019-08-23 17:17 UTC (permalink / raw) To: Segher Boessenkool, Tejas Joshi; +Cc: gcc, hubicka, joseph, Richard Sandiford Hello, On Thu, Aug 22 2019, Segher Boessenkool wrote: >> > Hi Tejas, >> > >> > [ Please do not top-post. ] > > On Thu, Aug 22, 2019 at 01:27:06PM +0530, Tejas Joshi wrote: >> > What happens then? "It does not work" is very very vague. At least it >> > seems the compiler does build now? >> >> Oh, compiler builds but instruction is still "bl fadd". It should be >> "fadds" right? > > Yes, but that means the problem is earlier, before it hits RTL perhaps. > > Compile with -dap, look at the expand dump (the lowest numbered one, 234 > or so), and see what it looked like in the final Gimple, and then in the > RTL generated from that. And then drill down. > Tejas sent me his patch and I looked at why it did not work. I found two reasons: 1. associated_internal_fn (in builtins.c) does not handle DEF_INTERNAL_OPTAB_FN kind of internal functions, and Tejas (sensibly, I'd say) used that macro to define the internal function. But when I worked around that by manually adding a case for it in the switch statement, I ran into an assert because... 2. direct_internal_fn_supported_p on which replacement_internal_fn depends to expand built-ins as internal functions cannot handle conversion optabs... and narrowing is a kind of conversion and the optab is added as such with OPTAB_CD. Actually, the second statement is not entirely true because somehow it can handle optab while_ult which is a conversion optab but a) the way it is handled, if I can understand it at all, seems to be a big hack and would be even worse if we decided to copy that for all narrowing math functions and b) it gets both modes from argument types whereas we need one from the result type and so we would have to rewrite replacement_internal_fn anyway. Therefore, at least for now (GSoC deadline is kind of looming), I decided that the best way forward would be to not rely on internal functions but plug into expand_builtin() and I wrote the following, lightly tested patch - which of course misses testcases and stuff - but I'd be curious about any feedback now anyway. When I proposed a very similar approach for the roundeven x86_64 expansion, Uros actually then opted for a solution based on internal functions, so I am curious whether there are simple alternatives I do not see. Tejas, of course cases for other fadd variants should at least be added to expand_builtin. Thanks, Martin 2019-08-23 Tejas Joshi <tejasjoshi9673@gmail.com> Martin Jambor <mjambor@suse.cz> * builtins.c (expand_builtin_binary_conversion): New function. (expand_builtin): Call it. * config/rs6000/rs6000.md (unspec): Add UNSPEC_ADD_NARROWING. (add_truncdfsf3): New define_insn. * optabs.def (fadd_optab): New. --- gcc/builtins.c | 55 +++++++++++++++++++++++++++++++++++++ gcc/config/rs6000/rs6000.md | 13 +++++++++ gcc/internal-fn.def | 2 ++ gcc/optabs.def | 1 + 4 files changed, 71 insertions(+) diff --git a/gcc/builtins.c b/gcc/builtins.c index 9a766e4ad63..a9bf5710834 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -2935,6 +2935,54 @@ expand_builtin_powi (tree exp, rtx target) return target; } +/* Attempt to expand a builtin function call EXP which performs a binary + operation on its floating point arguments and then converts the result into + a different floating point format. The operation in question is specified + in OP_OPTAB. Return NULL if the attempt failed. SUBTARGET may be used as + the target for computing the operand of EXP. */ + +static rtx +expand_builtin_binary_conversion (tree exp, rtx target, rtx subtarget, + optab op_optab) +{ + if (TREE_CODE (TREE_TYPE (exp)) != REAL_TYPE + || !validate_arglist (exp, REAL_TYPE, REAL_TYPE, VOID_TYPE)) + return NULL_RTX; + + tree arg0 = CALL_EXPR_ARG (exp, 0); + tree arg1 = CALL_EXPR_ARG (exp, 1); + gcc_assert (TYPE_MAIN_VARIANT (TREE_TYPE (arg0)) + == TYPE_MAIN_VARIANT (TREE_TYPE (arg1))); + machine_mode arg_mode = TYPE_MODE (TREE_TYPE (arg1)); + machine_mode res_mode = TYPE_MODE (TREE_TYPE (exp)); + + insn_code icode = convert_optab_handler (op_optab, res_mode, arg_mode); + if (icode == CODE_FOR_nothing) + return NULL_RTX; + + /* Wrap the computation of the arguments in a SAVE_EXPR, as we may + need to expand the argument again. This way, we will not perform + side-effects more the once. */ + CALL_EXPR_ARG (exp, 0) = arg0 = builtin_save_expr (arg0); + CALL_EXPR_ARG (exp, 1) = arg1 = builtin_save_expr (arg1); + + rtx op0 = expand_expr (arg0, subtarget, VOIDmode, EXPAND_NORMAL); + rtx op1 = expand_expr (arg1, subtarget, VOIDmode, EXPAND_NORMAL); + + struct expand_operand ops[3]; + create_output_operand (&ops[0], target, res_mode); + create_input_operand (&ops[1], op0, arg_mode); + create_input_operand (&ops[2], op1, arg_mode); + rtx_insn *pat = maybe_gen_insn (icode, 3, ops); + if (pat) + { + emit_insn (pat); + return ops[0].value; + } + + return NULL_RTX; +} + /* Expand expression EXP which is a call to the strlen builtin. Return NULL_RTX if we failed and the caller should emit a normal call, otherwise try to get the result in TARGET, if convenient. */ @@ -7392,6 +7440,13 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode, return target; break; + case BUILT_IN_FADD: + target = expand_builtin_binary_conversion (exp, target, subtarget, + fadd_optab); + if (target) + return target; + break; + case BUILT_IN_APPLY_ARGS: return expand_builtin_apply_args (); diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 9a7a1da987f..b44783a5028 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -89,6 +89,7 @@ UNSPEC_TLSGOTTPREL UNSPEC_TLSTLS UNSPEC_FIX_TRUNC_TF ; fadd, rounding towards zero + UNSPEC_ADD_NARROWING ; fadd, narrow down to return type UNSPEC_STFIWX UNSPEC_POPCNTB UNSPEC_FRES @@ -4653,6 +4654,18 @@ [(set_attr "type" "fp") (set_attr "isa" "*,<Fisa>")]) +(define_insn "add_truncdfsf3" + [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") + (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%d,wa") + (match_operand:DF 2 "gpc_reg_operand" "d,wa")] + UNSPEC_ADD_NARROWING))] + "TARGET_HARD_FLOAT" + "@ + fadds %0,%1,%2 + xsaddsp %x0,%x1,%x2" + [(set_attr "type" "fp") + (set_attr "isa" "*,p8v")]) + (define_expand "sub<mode>3" [(set (match_operand:SFDF 0 "gpc_reg_operand") (minus:SFDF (match_operand:SFDF 1 "gpc_reg_operand") diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 9461693bcd1..3f56880c23f 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -140,6 +140,8 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, vec_shl_insert, binary) +DEF_INTERNAL_OPTAB_FN (FADD, ECF_CONST, fadd, binary) + DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary) diff --git a/gcc/optabs.def b/gcc/optabs.def index 5283e6753f2..209369e9da1 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -67,6 +67,7 @@ OPTAB_CD(sfixtrunc_optab, "fix_trunc$F$b$I$a2") OPTAB_CD(ufixtrunc_optab, "fixuns_trunc$F$b$I$a2") /* Misc optabs that use two modes; model them as "conversions". */ +OPTAB_CD(fadd_optab, "add_trunc$b$a3") OPTAB_CD(smul_widen_optab, "mul$b$a3") OPTAB_CD(umul_widen_optab, "umul$b$a3") OPTAB_CD(usmul_widen_optab, "usmul$b$a3") -- 2.22.0 ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-23 17:17 ` Martin Jambor @ 2019-08-23 19:13 ` Segher Boessenkool 2019-08-24 9:53 ` Richard Sandiford 1 sibling, 0 replies; 63+ messages in thread From: Segher Boessenkool @ 2019-08-23 19:13 UTC (permalink / raw) To: Martin Jambor; +Cc: Tejas Joshi, gcc, hubicka, joseph, Richard Sandiford Hi! On Fri, Aug 23, 2019 at 07:16:59PM +0200, Martin Jambor wrote: > Therefore, at least for now (GSoC deadline is kind of looming), I > decided that the best way forward would be to not rely on internal > functions but plug into expand_builtin() and I wrote the following, > lightly tested patch - which of course misses testcases and stuff - but > I'd be curious about any feedback now anyway. When I proposed a very > similar approach for the roundeven x86_64 expansion, Uros actually then > opted for a solution based on internal functions, so I am curious > whether there are simple alternatives I do not see. > > Tejas, of course cases for other fadd variants should at least be added > to expand_builtin. Looks good for the rs6000 part, thanks! Some trivialities: > * builtins.c (expand_builtin_binary_conversion): New function. > (expand_builtin): Call it. (Wrong indentation, should be just one tab, no extra spaces). > +(define_insn "add_truncdfsf3" > + [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa") > + (unspec:SF [(match_operand:DF 1 "gpc_reg_operand" "%d,wa") > + (match_operand:DF 2 "gpc_reg_operand" "d,wa")] > + UNSPEC_ADD_NARROWING))] Please align the U with the preceding [. Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-23 17:17 ` Martin Jambor 2019-08-23 19:13 ` Segher Boessenkool @ 2019-08-24 9:53 ` Richard Sandiford 2019-08-25 13:55 ` Tejas Joshi 2019-08-26 13:23 ` Martin Jambor 1 sibling, 2 replies; 63+ messages in thread From: Richard Sandiford @ 2019-08-24 9:53 UTC (permalink / raw) To: Martin Jambor; +Cc: Segher Boessenkool, Tejas Joshi, gcc, hubicka, joseph Martin Jambor <mjambor@suse.cz> writes: > Hello, > > On Thu, Aug 22 2019, Segher Boessenkool wrote: >>> > Hi Tejas, >>> > >>> > [ Please do not top-post. ] >> >> On Thu, Aug 22, 2019 at 01:27:06PM +0530, Tejas Joshi wrote: >>> > What happens then? "It does not work" is very very vague. At least it >>> > seems the compiler does build now? >>> >>> Oh, compiler builds but instruction is still "bl fadd". It should be >>> "fadds" right? >> >> Yes, but that means the problem is earlier, before it hits RTL perhaps. >> >> Compile with -dap, look at the expand dump (the lowest numbered one, 234 >> or so), and see what it looked like in the final Gimple, and then in the >> RTL generated from that. And then drill down. >> > > Tejas sent me his patch and I looked at why it did not work. I found > two reasons: > > 1. associated_internal_fn (in builtins.c) does not handle > DEF_INTERNAL_OPTAB_FN kind of internal functions, and Tejas > (sensibly, I'd say) used that macro to define the internal function. > But when I worked around that by manually adding a case for it in the > switch statement, I ran into an assert because... > > 2. direct_internal_fn_supported_p on which replacement_internal_fn > depends to expand built-ins as internal functions cannot handle > conversion optabs... and narrowing is a kind of conversion and the > optab is added as such with OPTAB_CD. > > Actually, the second statement is not entirely true because somehow it > can handle optab while_ult which is a conversion optab but a) the way it > is handled, if I can understand it at all, seems to be a big hack and > would be even worse if we decided to copy that for all narrowing math > functions Think "big hack" is a bit unfair. The way that the internal function maps argument types to the optab modes, and the way it expands calls into rtl, depends on the "optab type" argument (the final argument to DEF_INTERNAL_OPTAB_FN). This is relatively flexible in that it can use a single-mode "direct" optab or a dual-mode "conversion" optab, with the modes coming from whichever arguments are appropriate. New optab types can be added as needed. FWIW, several other DEF_INTERNAL_OPTAB_FNs are conversion optabs too (e.g. IFN_LOAD_LANES, IFN_STORE_LANES, IFN_MASK_LOAD, etc.). But... > and b) it gets both modes from argument types whereas we need one from > the result type and so we would have to rewrite > replacement_internal_fn anyway. ...yeah, I agree this breaks the current model. The reason IFN_WHILE_ULT doesn't rely on the return type is that if you have: _2 = .WHILE_ULT (_0, _1) // returning a vector of 4 booleans _3 = .WHILE_ULT (_0, _1) // returning a vector of 8 booleans then the calls look equivalent. So instead we pass an extra argument indicating the required boolean vector "shape". The same "problem" could in principle apply to FADD if we ever needed to support double+double->_Float16 for example. > Therefore, at least for now (GSoC deadline is kind of looming), I > decided that the best way forward would be to not rely on internal > functions but plug into expand_builtin() and I wrote the following, > lightly tested patch - which of course misses testcases and stuff - but > I'd be curious about any feedback now anyway. When I proposed a very > similar approach for the roundeven x86_64 expansion, Uros actually then > opted for a solution based on internal functions, so I am curious > whether there are simple alternatives I do not see. > > Tejas, of course cases for other fadd variants should at least be added > to expand_builtin. > > Thanks, > > Martin > > > 2019-08-23 Tejas Joshi <tejasjoshi9673@gmail.com> > Martin Jambor <mjambor@suse.cz> > > * builtins.c (expand_builtin_binary_conversion): New function. > (expand_builtin): Call it. > * config/rs6000/rs6000.md (unspec): Add UNSPEC_ADD_NARROWING. > (add_truncdfsf3): New define_insn. > * optabs.def (fadd_optab): New. > > [...] > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index 9461693bcd1..3f56880c23f 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -140,6 +140,8 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) > DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, > vec_shl_insert, binary) > > +DEF_INTERNAL_OPTAB_FN (FADD, ECF_CONST, fadd, binary) > + > DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) > DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) > DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary) Should be dropped now. OK with that change and the ones Segher asked for. Thanks, Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-24 9:53 ` Richard Sandiford @ 2019-08-25 13:55 ` Tejas Joshi 2019-08-25 16:47 ` Segher Boessenkool 2019-08-26 13:23 ` Martin Jambor 1 sibling, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-08-25 13:55 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph Hello. > > Similarly addtfsf3 that multiplies TFmode and produces an SFmode result, and so on. I want to extend this patch for FADDL and DADDL. What operand constraints should I use for TFmode alongside "f"? > In cases where long double and double have the same mode, >the daddl function should use the existing adddf3 pattern. So, should I use adddf3 for DADDL directly? How would I map the add<mode>3 optab with DADDL? Thanks, Tejas On Sat, 24 Aug 2019 at 15:23, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Martin Jambor <mjambor@suse.cz> writes: > > Hello, > > > > On Thu, Aug 22 2019, Segher Boessenkool wrote: > >>> > Hi Tejas, > >>> > > >>> > [ Please do not top-post. ] > >> > >> On Thu, Aug 22, 2019 at 01:27:06PM +0530, Tejas Joshi wrote: > >>> > What happens then? "It does not work" is very very vague. At least it > >>> > seems the compiler does build now? > >>> > >>> Oh, compiler builds but instruction is still "bl fadd". It should be > >>> "fadds" right? > >> > >> Yes, but that means the problem is earlier, before it hits RTL perhaps. > >> > >> Compile with -dap, look at the expand dump (the lowest numbered one, 234 > >> or so), and see what it looked like in the final Gimple, and then in the > >> RTL generated from that. And then drill down. > >> > > > > Tejas sent me his patch and I looked at why it did not work. I found > > two reasons: > > > > 1. associated_internal_fn (in builtins.c) does not handle > > DEF_INTERNAL_OPTAB_FN kind of internal functions, and Tejas > > (sensibly, I'd say) used that macro to define the internal function. > > But when I worked around that by manually adding a case for it in the > > switch statement, I ran into an assert because... > > > > 2. direct_internal_fn_supported_p on which replacement_internal_fn > > depends to expand built-ins as internal functions cannot handle > > conversion optabs... and narrowing is a kind of conversion and the > > optab is added as such with OPTAB_CD. > > > > Actually, the second statement is not entirely true because somehow it > > can handle optab while_ult which is a conversion optab but a) the way it > > is handled, if I can understand it at all, seems to be a big hack and > > would be even worse if we decided to copy that for all narrowing math > > functions > > Think "big hack" is a bit unfair. The way that the internal function > maps argument types to the optab modes, and the way it expands calls > into rtl, depends on the "optab type" argument (the final argument to > DEF_INTERNAL_OPTAB_FN). This is relatively flexible in that it can use > a single-mode "direct" optab or a dual-mode "conversion" optab, with the > modes coming from whichever arguments are appropriate. New optab types > can be added as needed. > > FWIW, several other DEF_INTERNAL_OPTAB_FNs are conversion optabs too > (e.g. IFN_LOAD_LANES, IFN_STORE_LANES, IFN_MASK_LOAD, etc.). > > But... > > > and b) it gets both modes from argument types whereas we need one from > > the result type and so we would have to rewrite > > replacement_internal_fn anyway. > > ...yeah, I agree this breaks the current model. The reason IFN_WHILE_ULT > doesn't rely on the return type is that if you have: > > _2 = .WHILE_ULT (_0, _1) // returning a vector of 4 booleans > _3 = .WHILE_ULT (_0, _1) // returning a vector of 8 booleans > > then the calls look equivalent. So instead we pass an extra argument > indicating the required boolean vector "shape". > > The same "problem" could in principle apply to FADD if we ever needed > to support double+double->_Float16 for example. > > > Therefore, at least for now (GSoC deadline is kind of looming), I > > decided that the best way forward would be to not rely on internal > > functions but plug into expand_builtin() and I wrote the following, > > lightly tested patch - which of course misses testcases and stuff - but > > I'd be curious about any feedback now anyway. When I proposed a very > > similar approach for the roundeven x86_64 expansion, Uros actually then > > opted for a solution based on internal functions, so I am curious > > whether there are simple alternatives I do not see. > > > > Tejas, of course cases for other fadd variants should at least be added > > to expand_builtin. > > > > Thanks, > > > > Martin > > > > > > 2019-08-23 Tejas Joshi <tejasjoshi9673@gmail.com> > > Martin Jambor <mjambor@suse.cz> > > > > * builtins.c (expand_builtin_binary_conversion): New function. > > (expand_builtin): Call it. > > * config/rs6000/rs6000.md (unspec): Add UNSPEC_ADD_NARROWING. > > (add_truncdfsf3): New define_insn. > > * optabs.def (fadd_optab): New. > > > > [...] > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > index 9461693bcd1..3f56880c23f 100644 > > --- a/gcc/internal-fn.def > > +++ b/gcc/internal-fn.def > > @@ -140,6 +140,8 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) > > DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, > > vec_shl_insert, binary) > > > > +DEF_INTERNAL_OPTAB_FN (FADD, ECF_CONST, fadd, binary) > > + > > DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) > > DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) > > DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary) > > Should be dropped now. > > OK with that change and the ones Segher asked for. > > Thanks, > Richard ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-25 13:55 ` Tejas Joshi @ 2019-08-25 16:47 ` Segher Boessenkool 2019-08-26 7:07 ` Tejas Joshi 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-25 16:47 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph [ Please don't top-post ] On Sun, Aug 25, 2019 at 07:32:01PM +0530, Tejas Joshi wrote: > I want to extend this patch for FADDL and DADDL. What operand > constraints should I use for TFmode alongside "f"? It depends on the instruction you use, and what registers that then works on. GPRs get "r", FPRs get "f" for SFmode but "d" otherwise, the VRs get "v", if all VSRs are allowed you get "wa". And there are some mode attributes to go with mode iterators for when you handle multiple modes (which you always do, you need to handle KF as well). What machine insns do you want to generate? There most likely is something a lot like it already, so take that as example? > > In cases where long double and double have the same mode, > >the daddl function should use the existing adddf3 pattern. Sure, that probably should be handled in generic code (not rs6000). Where it would generate an adddfdf2 it should just do an adddf3. > So, should I use adddf3 for DADDL directly? How would I map the > add<mode>3 optab with DADDL? Simply check if source and target mode are the same? Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-25 16:47 ` Segher Boessenkool @ 2019-08-26 7:07 ` Tejas Joshi 2019-08-26 7:42 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-08-26 7:07 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph Hello. Sorry for not being clear. I am confused about some modes here. I meant, just as we expanded fadd (which narrows down from double to float) with add_truncdfsf3, how can I expand faddl (which narrows down long double to float). Wouldn't I require TFmode -> SFmode as add_trunctfsf3 just as Joseph had previously mentioned? And if yes, the operand constraints would still be f,d and d for TF->SF or what? Also, just as we generated fadds/xsaddsp instructions for fadd, would I be generating the same ones for faddl and fadd/xsadddp for daddl (long double to double) or something different? all for ISA 2.07. (for ISA 3.0, I might use IEEE128/FLOAT128 round-to-odd instructions like add<mode>_odd followed by conversion to narrower?) Thanks, Tejas On Sun, 25 Aug 2019 at 22:17, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > [ Please don't top-post ] > > On Sun, Aug 25, 2019 at 07:32:01PM +0530, Tejas Joshi wrote: > > I want to extend this patch for FADDL and DADDL. What operand > > constraints should I use for TFmode alongside "f"? > > It depends on the instruction you use, and what registers that then > works on. GPRs get "r", FPRs get "f" for SFmode but "d" otherwise, the > VRs get "v", if all VSRs are allowed you get "wa". And there are some > mode attributes to go with mode iterators for when you handle multiple > modes (which you always do, you need to handle KF as well). > > What machine insns do you want to generate? There most likely is > something a lot like it already, so take that as example? > > > > In cases where long double and double have the same mode, > > >the daddl function should use the existing adddf3 pattern. > > Sure, that probably should be handled in generic code (not rs6000). > Where it would generate an adddfdf2 it should just do an adddf3. > > > So, should I use adddf3 for DADDL directly? How would I map the > > add<mode>3 optab with DADDL? > > Simply check if source and target mode are the same? > > > Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-26 7:07 ` Tejas Joshi @ 2019-08-26 7:42 ` Segher Boessenkool 2019-08-30 19:12 ` Tejas Joshi 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-26 7:42 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph > > [ Please don't top-post ] On Mon, Aug 26, 2019 at 12:43:44PM +0530, Tejas Joshi wrote: > Sorry for not being clear. I am confused about some modes here. I > meant, just as we expanded fadd (which narrows down from double to > float) with add_truncdfsf3, how can I expand faddl (which narrows down > long double to float). Wouldn't I require TFmode -> SFmode as > add_trunctfsf3 just as Joseph had previously mentioned? Yes, you need an addsfkf2 as well as adddfkf2 (and tf variants of those, there are iterators for that). KF is IEEE QP float. TF is whatever long double maps to, IEEE QP or double-double. > And if yes, > the operand constraints would still be f,d and d for TF->SF or what? SF is "f". KF does not fit in "d". You won't need constraints anyway. There already is add<mode>3_odd and you can just use that, in a new defione_expand you make. For example, for DP you need two insns: xsaddqpo followed by xscvqpdp. The second of those is the existing insn pattern trunc<mode>df2_hw, so you just get something like (define_expand "adddfkf2" [(set (match_operand:DF 0 "gpc_reg_operand") (unspec:DF [(match_operand:IEEE128 1 "gpc_reg_operand") (match_operand:IEEE128 2 "gpc_reg_operand")] UNSPEC_DUNNO_MENTION_DF_SOMEHOW))] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)" { rtx tmp = gen_reg_rtx (<MODE>mode); emit_insn (gen_add<mode>3_odd (tmp, operands[1])))), operands[2]); emit_insn (trunc<mode>df2_hw (operands[0], tmp)); DONE; }) (not tested at all, be careful :-) ) > Also, just as we generated fadds/xsaddsp instructions for fadd, would > I be generating the same ones for faddl and fadd/xsadddp for daddl > (long double to double) or something different? all for ISA 2.07. (for > ISA 3.0, I might use IEEE128/FLOAT128 round-to-odd instructions like > add<mode>_odd followed by conversion to narrower?) For ISA 2.07 (Power 8) you don't have IEEE128 at all, not in hardware that is. I don't know if we'll want fadd support in the emulation libraries ever; don't worry about it for now, anyway. "long double is double" you should probably handle in generic code. "long double is double-double", well, fadd cannot really be done better than an add followed by a conversion in that case? Which boils down to truncating the inputs to double, and then doing whatever you would do for IEEE DP float. Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-26 7:42 ` Segher Boessenkool @ 2019-08-30 19:12 ` Tejas Joshi 2019-08-30 20:35 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-08-30 19:12 UTC (permalink / raw) To: gcc; +Cc: segher, joseph, Martin Jambor Hello. > For ISA 2.07 (Power 8) you don't have IEEE128 at all, not in hardware > that is. I don't know if we'll want fadd support in the emulation > libraries ever; don't worry about it for now, anyway. What instructions would need to be expanded for FADDL (long double to float) and DADDL (long double to double) on power8 (ISA 2.07) and power9 (ISA 3.0) respectively, along with VSX? (Just as we expanded FADD to fadds and xsaddsp for vsx). Thanks, Tejas On Mon, 26 Aug 2019 at 13:12, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > > > [ Please don't top-post ] > > On Mon, Aug 26, 2019 at 12:43:44PM +0530, Tejas Joshi wrote: > > Sorry for not being clear. I am confused about some modes here. I > > meant, just as we expanded fadd (which narrows down from double to > > float) with add_truncdfsf3, how can I expand faddl (which narrows down > > long double to float). Wouldn't I require TFmode -> SFmode as > > add_trunctfsf3 just as Joseph had previously mentioned? > > Yes, you need an addsfkf2 as well as adddfkf2 (and tf variants of those, > there are iterators for that). > > KF is IEEE QP float. TF is whatever long double maps to, IEEE QP or > double-double. > > > And if yes, > > the operand constraints would still be f,d and d for TF->SF or what? > > SF is "f". KF does not fit in "d". > > You won't need constraints anyway. There already is add<mode>3_odd and > you can just use that, in a new defione_expand you make. For example, > for DP you need two insns: xsaddqpo followed by xscvqpdp. The second > of those is the existing insn pattern trunc<mode>df2_hw, so you just get > something like > > (define_expand "adddfkf2" > [(set (match_operand:DF 0 "gpc_reg_operand") > (unspec:DF [(match_operand:IEEE128 1 "gpc_reg_operand") > (match_operand:IEEE128 2 "gpc_reg_operand")] > UNSPEC_DUNNO_MENTION_DF_SOMEHOW))] > "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)" > { > rtx tmp = gen_reg_rtx (<MODE>mode); > emit_insn (gen_add<mode>3_odd (tmp, operands[1])))), operands[2]); > emit_insn (trunc<mode>df2_hw (operands[0], tmp)); > DONE; > }) > > (not tested at all, be careful :-) ) > > > Also, just as we generated fadds/xsaddsp instructions for fadd, would > > I be generating the same ones for faddl and fadd/xsadddp for daddl > > (long double to double) or something different? all for ISA 2.07. (for > > ISA 3.0, I might use IEEE128/FLOAT128 round-to-odd instructions like > > add<mode>_odd followed by conversion to narrower?) > > For ISA 2.07 (Power 8) you don't have IEEE128 at all, not in hardware > that is. I don't know if we'll want fadd support in the emulation > libraries ever; don't worry about it for now, anyway. > > "long double is double" you should probably handle in generic code. > "long double is double-double", well, fadd cannot really be done better > than an add followed by a conversion in that case? Which boils down > to truncating the inputs to double, and then doing whatever you would > do for IEEE DP float. > > > Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-30 19:12 ` Tejas Joshi @ 2019-08-30 20:35 ` Segher Boessenkool 2019-09-02 3:19 ` Tejas Joshi 0 siblings, 1 reply; 63+ messages in thread From: Segher Boessenkool @ 2019-08-30 20:35 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, joseph, Martin Jambor > > > > [ Please don't top-post ] (I delete everything under your signature, without looking, assuming you just forgot to). On Sat, Aug 31, 2019 at 12:48:42AM +0530, Tejas Joshi wrote: > > For ISA 2.07 (Power 8) you don't have IEEE128 at all, not in hardware > > that is. I don't know if we'll want fadd support in the emulation > > libraries ever; don't worry about it for now, anyway. > > What instructions would need to be expanded for FADDL (long double to > float) and DADDL (long double to double) on power8 (ISA 2.07) and > power9 (ISA 3.0) respectively, along with VSX? (Just as we expanded > FADD to fadds and xsaddsp for vsx). If long double is double, faddl is the same as fadd, and daddl is just normal addition. If long double is double-double, faddl can be done as fadd on the first double precision component of both args, and daddl is just normal addition of those. If long double is IEEE QP, then it is more interesting :-) daddl is xsaddqpo # add qp, with round to odd xscvqpdp # convert qp to dp faddl is xsaddqpo # add qp, with round to odd xscvqpdpo # convert qp to dp, with round to odd xsrsp # convert dp to sp (single precision numbers are stored in double precision format, but this is rounded as single precision) fadds is fadds ; :-) or xsaddsp Both faddl and daddl are the sequences for Power9. There are no instructions for QP format on Power8; see libgcc/config/rs6000/t-float128 for how support for the emulation QP math is built, if you are interested. Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-30 20:35 ` Segher Boessenkool @ 2019-09-02 3:19 ` Tejas Joshi 2019-09-02 11:30 ` Segher Boessenkool 0 siblings, 1 reply; 63+ messages in thread From: Tejas Joshi @ 2019-09-02 3:19 UTC (permalink / raw) To: gcc; +Cc: Martin Jambor, hubicka, segher, joseph On Sat, 31 Aug 2019 at 02:05, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > > > > > [ Please don't top-post ] > > (I delete everything under your signature, without looking, assuming you > just forgot to). Oh sorry, I didn't know the reply button does evil things. :-) > If long double is double, faddl is the same as fadd, and daddl is just > normal addition. > > If long double is double-double, faddl can be done as fadd on the first > double precision component of both args, and daddl is just normal addition > of those. > > If long double is IEEE QP, then it is more interesting :-) On what conditions does the mapping of long double to double/ double-double or IEEE QP changes or depends, so that I can test it. Thanks, Tejas ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-09-02 3:19 ` Tejas Joshi @ 2019-09-02 11:30 ` Segher Boessenkool 0 siblings, 0 replies; 63+ messages in thread From: Segher Boessenkool @ 2019-09-02 11:30 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph Hi Tejas, On Mon, Sep 02, 2019 at 08:55:28AM +0530, Tejas Joshi wrote: > On Sat, 31 Aug 2019 at 02:05, Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > > > > > > > [ Please don't top-post ] > > > > (I delete everything under your signature, without looking, assuming you > > just forgot to). > > Oh sorry, I didn't know the reply button does evil things. :-) You're supposed to write your email as if the time reading it is more valuable than the time writing it. You can afford to spend a few seconds deleting some stuff, or checking that what you wrote is good, etc. There is only one you, and there are many people reading this. > > If long double is double, faddl is the same as fadd, and daddl is just > > normal addition. > > > > If long double is double-double, faddl can be done as fadd on the first > > double precision component of both args, and daddl is just normal addition > > of those. > > > > If long double is IEEE QP, then it is more interesting :-) > > On what conditions does the mapping of long double to double/ > double-double or IEEE QP changes or depends, so that I can test it. gcc -Q --help=target | grep long -mlong-double-64 is to select double, and -mlong-double-128 says to use something 128 bits. These are not mentioned in the manual, and GCC often says it has a different number of bits selected, hrm. I'll open a PR. When having a 128-bit long double, -mabi=ibmlongdouble says to use the double-double format, and -mabi=ieeelongdouble says to use IEEE QP FP. HTH, Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-24 9:53 ` Richard Sandiford 2019-08-25 13:55 ` Tejas Joshi @ 2019-08-26 13:23 ` Martin Jambor 1 sibling, 0 replies; 63+ messages in thread From: Martin Jambor @ 2019-08-26 13:23 UTC (permalink / raw) To: Richard Sandiford; +Cc: Segher Boessenkool, Tejas Joshi, gcc, hubicka, joseph Hi, On Sat, Aug 24 2019, Richard Sandiford wrote: > Martin Jambor <mjambor@suse.cz> writes: ... >> >> 2. direct_internal_fn_supported_p on which replacement_internal_fn >> depends to expand built-ins as internal functions cannot handle >> conversion optabs... and narrowing is a kind of conversion and the >> optab is added as such with OPTAB_CD. >> >> Actually, the second statement is not entirely true because somehow it >> can handle optab while_ult which is a conversion optab but a) the way it >> is handled, if I can understand it at all, seems to be a big hack and >> would be even worse if we decided to copy that for all narrowing math >> functions > > Think "big hack" is a bit unfair. The way that the internal function > maps argument types to the optab modes, and the way it expands calls > into rtl, depends on the "optab type" argument (the final argument to > DEF_INTERNAL_OPTAB_FN). This is relatively flexible in that it can use > a single-mode "direct" optab or a dual-mode "conversion" optab, with the > modes coming from whichever arguments are appropriate. New optab types > can be added as needed. My apologies. I guess I should have been more careful with my choice of words when perhaps I did not understand all aspects but when I saw: #define direct_while_optab_supported_p convert_optab_supported_p (and when I saw expand_while_optab_fn defined normally while all(?) other were constructed in an elaborate macro), I thought that I did not want to replicate the mechanism, not for a number of functions. > > FWIW, several other DEF_INTERNAL_OPTAB_FNs are conversion optabs too > (e.g. IFN_LOAD_LANES, IFN_STORE_LANES, IFN_MASK_LOAD, etc.). > > But... > >> and b) it gets both modes from argument types whereas we need one from >> the result type and so we would have to rewrite >> replacement_internal_fn anyway. > > ...yeah, I agree this breaks the current model. The reason IFN_WHILE_ULT > doesn't rely on the return type is that if you have: > > _2 = .WHILE_ULT (_0, _1) // returning a vector of 4 booleans > _3 = .WHILE_ULT (_0, _1) // returning a vector of 8 booleans > > then the calls look equivalent. So instead we pass an extra argument > indicating the required boolean vector "shape". > > The same "problem" could in principle apply to FADD if we ever needed > to support double+double->_Float16 for example. Right. I hope not going through an internal function is acceptable. If not, we'll have to teach this builtin->internal_funcxtion->optab mechanism about conversions. Thanks, Martin > >> Therefore, at least for now (GSoC deadline is kind of looming), I >> decided that the best way forward would be to not rely on internal >> functions but plug into expand_builtin() and I wrote the following, >> lightly tested patch - which of course misses testcases and stuff - but >> I'd be curious about any feedback now anyway. When I proposed a very >> similar approach for the roundeven x86_64 expansion, Uros actually then >> opted for a solution based on internal functions, so I am curious >> whether there are simple alternatives I do not see. >> >> Tejas, of course cases for other fadd variants should at least be added >> to expand_builtin. >> >> Thanks, >> >> Martin >> >> >> 2019-08-23 Tejas Joshi <tejasjoshi9673@gmail.com> >> Martin Jambor <mjambor@suse.cz> >> >> * builtins.c (expand_builtin_binary_conversion): New function. >> (expand_builtin): Call it. >> * config/rs6000/rs6000.md (unspec): Add UNSPEC_ADD_NARROWING. >> (add_truncdfsf3): New define_insn. >> * optabs.def (fadd_optab): New. >> ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-20 12:11 ` Segher Boessenkool 2019-08-20 12:59 ` Richard Sandiford @ 2019-08-20 16:04 ` Joseph Myers 1 sibling, 0 replies; 63+ messages in thread From: Joseph Myers @ 2019-08-20 16:04 UTC (permalink / raw) To: Segher Boessenkool Cc: Tejas Joshi, gcc, Martin Jambor, hubicka, richard.sandiford On Tue, 20 Aug 2019, Segher Boessenkool wrote: > plus minus neg mult div mod smin smax abs sqrt fma I think? And let's > hope we never ever have to do saturating versions of FP :-) There are six operations with narrowing versions in TS 18661-1 (plus minus mult div sqrt fma). neg and abs are operations that do no rounding, raise no exceptions and preserve signaling NaNs (affecting their sign bit appropriately). -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: Expansion of narrowing math built-ins into power instructions 2019-08-15 9:52 ` Tejas Joshi 2019-08-15 12:47 ` Richard Sandiford @ 2019-08-15 18:54 ` Segher Boessenkool 1 sibling, 0 replies; 63+ messages in thread From: Segher Boessenkool @ 2019-08-15 18:54 UTC (permalink / raw) To: Tejas Joshi; +Cc: gcc, Martin Jambor, hubicka, joseph On Thu, Aug 15, 2019 at 03:29:03PM +0530, Tejas Joshi wrote: > Also, in what manner should float_contract/narrow be different from > float_truncate as both are trying to do similar things? (truncation > from DF to SF) It's just a different name, nothing more, nothing less. Because it is a different name it can not be accidentally generated from actual truncations. Segher ^ permalink raw reply [flat|nested] 63+ messages in thread
end of thread, other threads:[~2019-09-02 11:30 UTC | newest] Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-29 17:37 Expansion of narrowing math built-ins into power instructions Martin Jambor 2019-07-29 18:40 ` Segher Boessenkool 2019-07-30 19:47 ` Joseph Myers 2019-07-30 9:20 ` Florian Weimer 2019-07-30 19:49 ` Joseph Myers 2019-07-31 6:47 ` Tejas Joshi 2019-07-31 14:47 ` Segher Boessenkool 2019-08-08 18:39 ` Tejas Joshi 2019-08-08 20:05 ` Segher Boessenkool 2019-08-08 23:09 ` Joseph Myers 2019-08-10 10:24 ` Tejas Joshi 2019-08-10 16:46 ` Segher Boessenkool 2019-08-11 4:58 ` Tejas Joshi 2019-08-11 7:20 ` Segher Boessenkool 2019-08-11 12:46 ` Tejas Joshi 2019-08-11 16:59 ` Segher Boessenkool 2019-08-12 17:25 ` Tejas Joshi 2019-08-12 17:55 ` Segher Boessenkool 2019-08-12 21:20 ` Joseph Myers 2019-08-12 21:52 ` Segher Boessenkool 2019-08-14 6:15 ` Tejas Joshi 2019-08-14 7:21 ` Segher Boessenkool 2019-08-14 16:11 ` Joseph Myers 2019-08-14 20:21 ` Segher Boessenkool 2019-08-14 20:23 ` Joseph Myers 2019-08-14 21:00 ` Segher Boessenkool 2019-08-15 9:52 ` Tejas Joshi 2019-08-15 12:47 ` Richard Sandiford 2019-08-15 13:55 ` Tejas Joshi 2019-08-15 18:45 ` Segher Boessenkool 2019-08-16 10:23 ` Richard Sandiford 2019-08-17 5:40 ` Tejas Joshi 2019-08-17 8:21 ` Richard Sandiford 2019-08-19 10:46 ` Tejas Joshi 2019-08-19 13:07 ` Segher Boessenkool 2019-08-20 7:41 ` Richard Sandiford 2019-08-20 12:11 ` Segher Boessenkool 2019-08-20 12:59 ` Richard Sandiford 2019-08-20 13:46 ` Segher Boessenkool 2019-08-20 14:43 ` Richard Sandiford 2019-08-20 15:12 ` Richard Sandiford 2019-08-20 19:42 ` Segher Boessenkool 2019-08-21 17:20 ` Tejas Joshi 2019-08-21 18:28 ` Segher Boessenkool 2019-08-21 19:17 ` Segher Boessenkool 2019-08-22 3:33 ` Tejas Joshi 2019-08-22 6:25 ` Segher Boessenkool 2019-08-22 7:57 ` Tejas Joshi 2019-08-22 9:56 ` Segher Boessenkool 2019-08-23 17:17 ` Martin Jambor 2019-08-23 19:13 ` Segher Boessenkool 2019-08-24 9:53 ` Richard Sandiford 2019-08-25 13:55 ` Tejas Joshi 2019-08-25 16:47 ` Segher Boessenkool 2019-08-26 7:07 ` Tejas Joshi 2019-08-26 7:42 ` Segher Boessenkool 2019-08-30 19:12 ` Tejas Joshi 2019-08-30 20:35 ` Segher Boessenkool 2019-09-02 3:19 ` Tejas Joshi 2019-09-02 11:30 ` Segher Boessenkool 2019-08-26 13:23 ` Martin Jambor 2019-08-20 16:04 ` Joseph Myers 2019-08-15 18:54 ` Segher Boessenkool
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).