public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?
@ 2021-07-14  5:15 Hongtao Liu
  2021-07-14  5:18 ` Hongtao Liu
       [not found] ` <47833128.jc7mZyZMxp@excalibur>
  0 siblings, 2 replies; 10+ messages in thread
From: Hongtao Liu @ 2021-07-14  5:15 UTC (permalink / raw)
  To: gcc-help, GCC Patches, GCC Development

Hi:
  The original problem was that some users wanted the cmdline option
-ffast-math not to act on intrinsic production code. .i.e for codes
like

#include<immintrin.h>
__m256d
foo2 (__m256d a, __m256d b, __m256d c, __m256d d)
{
__m256d tmp = _mm256_add_pd (a, b);
tmp = _mm256_sub_pd (tmp, c);
tmp = _mm256_sub_pd (tmp, d);
return tmp;
}

compiled with -O2 -mavx2 -ffast-math, users expected codes generated like

vaddpd ymm0, ymm0, ymm1
vsubpd ymm0, ymm0, ymm2
vsubpd ymm0, ymm0, ymm3

but not

vsubpd ymm1, ymm1, ymm2
vsubpd ymm0, ymm0, ymm3
vaddpd ymm0, ymm1, ymm0


For the LLVM side, there're mechanisms like
#pragma float_control( precise, on, push)
...(intrinsics definition)..
#pragma float_control(pop)

When intrinsics are inlined, their IRs will be marked with
"no-fast-math", and even if the caller is compiled with -ffast-math,
reassociation only happens to those IRs which are not marked with
"no-fast-math". It seems to be more flexible to support fast math
control of a region(inside a function).

Does GCC have a similar mechanism?


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?
  2021-07-14  5:15 [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not? Hongtao Liu
@ 2021-07-14  5:18 ` Hongtao Liu
  2021-07-14  6:39   ` Matthias Kretz
       [not found] ` <47833128.jc7mZyZMxp@excalibur>
  1 sibling, 1 reply; 10+ messages in thread
From: Hongtao Liu @ 2021-07-14  5:18 UTC (permalink / raw)
  To: gcc-help, GCC Patches, GCC Development

On Wed, Jul 14, 2021 at 1:15 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Hi:
>   The original problem was that some users wanted the cmdline option
> -ffast-math not to act on intrinsic production code. .i.e for codes
> like
>
> #include<immintrin.h>
> __m256d
> foo2 (__m256d a, __m256d b, __m256d c, __m256d d)
> {
> __m256d tmp = _mm256_add_pd (a, b);
> tmp = _mm256_sub_pd (tmp, c);
> tmp = _mm256_sub_pd (tmp, d);
> return tmp;
> }
>
> compiled with -O2 -mavx2 -ffast-math, users expected codes generated like
>
> vaddpd ymm0, ymm0, ymm1
> vsubpd ymm0, ymm0, ymm2
> vsubpd ymm0, ymm0, ymm3
>
> but not
>
> vsubpd ymm1, ymm1, ymm2
> vsubpd ymm0, ymm0, ymm3
> vaddpd ymm0, ymm1, ymm0
>
>
> For the LLVM side, there're mechanisms like
> #pragma float_control( precise, on, push)
> ...(intrinsics definition)..
> #pragma float_control(pop)
>
> When intrinsics are inlined, their IRs will be marked with
> "no-fast-math", and even if the caller is compiled with -ffast-math,
> reassociation only happens to those IRs which are not marked with
> "no-fast-math". It seems to be more flexible to support fast math
> control of a region(inside a function).
Testcase
https://godbolt.org/z/9cYMGGWPG
>
> Does GCC have a similar mechanism?
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?
  2021-07-14  5:18 ` Hongtao Liu
@ 2021-07-14  6:39   ` Matthias Kretz
  2021-07-14  7:04     ` Hongtao Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Matthias Kretz @ 2021-07-14  6:39 UTC (permalink / raw)
  To: gcc-help, GCC Patches, GCC Development

On Wednesday, 14 July 2021 07:18:29 CEST Hongtao Liu via Gcc-help wrote:
> On Wed, Jul 14, 2021 at 1:15 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > Hi:
> >   The original problem was that some users wanted the cmdline option
> > 
> > -ffast-math not to act on intrinsic production code.

This sounds like the users want intrinsics to map *directly* to the 
corresponding instruction. If that's the case such users should use inline 
assembly, IMHO. If you compile a TU with -ffast-math then *all* floating-point 
operations are affected. Yes, more control over where to use fast-math and the 
ability to mix fast-math and no-fast-math without risking ODR violations would 
be great. But that's a larger issue, and one that would ideally be solved in 
WG14/WG21.

FWIW, this is what I'd do, i.e. turn off fast-math for the function in 
question:
https://godbolt.org/z/3cKq5hT1o

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?
  2021-07-14  6:39   ` Matthias Kretz
@ 2021-07-14  7:04     ` Hongtao Liu
  2021-07-14  7:39       ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Hongtao Liu @ 2021-07-14  7:04 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: gcc-help, GCC Patches, GCC Development, H. J. Lu

On Wed, Jul 14, 2021 at 2:39 PM Matthias Kretz <m.kretz@gsi.de> wrote:
>
> On Wednesday, 14 July 2021 07:18:29 CEST Hongtao Liu via Gcc-help wrote:
> > On Wed, Jul 14, 2021 at 1:15 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > > Hi:
> > >   The original problem was that some users wanted the cmdline option
> > >
> > > -ffast-math not to act on intrinsic production code.
>
> This sounds like the users want intrinsics to map *directly* to the
Thanks for the reply.
I think the users want the mixed usage of fast-math and no-fast-math.
> corresponding instruction. If that's the case such users should use inline
> assembly, IMHO. If you compile a TU with -ffast-math then *all* floating-point
> operations are affected. Yes, more control over where to use fast-math and the
> ability to mix fast-math and no-fast-math without risking ODR violations would
> be great. But that's a larger issue, and one that would ideally be solved in
> WG14/WG21.
hmm, guess it would need a lot of work.
>
> FWIW, this is what I'd do, i.e. turn off fast-math for the function in
> question:
> https://godbolt.org/z/3cKq5hT1o
>
> --
> ──────────────────────────────────────────────────────────────────────────
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  std::experimental::simd              https://github.com/VcDevel/std-simd
> ──────────────────────────────────────────────────────────────────────────



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?
  2021-07-14  7:04     ` Hongtao Liu
@ 2021-07-14  7:39       ` Richard Biener
  2021-07-14 16:45         ` Segher Boessenkool
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2021-07-14  7:39 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Matthias Kretz, GCC-help, GCC Development, GCC Patches

On Wed, Jul 14, 2021 at 9:00 AM Hongtao Liu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Wed, Jul 14, 2021 at 2:39 PM Matthias Kretz <m.kretz@gsi.de> wrote:
> >
> > On Wednesday, 14 July 2021 07:18:29 CEST Hongtao Liu via Gcc-help wrote:
> > > On Wed, Jul 14, 2021 at 1:15 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > Hi:
> > > >   The original problem was that some users wanted the cmdline option
> > > >
> > > > -ffast-math not to act on intrinsic production code.
> >
> > This sounds like the users want intrinsics to map *directly* to the
> Thanks for the reply.
> I think the users want the mixed usage of fast-math and no-fast-math.
> > corresponding instruction. If that's the case such users should use inline
> > assembly, IMHO. If you compile a TU with -ffast-math then *all* floating-point
> > operations are affected. Yes, more control over where to use fast-math and the
> > ability to mix fast-math and no-fast-math without risking ODR violations would
> > be great. But that's a larger issue, and one that would ideally be solved in
> > WG14/WG21.
> hmm, guess it would need a lot of work.

-ffast-math decomposes to quite some flag_* and those generally are not
reflected into the IL but can be different per function (and then
prevent inlining).

There's one "related" IL feature used by the Fortran frontend - PAREN_EXPR
prevents association across it.  So for Fortran (when not -fno-protect-parens
which is enabled by -Ofast), (a + b) - b cannot be optimized to a.  Eventually
this could be used to wrap intrinsic results since most of the issues in the
end require association.  Note PAREN_EXPR isn't exposed to the C family
frontends but we could of course add a builtin-like thing for this
_Noassoc ( .... ) or so.  Note PAREN_EXPR survives -Ofast so it's the frontends
that would need to choose to emit or not emit it (or always emit it).

Note some people do like to have their intrisic code optimized, so there's
likely conflicting interest here.

Richard.

> >
> > FWIW, this is what I'd do, i.e. turn off fast-math for the function in
> > question:
> > https://godbolt.org/z/3cKq5hT1o
> >
> > --
> > ──────────────────────────────────────────────────────────────────────────
> >  Dr. Matthias Kretz                           https://mattkretz.github.io
> >  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
> >  std::experimental::simd              https://github.com/VcDevel/std-simd
> > ──────────────────────────────────────────────────────────────────────────
>
>
>
> --
> BR,
> Hongtao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not?
  2021-07-14  7:39       ` Richard Biener
@ 2021-07-14 16:45         ` Segher Boessenkool
  0 siblings, 0 replies; 10+ messages in thread
From: Segher Boessenkool @ 2021-07-14 16:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: Hongtao Liu, GCC-help, GCC Development, GCC Patches

On Wed, Jul 14, 2021 at 09:39:42AM +0200, Richard Biener via Gcc-help wrote:
> On Wed, Jul 14, 2021 at 9:00 AM Hongtao Liu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Wed, Jul 14, 2021 at 2:39 PM Matthias Kretz <m.kretz@gsi.de> wrote:
> > >
> > > On Wednesday, 14 July 2021 07:18:29 CEST Hongtao Liu via Gcc-help wrote:
> > > > On Wed, Jul 14, 2021 at 1:15 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > > Hi:
> > > > >   The original problem was that some users wanted the cmdline option
> > > > >
> > > > > -ffast-math not to act on intrinsic production code.
> > >
> > > This sounds like the users want intrinsics to map *directly* to the
> > Thanks for the reply.
> > I think the users want the mixed usage of fast-math and no-fast-math.
> > > corresponding instruction. If that's the case such users should use inline
> > > assembly, IMHO. If you compile a TU with -ffast-math then *all* floating-point
> > > operations are affected. Yes, more control over where to use fast-math and the
> > > ability to mix fast-math and no-fast-math without risking ODR violations would
> > > be great. But that's a larger issue, and one that would ideally be solved in
> > > WG14/WG21.
> > hmm, guess it would need a lot of work.

Yes.  And the biggest part of that is defining what the actual semantics
of this should be!  Code compiled with -ffast-math is allowed to do
completely *anything* if ever it encounters an infinity or denormal,
and the sign of zeroes can flip randomly, etc.  It cannot interoperate
with code compiled with -fno-fast-math in general, unless much care is
taken.

> -ffast-math decomposes to quite some flag_* and those generally are not
> reflected into the IL but can be different per function (and then
> prevent inlining).

Yeah.  And for most of those sub-flags you have the problems I talked
about above.

> Note some people do like to have their intrisic code optimized, so there's
> likely conflicting interest here.

If people do not want anything optimised they should use -O0, or write
assembler code instead.  GCC in general optimises where it can, as any
good compiler should.  There are various facilities for preventing
optimisations in much more targeted places of course -- and most of
those do not tell the compiler "do not do X", but they instead say
"here Y happens, and you do not know the details of that", effectively
telling the compiler "hands off!"


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC] c-family: Add __builtin_noassoc
       [not found]   ` <CAFiYyc2t690=B77Mwb9=-T_TzQvZUj7RnFVDmuaWDJBN81DtHw@mail.gmail.com>
@ 2021-07-16  8:57     ` Matthias Kretz
  2021-07-16  9:31       ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Matthias Kretz @ 2021-07-16  8:57 UTC (permalink / raw)
  To: Richard Biener, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2975 bytes --]

On Wednesday, 14 July 2021 10:14:55 CEST Richard Biener wrote:
> > > There's one "related" IL feature used by the Fortran frontend -
> > > PAREN_EXPR
> > > prevents association across it.  So for Fortran (when not
> > > -fno-protect-parens which is enabled by -Ofast), (a + b) - b cannot be
> > > optimized to a.  Eventually this could be used to wrap intrinsic results
> > > since most of the issues in the end require association.  Note
> > > PAREN_EXPR
> > > isn't exposed to the C family frontends but we could of course add a
> > > builtin-like thing for this _Noassoc ( .... ) or so.  Note PAREN_EXPR
> > > survives -Ofast so it's the frontends that would need to choose to emit
> > > or
> > > not emit it (or always emit it).
> >
> > Interesting. I want that builtin in C++. Currently I use inline asm to
> > achieve a similar effect. But the inline asm hammer is really too big for
> > the problem.
>
> I think implementing it similar to how we do __builtin_shufflevector would
> be easily possible.  PAREN_EXPR is a tree code.

Like this? If you like it, I'll write the missing documentation and do real 
regression testing.

---

New builtin to enable explicit use of PAREN_EXPR in C & C++ code.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

gcc/testsuite/ChangeLog:

	* c-c++-common/builtin-noassoc-1.c: New test.

gcc/cp/ChangeLog:

	* cp-objcp-common.c (names_builtin_p): Handle
	RID_BUILTIN_NOASSOC.
	* parser.c (cp_parser_postfix_expression): Handle
	RID_BUILTIN_NOASSOC.

gcc/c-family/ChangeLog:

	* c-common.c (c_common_reswords): Add __builtin_noassoc.
	* c-common.h (enum rid): Add RID_BUILTIN_NOASSOC.

gcc/c/ChangeLog:

	* c-decl.c (names_builtin_p): Handle RID_BUILTIN_NOASSOC.
	* c-parser.c (c_parser_postfix_expression): Likewise.
---
 gcc/c-family/c-common.c                       |  1 +
 gcc/c-family/c-common.h                       |  2 +-
 gcc/c/c-decl.c                                |  1 +
 gcc/c/c-parser.c                              | 20 ++++++++++++++++
 gcc/cp/cp-objcp-common.c                      |  1 +
 gcc/cp/parser.c                               | 14 +++++++++++
 .../c-c++-common/builtin-noassoc-1.c          | 24 +++++++++++++++++++
 7 files changed, 62 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/builtin-noassoc-1.c


-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0001-c-family-Add-__builtin_noassoc.patch --]
[-- Type: text/x-patch, Size: 4822 bytes --]

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 681fcc972f4..e74123d896c 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -384,6 +384,7 @@ const struct c_common_resword c_common_reswords[] =
   { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 },
   { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 },
   { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
+  { "__builtin_noassoc", RID_BUILTIN_NOASSOC, 0 },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
   { "__builtin_shufflevector", RID_BUILTIN_SHUFFLEVECTOR, 0 },
   { "__builtin_tgmath", RID_BUILTIN_TGMATH, D_CONLY },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 50ca8fb6ebd..b772cf9c5e9 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -108,7 +108,7 @@ enum rid
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,      RID_CHOOSE_EXPR,
   RID_TYPES_COMPATIBLE_P,      RID_BUILTIN_COMPLEX,	     RID_BUILTIN_SHUFFLE,
   RID_BUILTIN_SHUFFLEVECTOR,   RID_BUILTIN_CONVERTVECTOR,   RID_BUILTIN_TGMATH,
-  RID_BUILTIN_HAS_ATTRIBUTE,
+  RID_BUILTIN_HAS_ATTRIBUTE,   RID_BUILTIN_NOASSOC,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
 
   /* TS 18661-3 keywords, in the same sequence as the TI_* values.  */
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 983d65e930c..7b7ecba026f 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -10557,6 +10557,7 @@ names_builtin_p (const char *name)
     case RID_BUILTIN_HAS_ATTRIBUTE:
     case RID_BUILTIN_SHUFFLE:
     case RID_BUILTIN_SHUFFLEVECTOR:
+    case RID_BUILTIN_NOASSOC:
     case RID_CHOOSE_EXPR:
     case RID_OFFSETOF:
     case RID_TYPES_COMPATIBLE_P:
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 9a56e0c04c6..2b40dc8253e 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -8931,6 +8931,7 @@ c_parser_predefined_identifier (c_parser *parser)
 			 assignment-expression ,
 			 assignment-expression, )
      __builtin_convertvector ( assignment-expression , type-name )
+     __builtin_noassoc ( assignment-expression )
 
    offsetof-member-designator:
      identifier
@@ -10076,6 +10077,25 @@ c_parser_postfix_expression (c_parser *parser)
 	      }
 	  }
 	  break;
+	case RID_BUILTIN_NOASSOC:
+	  {
+	    location_t start_loc = loc;
+	    c_parser_consume_token (parser);
+	    matching_parens parens;
+	    if (!parens.require_open (parser))
+	      {
+		expr.set_error ();
+		break;
+	      }
+	    e1 = c_parser_expr_no_commas (parser, NULL);
+	    mark_exp_read (e1.value);
+	    location_t end_loc = c_parser_peek_token (parser)->get_finish ();
+	    parens.skip_until_found_close (parser);
+	    expr.value = build1_loc (loc, PAREN_EXPR, TREE_TYPE (e1.value),
+				     e1.value);
+	    set_c_expr_source_range (&expr, start_loc, end_loc);
+	  }
+	  break;
 	case RID_AT_SELECTOR:
 	  {
 	    gcc_assert (c_dialect_objc ());
diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c
index ee255732d5a..2492094508b 100644
--- a/gcc/cp/cp-objcp-common.c
+++ b/gcc/cp/cp-objcp-common.c
@@ -395,6 +395,7 @@ names_builtin_p (const char *name)
     case RID_BUILTIN_SHUFFLE:
     case RID_BUILTIN_SHUFFLEVECTOR:
     case RID_BUILTIN_LAUNDER:
+    case RID_BUILTIN_NOASSOC:
     case RID_BUILTIN_BIT_CAST:
     case RID_OFFSETOF:
     case RID_HAS_NOTHROW_ASSIGN:
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 62f3465539b..49f2dcc0bfd 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -7316,6 +7316,7 @@ cp_parser_postfix_expression (cp_parser *parser, bool address_p, bool cast_p,
     case RID_BUILTIN_SHUFFLE:
     case RID_BUILTIN_SHUFFLEVECTOR:
     case RID_BUILTIN_LAUNDER:
+    case RID_BUILTIN_NOASSOC:
       {
 	vec<tree, va_gc> *vec;
 
@@ -7358,6 +7359,19 @@ cp_parser_postfix_expression (cp_parser *parser, bool address_p, bool cast_p,
 	      }
 	    break;
 
+	  case RID_BUILTIN_NOASSOC:
+	    if (vec->length () == 1)
+	      postfix_expression = build1_loc (loc, PAREN_EXPR,
+					       TREE_TYPE ((*vec)[0]),
+					       (*vec)[0]);
+	    else
+	      {
+		error_at (loc, "wrong number of arguments to "
+			       "%<__builtin_noassoc%>");
+		postfix_expression = error_mark_node;
+	      }
+	    break;
+
 	  case RID_BUILTIN_SHUFFLE:
 	    if (vec->length () == 2)
 	      postfix_expression
diff --git a/gcc/testsuite/c-c++-common/builtin-noassoc-1.c b/gcc/testsuite/c-c++-common/builtin-noassoc-1.c
new file mode 100644
index 00000000000..ff55c666264
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/builtin-noassoc-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+
+float a = 1.f;
+float b = 1.e20f;
+
+__attribute__((optimize("-ffast-math")))
+float
+fast()
+{
+  return __builtin_noassoc (a + b) - b;
+}
+
+float
+normal()
+{
+  return a + b - b;
+}
+
+int main()
+{
+  if (fast() != normal())
+    __builtin_abort();
+  return 0;
+}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] c-family: Add __builtin_noassoc
  2021-07-16  8:57     ` [RFC] c-family: Add __builtin_noassoc Matthias Kretz
@ 2021-07-16  9:31       ` Richard Biener
  2021-07-16 12:00         ` Matthias Kretz
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2021-07-16  9:31 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: GCC Patches, Hongtao Liu

On Fri, Jul 16, 2021 at 10:57 AM Matthias Kretz <m.kretz@gsi.de> wrote:
>
> On Wednesday, 14 July 2021 10:14:55 CEST Richard Biener wrote:
> > > > There's one "related" IL feature used by the Fortran frontend -
> > > > PAREN_EXPR
> > > > prevents association across it.  So for Fortran (when not
> > > > -fno-protect-parens which is enabled by -Ofast), (a + b) - b cannot be
> > > > optimized to a.  Eventually this could be used to wrap intrinsic results
> > > > since most of the issues in the end require association.  Note
> > > > PAREN_EXPR
> > > > isn't exposed to the C family frontends but we could of course add a
> > > > builtin-like thing for this _Noassoc ( .... ) or so.  Note PAREN_EXPR
> > > > survives -Ofast so it's the frontends that would need to choose to emit
> > > > or
> > > > not emit it (or always emit it).
> > >
> > > Interesting. I want that builtin in C++. Currently I use inline asm to
> > > achieve a similar effect. But the inline asm hammer is really too big for
> > > the problem.
> >
> > I think implementing it similar to how we do __builtin_shufflevector would
> > be easily possible.  PAREN_EXPR is a tree code.
>
> Like this? If you like it, I'll write the missing documentation and do real
> regression testing.

Yes, like this.  Now, __builtin_noassoc (a + b + c) might suggest that
it prevents a + b + c from being re-associated - but it does not.  PAREN_EXPR
is a barrier for association, so for 'a + b + c + PAREN_EXPR <d + e + f>'
the a+b+c and d+e+f chains will not mix but they individually can be
re-associated.  That said __builtin_noassoc might be a bad name,
maybe __builtin_assoc_barrier is better?

The implementation is originally for the Fortran language semantics
which allows re-association but respects parens (thus PAREN_EXPR).

To fully prevent association of a a + b + d + e chain you need at least
two PAREN_EXPRs, for example (a+b) + (d+e) would do.

One could of course provide __builtin_noassoc (a+b+c+d) with the
implied semantics and insert PAREN_EXPRs around all operands
when lowering it.

Not sure what's more useful in practice - directly exposing the middle-end
PAREN_EXPR or providing a way to mark a whole expression as to be
not re-associated?  Maybe both?

Richard.

> ---
>
> New builtin to enable explicit use of PAREN_EXPR in C & C++ code.
>
> Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
>
> gcc/testsuite/ChangeLog:
>
>         * c-c++-common/builtin-noassoc-1.c: New test.
>
> gcc/cp/ChangeLog:
>
>         * cp-objcp-common.c (names_builtin_p): Handle
>         RID_BUILTIN_NOASSOC.
>         * parser.c (cp_parser_postfix_expression): Handle
>         RID_BUILTIN_NOASSOC.
>
> gcc/c-family/ChangeLog:
>
>         * c-common.c (c_common_reswords): Add __builtin_noassoc.
>         * c-common.h (enum rid): Add RID_BUILTIN_NOASSOC.
>
> gcc/c/ChangeLog:
>
>         * c-decl.c (names_builtin_p): Handle RID_BUILTIN_NOASSOC.
>         * c-parser.c (c_parser_postfix_expression): Likewise.
> ---
>  gcc/c-family/c-common.c                       |  1 +
>  gcc/c-family/c-common.h                       |  2 +-
>  gcc/c/c-decl.c                                |  1 +
>  gcc/c/c-parser.c                              | 20 ++++++++++++++++
>  gcc/cp/cp-objcp-common.c                      |  1 +
>  gcc/cp/parser.c                               | 14 +++++++++++
>  .../c-c++-common/builtin-noassoc-1.c          | 24 +++++++++++++++++++
>  7 files changed, 62 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/c-c++-common/builtin-noassoc-1.c
>
>
> --
> ──────────────────────────────────────────────────────────────────────────
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  std::experimental::simd              https://github.com/VcDevel/std-simd
> ──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] c-family: Add __builtin_noassoc
  2021-07-16  9:31       ` Richard Biener
@ 2021-07-16 12:00         ` Matthias Kretz
  2021-07-19  7:16           ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Matthias Kretz @ 2021-07-16 12:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Hongtao Liu

On Friday, 16 July 2021 11:31:29 CEST Richard Biener wrote:
> On Fri, Jul 16, 2021 at 10:57 AM Matthias Kretz <m.kretz@gsi.de> wrote:
> > On Wednesday, 14 July 2021 10:14:55 CEST Richard Biener wrote:
> > > I think implementing it similar to how we do __builtin_shufflevector
> > > would
> > > be easily possible.  PAREN_EXPR is a tree code.
> > 
> > Like this? If you like it, I'll write the missing documentation and do
> > real
> > regression testing.
> 
> Yes, like this.  Now, __builtin_noassoc (a + b + c) might suggest that
> it prevents a + b + c from being re-associated - but it does not. 
> PAREN_EXPR is a barrier for association, so for 'a + b + c + PAREN_EXPR <d
> + e + f>' the a+b+c and d+e+f chains will not mix but they individually can
> be re-associated.  That said __builtin_noassoc might be a bad name,
> maybe __builtin_assoc_barrier is better?

Yes, I agree with renaming it. And assoc_barrier sounds intuitive to me.

> To fully prevent association of a a + b + d + e chain you need at least
> two PAREN_EXPRs, for example (a+b) + (d+e) would do.
> 
> One could of course provide __builtin_noassoc (a+b+c+d) with the
> implied semantics and insert PAREN_EXPRs around all operands
> when lowering it.

I wouldn't want to go there. __builtin_noassoc(f(x, y, z))? We probably both 
agree that it would be a no-op, but it reads like f should be evaluated with -
fno-associative-math.

> Not sure what's more useful in practice - directly exposing the middle-end
> PAREN_EXPR or providing a way to mark a whole expression as to be
> not re-associated?  Maybe both?

I think this is a tool for specialists. Give them the low-level tool and 
they'll build whatever higher level abstractions they need on top of it. Like

float sum_noassoc(RangeOfFloats auto x) {
  float sum = 0;
  for (float v : x)
    sum = __builtin_assoc_barrier(v + x);
  return sum;
}

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] c-family: Add __builtin_noassoc
  2021-07-16 12:00         ` Matthias Kretz
@ 2021-07-19  7:16           ` Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2021-07-19  7:16 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: GCC Patches, Hongtao Liu

On Fri, Jul 16, 2021 at 2:00 PM Matthias Kretz <m.kretz@gsi.de> wrote:
>
> On Friday, 16 July 2021 11:31:29 CEST Richard Biener wrote:
> > On Fri, Jul 16, 2021 at 10:57 AM Matthias Kretz <m.kretz@gsi.de> wrote:
> > > On Wednesday, 14 July 2021 10:14:55 CEST Richard Biener wrote:
> > > > I think implementing it similar to how we do __builtin_shufflevector
> > > > would
> > > > be easily possible.  PAREN_EXPR is a tree code.
> > >
> > > Like this? If you like it, I'll write the missing documentation and do
> > > real
> > > regression testing.
> >
> > Yes, like this.  Now, __builtin_noassoc (a + b + c) might suggest that
> > it prevents a + b + c from being re-associated - but it does not.
> > PAREN_EXPR is a barrier for association, so for 'a + b + c + PAREN_EXPR <d
> > + e + f>' the a+b+c and d+e+f chains will not mix but they individually can
> > be re-associated.  That said __builtin_noassoc might be a bad name,
> > maybe __builtin_assoc_barrier is better?
>
> Yes, I agree with renaming it. And assoc_barrier sounds intuitive to me.
>
> > To fully prevent association of a a + b + d + e chain you need at least
> > two PAREN_EXPRs, for example (a+b) + (d+e) would do.
> >
> > One could of course provide __builtin_noassoc (a+b+c+d) with the
> > implied semantics and insert PAREN_EXPRs around all operands
> > when lowering it.
>
> I wouldn't want to go there. __builtin_noassoc(f(x, y, z))? We probably both
> agree that it would be a no-op, but it reads like f should be evaluated with -
> fno-associative-math.

Ah, true - yeah, let's not go down this rathole.

> > Not sure what's more useful in practice - directly exposing the middle-end
> > PAREN_EXPR or providing a way to mark a whole expression as to be
> > not re-associated?  Maybe both?
>
> I think this is a tool for specialists. Give them the low-level tool and
> they'll build whatever higher level abstractions they need on top of it. Like

OK, fine.

> float sum_noassoc(RangeOfFloats auto x) {
>   float sum = 0;
>   for (float v : x)
>     sum = __builtin_assoc_barrier(v + x);
>   return sum;
> }
>
> --
> ──────────────────────────────────────────────────────────────────────────
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  std::experimental::simd              https://github.com/VcDevel/std-simd
> ──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-07-19  7:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-14  5:15 [Questions] Is there any bit in gimple/rtl to indicate this IR support fast-math or not? Hongtao Liu
2021-07-14  5:18 ` Hongtao Liu
2021-07-14  6:39   ` Matthias Kretz
2021-07-14  7:04     ` Hongtao Liu
2021-07-14  7:39       ` Richard Biener
2021-07-14 16:45         ` Segher Boessenkool
     [not found] ` <47833128.jc7mZyZMxp@excalibur>
     [not found]   ` <CAFiYyc2t690=B77Mwb9=-T_TzQvZUj7RnFVDmuaWDJBN81DtHw@mail.gmail.com>
2021-07-16  8:57     ` [RFC] c-family: Add __builtin_noassoc Matthias Kretz
2021-07-16  9:31       ` Richard Biener
2021-07-16 12:00         ` Matthias Kretz
2021-07-19  7:16           ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).