[PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit
@ 2022-12-14 20:29 Michael Meissner
  2022-12-14 20:32 ` [PATCH 1/3, V3] PR 107299, Rework 128-bit complex multiply and divide Michael Meissner
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Michael Meissner @ 2022-12-14 20:29 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt

This set of patches was first submitted on November 1st.  Kewen.Lin
<linkw@linux.ibm.com> asked for some changes to the first set of patches.  I
also tried to clean up the comments in the second patch about types that Segher
Boessenkool <segher@kernel.crashing.org> mentioned.

I had just re-submitted the first patch yesterday, but Segher asked that I
repost all three patches.  Here is the original commentary for all three
patches, tweaked a little bit:

These 3 patches fix the problems with building GCC on PowerPC systems when long
double is configured to use the IEEE 128-bit format.

There are 3 patches in this patch set.  The first two patches are required to
fix the basic problem.  The third patch fixes some issues that were noticed
along the way.

The basic issue is internally within GCC there are several types for 128-bit
floating point.  The types are:

    1) The long double type (which use either TFmode for 128-bit long doubles
        or possibly DFmode for 64-bit long doubles).  In the normal case, long
        double is 128-bits (TFmode) and depending on the configuration options
        and the switches passed by the user at compilation time, long double is
        either the 128-bit IBM double-double type or IEEE 128-bit.

    2)  The type for __ibm128.  If long double is IBM 128-bit double-double,
        internally within the compiler, this type is the same as the long
        double type.  If long double is either IEEE 128-bit or is 64-bit, then
        this type is a separate type.  If long double is not double-double,
        this type will use IFmode during RTL.

    3)  The type for _Float128.  This type is always IEEE 128-bit if it exists.
        While it is a separate internal type, currently if long double is IEEE
        128-bit, this type uses TFmode once it gets to RTL, but within Gimple
        it is a separate type.  If long double is not IEEE 128-bit, then this
        type uses KFmode.  All of the f128 math functions defined by the
        compiler use this type.  In the past, _Float128 was a C extended type
        and not available in C++.  Now it is a part of the C/C++ 2x standards.

    4)  The type for __float128.  The history is I implemented __float128
        first, and several releases later, we added _Float128 as a standard C
        type.  Unfortunately, I didn't think things through enough when
        _Float128 came out.  Like __ibm128, it uses the long double type if
        long double is IEEE 128-bit, and now it uses the _Float128 type if long
        double is not IEEE 128-bit.  IMHO, this is the major problem.  The two
        IEEE 128-bit types should use the same type internally (or at least one
        should be a qualified type of the other).  Before we started adding
        more support for _Float128, it mostly works, but now it doesn't with
        more optimizations being done.

    5)  The error occurs in building _mulkc3 in libgcc, when the TFmode type in
        the code is defined to use attribute((mode(TF))), but the functions
        that are called all have _Float128 arguments.  These are separate
        types, and ultimately one of the consistancy checks fails because they
        are different types.

There are 3 patches in this set:

    1)  The first patch rewrites how the complex 128-bit multiply and divide
        functions are done in the compiler.  In the old scheme, essentially
        there were only two types ever being used, the long double type, and
        the not long double type.  The original code would make the names
        called of these functions to be __multc3/__divtc3 or
        __mulkc3/__divkc3.  This worked because there were only two types.
        With straightening out the types, so __float128/_Float128 is never the
        long double type, there are potentially 3-4 types.  However, the C
        front end and the middle end code will not let us create two built-in
        functions that have the same name.

        Patch #1 patch rips out this code, and rewrites it to be cleaner.

	In the original version of the patches, I disabled doing the mapping
	when building libgcc because it caused problems when building __mulkc3
	and __divkc3.  I have removed this check, since the second patch will
	allow these functions to be built without disabling the mapping.

    2)  The second patch fixes the problem of __float128 and _Float128 not
        being the same if long double is IEEE 128-bit.  After this patch, both
        _Float128 and __float128 types will always use the same type.  When we
        get to RTL, it will always use KFmode type (and not use TFmode).  The
        stdc++ library will not build if we use TFmode for these types due to
        the other changes.

        There is a minor codegen issue that if you explicitly use long double
        and call the F128 FMA (fused multiply-add) round to odd functions that
        are defined to use __float128/_Float128 arguments.  While we might be
        able to optimize these later, I don't think it is important to optimize
        the use of long double instead of __float128/_Float128.  Note, if you
        use the proper __float128/_Float128 types instead of long double, the
        code does the optimization.

	The issue is the extra conversions that are inserted to convert between
	KFmode and TFmode won't allow the combiner to use the current patterns
	to combine the multiply, add/subtract, and optionally negation into the
	single FMA instruction.

        By doing this change, it also fixes two tests that have been broken on
        IEEE 128-bit long double systems (float128-cmp2-runnable.c and
        nan128-1.c).  These two tests use __float128 variables and call nansq
        to create a signaling NaN.  Nansq is defined to be __builtin_nansf128,
        which returns a _Float128 Nan.  However, since in the current
        implementation before these patches, __float128 is a different type
        than _Float128 when long double is IEEE 128-bit, the machine
        independent code converts the signaling NaN into a non-signaling NaN.
        Since after these patches they are the same internal type, the
        signaling NaN is preserved.

    3)  Patch #3 fixes two tests that were still failing after patches #1 and
        #2 were applied (convert-fp-128.c and pr85657-3.c).  This patch fixes
	the conversions between the 128-bit floating point types.  In the past,
        we would always call rs6000_expand_float128_convert to do all
        conversions.  After this patch, the conversions between different types
        that have the same representation will be done in-line and not call the
        rs6000_expand_float128_convert function.

        In addition, in the past we missed some conversions, and the compiler
        would generate an external call, even though the types might have the
        same representation.

After these patches, there are 3 specific tests and 1 set of tests that fail
when using IEEE 128-bit long double:

    1)  fp128_conversions.c: I haven't looked at yet;

    2)  pr105334.c: This is a bug that __ibm128 doesn't work if the default
        long double is IEEE 128-bit and you use the options: -mlong-double-128
        -msoft-float (i.e. no -mabi=ibmlongdouble).  I believe I have patches
        for this floating around.

    3)  The g++.dg/cpp23/ext-floating1.C test is failing.  I believe we need to
        dig in to fix PowerPC specific ISO C/C++ 2x _Float128 support.  I have
        looked at it yet.

    4)  All/some of the G++ modules tests fail.  This is PR 98645, and it is
        assigned to Nathan Sidwell.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3, V3] PR 107299, Rework 128-bit complex multiply and divide
  2022-12-14 20:29 [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit Michael Meissner
@ 2022-12-14 20:32 ` Michael Meissner
  2022-12-14 20:34 ` [PATCH 2/3, V3] PR 107299, Make __float128 use the _Float128 type Michael Meissner
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2022-12-14 20:32 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt

This patch reworks how the complex multiply and divide built-in functions are
done.  Previously we created built-in declarations for doing long double complex
multiply and divide when long double is IEEE 128-bit.  The old code also did not
support __ibm128 complex multiply and divide if long double is IEEE 128-bit.

In terms of history, I wrote the original code just as I was starting to test
GCC on systems where IEEE 128-bit long double was the default.  At the time, we
had not yet started mangling the built-in function names as a way to bridge
going from a system with 128-bit IBM long double to 128-bin IEEE long double.

The original code depends on there only being two 128-bit types invovled.  With
the next patch in this series, this assumption will no longer be true.  When
long double is IEEE 128-bit, there will be 2 IEEE 128-bit types (one for the
explicit __float128/_Float128 type and one for long double).

The problem is we cannot create two separate built-in functions that resolve to
the same name.  This is a requirement of add_builtin_function and the C front
end.  That means for the 3 possible modes (IFmode, KFmode, and TFmode), you can
only use 2 of them.

This code does not create the built-in declaration with the changed name.
Instead, it uses the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the name
before it is written out to the assembler file like it now does for all of the
other long double built-in functions.

When I wrote these patches, I discovered that __ibm128 complex multiply and
divide had originally not been supported if long double is IEEE 128-bit as it
would generate calls to __mulic3 and __divic3.  I added tests in the testsuite
to verify that the correct name (i.e. __multc3 and __divtc3) is used in this
case.

I had previously sent this patch out on November 1st.  Compared to that version,
this version no longer disables the special mapping when you are building
libgcc, as it turns out we don't need it.

I tested all 3 patchs for PR target/107299 on:

    1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
    2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
    3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
    4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm

Once all 3 patches have been applied, we can once again build GCC when long
double is IEEE 128-bit.  There were no other regressions with these patches.
Can I check these patches into the trunk?

2022-12-14   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/107299
	* config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
	(init_float128_ieee): Delete code to switch complex multiply and divide
	for long double.
	(complex_multiply_builtin_code): New helper function.
	(complex_divide_builtin_code): Likewise.
	(rs6000_mangle_decl_assembler_name): Add support for mangling the name
	of complex 128-bit multiply and divide built-in functions.

gcc/testsuite/

	PR target/107299
	* gcc.target/powerpc/divic3-1.c: New test.
	* gcc.target/powerpc/divic3-2.c: Likewise.
	* gcc.target/powerpc/mulic3-1.c: Likewise.
	* gcc.target/powerpc/mulic3-2.c: Likewise.
---
 gcc/config/rs6000/rs6000.cc                 | 109 +++++++++++---------
 gcc/testsuite/gcc.target/powerpc/divic3-1.c |  18 ++++
 gcc/testsuite/gcc.target/powerpc/divic3-2.c |  17 +++
 gcc/testsuite/gcc.target/powerpc/mulic3-1.c |  18 ++++
 gcc/testsuite/gcc.target/powerpc/mulic3-2.c |  17 +++
 5 files changed, 132 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-2.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index b3a609f3aa3..6d08f6ed1fb 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -11101,26 +11101,6 @@ init_float128_ibm (machine_mode mode)
     }
 }
 
-/* Create a decl for either complex long double multiply or complex long double
-   divide when long double is IEEE 128-bit floating point.  We can't use
-   __multc3 and __divtc3 because the original long double using IBM extended
-   double used those names.  The complex multiply/divide functions are encoded
-   as builtin functions with a complex result and 4 scalar inputs.  */
-
-static void
-create_complex_muldiv (const char *name, built_in_function fncode, tree fntype)
-{
-  tree fndecl = add_builtin_function (name, fntype, fncode, BUILT_IN_NORMAL,
-				      name, NULL_TREE);
-
-  set_builtin_decl (fncode, fndecl, true);
-
-  if (TARGET_DEBUG_BUILTIN)
-    fprintf (stderr, "create complex %s, fncode: %d\n", name, (int) fncode);
-
-  return;
-}
-
 /* Set up IEEE 128-bit floating point routines.  Use different names if the
    arguments can be passed in a vector register.  The historical PowerPC
    implementation of IEEE 128-bit floating point used _q_<op> for the names, so
@@ -11132,32 +11112,6 @@ init_float128_ieee (machine_mode mode)
 {
   if (FLOAT128_VECTOR_P (mode))
     {
-      static bool complex_muldiv_init_p = false;
-
-      /* Set up to call __mulkc3 and __divkc3 under -mabi=ieeelongdouble.  If
-	 we have clone or target attributes, this will be called a second
-	 time.  We want to create the built-in function only once.  */
-     if (mode == TFmode && TARGET_IEEEQUAD && !complex_muldiv_init_p)
-       {
-	 complex_muldiv_init_p = true;
-	 built_in_function fncode_mul =
-	   (built_in_function) (BUILT_IN_COMPLEX_MUL_MIN + TCmode
-				- MIN_MODE_COMPLEX_FLOAT);
-	 built_in_function fncode_div =
-	   (built_in_function) (BUILT_IN_COMPLEX_DIV_MIN + TCmode
-				- MIN_MODE_COMPLEX_FLOAT);
-
-	 tree fntype = build_function_type_list (complex_long_double_type_node,
-						 long_double_type_node,
-						 long_double_type_node,
-						 long_double_type_node,
-						 long_double_type_node,
-						 NULL_TREE);
-
-	 create_complex_muldiv ("__mulkc3", fncode_mul, fntype);
-	 create_complex_muldiv ("__divkc3", fncode_div, fntype);
-       }
-
       set_optab_libfunc (add_optab, mode, "__addkf3");
       set_optab_libfunc (sub_optab, mode, "__subkf3");
       set_optab_libfunc (neg_optab, mode, "__negkf2");
@@ -28110,6 +28064,25 @@ rs6000_starting_frame_offset (void)
   return RS6000_STARTING_FRAME_OFFSET;
 }
 \f
+/* Internal function to return the built-in function id for the complex
+   multiply operation for a given mode.  */
+
+static inline built_in_function
+complex_multiply_builtin_code (machine_mode mode)
+{
+  return (built_in_function) (BUILT_IN_COMPLEX_MUL_MIN + mode
+			      - MIN_MODE_COMPLEX_FLOAT);
+}
+
+/* Internal function to return the built-in function id for the complex divide
+   operation for a given mode.  */
+
+static inline built_in_function
+complex_divide_builtin_code (machine_mode mode)
+{
+  return (built_in_function) (BUILT_IN_COMPLEX_DIV_MIN + mode
+			      - MIN_MODE_COMPLEX_FLOAT);
+}
 
 /* On 64-bit Linux and Freebsd systems, possibly switch the long double library
    function names from <foo>l to <foo>f128 if the default long double type is
@@ -28128,11 +28101,53 @@ rs6000_starting_frame_offset (void)
    only do this transformation if the __float128 type is enabled.  This
    prevents us from doing the transformation on older 32-bit ports that might
    have enabled using IEEE 128-bit floating point as the default long double
-   type.  */
+   type.
+
+   We also use the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the
+   function names used for complex multiply and divide to the appropriate
+   names.  */
 
 static tree
 rs6000_mangle_decl_assembler_name (tree decl, tree id)
 {
+  /* Handle complex multiply/divide.  For IEEE 128-bit, use __mulkc3 or
+     __divkc3 and for IBM 128-bit use __multc3 and __divtc3.  */
+  if (TARGET_FLOAT128_TYPE
+      && TREE_CODE (decl) == FUNCTION_DECL
+      && DECL_IS_UNDECLARED_BUILTIN (decl)
+      && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
+    {
+      built_in_function id = DECL_FUNCTION_CODE (decl);
+      const char *newname = NULL;
+
+      if (id == complex_multiply_builtin_code (KCmode))
+	newname = "__mulkc3";
+
+      else if (id == complex_multiply_builtin_code (ICmode))
+	newname = "__multc3";
+
+      else if (id == complex_multiply_builtin_code (TCmode))
+	newname = (TARGET_IEEEQUAD) ? "__mulkc3" : "__multc3";
+
+      else if (id == complex_divide_builtin_code (KCmode))
+	newname = "__divkc3";
+
+      else if (id == complex_divide_builtin_code (ICmode))
+	newname = "__divtc3";
+
+      else if (id == complex_divide_builtin_code (TCmode))
+	newname = (TARGET_IEEEQUAD) ? "__divkc3" : "__divtc3";
+
+      if (newname)
+	{
+	  if (TARGET_DEBUG_BUILTIN)
+	    fprintf (stderr, "Map complex mul/div => %s\n", newname);
+
+	  return get_identifier (newname);
+	}
+    }
+
+  /* Map long double built-in functions if long double is IEEE 128-bit.  */
   if (TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
       && TREE_CODE (decl) == FUNCTION_DECL
       && DECL_IS_UNDECLARED_BUILTIN (decl)
diff --git a/gcc/testsuite/gcc.target/powerpc/divic3-1.c b/gcc/testsuite/gcc.target/powerpc/divic3-1.c
new file mode 100644
index 00000000000..1cc6b1be904
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/divic3-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
+
+/* Check that complex divide generates the right call for __ibm128 when long
+   double is IEEE 128-bit floating point.  */
+
+typedef _Complex long double c_ibm128_t __attribute__((mode(__IC__)));
+
+void
+divide (c_ibm128_t *p, c_ibm128_t *q, c_ibm128_t *r)
+{
+  *p = *q / *r;
+}
+
+/* { dg-final { scan-assembler "bl __divtc3" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/divic3-2.c b/gcc/testsuite/gcc.target/powerpc/divic3-2.c
new file mode 100644
index 00000000000..8ff342e0116
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/divic3-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
+/* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */
+
+/* Check that complex divide generates the right call for __ibm128 when long
+   double is IBM 128-bit floating point.  */
+
+typedef _Complex long double c_ibm128_t __attribute__((mode(__TC__)));
+
+void
+divide (c_ibm128_t *p, c_ibm128_t *q, c_ibm128_t *r)
+{
+  *p = *q / *r;
+}
+
+/* { dg-final { scan-assembler "bl __divtc3" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mulic3-1.c b/gcc/testsuite/gcc.target/powerpc/mulic3-1.c
new file mode 100644
index 00000000000..4cd773c4b06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mulic3-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
+
+/* Check that complex multiply generates the right call for __ibm128 when long
+   double is IEEE 128-bit floating point.  */
+
+typedef _Complex long double c_ibm128_t __attribute__((mode(__IC__)));
+
+void
+multiply (c_ibm128_t *p, c_ibm128_t *q, c_ibm128_t *r)
+{
+  *p = *q * *r;
+}
+
+/* { dg-final { scan-assembler "bl __multc3" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mulic3-2.c b/gcc/testsuite/gcc.target/powerpc/mulic3-2.c
new file mode 100644
index 00000000000..36fe8bc3061
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mulic3-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
+/* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */
+
+/* Check that complex multiply generates the right call for __ibm128 when long
+   double is IBM 128-bit floating point.  */
+
+typedef _Complex long double c_ibm128_t __attribute__((mode(__TC__)));
+
+void
+multiply (c_ibm128_t *p, c_ibm128_t *q, c_ibm128_t *r)
+{
+  *p = *q * *r;
+}
+
+/* { dg-final { scan-assembler "bl __multc3" } } */
-- 
2.38.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/3, V3] PR 107299, Make __float128 use the _Float128 type
  2022-12-14 20:29 [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit Michael Meissner
  2022-12-14 20:32 ` [PATCH 1/3, V3] PR 107299, Rework 128-bit complex multiply and divide Michael Meissner
@ 2022-12-14 20:34 ` Michael Meissner
  2022-12-14 20:35 ` [PATCH 3/3, V3] PR 107299, Update float 128-bit conversion Michael Meissner
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2022-12-14 20:34 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt

This patch fixes the issue that GCC cannot build when the default long double
is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
during the evrp pass.  Ultimately it is failing because the code declared the
internal type for one IEEE 128-bit floating point type, and NaN functions use a
different IEEE 128-bit floating point type.

Gimple-range-fold uses the internal types, but there are similar problems when
the code is converted to RTL and the two different modes (KFmode, TFmode) are
used.

	typedef float TFtype __attribute__((mode (TF)));
	typedef __complex float TCtype __attribute__((mode (TC)));

	TCtype
	__mulkc3_sw (TFtype a, TFtype b, TFtype c, TFtype d)
	{
	  TFtype ac, bd, ad, bc, x, y;
	  TCtype res;

	  ac = a * c;
	  bd = b * d;
	  ad = a * d;
	  bc = b * c;

	  x = ac - bd;
	  y = ad + bc;

	  if (__builtin_isnan (x) && __builtin_isnan (y))
	    {
	      _Bool recalc = 0;
	      if (__builtin_isinf (a) || __builtin_isinf (b))
		{

		  a = __builtin_copysignf128 (__builtin_isinf (a) ? 1 : 0, a);
		  b = __builtin_copysignf128 (__builtin_isinf (b) ? 1 : 0, b);
		  if (__builtin_isnan (c))
		    c = __builtin_copysignf128 (0, c);
		  if (__builtin_isnan (d))
		    d = __builtin_copysignf128 (0, d);
		  recalc = 1;
		}
	      if (__builtin_isinf (c) || __builtin_isinf (d))
		{

		  c = __builtin_copysignf128 (__builtin_isinf (c) ? 1 : 0, c);
		  d = __builtin_copysignf128 (__builtin_isinf (d) ? 1 : 0, d);
		  if (__builtin_isnan (a))
		    a = __builtin_copysignf128 (0, a);
		  if (__builtin_isnan (b))
		    b = __builtin_copysignf128 (0, b);
		  recalc = 1;
		}
	      if (!recalc
		  && (__builtin_isinf (ac) || __builtin_isinf (bd)
		      || __builtin_isinf (ad) || __builtin_isinf (bc)))
		{

		  if (__builtin_isnan (a))
		    a = __builtin_copysignf128 (0, a);
		  if (__builtin_isnan (b))
		    b = __builtin_copysignf128 (0, b);
		  if (__builtin_isnan (c))
		    c = __builtin_copysignf128 (0, c);
		  if (__builtin_isnan (d))
		    d = __builtin_copysignf128 (0, d);
		  recalc = 1;
		}
	      if (recalc)
		{
		  x = __builtin_inff128 () * (a * c - b * d);
		  y = __builtin_inff128 () * (a * d + b * c);
		}
	    }

	  __real__ res = x;
	  __imag__ res = y;
	  return res;
	}

Currently GCC uses the long double type node for __float128 if long double is
IEEE 128-bit.  It did not use the node for _Float128.

Originally this was noticed if you call the nansq function to make a signaling
NaN (nansq is mapped to nansf128).  Because the type node for _Float128 is
different from __float128, the machine independent code converts signaling NaNs
to quiet NaNs if the types are not compatible.  The following tests used to
fail when run on a system where long double is IEEE 128-bit:

	gcc.dg/torture/float128-nan.c
	gcc.target/powerpc/nan128-1.c

This patch makes both __float128 and _Float128 use the same type node.

One side effect of not using the long double type node for __float128 is that we
must only use KFmode for _Float128/__float128.  The libstdc++ library won't
build if we use TFmode for _Float128 and __float128 when long double is IEEE
128-bit.

Another minor side effect is that the f128 round to odd fused multiply-add
function will not merge negatition with the FMA operation when the type is long
double.  If the type is __float128 or _Float128, then it will continue to do the
optimization.  The round to odd functions are defined in terms of __float128
arguments.  For example:

	long double
	do_fms (long double a, long double b, long double c)
	{
	    return __builtin_fmaf128_round_to_odd (a, b, -c);
	}

will generate (assuming -mabi=ieeelongdouble):

	xsnegqp 4,4
	xsmaddqpo 4,2,3
	xxlor 34,36,36

while:

	__float128
	do_fms (__float128 a, __float128 b, __float128 c)
	{
	    return __builtin_fmaf128_round_to_odd (a, b, -c);
	}

will generate:

	xsmsubqpo 4,2,3
	xxlor 34,36,36

Assuming this patch goes in, we can open a bug about the above optimizations not
working.  However, given that the functions are explicitly documented to use
__float128 types, and the code in the test is using long double, I don't think
it is a high priority issue.  The user should use the documented types.

I did experiment to do the support, and to to it properly, you need a bunch of
insns used by the combiner to deal with combining the conversion from TFmode to
KFmode along with the optimization in order to eventually combine the multiple
and add/subtract operations into a separate FMA.

I tested all 3 patchs for PR target/107299 on:

    1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
    2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
    3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
    4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm

Once all 3 patches have been applied, we can once again build GCC when long
double is IEEE 128-bit.  There were no other regressions with these patches.
Can I check these patches into the trunk?

2022-12-14   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/107299
	* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Always use the
	_Float128 type for __float128.
	(rs6000_expand_builtin): Only change a KFmode built-in to TFmode, if the
	built-in passes or returns TFmode.  If the predicate failed because the
	modes were different, use convert_move to load up the value instead of
	copy_to_mode_reg.
	* config/rs6000/rs6000.cc (rs6000_translate_mode_attribute): Don't
	translate IEEE 128-bit floating point modes to explicit IEEE 128-bit
	modes (KFmode or KCmode), even if long double is IEEE 128-bit.
	(rs6000_libgcc_floating_mode_supported_p): Support KFmode all of the
	time if we support IEEE 128-bit floating point.
	(rs6000_floatn_mode): _Float128 and _Float128x always uses KFmode.

gcc/testsuite/

	PR target/107299
	* gcc.target/powerpc/float128-hw12.c: New test.
	* gcc.target/powerpc/float128-hw13.c: Likewise.
	* gcc.target/powerpc/float128-hw4.c: Update insns.
---
 gcc/config/rs6000/rs6000-builtin.cc           | 237 ++++++++++--------
 gcc/config/rs6000/rs6000.cc                   |  31 ++-
 .../gcc.target/powerpc/float128-hw12.c        | 137 ++++++++++
 .../gcc.target/powerpc/float128-hw13.c        | 137 ++++++++++
 .../gcc.target/powerpc/float128-hw4.c         |  10 +-
 5 files changed, 431 insertions(+), 121 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw12.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw13.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc
index 90ab39dc258..e5298f45363 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -730,25 +730,28 @@ rs6000_init_builtins (void)
 
   if (TARGET_FLOAT128_TYPE)
     {
-      if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128)
-	ieee128_float_type_node = long_double_type_node;
-      else
+      /* In the past we used long_double_type_node when long double was IEEE
+	 128-bit.  However, this means that the _Float128 type
+	 (i.e. float128_type_node) is a different type from __float128
+	 (i.e. ieee128_float_type_nonde).  This leads to some corner cases,
+	 such as processing signaling NaNs with the nansf128 built-in function
+	 (which returns a _Float128 value) and assign it to a long double or
+	 __float128 value.  The two explicit IEEE 128-bit types should always
+	 use the same internal type.
+
+	 For C we only need to register the __ieee128 name for it.  For C++, we
+	 create a distinct type which will mangle differently (u9__ieee128)
+	 vs. _Float128 (DF128_) and behave backwards compatibly.  */
+      if (float128t_type_node == NULL_TREE)
 	{
-	  /* For C we only need to register the __ieee128 name for
-	     it.  For C++, we create a distinct type which will mangle
-	     differently (u9__ieee128) vs. _Float128 (DF128_) and behave
-	     backwards compatibly.  */
-	  if (float128t_type_node == NULL_TREE)
-	    {
-	      float128t_type_node = make_node (REAL_TYPE);
-	      TYPE_PRECISION (float128t_type_node)
-		= TYPE_PRECISION (float128_type_node);
-	      layout_type (float128t_type_node);
-	      SET_TYPE_MODE (float128t_type_node,
-			     TYPE_MODE (float128_type_node));
-	    }
-	  ieee128_float_type_node = float128t_type_node;
+	  float128t_type_node = make_node (REAL_TYPE);
+	  TYPE_PRECISION (float128t_type_node)
+	    = TYPE_PRECISION (float128_type_node);
+	  layout_type (float128t_type_node);
+	  SET_TYPE_MODE (float128t_type_node,
+			 TYPE_MODE (float128_type_node));
 	}
+      ieee128_float_type_node = float128t_type_node;
       t = build_qualified_type (ieee128_float_type_node, TYPE_QUAL_CONST);
       lang_hooks.types.register_builtin_type (ieee128_float_type_node,
 					      "__ieee128");
@@ -3265,13 +3268,13 @@ htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
 
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
-   (and in mode MODE if that's convenient).
+   (and in mode RETURN_MODE if that's convenient).
    SUBTARGET may be used as the target for computing one of EXP's operands.
    IGNORE is nonzero if the value is to be ignored.
    Use the new builtin infrastructure.  */
 rtx
 rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
-		       machine_mode /* mode */, int ignore)
+		       machine_mode return_mode, int ignore)
 {
   tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   enum rs6000_gen_builtins fcode
@@ -3287,78 +3290,99 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
   size_t uns_fcode = (size_t)fcode;
   enum insn_code icode = rs6000_builtin_info[uns_fcode].icode;
 
-  /* TODO: The following commentary and code is inherited from the original
-     builtin processing code.  The commentary is a bit confusing, with the
-     intent being that KFmode is always IEEE-128, IFmode is always IBM
-     double-double, and TFmode is the current long double.  The code is
-     confusing in that it converts from KFmode to TFmode pattern names,
-     when the other direction is more intuitive.  Try to address this.  */
-
-  /* We have two different modes (KFmode, TFmode) that are the IEEE
-     128-bit floating point type, depending on whether long double is the
-     IBM extended double (KFmode) or long double is IEEE 128-bit (TFmode).
-     It is simpler if we only define one variant of the built-in function,
-     and switch the code when defining it, rather than defining two built-
-     ins and using the overload table in rs6000-c.cc to switch between the
-     two.  If we don't have the proper assembler, don't do this switch
-     because CODE_FOR_*kf* and CODE_FOR_*tf* will be CODE_FOR_nothing.  */
-  if (FLOAT128_IEEE_P (TFmode))
-    switch (icode)
-      {
-      case CODE_FOR_sqrtkf2_odd:
-	icode = CODE_FOR_sqrttf2_odd;
-	break;
-      case CODE_FOR_trunckfdf2_odd:
-	icode = CODE_FOR_trunctfdf2_odd;
-	break;
-      case CODE_FOR_addkf3_odd:
-	icode = CODE_FOR_addtf3_odd;
-	break;
-      case CODE_FOR_subkf3_odd:
-	icode = CODE_FOR_subtf3_odd;
-	break;
-      case CODE_FOR_mulkf3_odd:
-	icode = CODE_FOR_multf3_odd;
-	break;
-      case CODE_FOR_divkf3_odd:
-	icode = CODE_FOR_divtf3_odd;
-	break;
-      case CODE_FOR_fmakf4_odd:
-	icode = CODE_FOR_fmatf4_odd;
-	break;
-      case CODE_FOR_xsxexpqp_kf:
-	icode = CODE_FOR_xsxexpqp_tf;
-	break;
-      case CODE_FOR_xsxsigqp_kf:
-	icode = CODE_FOR_xsxsigqp_tf;
-	break;
-      case CODE_FOR_xststdcnegqp_kf:
-	icode = CODE_FOR_xststdcnegqp_tf;
-	break;
-      case CODE_FOR_xsiexpqp_kf:
-	icode = CODE_FOR_xsiexpqp_tf;
-	break;
-      case CODE_FOR_xsiexpqpf_kf:
-	icode = CODE_FOR_xsiexpqpf_tf;
-	break;
-      case CODE_FOR_xststdcqp_kf:
-	icode = CODE_FOR_xststdcqp_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_eq_kf:
-	icode = CODE_FOR_xscmpexpqp_eq_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_lt_kf:
-	icode = CODE_FOR_xscmpexpqp_lt_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_gt_kf:
-	icode = CODE_FOR_xscmpexpqp_gt_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_unordered_kf:
-	icode = CODE_FOR_xscmpexpqp_unordered_tf;
-	break;
-      default:
-	break;
-      }
+  /* For 128-bit long double, we may need both the KFmode built-in functions
+     and IFmode built-in functions to the equivalent TFmode built-in function,
+     if either a TFmode result is expected or any of the arguments use
+     TFmode.  */
+  if (TARGET_LONG_DOUBLE_128)
+    {
+      bool uses_tf_mode = return_mode == TFmode;
+      if (!uses_tf_mode)
+	{
+	  call_expr_arg_iterator iter;
+	  tree arg;
+	  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+	    {
+	      if (arg != error_mark_node
+		  && TYPE_MODE (TREE_TYPE (arg)) == TFmode)
+		{
+		  uses_tf_mode = true;
+		  break;
+		}
+	    }
+	}
+
+      /* Convert KFmode built-in functions to TFmode when long double is IEEE
+	 128-bit.  */
+      if (uses_tf_mode && FLOAT128_IEEE_P (TFmode))
+	switch (icode)
+	  {
+	  case CODE_FOR_sqrtkf2_odd:
+	    icode = CODE_FOR_sqrttf2_odd;
+	    break;
+	  case CODE_FOR_trunckfdf2_odd:
+	    icode = CODE_FOR_trunctfdf2_odd;
+	    break;
+	  case CODE_FOR_addkf3_odd:
+	    icode = CODE_FOR_addtf3_odd;
+	    break;
+	  case CODE_FOR_subkf3_odd:
+	    icode = CODE_FOR_subtf3_odd;
+	    break;
+	  case CODE_FOR_mulkf3_odd:
+	    icode = CODE_FOR_multf3_odd;
+	    break;
+	  case CODE_FOR_divkf3_odd:
+	    icode = CODE_FOR_divtf3_odd;
+	    break;
+	  case CODE_FOR_fmakf4_odd:
+	    icode = CODE_FOR_fmatf4_odd;
+	    break;
+	  case CODE_FOR_xsxexpqp_kf:
+	    icode = CODE_FOR_xsxexpqp_tf;
+	    break;
+	  case CODE_FOR_xsxsigqp_kf:
+	    icode = CODE_FOR_xsxsigqp_tf;
+	    break;
+	  case CODE_FOR_xststdcnegqp_kf:
+	    icode = CODE_FOR_xststdcnegqp_tf;
+	    break;
+	  case CODE_FOR_xsiexpqp_kf:
+	    icode = CODE_FOR_xsiexpqp_tf;
+	    break;
+	  case CODE_FOR_xsiexpqpf_kf:
+	    icode = CODE_FOR_xsiexpqpf_tf;
+	    break;
+	  case CODE_FOR_xststdcqp_kf:
+	    icode = CODE_FOR_xststdcqp_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_eq_kf:
+	    icode = CODE_FOR_xscmpexpqp_eq_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_lt_kf:
+	    icode = CODE_FOR_xscmpexpqp_lt_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_gt_kf:
+	    icode = CODE_FOR_xscmpexpqp_gt_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_unordered_kf:
+	    icode = CODE_FOR_xscmpexpqp_unordered_tf;
+	    break;
+	  default:
+	    break;
+	  }
+
+      /* Convert IFmode built-in functions to TFmode when long double is IBM
+	 128-bit.  */
+      else if (uses_tf_mode && FLOAT128_IBM_P (TFmode))
+	{
+	  if (icode == CODE_FOR_packif)
+	    icode = CODE_FOR_packtf;
+
+	  else if (icode == CODE_FOR_unpackif)
+	    icode = CODE_FOR_unpacktf;
+	}
+    }
 
   /* In case of "#pragma target" changes, we initialize all builtins
      but check for actual availability now, during expand time.  For
@@ -3481,18 +3505,6 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
 
   if (bif_is_ibm128 (*bifaddr) && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)
     {
-      if (fcode == RS6000_BIF_PACK_IF)
-	{
-	  icode = CODE_FOR_packtf;
-	  fcode = RS6000_BIF_PACK_TF;
-	  uns_fcode = (size_t) fcode;
-	}
-      else if (fcode == RS6000_BIF_UNPACK_IF)
-	{
-	  icode = CODE_FOR_unpacktf;
-	  fcode = RS6000_BIF_UNPACK_TF;
-	  uns_fcode = (size_t) fcode;
-	}
     }
 
   /* TRUE iff the built-in function returns void.  */
@@ -3647,7 +3659,24 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
 
   for (int i = 0; i < nargs; i++)
     if (!insn_data[icode].operand[i+k].predicate (op[i], mode[i+k]))
-      op[i] = copy_to_mode_reg (mode[i+k], op[i]);
+      {
+	/* If the predicate failed because the modes are different, do a
+	   convert instead of copy_to_mode_reg, since copy_to_mode_reg will
+	   abort in this case.  The modes might be different if we have two
+	   different 128-bit floating point modes (i.e. KFmode/TFmode if long
+	   double is IEEE 128-bit and IFmode/TFmode if long double is IBM
+	   128-bit).  */
+	machine_mode mode_insn = mode[i+k];
+	machine_mode mode_op = GET_MODE (op[i]);
+	if (mode_insn != mode_op && mode_op != VOIDmode)
+	  {
+	    rtx tmp = gen_reg_rtx (mode_insn);
+	    convert_move (tmp, op[i], 0);
+	    op[i] = tmp;
+	  }
+	else
+	  op[i] = copy_to_mode_reg (mode_insn, op[i]);
+      }
 
   rtx pat;
 
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 6d08f6ed1fb..604f6a9ce33 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -23819,15 +23819,23 @@ rs6000_eh_return_filter_mode (void)
   return TARGET_32BIT ? SImode : word_mode;
 }
 
-/* Target hook for translate_mode_attribute.  */
+/* Target hook for translate_mode_attribute.
+
+   When -mabi=ieeelongdouble is used, we want to translate either KFmode or
+   TFmode to KFmode.  This is because user code that wants to specify IEEE
+   128-bit types will use either TFmode or KFmode, and we really want to use
+   the _Float128 and __float128 types instead of long double.
+
+   Similarly when -mabi=ibmlongdouble is used, we want to map IFmode into
+   TFmode.  */
 static machine_mode
 rs6000_translate_mode_attribute (machine_mode mode)
 {
-  if ((FLOAT128_IEEE_P (mode)
-       && ieee128_float_type_node == long_double_type_node)
-      || (FLOAT128_IBM_P (mode)
-	  && ibm128_float_type_node == long_double_type_node))
+  if (FLOAT128_IBM_P (mode)
+      && ibm128_float_type_node == long_double_type_node)
     return COMPLEX_MODE_P (mode) ? E_TCmode : E_TFmode;
+  else if (FLOAT128_IEEE_P (mode))
+    return COMPLEX_MODE_P (mode) ? E_KCmode : E_KFmode;
   return mode;
 }
 
@@ -23863,13 +23871,10 @@ rs6000_libgcc_floating_mode_supported_p (scalar_float_mode mode)
     case E_TFmode:
       return true;
 
-      /* We only return true for KFmode if IEEE 128-bit types are supported, and
-	 if long double does not use the IEEE 128-bit format.  If long double
-	 uses the IEEE 128-bit format, it will use TFmode and not KFmode.
-	 Because the code will not use KFmode in that case, there will be aborts
-	 because it can't find KFmode in the Floatn types.  */
+      /* We only return true for KFmode if IEEE 128-bit types are
+	 supported.  */
     case E_KFmode:
-      return TARGET_FLOAT128_TYPE && !TARGET_IEEEQUAD;
+      return TARGET_FLOAT128_TYPE;
 
     default:
       return false;
@@ -23903,7 +23908,7 @@ rs6000_floatn_mode (int n, bool extended)
 
 	case 64:
 	  if (TARGET_FLOAT128_TYPE)
-	    return (FLOAT128_IEEE_P (TFmode)) ? TFmode : KFmode;
+	    return KFmode;
 	  else
 	    return opt_scalar_float_mode ();
 
@@ -23927,7 +23932,7 @@ rs6000_floatn_mode (int n, bool extended)
 
 	case 128:
 	  if (TARGET_FLOAT128_TYPE)
-	    return (FLOAT128_IEEE_P (TFmode)) ? TFmode : KFmode;
+	    return KFmode;
 	  else
 	    return opt_scalar_float_mode ();
 
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw12.c b/gcc/testsuite/gcc.target/powerpc/float128-hw12.c
new file mode 100644
index 00000000000..d08b4cbc883
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw12.c
@@ -0,0 +1,137 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target float128 } */
+/* { dg-options "-mpower9-vector -O2 -mabi=ieeelongdouble -Wno-psabi" } */
+
+/* Insure that the ISA 3.0 IEEE 128-bit floating point built-in functions work
+   with _Float128.  This is the same as float128-hw4.c, except the type
+   _Float128 is used, and the IEEE 128-bit long double ABI is used.  */
+
+#ifndef TYPE
+#define TYPE _Float128
+#endif
+
+unsigned int
+get_double_exponent (double a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned int
+get_float128_exponent (TYPE a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned long
+get_double_mantissa (double a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+__uint128_t
+get_float128_mantissa (TYPE a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+double
+set_double_exponent_ulong (unsigned long a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_uint128 (__uint128_t a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+double
+set_double_exponent_double (double a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_float128 (TYPE a, __uint128_t e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+sqrt_odd (TYPE a)
+{
+  return __builtin_sqrtf128_round_to_odd (a);
+}
+
+double
+trunc_odd (TYPE a)
+{
+  return __builtin_truncf128_round_to_odd (a);
+}
+
+TYPE
+add_odd (TYPE a, TYPE b)
+{
+  return __builtin_addf128_round_to_odd (a, b);
+}
+
+TYPE
+sub_odd (TYPE a, TYPE b)
+{
+  return __builtin_subf128_round_to_odd (a, b);
+}
+
+TYPE
+mul_odd (TYPE a, TYPE b)
+{
+  return __builtin_mulf128_round_to_odd (a, b);
+}
+
+TYPE
+div_odd (TYPE a, TYPE b)
+{
+  return __builtin_divf128_round_to_odd (a, b);
+}
+
+TYPE
+fma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+fms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+TYPE
+nfma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+nfms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+/* { dg-final { scan-assembler 	   {\mxsiexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsiexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsaddqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsdivqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsmaddqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmsubqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmulqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsnmaddqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxsnmsubqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxssqrtqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxssubqpo\M}   } } */
+/* { dg-final { scan-assembler-not {\mbl\M}         } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw13.c b/gcc/testsuite/gcc.target/powerpc/float128-hw13.c
new file mode 100644
index 00000000000..51a3cd4802b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw13.c
@@ -0,0 +1,137 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target float128 } */
+/* { dg-options "-mpower9-vector -O2 -mabi=ibmlongdouble -Wno-psabi" } */
+
+/* Insure that the ISA 3.0 IEEE 128-bit floating point built-in functions work
+   with __float128.  This is the same as float128-hw4.c, except the type
+   __float128 is used, and the IBM 128-bit long double ABI is used.  */
+
+#ifndef TYPE
+#define TYPE __float128
+#endif
+
+unsigned int
+get_double_exponent (double a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned int
+get_float128_exponent (TYPE a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned long
+get_double_mantissa (double a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+__uint128_t
+get_float128_mantissa (TYPE a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+double
+set_double_exponent_ulong (unsigned long a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_uint128 (__uint128_t a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+double
+set_double_exponent_double (double a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_float128 (TYPE a, __uint128_t e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+sqrt_odd (TYPE a)
+{
+  return __builtin_sqrtf128_round_to_odd (a);
+}
+
+double
+trunc_odd (TYPE a)
+{
+  return __builtin_truncf128_round_to_odd (a);
+}
+
+TYPE
+add_odd (TYPE a, TYPE b)
+{
+  return __builtin_addf128_round_to_odd (a, b);
+}
+
+TYPE
+sub_odd (TYPE a, TYPE b)
+{
+  return __builtin_subf128_round_to_odd (a, b);
+}
+
+TYPE
+mul_odd (TYPE a, TYPE b)
+{
+  return __builtin_mulf128_round_to_odd (a, b);
+}
+
+TYPE
+div_odd (TYPE a, TYPE b)
+{
+  return __builtin_divf128_round_to_odd (a, b);
+}
+
+TYPE
+fma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+fms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+TYPE
+nfma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+nfms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+/* { dg-final { scan-assembler 	   {\mxsiexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsiexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsaddqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsdivqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsmaddqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmsubqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmulqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsnmaddqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxsnmsubqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxssqrtqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxssubqpo\M}   } } */
+/* { dg-final { scan-assembler-not {\mbl\M}         } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw4.c b/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
index fc149169bc6..3f6717825b7 100644
--- a/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
@@ -118,6 +118,11 @@ nfms_odd (TYPE a, TYPE b, TYPE c)
   return -__builtin_fmaf128_round_to_odd (a, b, -c);
 }
 
+/* In using long double instead of _Float128, we might not be able to optimize
+   __builtin_fmaf128_round_to_odd (a, b, -c) into using xsmsubqpo instead of
+   xsnegqp and xsmaddqpo due to conversions between TFmode and KFmode.  So just
+   recognize that the did the FMA optimization.  */
+
 /* { dg-final { scan-assembler 	   {\mxsiexpdp\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsiexpqp\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsxexpdp\M}   } } */
@@ -126,11 +131,8 @@ nfms_odd (TYPE a, TYPE b, TYPE c)
 /* { dg-final { scan-assembler 	   {\mxsxsigqp\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsaddqpo\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsdivqpo\M}   } } */
-/* { dg-final { scan-assembler 	   {\mxsmaddqpo\M}  } } */
-/* { dg-final { scan-assembler 	   {\mxsmsubqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsn?m(add|sub)qpo\M} } } */
 /* { dg-final { scan-assembler 	   {\mxsmulqpo\M}   } } */
-/* { dg-final { scan-assembler 	   {\mxsnmaddqpo\M} } } */
-/* { dg-final { scan-assembler 	   {\mxsnmsubqpo\M} } } */
 /* { dg-final { scan-assembler 	   {\mxssqrtqpo\M}  } } */
 /* { dg-final { scan-assembler 	   {\mxssubqpo\M}   } } */
 /* { dg-final { scan-assembler-not {\mbl\M}         } } */
-- 
2.38.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3/3, V3] PR 107299, Update float 128-bit conversion
  2022-12-14 20:29 [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit Michael Meissner
  2022-12-14 20:32 ` [PATCH 1/3, V3] PR 107299, Rework 128-bit complex multiply and divide Michael Meissner
  2022-12-14 20:34 ` [PATCH 2/3, V3] PR 107299, Make __float128 use the _Float128 type Michael Meissner
@ 2022-12-14 20:35 ` Michael Meissner
  2023-02-27 23:26 ` Ping: [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit Michael Meissner
  2023-03-02 20:26 ` Segher Boessenkool
  4 siblings, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2022-12-14 20:35 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt

This patch fixes two tests that are still failing when long double is IEEE
128-bit after the previous 2 patches for PR target/107299 have been applied.
The tests are:

	gcc.target/powerpc/convert-fp-128.c
	gcc.target/powerpc/pr85657-3.c

This patch is a rewrite of the patch submitted on August 18th:

| https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599988.html

This patch reworks the conversions between 128-bit binary floating point types.
Previously, we would call rs6000_expand_float128_convert to do all conversions.
Now, we only define the conversions between the same representation that turn
into a NOP.  The appropriate extend or truncate insn is generated, and after
register allocation, it is converted to a move.

This patch also fixes two places where we want to override the external name
for the conversion function, and the wrong optab was used.  Previously,
rs6000_expand_float128_convert would handle the move or generate the call as
needed.  Now, it lets the machine independent code generate the call.  But if
we use the machine independent code to generate the call, we need to update the
name for two optabs where a truncate would be used in terms of converting
between the modes.  This patch updates those two optabs.

I tested this patch on:

    1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
    2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
    3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
    4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm

In the past I have also tested this exact patch on the following systems:

    1)	LE Power10 using --with-cpu=power9  --with-long-double-format=ibm
    2)	LE Power10 using --with-cpu=power8  --with-long-double-format=ibm
    3)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm

There were no regressions in the bootstrap process or running the tests (after
applying all 3 patches for PR target/107299).  Can I check this patch into the
trunk?

2022-12-14   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/107299
	* config/rs6000/rs6000.cc (init_float128_ieee): Use the correct
	float_extend or float_truncate optab based on how the machine converts
	between IEEE 128-bit and IBM 128-bit.
	* config/rs6000/rs6000.md (IFKF): Delete.
	(IFKF_reg): Delete.
	(extendiftf2): Rewrite to be a move if IFmode and TFmode are both IBM
	128-bit.  Do not run if TFmode is IEEE 128-bit.
	(extendifkf2): Delete.
	(extendtfkf2): Delete.
	(extendtfif2): Delete.
	(trunciftf2): Delete.
	(truncifkf2): Delete.
	(trunckftf2): Delete.
	(extendkftf2): Implement conversion of IEEE 128-bit types as a move.
	(trunctfif2): Delete.
	(trunctfkf2): Implement conversion of IEEE 128-bit types as a move.
	(extend<mode>tf2_internal): Delete.
	(extendtf<mode>2_internal): Delete.
---
 gcc/config/rs6000/rs6000.cc |   4 +-
 gcc/config/rs6000/rs6000.md | 177 ++++++++++--------------------------
 2 files changed, 50 insertions(+), 131 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 604f6a9ce33..0a20bfc8421 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -11134,11 +11134,11 @@ init_float128_ieee (machine_mode mode)
       set_conv_libfunc (trunc_optab, SFmode, mode, "__trunckfsf2");
       set_conv_libfunc (trunc_optab, DFmode, mode, "__trunckfdf2");
 
-      set_conv_libfunc (sext_optab, mode, IFmode, "__trunctfkf2");
+      set_conv_libfunc (trunc_optab, mode, IFmode, "__trunctfkf2");
       if (mode != TFmode && FLOAT128_IBM_P (TFmode))
 	set_conv_libfunc (sext_optab, mode, TFmode, "__trunctfkf2");
 
-      set_conv_libfunc (trunc_optab, IFmode, mode, "__extendkftf2");
+      set_conv_libfunc (sext_optab, IFmode, mode, "__extendkftf2");
       if (mode != TFmode && FLOAT128_IBM_P (TFmode))
 	set_conv_libfunc (trunc_optab, TFmode, mode, "__extendkftf2");
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6011f5bf76a..799af3c3ebe 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -543,12 +543,6 @@ (define_mode_iterator FMOVE128_GPR [TI
 ; Iterator for 128-bit VSX types for pack/unpack
 (define_mode_iterator FMOVE128_VSX [V1TI KF])
 
-; Iterators for converting to/from TFmode
-(define_mode_iterator IFKF [IF KF])
-
-; Constraints for moving IF/KFmode.
-(define_mode_attr IFKF_reg [(IF "d") (KF "wa")])
-
 ; Whether a floating point move is ok, don't allow SD without hardware FP
 (define_mode_attr fmove_ok [(SF "")
 			    (DF "")
@@ -9096,106 +9090,65 @@ (define_insn "*ieee_128bit_vsx_nabs<mode>2_internal"
   "xxlor %x0,%x1,%x2"
   [(set_attr "type" "veclogical")])
 
-;; Float128 conversion functions.  These expand to library function calls.
-;; We use expand to convert from IBM double double to IEEE 128-bit
-;; and trunc for the opposite.
-(define_expand "extendiftf2"
-  [(set (match_operand:TF 0 "gpc_reg_operand")
-	(float_extend:TF (match_operand:IF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
-{
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
-
-(define_expand "extendifkf2"
-  [(set (match_operand:KF 0 "gpc_reg_operand")
-	(float_extend:KF (match_operand:IF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
-{
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
-
-(define_expand "extendtfkf2"
-  [(set (match_operand:KF 0 "gpc_reg_operand")
-	(float_extend:KF (match_operand:TF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
-{
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
-
-(define_expand "extendtfif2"
-  [(set (match_operand:IF 0 "gpc_reg_operand")
-	(float_extend:IF (match_operand:TF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
-{
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
-
-(define_expand "trunciftf2"
-  [(set (match_operand:TF 0 "gpc_reg_operand")
-	(float_truncate:TF (match_operand:IF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
-{
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
-
-(define_expand "truncifkf2"
-  [(set (match_operand:KF 0 "gpc_reg_operand")
-	(float_truncate:KF (match_operand:IF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
-{
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
-
-(define_expand "trunckftf2"
-  [(set (match_operand:TF 0 "gpc_reg_operand")
-	(float_truncate:TF (match_operand:KF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
+;; Float128 conversion functions.  We only define the 'conversions' between two
+;; formats that use the same representation.  We call the library function to
+;; convert between IEEE 128-bit and IBM 128-bit.  We can't do these moves by
+;; using a SUBREG before register allocation.  We set up the moves to prefer
+;; the output register being the same as the input register, which would enable
+;; the move to be deleted completely.
+(define_insn_and_split "extendkftf2"
+  [(set (match_operand:TF 0 "gpc_reg_operand" "=wa,wa")
+	(float_extend:TF (match_operand:KF 1 "gpc_reg_operand" "0,wa")))]
+  "TARGET_FLOAT128_TYPE && FLOAT128_IEEE_P (TFmode)"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(match_dup 2))]
 {
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
+  operands[2] = gen_lowpart (TFmode, operands[1]);
+}
+  [(set_attr "type" "veclogical")])
 
-(define_expand "trunctfif2"
-  [(set (match_operand:IF 0 "gpc_reg_operand")
-	(float_truncate:IF (match_operand:TF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
+(define_insn_and_split "trunctfkf2"
+  [(set (match_operand:KF 0 "gpc_reg_operand" "=wa,wa")
+	(float_truncate:KF (match_operand:TF 1 "gpc_reg_operand" "0,wa")))]
+  "TARGET_FLOAT128_TYPE && FLOAT128_IEEE_P (TFmode)"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(match_dup 2))]
 {
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
+  operands[2] = gen_lowpart (KFmode, operands[1]);
+}
+  [(set_attr "type" "veclogical")])
 
-(define_insn_and_split "*extend<mode>tf2_internal"
-  [(set (match_operand:TF 0 "gpc_reg_operand" "=<IFKF_reg>")
-	(float_extend:TF
-	 (match_operand:IFKF 1 "gpc_reg_operand" "<IFKF_reg>")))]
-   "TARGET_FLOAT128_TYPE
-    && FLOAT128_IBM_P (TFmode) == FLOAT128_IBM_P (<MODE>mode)"
+(define_insn_and_split "extendtfif2"
+  [(set (match_operand:IF 0 "gpc_reg_operand" "=wa,wa,r,r")
+	(float_extend:IF (match_operand:TF 1 "gpc_reg_operand" "0,wa,0,r")))]
+  "TARGET_HARD_FLOAT && FLOAT128_IBM_P (TFmode)"
   "#"
   "&& reload_completed"
-  [(set (match_dup 0) (match_dup 2))]
+  [(set (match_dup 0)
+	(match_dup 2))]
 {
-  operands[2] = gen_rtx_REG (TFmode, REGNO (operands[1]));
-})
+  operands[2] = gen_lowpart (IFmode, operands[1]);
+}
+  [(set_attr "num_insns" "2")
+   (set_attr "length" "8")])
 
-(define_insn_and_split "*extendtf<mode>2_internal"
-  [(set (match_operand:IFKF 0 "gpc_reg_operand" "=<IFKF_reg>")
-	(float_extend:IFKF
-	 (match_operand:TF 1 "gpc_reg_operand" "<IFKF_reg>")))]
-   "TARGET_FLOAT128_TYPE
-    && FLOAT128_IBM_P (TFmode) == FLOAT128_IBM_P (<MODE>mode)"
+(define_insn_and_split "extendiftf2"
+  [(set (match_operand:TF 0 "gpc_reg_operand" "=wa,wa,r,r")
+	(float_extend:TF (match_operand:IF 1 "gpc_reg_operand" "0,wa,0,r")))]
+  "TARGET_HARD_FLOAT && FLOAT128_IBM_P (TFmode)"
   "#"
   "&& reload_completed"
-  [(set (match_dup 0) (match_dup 2))]
+  [(set (match_dup 0)
+	(match_dup 2))]
 {
-  operands[2] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
-})
+  operands[2] = gen_lowpart (TFmode, operands[1]);
+}
+  [(set_attr "num_insns" "2")
+   (set_attr "length" "8")])
 
 \f
 ;; Reload helper functions used by rs6000_secondary_reload.  The patterns all
@@ -14919,40 +14872,6 @@ (define_insn "extend<SFDF:mode><IEEE128:mode>2_hw"
   [(set_attr "type" "vecfloat")
    (set_attr "size" "128")])
 
-;; Conversion between KFmode and TFmode if TFmode is ieee 128-bit floating
-;; point is a simple copy.
-(define_insn_and_split "extendkftf2"
-  [(set (match_operand:TF 0 "vsx_register_operand" "=wa,?wa")
-	(float_extend:TF (match_operand:KF 1 "vsx_register_operand" "0,wa")))]
-  "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD"
-  "@
-   #
-   xxlor %x0,%x1,%x1"
-  "&& reload_completed  && REGNO (operands[0]) == REGNO (operands[1])"
-  [(const_int 0)]
-{
-  emit_note (NOTE_INSN_DELETED);
-  DONE;
-}
-  [(set_attr "type" "*,veclogical")
-   (set_attr "length" "0,4")])
-
-(define_insn_and_split "trunctfkf2"
-  [(set (match_operand:KF 0 "vsx_register_operand" "=wa,?wa")
-	(float_extend:KF (match_operand:TF 1 "vsx_register_operand" "0,wa")))]
-  "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD"
-  "@
-   #
-   xxlor %x0,%x1,%x1"
-  "&& reload_completed  && REGNO (operands[0]) == REGNO (operands[1])"
-  [(const_int 0)]
-{
-  emit_note (NOTE_INSN_DELETED);
-  DONE;
-}
-  [(set_attr "type" "*,veclogical")
-   (set_attr "length" "0,4")])
-
 (define_insn "trunc<mode>df2_hw"
   [(set (match_operand:DF 0 "altivec_register_operand" "=v")
 	(float_truncate:DF
-- 
2.38.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Ping: [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit
  2022-12-14 20:29 [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit Michael Meissner
                   ` (2 preceding siblings ...)
  2022-12-14 20:35 ` [PATCH 3/3, V3] PR 107299, Update float 128-bit conversion Michael Meissner
@ 2023-02-27 23:26 ` Michael Meissner
  2023-03-02 20:26 ` Segher Boessenkool
  4 siblings, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2023-02-27 23:26 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt

This is the most important patch to look at:

| Date: Wed, 14 Dec 2022 15:29:02 -0500
| From: Michael Meissner <meissner@linux.ibm.com>
| Subject: [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit
| Message-ID: <Y5oyDgX3SgN5W8Pl@toto.the-meissners.org>

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit
  2022-12-14 20:29 [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit Michael Meissner
                   ` (3 preceding siblings ...)
  2023-02-27 23:26 ` Ping: [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit Michael Meissner
@ 2023-03-02 20:26 ` Segher Boessenkool
  4 siblings, 0 replies; 6+ messages in thread
From: Segher Boessenkool @ 2023-03-02 20:26 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn,
	Peter Bergner, Will Schmidt

Hi!

On Wed, Dec 14, 2022 at 03:29:02PM -0500, Michael Meissner wrote:
> These 3 patches fix the problems with building GCC on PowerPC systems when long
> double is configured to use the IEEE 128-bit format.

If you are strictly trying to fix a bootstrap problem, you should say
so: it should be very prominent in the proposed commit message, and
that message should be not much more than that.  And the patch should do
nothing else.  *Every* patch should do just one thing, but for fixes it
is even more important (how will we ever get into any stable state if we
always do random stuff?)

> The basic issue is internally within GCC there are several types for 128-bit
> floating point.  The types are:

There also are three *modes*, but there should be only IFmode and
KFmode, and TFmode should be just a #define that resolves to either, not
a separate mode.  This simplifies things a lot.  It should have always
been that way, as we talked about way back when already.  But you didn't
want to implement things that way.  This is on my plate now (for GCC 14).

>     3)  The type for _Float128.  This type is always IEEE 128-bit if it exists.

And if there is no IEEE QP type?  What should _Float128 be then?

Largely academic, because there always *should* be a QP type (and mode).
This is a long-standing shortcoming as well.  I will finally fix that
myself for GCC 14 as well, simplifying many things.

>         Like __ibm128, it uses the long double type if
>         long double is IEEE 128-bit,

Which is completely upside down, of course.  The basic types should be
basic and always exist.  Things like long double can use some
indirection and/or copy stuff over.

Anything else is just a maze of twisty little passages.  A fun game if
you like that sort of thing, maybe, but instead of solving problems it
causes more :-(

> After these patches, there are 3 specific tests and 1 set of tests that fail
> when using IEEE 128-bit long double:
> 
>     1)  fp128_conversions.c: I haven't looked at yet;

That needs to be looked at before these patches can be approved.

>     2)  pr105334.c: This is a bug that __ibm128 doesn't work if the default
>         long double is IEEE 128-bit and you use the options: -mlong-double-128
>         -msoft-float (i.e. no -mabi=ibmlongdouble).  I believe I have patches
>         for this floating around.

Ditto.

>     3)  The g++.dg/cpp23/ext-floating1.C test is failing.  I believe we need to
>         dig in to fix PowerPC specific ISO C/C++ 2x _Float128 support.  I have
>         looked at it yet.

Ditto.

>     4)  All/some of the G++ modules tests fail.  This is PR 98645, and it is
>         assigned to Nathan Sidwell.

And this one too.

Any new failures need analysis.  Always.  This is why we have regression
tests at all!

Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-03-02 20:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-14 20:29 [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit Michael Meissner
2022-12-14 20:32 ` [PATCH 1/3, V3] PR 107299, Rework 128-bit complex multiply and divide Michael Meissner
2022-12-14 20:34 ` [PATCH 2/3, V3] PR 107299, Make __float128 use the _Float128 type Michael Meissner
2022-12-14 20:35 ` [PATCH 3/3, V3] PR 107299, Update float 128-bit conversion Michael Meissner
2023-02-27 23:26 ` Ping: [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit Michael Meissner
2023-03-02 20:26 ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).