public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit)
@ 2022-11-02  2:39 Michael Meissner
  2022-11-02  2:40 ` [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299 Michael Meissner
                   ` (3 more replies)
  0 siblings, 4 replies; 41+ messages in thread
From: Michael Meissner @ 2022-11-02  2:39 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

These 3 patches fix the problems with building GCC on PowerPC systems when long
double is configured to use the IEEE 128-bit format.

There are 3 patches in this patch set.  The first two patches are required to
fix the basic problem.  The third patch fixes some issue that were noticed
along the way.

The basic issue is internally within GCC there are several types for 128-bit
floating point.  The types are:

    1)	The long double type (TFmode or possibly DFmode).  In the normal case,
	long double is 128-bits (TFmode) and depending on the configuration
	switches and the switches passed by the user at compilation time, long
	double is either the 128-bit IBM double-double type or IEEE 128-bit.

    2)	The type for __ibm128.  If long double is IBM 128-bit double-double,
	internally within the compiler, this type is the same as the long
	double type.  If long double is either IEEE 128-bit or is 64-bit, then
	this type is a separate type.

    3)	The type for _Float128.  This type is always IEEE 128-bit if it exists.
	While it is a separate internal type, currently if long double is IEEE
	128-bit, this type uses TFmode once it gets to RTL, but within Gimple
	it is a separate type.  If long double is not IEEE 128-bit, then this
	type uses KFmode.  All of the f128 math functions defined by the
	compiler use this type.  In the past, _Float128 was a C extended type,
	but now it is a part of the C/C++ 2x standards.

    4)	The type for __float128.  The history is I implemented __float128
	first, and several releases later, we added _Float128 as a standard C
	type.  Unfortunately, I didn't think things through enough when
	_Float128 came out.  Like __ibm128, it uses the long double type if
	long double is IEEE 128-bit, and now it uses the _Float128 type if long
	double is not IEEE 128-bit.  IMHO, this is the major problem.  The two
	IEEE 128-bit types should use the same type internally (or at least one
	should be a qualified type of the other).  Before we started adding
	more support for _Float128, it mostly works, but now it doesn't with
	more optimizations being done.

    5)	The error occurs in building _mulkc3 in libgcc, when the TFmode type in
	the code is defined to use attribute((mode(TF))), but the functions
	that are called all have _Float128 arguments.  These are separate
	types, and ultimately one of the consistancy checks fails because they
	are different types.

There are 3 patches in this set:

    1)	The first patch rewrites how the complex 128-bit multiply and divide
	functions are done in the compiler.  In the old scheme, essentially
	there were only two types ever being used, the long double type, and
	the not long double type.  The original code would make the names
	called of these functions to be __multc3/__divtc3 or
	__mulkc3/__divkc3.  This worked because there were only two types.
	With straightening out the types, so __float128/_Float128 is never the
	long double type, there are potentially 3-4 types.  However, the C
	front end and the middle end code will not let use create two built-in
	functions that have the same name.

	So I ripped out this code, and I hopefully replaced it with cleaner
	code that is in patch #1.  This patch needs to be in the compiler
	before the second patch can be installed.

    2)	The second patch fixes the problem of __float128 and _Float128 not
	being the same if long double is IEEE 128-bit.  After this patch, both
	_Float128 and __float128 types will always use the KFmode type.  The
	stdc++ library will not build if we use TFmode for these types due to
	the other changes.

	There is a minor codegen issue that if you explicitly use long double
	and call the F128 FMA (fused multiply-add) round to odd functions that
	are defined to use __float128/_Float128 arguments.  While we might be
	able to optimize these later, I don't think it is important to optimize
	the use of long double instead of __float128/_Float128.  Note, if you
	use the proper __float128/_Float128 types instead of long double, the
	code does the optimization.

	By doing this change, it also fixes two tests that have been broken on
	IEEE 128-bit long double systems (float128-cmp2-runnable.c and
	nan128-1.c).  These two tests use __float128 variables and call nansq
	to create a signaling NaN.  Nansq is defined to be __builtin_nansf128,
	which returns a _Float128 Nan.  However, since in the current
	implementation before these patches, __float128 is a different type
	than _Float128 when long double is IEEE 128-bit, the machine
	independent code converts the signaling NaN into a non-signaling NaN.
	Since after these patches they are the same internal type, the
	signaling NaN is preserved.

    3)	This patch fixes two tests that were still failing after patches #1 and
	#2 were applied (convert-fp-128.c and pr85657-3.c).  It fixes the
	conversions between the 128-bit floating point types.  In the past, we
	would always call rs6000_expand_float128_convert to do all
	conversions.  After this patch, the conversions between different types
	that have the same representation will be done in-line and not call
	rs6000_expand_float128_convert.

	In addition, in the past we missed some conversions, and the compiler
	would generate an external call, even though the types might have the
	same representation.

After these patches, there are 3 specific tests and 1 set of tests that fail
when using IEEE 128-bit long double:

    1)	fp128_conversions.c: I haven't looked at yet;

    2)	pr105334.c: This is a bug that __ibm128 doesn't work if the default
	long double is IEEE 128-bit and you use the options: -mlong-double-128
	-msoft-float (i.e. no -mabi=ibmlongdouble).  I believe I have patches
	for this floating around.

    3)	The g++.dg/cpp23/ext-floating1.C test is failing.  I believe we need to
	dig in to fix PowerPC specific ISO C/C++ 2x _Float128 support.  I have
	looked at it yet.

    4)	All/some of the G++ modules tests fail.  This is PR 98645, and it is
	assigned to Nathan Sidwell.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-11-02  2:39 Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit) Michael Meissner
@ 2022-11-02  2:40 ` Michael Meissner
  2022-11-07 15:41   ` Ping: " Michael Meissner
                     ` (4 more replies)
  2022-11-02  2:42 ` [PATCH 2/3] Make __float128 use the _Float128 type, " Michael Meissner
                   ` (2 subsequent siblings)
  3 siblings, 5 replies; 41+ messages in thread
From: Michael Meissner @ 2022-11-02  2:40 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

This function reworks how the complex multiply and divide built-in functions are
done.  Previously we created built-in declarations for doing long double complex
multiply and divide when long double is IEEE 128-bit.  The old code also did not
support __ibm128 complex multiply and divide if long double is IEEE 128-bit.

In terms of history, I wrote the original code just as I was starting to test
GCC on systems where IEEE 128-bit long double was the default.  At the time, we
had not yet started mangling the built-in function names as a way to bridge
going from a system with 128-bit IBM long double to 128-bin IEEE long double.

The original code depends on there only being two 128-bit types invovled.  With
the next patch in this series, this assumption will no longer be true.  When
long double is IEEE 128-bit, there will be 2 IEEE 128-bit types (one for the
explicit __float128/_Float128 type and one for long double).

The problem is we cannot create two separate built-in functions that resolve to
the same name.  This is a requirement of add_builtin_function and the C front
end.  That means for the 3 possible modes (IFmode, KFmode, and TFmode), you can
only use 2 of them.

This code does not create the built-in declaration with the changed name.
Instead, it uses the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the name
before it is written out to the assembler file like it now does for all of the
other long double built-in functions.

We need to disable using this mapping when we are building libgcc, specifically
when it is building the floating point 128-bit multiply and divide functions.
The flag that is used when libgcc is built (-fbuilding-libcc) is only available
in the C/C++ front ends.  We need to remember that we are building libgcc in the
rs6000-c.cc support to be able to use this later to decided whether to mangle
the decl assembler name or not.

When I wrote these patches, I discovered that __ibm128 complex multiply and
divide had originally not been supported if long double is IEEE 128-bit as it
would generate calls to __mulic3 and __divic3.  I added tests in the testsuite
to verify that the correct name (i.e. __multc3 and __divtc3) is used in this
case.

I tested all 3 patchs for PR target/107299 on:

    1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
    2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
    3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
    4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm

Once all 3 patches have been applied, we can once again build GCC when long
double is IEEE 128-bit.  There were no other regressions with these patches.
Can I check these patches into the trunk?

2022-11-01   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/107299
	* config/rs6000/rs6000-c.cc (rs6000_cpu_cpp_builtins): Set
	building_libgcc.
	* config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
	(init_float128_ieee): Delete code to switch complex multiply and divide
	for long double.
	(complex_multiply_builtin_code): New helper function.
	(complex_divide_builtin_code): Likewise.
	(rs6000_mangle_decl_assembler_name): Add support for mangling the name
	of complex 128-bit multiply and divide built-in functions.
	* config/rs6000/rs6000.opt (building_libgcc): New target variable.

gcc/testsuite/

	PR target/107299
	* gcc.target/powerpc/divic3-1.c: New test.
	* gcc.target/powerpc/divic3-2.c: Likewise.
	* gcc.target/powerpc/mulic3-1.c: Likewise.
	* gcc.target/powerpc/mulic3-2.c: Likewise.
---
 gcc/config/rs6000/rs6000-c.cc               |   8 ++
 gcc/config/rs6000/rs6000.cc                 | 110 +++++++++++---------
 gcc/config/rs6000/rs6000.opt                |   4 +
 gcc/testsuite/gcc.target/powerpc/divic3-1.c |  18 ++++
 gcc/testsuite/gcc.target/powerpc/divic3-2.c |  17 +++
 gcc/testsuite/gcc.target/powerpc/mulic3-1.c |  18 ++++
 gcc/testsuite/gcc.target/powerpc/mulic3-2.c |  17 +++
 7 files changed, 145 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-2.c

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 56609462629..5c2f3bcee9f 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -780,6 +780,14 @@ rs6000_cpu_cpp_builtins (cpp_reader *pfile)
       || DEFAULT_ABI == ABI_ELFv2
       || (DEFAULT_ABI == ABI_AIX && !rs6000_compat_align_parm))
     builtin_define ("__STRUCT_PARM_ALIGN__=16");
+
+  /* Store whether or not we are building libgcc.  This is needed to disable
+     generating the alternate names for 128-bit complex multiply and divide.
+     We need to disable generating __multc3, __divtc3, __mulkc3, and __divkc3
+     when we are building those functions in libgcc.  The variable
+     flag_building_libgcc is only available for the C family of front-ends.
+     We set this variable here to disable generating the alternate names.  */
+  building_libgcc = flag_building_libgcc;
 }
 
 \f
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a85d7630b41..cfb6227e27b 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -11123,26 +11123,6 @@ init_float128_ibm (machine_mode mode)
     }
 }
 
-/* Create a decl for either complex long double multiply or complex long double
-   divide when long double is IEEE 128-bit floating point.  We can't use
-   __multc3 and __divtc3 because the original long double using IBM extended
-   double used those names.  The complex multiply/divide functions are encoded
-   as builtin functions with a complex result and 4 scalar inputs.  */
-
-static void
-create_complex_muldiv (const char *name, built_in_function fncode, tree fntype)
-{
-  tree fndecl = add_builtin_function (name, fntype, fncode, BUILT_IN_NORMAL,
-				      name, NULL_TREE);
-
-  set_builtin_decl (fncode, fndecl, true);
-
-  if (TARGET_DEBUG_BUILTIN)
-    fprintf (stderr, "create complex %s, fncode: %d\n", name, (int) fncode);
-
-  return;
-}
-
 /* Set up IEEE 128-bit floating point routines.  Use different names if the
    arguments can be passed in a vector register.  The historical PowerPC
    implementation of IEEE 128-bit floating point used _q_<op> for the names, so
@@ -11154,32 +11134,6 @@ init_float128_ieee (machine_mode mode)
 {
   if (FLOAT128_VECTOR_P (mode))
     {
-      static bool complex_muldiv_init_p = false;
-
-      /* Set up to call __mulkc3 and __divkc3 under -mabi=ieeelongdouble.  If
-	 we have clone or target attributes, this will be called a second
-	 time.  We want to create the built-in function only once.  */
-     if (mode == TFmode && TARGET_IEEEQUAD && !complex_muldiv_init_p)
-       {
-	 complex_muldiv_init_p = true;
-	 built_in_function fncode_mul =
-	   (built_in_function) (BUILT_IN_COMPLEX_MUL_MIN + TCmode
-				- MIN_MODE_COMPLEX_FLOAT);
-	 built_in_function fncode_div =
-	   (built_in_function) (BUILT_IN_COMPLEX_DIV_MIN + TCmode
-				- MIN_MODE_COMPLEX_FLOAT);
-
-	 tree fntype = build_function_type_list (complex_long_double_type_node,
-						 long_double_type_node,
-						 long_double_type_node,
-						 long_double_type_node,
-						 long_double_type_node,
-						 NULL_TREE);
-
-	 create_complex_muldiv ("__mulkc3", fncode_mul, fntype);
-	 create_complex_muldiv ("__divkc3", fncode_div, fntype);
-       }
-
       set_optab_libfunc (add_optab, mode, "__addkf3");
       set_optab_libfunc (sub_optab, mode, "__subkf3");
       set_optab_libfunc (neg_optab, mode, "__negkf2");
@@ -28142,6 +28096,25 @@ rs6000_starting_frame_offset (void)
   return RS6000_STARTING_FRAME_OFFSET;
 }
 \f
+/* Internal function to return the built-in function id for the complex
+   multiply operation for a given mode.  */
+
+static inline built_in_function
+complex_multiply_builtin_code (machine_mode mode)
+{
+  return (built_in_function) (BUILT_IN_COMPLEX_MUL_MIN + mode
+			      - MIN_MODE_COMPLEX_FLOAT);
+}
+
+/* Internal function to return the built-in function id for the complex divide
+   operation for a given mode.  */
+
+static inline built_in_function
+complex_divide_builtin_code (machine_mode mode)
+{
+  return (built_in_function) (BUILT_IN_COMPLEX_DIV_MIN + mode
+			      - MIN_MODE_COMPLEX_FLOAT);
+}
 
 /* On 64-bit Linux and Freebsd systems, possibly switch the long double library
    function names from <foo>l to <foo>f128 if the default long double type is
@@ -28160,11 +28133,54 @@ rs6000_starting_frame_offset (void)
    only do this transformation if the __float128 type is enabled.  This
    prevents us from doing the transformation on older 32-bit ports that might
    have enabled using IEEE 128-bit floating point as the default long double
-   type.  */
+   type.
+
+   We also use the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the
+   function names used for complex multiply and divide to the appropriate
+   names.  */
 
 static tree
 rs6000_mangle_decl_assembler_name (tree decl, tree id)
 {
+  /* Handle complex multiply/divide.  For IEEE 128-bit, use __mulkc3 or
+     __divkc3 and for IBM 128-bit use __multc3 and __divtc3.  */
+  if (TARGET_FLOAT128_TYPE
+      && !building_libgcc
+      && TREE_CODE (decl) == FUNCTION_DECL
+      && DECL_IS_UNDECLARED_BUILTIN (decl)
+      && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
+    {
+      built_in_function id = DECL_FUNCTION_CODE (decl);
+      const char *newname = NULL;
+
+      if (id == complex_multiply_builtin_code (KCmode))
+	newname = "__mulkc3";
+
+      else if (id == complex_multiply_builtin_code (ICmode))
+	newname = "__multc3";
+
+      else if (id == complex_multiply_builtin_code (TCmode))
+	newname = (TARGET_IEEEQUAD) ? "__mulkc3" : "__multc3";
+
+      else if (id == complex_divide_builtin_code (KCmode))
+	newname = "__divkc3";
+
+      else if (id == complex_divide_builtin_code (ICmode))
+	newname = "__divtc3";
+
+      else if (id == complex_divide_builtin_code (TCmode))
+	newname = (TARGET_IEEEQUAD) ? "__divkc3" : "__divtc3";
+
+      if (newname)
+	{
+	  if (TARGET_DEBUG_BUILTIN)
+	    fprintf (stderr, "Map complex mul/div => %s\n", newname);
+
+	  return get_identifier (newname);
+	}
+    }
+
+  /* Map long double built-in functions if long double is IEEE 128-bit.  */
   if (TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
       && TREE_CODE (decl) == FUNCTION_DECL
       && DECL_IS_UNDECLARED_BUILTIN (decl)
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index b63a5d443af..e2de03dda92 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -100,6 +100,10 @@ unsigned int rs6000_recip_control
 TargetVariable
 unsigned int rs6000_debug
 
+;; Whether we are building libgcc or not.
+TargetVariable
+bool building_libgcc = false
+
 ;; Whether to enable the -mfloat128 stuff without necessarily enabling the
 ;; __float128 keyword.
 TargetSave
diff --git a/gcc/testsuite/gcc.target/powerpc/divic3-1.c b/gcc/testsuite/gcc.target/powerpc/divic3-1.c
new file mode 100644
index 00000000000..1cc6b1be904
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/divic3-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
+
+/* Check that complex divide generates the right call for __ibm128 when long
+   double is IEEE 128-bit floating point.  */
+
+typedef _Complex long double c_ibm128_t __attribute__((mode(__IC__)));
+
+void
+divide (c_ibm128_t *p, c_ibm128_t *q, c_ibm128_t *r)
+{
+  *p = *q / *r;
+}
+
+/* { dg-final { scan-assembler "bl __divtc3" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/divic3-2.c b/gcc/testsuite/gcc.target/powerpc/divic3-2.c
new file mode 100644
index 00000000000..8ff342e0116
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/divic3-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
+/* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */
+
+/* Check that complex divide generates the right call for __ibm128 when long
+   double is IBM 128-bit floating point.  */
+
+typedef _Complex long double c_ibm128_t __attribute__((mode(__TC__)));
+
+void
+divide (c_ibm128_t *p, c_ibm128_t *q, c_ibm128_t *r)
+{
+  *p = *q / *r;
+}
+
+/* { dg-final { scan-assembler "bl __divtc3" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mulic3-1.c b/gcc/testsuite/gcc.target/powerpc/mulic3-1.c
new file mode 100644
index 00000000000..4cd773c4b06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mulic3-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
+
+/* Check that complex multiply generates the right call for __ibm128 when long
+   double is IEEE 128-bit floating point.  */
+
+typedef _Complex long double c_ibm128_t __attribute__((mode(__IC__)));
+
+void
+multiply (c_ibm128_t *p, c_ibm128_t *q, c_ibm128_t *r)
+{
+  *p = *q * *r;
+}
+
+/* { dg-final { scan-assembler "bl __multc3" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mulic3-2.c b/gcc/testsuite/gcc.target/powerpc/mulic3-2.c
new file mode 100644
index 00000000000..36fe8bc3061
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mulic3-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
+/* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */
+
+/* Check that complex multiply generates the right call for __ibm128 when long
+   double is IBM 128-bit floating point.  */
+
+typedef _Complex long double c_ibm128_t __attribute__((mode(__TC__)));
+
+void
+multiply (c_ibm128_t *p, c_ibm128_t *q, c_ibm128_t *r)
+{
+  *p = *q * *r;
+}
+
+/* { dg-final { scan-assembler "bl __multc3" } } */
-- 
2.37.3


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-11-02  2:39 Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit) Michael Meissner
  2022-11-02  2:40 ` [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299 Michael Meissner
@ 2022-11-02  2:42 ` Michael Meissner
  2022-11-07 15:43   ` Ping: " Michael Meissner
                     ` (4 more replies)
  2022-11-02  2:44 ` [PATCH 3/3] Update float 128-bit conversions, " Michael Meissner
  2022-12-06 14:56 ` Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit) Segher Boessenkool
  3 siblings, 5 replies; 41+ messages in thread
From: Michael Meissner @ 2022-11-02  2:42 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

This patch fixes the issue that GCC cannot build when the default long double
is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
during the evrp pass.  Ultimately it is failing because the code declared the
type to use TFmode but it used F128 functions (i.e. KFmode).

	typedef float TFtype __attribute__((mode (TF)));
	typedef __complex float TCtype __attribute__((mode (TC)));

	TCtype
	__mulkc3_sw (TFtype a, TFtype b, TFtype c, TFtype d)
	{
	  TFtype ac, bd, ad, bc, x, y;
	  TCtype res;

	  ac = a * c;
	  bd = b * d;
	  ad = a * d;
	  bc = b * c;

	  x = ac - bd;
	  y = ad + bc;

	  if (__builtin_isnan (x) && __builtin_isnan (y))
	    {
	      _Bool recalc = 0;
	      if (__builtin_isinf (a) || __builtin_isinf (b))
		{

		  a = __builtin_copysignf128 (__builtin_isinf (a) ? 1 : 0, a);
		  b = __builtin_copysignf128 (__builtin_isinf (b) ? 1 : 0, b);
		  if (__builtin_isnan (c))
		    c = __builtin_copysignf128 (0, c);
		  if (__builtin_isnan (d))
		    d = __builtin_copysignf128 (0, d);
		  recalc = 1;
		}
	      if (__builtin_isinf (c) || __builtin_isinf (d))
		{

		  c = __builtin_copysignf128 (__builtin_isinf (c) ? 1 : 0, c);
		  d = __builtin_copysignf128 (__builtin_isinf (d) ? 1 : 0, d);
		  if (__builtin_isnan (a))
		    a = __builtin_copysignf128 (0, a);
		  if (__builtin_isnan (b))
		    b = __builtin_copysignf128 (0, b);
		  recalc = 1;
		}
	      if (!recalc
		  && (__builtin_isinf (ac) || __builtin_isinf (bd)
		      || __builtin_isinf (ad) || __builtin_isinf (bc)))
		{

		  if (__builtin_isnan (a))
		    a = __builtin_copysignf128 (0, a);
		  if (__builtin_isnan (b))
		    b = __builtin_copysignf128 (0, b);
		  if (__builtin_isnan (c))
		    c = __builtin_copysignf128 (0, c);
		  if (__builtin_isnan (d))
		    d = __builtin_copysignf128 (0, d);
		  recalc = 1;
		}
	      if (recalc)
		{
		  x = __builtin_inff128 () * (a * c - b * d);
		  y = __builtin_inff128 () * (a * d + b * c);
		}
	    }

	  __real__ res = x;
	  __imag__ res = y;
	  return res;
	}

Currently GCC uses the long double type node for __float128 if long double is
IEEE 128-bit.  It did not use the node for _Float128.

Originally this was noticed if you call the nansq function to make a signaling
NaN (nansq is mapped to nansf128).  Because the type node for _Float128 is
different from __float128, the machine independent code converts signaling NaNs
to quiet NaNs if the types are not compatible.  The following tests used to
fail when run on a system where long double is IEEE 128-bit:

	gcc.dg/torture/float128-nan.c
	gcc.target/powerpc/nan128-1.c

This patch makes both __float128 and _Float128 use the same type node.

One side effect of not using the long double type node for __float128 is that we
must only use KFmode for _Float128/__float128.  The libstdc++ library won't
build if we use TFmode for _Float128 and __float128 when long double is IEEE
128-bit.

Another minor side effect is that the f128 round to odd fused multiply-add
function will not merge negatition with the FMA operation when the type is long
double.  If the type is __float128 or _Float128, then it will continue to do the
optimization.  The round to odd functions are defined in terms of __float128
arguments.  For example:

	long double
	do_fms (long double a, long double b, long double c)
	{
	    return __builtin_fmaf128_round_to_odd (a, b, -c);
	}

will generate (assuming -mabi=ieeelongdouble):

	xsnegqp 4,4
	xsmaddqpo 4,2,3
	xxlor 34,36,36

while:

	__float128
	do_fms (__float128 a, __float128 b, __float128 c)
	{
	    return __builtin_fmaf128_round_to_odd (a, b, -c);
	}

will generate:

	xsmsubqpo 4,2,3
	xxlor 34,36,36

I tested all 3 patchs for PR target/107299 on:

    1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
    2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
    3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
    4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm

Once all 3 patches have been applied, we can once again build GCC when long
double is IEEE 128-bit.  There were no other regressions with these patches.
Can I check these patches into the trunk?

2022-11-01   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/107299
	* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Always use the
	_Float128 type for __float128.
	(rs6000_expand_builtin): Only change a KFmode built-in to TFmode, if the
	built-in passes or returns TFmode.  If the predicate failed because the
	modes were different, use convert_move to load up the value instead of
	copy_to_mode_reg.
	* config/rs6000/rs6000.cc (rs6000_translate_mode_attribute): Don't
	translate IEEE 128-bit floating point modes to explicit IEEE 128-bit
	modes (KFmode or KCmode), even if long double is IEEE 128-bit.
	(rs6000_libgcc_floating_mode_supported_p): Support KFmode all of the
	time if we support IEEE 128-bit floating point.
	(rs6000_floatn_mode): _Float128 and _Float128x always uses KFmode.

gcc/testsuite/

	PR target/107299
	* gcc.target/powerpc/float128-hw12.c: New test.
	* gcc.target/powerpc/float128-hw13.c: Likewise.
	* gcc.target/powerpc/float128-hw4.c: Update insns.
---
 gcc/config/rs6000/rs6000-builtin.cc           | 237 ++++++++++--------
 gcc/config/rs6000/rs6000.cc                   |  31 ++-
 .../gcc.target/powerpc/float128-hw12.c        | 137 ++++++++++
 .../gcc.target/powerpc/float128-hw13.c        | 137 ++++++++++
 .../gcc.target/powerpc/float128-hw4.c         |  10 +-
 5 files changed, 431 insertions(+), 121 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw12.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw13.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc
index 90ab39dc258..e5298f45363 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -730,25 +730,28 @@ rs6000_init_builtins (void)
 
   if (TARGET_FLOAT128_TYPE)
     {
-      if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128)
-	ieee128_float_type_node = long_double_type_node;
-      else
+      /* In the past we used long_double_type_node when long double was IEEE
+	 128-bit.  However, this means that the _Float128 type
+	 (i.e. float128_type_node) is a different type from __float128
+	 (i.e. ieee128_float_type_nonde).  This leads to some corner cases,
+	 such as processing signaling NaNs with the nansf128 built-in function
+	 (which returns a _Float128 value) and assign it to a long double or
+	 __float128 value.  The two explicit IEEE 128-bit types should always
+	 use the same internal type.
+
+	 For C we only need to register the __ieee128 name for it.  For C++, we
+	 create a distinct type which will mangle differently (u9__ieee128)
+	 vs. _Float128 (DF128_) and behave backwards compatibly.  */
+      if (float128t_type_node == NULL_TREE)
 	{
-	  /* For C we only need to register the __ieee128 name for
-	     it.  For C++, we create a distinct type which will mangle
-	     differently (u9__ieee128) vs. _Float128 (DF128_) and behave
-	     backwards compatibly.  */
-	  if (float128t_type_node == NULL_TREE)
-	    {
-	      float128t_type_node = make_node (REAL_TYPE);
-	      TYPE_PRECISION (float128t_type_node)
-		= TYPE_PRECISION (float128_type_node);
-	      layout_type (float128t_type_node);
-	      SET_TYPE_MODE (float128t_type_node,
-			     TYPE_MODE (float128_type_node));
-	    }
-	  ieee128_float_type_node = float128t_type_node;
+	  float128t_type_node = make_node (REAL_TYPE);
+	  TYPE_PRECISION (float128t_type_node)
+	    = TYPE_PRECISION (float128_type_node);
+	  layout_type (float128t_type_node);
+	  SET_TYPE_MODE (float128t_type_node,
+			 TYPE_MODE (float128_type_node));
 	}
+      ieee128_float_type_node = float128t_type_node;
       t = build_qualified_type (ieee128_float_type_node, TYPE_QUAL_CONST);
       lang_hooks.types.register_builtin_type (ieee128_float_type_node,
 					      "__ieee128");
@@ -3265,13 +3268,13 @@ htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
 
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
-   (and in mode MODE if that's convenient).
+   (and in mode RETURN_MODE if that's convenient).
    SUBTARGET may be used as the target for computing one of EXP's operands.
    IGNORE is nonzero if the value is to be ignored.
    Use the new builtin infrastructure.  */
 rtx
 rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
-		       machine_mode /* mode */, int ignore)
+		       machine_mode return_mode, int ignore)
 {
   tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   enum rs6000_gen_builtins fcode
@@ -3287,78 +3290,99 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
   size_t uns_fcode = (size_t)fcode;
   enum insn_code icode = rs6000_builtin_info[uns_fcode].icode;
 
-  /* TODO: The following commentary and code is inherited from the original
-     builtin processing code.  The commentary is a bit confusing, with the
-     intent being that KFmode is always IEEE-128, IFmode is always IBM
-     double-double, and TFmode is the current long double.  The code is
-     confusing in that it converts from KFmode to TFmode pattern names,
-     when the other direction is more intuitive.  Try to address this.  */
-
-  /* We have two different modes (KFmode, TFmode) that are the IEEE
-     128-bit floating point type, depending on whether long double is the
-     IBM extended double (KFmode) or long double is IEEE 128-bit (TFmode).
-     It is simpler if we only define one variant of the built-in function,
-     and switch the code when defining it, rather than defining two built-
-     ins and using the overload table in rs6000-c.cc to switch between the
-     two.  If we don't have the proper assembler, don't do this switch
-     because CODE_FOR_*kf* and CODE_FOR_*tf* will be CODE_FOR_nothing.  */
-  if (FLOAT128_IEEE_P (TFmode))
-    switch (icode)
-      {
-      case CODE_FOR_sqrtkf2_odd:
-	icode = CODE_FOR_sqrttf2_odd;
-	break;
-      case CODE_FOR_trunckfdf2_odd:
-	icode = CODE_FOR_trunctfdf2_odd;
-	break;
-      case CODE_FOR_addkf3_odd:
-	icode = CODE_FOR_addtf3_odd;
-	break;
-      case CODE_FOR_subkf3_odd:
-	icode = CODE_FOR_subtf3_odd;
-	break;
-      case CODE_FOR_mulkf3_odd:
-	icode = CODE_FOR_multf3_odd;
-	break;
-      case CODE_FOR_divkf3_odd:
-	icode = CODE_FOR_divtf3_odd;
-	break;
-      case CODE_FOR_fmakf4_odd:
-	icode = CODE_FOR_fmatf4_odd;
-	break;
-      case CODE_FOR_xsxexpqp_kf:
-	icode = CODE_FOR_xsxexpqp_tf;
-	break;
-      case CODE_FOR_xsxsigqp_kf:
-	icode = CODE_FOR_xsxsigqp_tf;
-	break;
-      case CODE_FOR_xststdcnegqp_kf:
-	icode = CODE_FOR_xststdcnegqp_tf;
-	break;
-      case CODE_FOR_xsiexpqp_kf:
-	icode = CODE_FOR_xsiexpqp_tf;
-	break;
-      case CODE_FOR_xsiexpqpf_kf:
-	icode = CODE_FOR_xsiexpqpf_tf;
-	break;
-      case CODE_FOR_xststdcqp_kf:
-	icode = CODE_FOR_xststdcqp_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_eq_kf:
-	icode = CODE_FOR_xscmpexpqp_eq_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_lt_kf:
-	icode = CODE_FOR_xscmpexpqp_lt_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_gt_kf:
-	icode = CODE_FOR_xscmpexpqp_gt_tf;
-	break;
-      case CODE_FOR_xscmpexpqp_unordered_kf:
-	icode = CODE_FOR_xscmpexpqp_unordered_tf;
-	break;
-      default:
-	break;
-      }
+  /* For 128-bit long double, we may need both the KFmode built-in functions
+     and IFmode built-in functions to the equivalent TFmode built-in function,
+     if either a TFmode result is expected or any of the arguments use
+     TFmode.  */
+  if (TARGET_LONG_DOUBLE_128)
+    {
+      bool uses_tf_mode = return_mode == TFmode;
+      if (!uses_tf_mode)
+	{
+	  call_expr_arg_iterator iter;
+	  tree arg;
+	  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+	    {
+	      if (arg != error_mark_node
+		  && TYPE_MODE (TREE_TYPE (arg)) == TFmode)
+		{
+		  uses_tf_mode = true;
+		  break;
+		}
+	    }
+	}
+
+      /* Convert KFmode built-in functions to TFmode when long double is IEEE
+	 128-bit.  */
+      if (uses_tf_mode && FLOAT128_IEEE_P (TFmode))
+	switch (icode)
+	  {
+	  case CODE_FOR_sqrtkf2_odd:
+	    icode = CODE_FOR_sqrttf2_odd;
+	    break;
+	  case CODE_FOR_trunckfdf2_odd:
+	    icode = CODE_FOR_trunctfdf2_odd;
+	    break;
+	  case CODE_FOR_addkf3_odd:
+	    icode = CODE_FOR_addtf3_odd;
+	    break;
+	  case CODE_FOR_subkf3_odd:
+	    icode = CODE_FOR_subtf3_odd;
+	    break;
+	  case CODE_FOR_mulkf3_odd:
+	    icode = CODE_FOR_multf3_odd;
+	    break;
+	  case CODE_FOR_divkf3_odd:
+	    icode = CODE_FOR_divtf3_odd;
+	    break;
+	  case CODE_FOR_fmakf4_odd:
+	    icode = CODE_FOR_fmatf4_odd;
+	    break;
+	  case CODE_FOR_xsxexpqp_kf:
+	    icode = CODE_FOR_xsxexpqp_tf;
+	    break;
+	  case CODE_FOR_xsxsigqp_kf:
+	    icode = CODE_FOR_xsxsigqp_tf;
+	    break;
+	  case CODE_FOR_xststdcnegqp_kf:
+	    icode = CODE_FOR_xststdcnegqp_tf;
+	    break;
+	  case CODE_FOR_xsiexpqp_kf:
+	    icode = CODE_FOR_xsiexpqp_tf;
+	    break;
+	  case CODE_FOR_xsiexpqpf_kf:
+	    icode = CODE_FOR_xsiexpqpf_tf;
+	    break;
+	  case CODE_FOR_xststdcqp_kf:
+	    icode = CODE_FOR_xststdcqp_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_eq_kf:
+	    icode = CODE_FOR_xscmpexpqp_eq_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_lt_kf:
+	    icode = CODE_FOR_xscmpexpqp_lt_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_gt_kf:
+	    icode = CODE_FOR_xscmpexpqp_gt_tf;
+	    break;
+	  case CODE_FOR_xscmpexpqp_unordered_kf:
+	    icode = CODE_FOR_xscmpexpqp_unordered_tf;
+	    break;
+	  default:
+	    break;
+	  }
+
+      /* Convert IFmode built-in functions to TFmode when long double is IBM
+	 128-bit.  */
+      else if (uses_tf_mode && FLOAT128_IBM_P (TFmode))
+	{
+	  if (icode == CODE_FOR_packif)
+	    icode = CODE_FOR_packtf;
+
+	  else if (icode == CODE_FOR_unpackif)
+	    icode = CODE_FOR_unpacktf;
+	}
+    }
 
   /* In case of "#pragma target" changes, we initialize all builtins
      but check for actual availability now, during expand time.  For
@@ -3481,18 +3505,6 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
 
   if (bif_is_ibm128 (*bifaddr) && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)
     {
-      if (fcode == RS6000_BIF_PACK_IF)
-	{
-	  icode = CODE_FOR_packtf;
-	  fcode = RS6000_BIF_PACK_TF;
-	  uns_fcode = (size_t) fcode;
-	}
-      else if (fcode == RS6000_BIF_UNPACK_IF)
-	{
-	  icode = CODE_FOR_unpacktf;
-	  fcode = RS6000_BIF_UNPACK_TF;
-	  uns_fcode = (size_t) fcode;
-	}
     }
 
   /* TRUE iff the built-in function returns void.  */
@@ -3647,7 +3659,24 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
 
   for (int i = 0; i < nargs; i++)
     if (!insn_data[icode].operand[i+k].predicate (op[i], mode[i+k]))
-      op[i] = copy_to_mode_reg (mode[i+k], op[i]);
+      {
+	/* If the predicate failed because the modes are different, do a
+	   convert instead of copy_to_mode_reg, since copy_to_mode_reg will
+	   abort in this case.  The modes might be different if we have two
+	   different 128-bit floating point modes (i.e. KFmode/TFmode if long
+	   double is IEEE 128-bit and IFmode/TFmode if long double is IBM
+	   128-bit).  */
+	machine_mode mode_insn = mode[i+k];
+	machine_mode mode_op = GET_MODE (op[i]);
+	if (mode_insn != mode_op && mode_op != VOIDmode)
+	  {
+	    rtx tmp = gen_reg_rtx (mode_insn);
+	    convert_move (tmp, op[i], 0);
+	    op[i] = tmp;
+	  }
+	else
+	  op[i] = copy_to_mode_reg (mode_insn, op[i]);
+      }
 
   rtx pat;
 
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index cfb6227e27b..8a8357512c0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -23851,15 +23851,23 @@ rs6000_eh_return_filter_mode (void)
   return TARGET_32BIT ? SImode : word_mode;
 }
 
-/* Target hook for translate_mode_attribute.  */
+/* Target hook for translate_mode_attribute.
+
+   When -mabi=ieeelongdouble is used, we want to translate either KFmode or
+   TFmode to KFmode.  This is because user code that wants to specify IEEE
+   128-bit types will use either TFmode or KFmode, and we really want to use
+   the _Float128 and __float128 types instead of long double.
+
+   Similarly when -mabi=ibmlongdouble is used, we want to map IFmode into
+   TFmode.  */
 static machine_mode
 rs6000_translate_mode_attribute (machine_mode mode)
 {
-  if ((FLOAT128_IEEE_P (mode)
-       && ieee128_float_type_node == long_double_type_node)
-      || (FLOAT128_IBM_P (mode)
-	  && ibm128_float_type_node == long_double_type_node))
+  if (FLOAT128_IBM_P (mode)
+      && ibm128_float_type_node == long_double_type_node)
     return COMPLEX_MODE_P (mode) ? E_TCmode : E_TFmode;
+  else if (FLOAT128_IEEE_P (mode))
+    return COMPLEX_MODE_P (mode) ? E_KCmode : E_KFmode;
   return mode;
 }
 
@@ -23895,13 +23903,10 @@ rs6000_libgcc_floating_mode_supported_p (scalar_float_mode mode)
     case E_TFmode:
       return true;
 
-      /* We only return true for KFmode if IEEE 128-bit types are supported, and
-	 if long double does not use the IEEE 128-bit format.  If long double
-	 uses the IEEE 128-bit format, it will use TFmode and not KFmode.
-	 Because the code will not use KFmode in that case, there will be aborts
-	 because it can't find KFmode in the Floatn types.  */
+      /* We only return true for KFmode if IEEE 128-bit types are
+	 supported.  */
     case E_KFmode:
-      return TARGET_FLOAT128_TYPE && !TARGET_IEEEQUAD;
+      return TARGET_FLOAT128_TYPE;
 
     default:
       return false;
@@ -23935,7 +23940,7 @@ rs6000_floatn_mode (int n, bool extended)
 
 	case 64:
 	  if (TARGET_FLOAT128_TYPE)
-	    return (FLOAT128_IEEE_P (TFmode)) ? TFmode : KFmode;
+	    return KFmode;
 	  else
 	    return opt_scalar_float_mode ();
 
@@ -23959,7 +23964,7 @@ rs6000_floatn_mode (int n, bool extended)
 
 	case 128:
 	  if (TARGET_FLOAT128_TYPE)
-	    return (FLOAT128_IEEE_P (TFmode)) ? TFmode : KFmode;
+	    return KFmode;
 	  else
 	    return opt_scalar_float_mode ();
 
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw12.c b/gcc/testsuite/gcc.target/powerpc/float128-hw12.c
new file mode 100644
index 00000000000..d08b4cbc883
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw12.c
@@ -0,0 +1,137 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target float128 } */
+/* { dg-options "-mpower9-vector -O2 -mabi=ieeelongdouble -Wno-psabi" } */
+
+/* Insure that the ISA 3.0 IEEE 128-bit floating point built-in functions work
+   with _Float128.  This is the same as float128-hw4.c, except the type
+   _Float128 is used, and the IEEE 128-bit long double ABI is used.  */
+
+#ifndef TYPE
+#define TYPE _Float128
+#endif
+
+unsigned int
+get_double_exponent (double a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned int
+get_float128_exponent (TYPE a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned long
+get_double_mantissa (double a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+__uint128_t
+get_float128_mantissa (TYPE a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+double
+set_double_exponent_ulong (unsigned long a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_uint128 (__uint128_t a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+double
+set_double_exponent_double (double a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_float128 (TYPE a, __uint128_t e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+sqrt_odd (TYPE a)
+{
+  return __builtin_sqrtf128_round_to_odd (a);
+}
+
+double
+trunc_odd (TYPE a)
+{
+  return __builtin_truncf128_round_to_odd (a);
+}
+
+TYPE
+add_odd (TYPE a, TYPE b)
+{
+  return __builtin_addf128_round_to_odd (a, b);
+}
+
+TYPE
+sub_odd (TYPE a, TYPE b)
+{
+  return __builtin_subf128_round_to_odd (a, b);
+}
+
+TYPE
+mul_odd (TYPE a, TYPE b)
+{
+  return __builtin_mulf128_round_to_odd (a, b);
+}
+
+TYPE
+div_odd (TYPE a, TYPE b)
+{
+  return __builtin_divf128_round_to_odd (a, b);
+}
+
+TYPE
+fma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+fms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+TYPE
+nfma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+nfms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+/* { dg-final { scan-assembler 	   {\mxsiexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsiexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsaddqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsdivqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsmaddqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmsubqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmulqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsnmaddqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxsnmsubqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxssqrtqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxssubqpo\M}   } } */
+/* { dg-final { scan-assembler-not {\mbl\M}         } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw13.c b/gcc/testsuite/gcc.target/powerpc/float128-hw13.c
new file mode 100644
index 00000000000..51a3cd4802b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw13.c
@@ -0,0 +1,137 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target float128 } */
+/* { dg-options "-mpower9-vector -O2 -mabi=ibmlongdouble -Wno-psabi" } */
+
+/* Insure that the ISA 3.0 IEEE 128-bit floating point built-in functions work
+   with __float128.  This is the same as float128-hw4.c, except the type
+   __float128 is used, and the IBM 128-bit long double ABI is used.  */
+
+#ifndef TYPE
+#define TYPE __float128
+#endif
+
+unsigned int
+get_double_exponent (double a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned int
+get_float128_exponent (TYPE a)
+{
+  return __builtin_vec_scalar_extract_exp (a);
+}
+
+unsigned long
+get_double_mantissa (double a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+__uint128_t
+get_float128_mantissa (TYPE a)
+{
+  return __builtin_vec_scalar_extract_sig (a);
+}
+
+double
+set_double_exponent_ulong (unsigned long a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_uint128 (__uint128_t a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+double
+set_double_exponent_double (double a, unsigned long e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+set_float128_exponent_float128 (TYPE a, __uint128_t e)
+{
+  return __builtin_vec_scalar_insert_exp (a, e);
+}
+
+TYPE
+sqrt_odd (TYPE a)
+{
+  return __builtin_sqrtf128_round_to_odd (a);
+}
+
+double
+trunc_odd (TYPE a)
+{
+  return __builtin_truncf128_round_to_odd (a);
+}
+
+TYPE
+add_odd (TYPE a, TYPE b)
+{
+  return __builtin_addf128_round_to_odd (a, b);
+}
+
+TYPE
+sub_odd (TYPE a, TYPE b)
+{
+  return __builtin_subf128_round_to_odd (a, b);
+}
+
+TYPE
+mul_odd (TYPE a, TYPE b)
+{
+  return __builtin_mulf128_round_to_odd (a, b);
+}
+
+TYPE
+div_odd (TYPE a, TYPE b)
+{
+  return __builtin_divf128_round_to_odd (a, b);
+}
+
+TYPE
+fma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+fms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return __builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+TYPE
+nfma_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, c);
+}
+
+TYPE
+nfms_odd (TYPE a, TYPE b, TYPE c)
+{
+  return -__builtin_fmaf128_round_to_odd (a, b, -c);
+}
+
+/* { dg-final { scan-assembler 	   {\mxsiexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsiexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxexpqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigdp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsxsigqp\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsaddqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsdivqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsmaddqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmsubqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsmulqpo\M}   } } */
+/* { dg-final { scan-assembler 	   {\mxsnmaddqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxsnmsubqpo\M} } } */
+/* { dg-final { scan-assembler 	   {\mxssqrtqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxssubqpo\M}   } } */
+/* { dg-final { scan-assembler-not {\mbl\M}         } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-hw4.c b/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
index fc149169bc6..3f6717825b7 100644
--- a/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
+++ b/gcc/testsuite/gcc.target/powerpc/float128-hw4.c
@@ -118,6 +118,11 @@ nfms_odd (TYPE a, TYPE b, TYPE c)
   return -__builtin_fmaf128_round_to_odd (a, b, -c);
 }
 
+/* In using long double instead of _Float128, we might not be able to optimize
+   __builtin_fmaf128_round_to_odd (a, b, -c) into using xsmsubqpo instead of
+   xsnegqp and xsmaddqpo due to conversions between TFmode and KFmode.  So just
+   recognize that the did the FMA optimization.  */
+
 /* { dg-final { scan-assembler 	   {\mxsiexpdp\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsiexpqp\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsxexpdp\M}   } } */
@@ -126,11 +131,8 @@ nfms_odd (TYPE a, TYPE b, TYPE c)
 /* { dg-final { scan-assembler 	   {\mxsxsigqp\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsaddqpo\M}   } } */
 /* { dg-final { scan-assembler 	   {\mxsdivqpo\M}   } } */
-/* { dg-final { scan-assembler 	   {\mxsmaddqpo\M}  } } */
-/* { dg-final { scan-assembler 	   {\mxsmsubqpo\M}  } } */
+/* { dg-final { scan-assembler 	   {\mxsn?m(add|sub)qpo\M} } } */
 /* { dg-final { scan-assembler 	   {\mxsmulqpo\M}   } } */
-/* { dg-final { scan-assembler 	   {\mxsnmaddqpo\M} } } */
-/* { dg-final { scan-assembler 	   {\mxsnmsubqpo\M} } } */
 /* { dg-final { scan-assembler 	   {\mxssqrtqpo\M}  } } */
 /* { dg-final { scan-assembler 	   {\mxssubqpo\M}   } } */
 /* { dg-final { scan-assembler-not {\mbl\M}         } } */
-- 
2.37.3


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/3] Update float 128-bit conversions, PR target/107299.
  2022-11-02  2:39 Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit) Michael Meissner
  2022-11-02  2:40 ` [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299 Michael Meissner
  2022-11-02  2:42 ` [PATCH 2/3] Make __float128 use the _Float128 type, " Michael Meissner
@ 2022-11-02  2:44 ` Michael Meissner
  2022-11-07 15:44   ` Ping: " Michael Meissner
                     ` (2 more replies)
  2022-12-06 14:56 ` Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit) Segher Boessenkool
  3 siblings, 3 replies; 41+ messages in thread
From: Michael Meissner @ 2022-11-02  2:44 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

This patch fixes two tests that are still failing when long double is IEEE
128-bit after the previous 2 patches for PR target/107299 have been applied.
The tests are:

	gcc.target/powerpc/convert-fp-128.c
	gcc.target/powerpc/pr85657-3.c

This patch is a rewrite of the patch submitted on August 18th:

| https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599988.html

This patch reworks the conversions between 128-bit binary floating point types.
Previously, we would call rs6000_expand_float128_convert to do all conversions.
Now, we only define the conversions between the same representation that turn
into a NOP.  The appropriate extend or truncate insn is generated, and after
register allocation, it is converted to a move.

This patch also fixes two places where we want to override the external name
for the conversion function, and the wrong optab was used.  Previously,
rs6000_expand_float128_convert would handle the move or generate the call as
needed.  Now, it lets the machine independent code generate the call.  But if
we use the machine independent code to generate the call, we need to update the
name for two optabs where a truncate would be used in terms of converting
between the modes.  This patch updates those two optabs.

I tested this patch on:

    1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
    2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
    3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
    4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm

In the past I have also tested this exact patch on the following systems:

    1)	LE Power10 using --with-cpu=power9  --with-long-double-format=ibm
    2)	LE Power10 using --with-cpu=power8  --with-long-double-format=ibm
    3)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm

There were no regressions in the bootstrap process or running the tests (after
applying all 3 patches for PR target/107299).  Can I check this patch into the
trunk?

2022-11-01   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/107299
	* config/rs6000/rs6000.cc (init_float128_ieee): Use the correct
	float_extend or float_truncate optab based on how the machine converts
	between IEEE 128-bit and IBM 128-bit.
	* config/rs6000/rs6000.md (IFKF): Delete.
	(IFKF_reg): Delete.
	(extendiftf2): Rewrite to be a move if IFmode and TFmode are both IBM
	128-bit.  Do not run if TFmode is IEEE 128-bit.
	(extendifkf2): Delete.
	(extendtfkf2): Delete.
	(extendtfif2): Delete.
	(trunciftf2): Delete.
	(truncifkf2): Delete.
	(trunckftf2): Delete.
	(extendkftf2): Implement conversion of IEEE 128-bit types as a move.
	(trunctfif2): Delete.
	(trunctfkf2): Implement conversion of IEEE 128-bit types as a move.
	(extend<mode>tf2_internal): Delete.
	(extendtf<mode>2_internal): Delete.
---
 gcc/config/rs6000/rs6000.cc |   4 +-
 gcc/config/rs6000/rs6000.md | 177 ++++++++++--------------------------
 2 files changed, 50 insertions(+), 131 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 8a8357512c0..9a5907c7130 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -11156,11 +11156,11 @@ init_float128_ieee (machine_mode mode)
       set_conv_libfunc (trunc_optab, SFmode, mode, "__trunckfsf2");
       set_conv_libfunc (trunc_optab, DFmode, mode, "__trunckfdf2");
 
-      set_conv_libfunc (sext_optab, mode, IFmode, "__trunctfkf2");
+      set_conv_libfunc (trunc_optab, mode, IFmode, "__trunctfkf2");
       if (mode != TFmode && FLOAT128_IBM_P (TFmode))
 	set_conv_libfunc (sext_optab, mode, TFmode, "__trunctfkf2");
 
-      set_conv_libfunc (trunc_optab, IFmode, mode, "__extendkftf2");
+      set_conv_libfunc (sext_optab, IFmode, mode, "__extendkftf2");
       if (mode != TFmode && FLOAT128_IBM_P (TFmode))
 	set_conv_libfunc (trunc_optab, TFmode, mode, "__extendkftf2");
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 3bae303086b..4880df5c51c 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -543,12 +543,6 @@ (define_mode_iterator FMOVE128_GPR [TI
 ; Iterator for 128-bit VSX types for pack/unpack
 (define_mode_iterator FMOVE128_VSX [V1TI KF])
 
-; Iterators for converting to/from TFmode
-(define_mode_iterator IFKF [IF KF])
-
-; Constraints for moving IF/KFmode.
-(define_mode_attr IFKF_reg [(IF "d") (KF "wa")])
-
 ; Whether a floating point move is ok, don't allow SD without hardware FP
 (define_mode_attr fmove_ok [(SF "")
 			    (DF "")
@@ -9096,106 +9090,65 @@ (define_insn "*ieee_128bit_vsx_nabs<mode>2_internal"
   "xxlor %x0,%x1,%x2"
   [(set_attr "type" "veclogical")])
 
-;; Float128 conversion functions.  These expand to library function calls.
-;; We use expand to convert from IBM double double to IEEE 128-bit
-;; and trunc for the opposite.
-(define_expand "extendiftf2"
-  [(set (match_operand:TF 0 "gpc_reg_operand")
-	(float_extend:TF (match_operand:IF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
-{
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
-
-(define_expand "extendifkf2"
-  [(set (match_operand:KF 0 "gpc_reg_operand")
-	(float_extend:KF (match_operand:IF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
-{
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
-
-(define_expand "extendtfkf2"
-  [(set (match_operand:KF 0 "gpc_reg_operand")
-	(float_extend:KF (match_operand:TF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
-{
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
-
-(define_expand "extendtfif2"
-  [(set (match_operand:IF 0 "gpc_reg_operand")
-	(float_extend:IF (match_operand:TF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
-{
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
-
-(define_expand "trunciftf2"
-  [(set (match_operand:TF 0 "gpc_reg_operand")
-	(float_truncate:TF (match_operand:IF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
-{
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
-
-(define_expand "truncifkf2"
-  [(set (match_operand:KF 0 "gpc_reg_operand")
-	(float_truncate:KF (match_operand:IF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
-{
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
-
-(define_expand "trunckftf2"
-  [(set (match_operand:TF 0 "gpc_reg_operand")
-	(float_truncate:TF (match_operand:KF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
+;; Float128 conversion functions.  We only define the 'conversions' between two
+;; formats that use the same representation.  We call the library function to
+;; convert between IEEE 128-bit and IBM 128-bit.  We can't do these moves by
+;; using a SUBREG before register allocation.  We set up the moves to prefer
+;; the output register being the same as the input register, which would enable
+;; the move to be deleted completely.
+(define_insn_and_split "extendkftf2"
+  [(set (match_operand:TF 0 "gpc_reg_operand" "=wa,wa")
+	(float_extend:TF (match_operand:KF 1 "gpc_reg_operand" "0,wa")))]
+  "TARGET_FLOAT128_TYPE && FLOAT128_IEEE_P (TFmode)"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(match_dup 2))]
 {
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
+  operands[2] = gen_lowpart (TFmode, operands[1]);
+}
+  [(set_attr "type" "veclogical")])
 
-(define_expand "trunctfif2"
-  [(set (match_operand:IF 0 "gpc_reg_operand")
-	(float_truncate:IF (match_operand:TF 1 "gpc_reg_operand")))]
-  "TARGET_FLOAT128_TYPE"
+(define_insn_and_split "trunctfkf2"
+  [(set (match_operand:KF 0 "gpc_reg_operand" "=wa,wa")
+	(float_truncate:KF (match_operand:TF 1 "gpc_reg_operand" "0,wa")))]
+  "TARGET_FLOAT128_TYPE && FLOAT128_IEEE_P (TFmode)"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(match_dup 2))]
 {
-  rs6000_expand_float128_convert (operands[0], operands[1], false);
-  DONE;
-})
+  operands[2] = gen_lowpart (KFmode, operands[1]);
+}
+  [(set_attr "type" "veclogical")])
 
-(define_insn_and_split "*extend<mode>tf2_internal"
-  [(set (match_operand:TF 0 "gpc_reg_operand" "=<IFKF_reg>")
-	(float_extend:TF
-	 (match_operand:IFKF 1 "gpc_reg_operand" "<IFKF_reg>")))]
-   "TARGET_FLOAT128_TYPE
-    && FLOAT128_IBM_P (TFmode) == FLOAT128_IBM_P (<MODE>mode)"
+(define_insn_and_split "extendtfif2"
+  [(set (match_operand:IF 0 "gpc_reg_operand" "=wa,wa,r,r")
+	(float_extend:IF (match_operand:TF 1 "gpc_reg_operand" "0,wa,0,r")))]
+  "TARGET_HARD_FLOAT && FLOAT128_IBM_P (TFmode)"
   "#"
   "&& reload_completed"
-  [(set (match_dup 0) (match_dup 2))]
+  [(set (match_dup 0)
+	(match_dup 2))]
 {
-  operands[2] = gen_rtx_REG (TFmode, REGNO (operands[1]));
-})
+  operands[2] = gen_lowpart (IFmode, operands[1]);
+}
+  [(set_attr "num_insns" "2")
+   (set_attr "length" "8")])
 
-(define_insn_and_split "*extendtf<mode>2_internal"
-  [(set (match_operand:IFKF 0 "gpc_reg_operand" "=<IFKF_reg>")
-	(float_extend:IFKF
-	 (match_operand:TF 1 "gpc_reg_operand" "<IFKF_reg>")))]
-   "TARGET_FLOAT128_TYPE
-    && FLOAT128_IBM_P (TFmode) == FLOAT128_IBM_P (<MODE>mode)"
+(define_insn_and_split "extendiftf2"
+  [(set (match_operand:TF 0 "gpc_reg_operand" "=wa,wa,r,r")
+	(float_extend:TF (match_operand:IF 1 "gpc_reg_operand" "0,wa,0,r")))]
+  "TARGET_HARD_FLOAT && FLOAT128_IBM_P (TFmode)"
   "#"
   "&& reload_completed"
-  [(set (match_dup 0) (match_dup 2))]
+  [(set (match_dup 0)
+	(match_dup 2))]
 {
-  operands[2] = gen_rtx_REG (<MODE>mode, REGNO (operands[1]));
-})
+  operands[2] = gen_lowpart (TFmode, operands[1]);
+}
+  [(set_attr "num_insns" "2")
+   (set_attr "length" "8")])
 
 \f
 ;; Reload helper functions used by rs6000_secondary_reload.  The patterns all
@@ -14909,40 +14862,6 @@ (define_insn "extend<SFDF:mode><IEEE128:mode>2_hw"
   [(set_attr "type" "vecfloat")
    (set_attr "size" "128")])
 
-;; Conversion between KFmode and TFmode if TFmode is ieee 128-bit floating
-;; point is a simple copy.
-(define_insn_and_split "extendkftf2"
-  [(set (match_operand:TF 0 "vsx_register_operand" "=wa,?wa")
-	(float_extend:TF (match_operand:KF 1 "vsx_register_operand" "0,wa")))]
-  "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD"
-  "@
-   #
-   xxlor %x0,%x1,%x1"
-  "&& reload_completed  && REGNO (operands[0]) == REGNO (operands[1])"
-  [(const_int 0)]
-{
-  emit_note (NOTE_INSN_DELETED);
-  DONE;
-}
-  [(set_attr "type" "*,veclogical")
-   (set_attr "length" "0,4")])
-
-(define_insn_and_split "trunctfkf2"
-  [(set (match_operand:KF 0 "vsx_register_operand" "=wa,?wa")
-	(float_extend:KF (match_operand:TF 1 "vsx_register_operand" "0,wa")))]
-  "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD"
-  "@
-   #
-   xxlor %x0,%x1,%x1"
-  "&& reload_completed  && REGNO (operands[0]) == REGNO (operands[1])"
-  [(const_int 0)]
-{
-  emit_note (NOTE_INSN_DELETED);
-  DONE;
-}
-  [(set_attr "type" "*,veclogical")
-   (set_attr "length" "0,4")])
-
 (define_insn "trunc<mode>df2_hw"
   [(set (match_operand:DF 0 "altivec_register_operand" "=v")
 	(float_truncate:DF
-- 
2.37.3


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Ping: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-11-02  2:40 ` [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299 Michael Meissner
@ 2022-11-07 15:41   ` Michael Meissner
  2022-11-29 17:43   ` Ping #2: " Michael Meissner
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Michael Meissner @ 2022-11-07 15:41 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

Ping patch:

| Date: Tue, 1 Nov 2022 22:40:43 -0400
| Subject: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
| Message-ID: <Y2HYqx4zLCNCT0Zy@toto.the-meissners.org>

This patch is needed to build GCC on Fedora 36 where the default for long
double is now IEEE 128-bit.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Ping: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-11-02  2:42 ` [PATCH 2/3] Make __float128 use the _Float128 type, " Michael Meissner
@ 2022-11-07 15:43   ` Michael Meissner
  2022-11-29 17:44   ` Michael Meissner
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Michael Meissner @ 2022-11-07 15:43 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

Ping patch:

| Date: Tue, 1 Nov 2022 22:42:30 -0400
| Subject: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
| Message-ID: <Y2HZFlHH8HuvhGL4@toto.the-meissners.org>

This patch is needed to build GCC on Fedora 36 which has switched the long
double default to be IEEE 128-bit.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Ping: [PATCH 3/3] Update float 128-bit conversions, PR target/107299.
  2022-11-02  2:44 ` [PATCH 3/3] Update float 128-bit conversions, " Michael Meissner
@ 2022-11-07 15:44   ` Michael Meissner
  2022-11-29 17:46   ` Ping #3: " Michael Meissner
  2022-12-02 18:04   ` Michael Meissner
  2 siblings, 0 replies; 41+ messages in thread
From: Michael Meissner @ 2022-11-07 15:44 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

Ping patch:

| Date: Tue, 1 Nov 2022 22:44:01 -0400
| Subject: [PATCH 3/3] Update float 128-bit conversions, PR target/107299.
| Message-ID: <Y2HZcYMCCcyEADyD@toto.the-meissners.org>

This patch fixes some issues with IEEE 128-bit long doubles once the previous 2
patches have been applied.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Ping #2: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-11-02  2:40 ` [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299 Michael Meissner
  2022-11-07 15:41   ` Ping: " Michael Meissner
@ 2022-11-29 17:43   ` Michael Meissner
  2022-12-02 17:58   ` Ping #3: " Michael Meissner
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Michael Meissner @ 2022-11-29 17:43 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

Can we please get this patch reviewed?  GCC 13 won't build on Fedora 37 (which
defaults to long double being IEEE 128-bit) without the 3 patches in this set.

| Date: Tue, 1 Nov 2022 22:40:43 -0400
| Subject: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
| Message-ID: <Y2HYqx4zLCNCT0Zy@toto.the-meissners.org>

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-11-02  2:42 ` [PATCH 2/3] Make __float128 use the _Float128 type, " Michael Meissner
  2022-11-07 15:43   ` Ping: " Michael Meissner
@ 2022-11-29 17:44   ` Michael Meissner
  2022-12-02 18:01   ` Ping #3: " Michael Meissner
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Michael Meissner @ 2022-11-29 17:44 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

Can we get the three patches in this patch set reviewed?  Without the patches
applied, GCC 13 will not build on Fedora 37, which uses long double defaulting
to IEEE 128-bit.

| Date: Tue, 1 Nov 2022 22:42:30 -0400
| Subject: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
| Message-ID: <Y2HZFlHH8HuvhGL4@toto.the-meissners.org>

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Ping #3: [PATCH 3/3] Update float 128-bit conversions, PR target/107299.
  2022-11-02  2:44 ` [PATCH 3/3] Update float 128-bit conversions, " Michael Meissner
  2022-11-07 15:44   ` Ping: " Michael Meissner
@ 2022-11-29 17:46   ` Michael Meissner
  2022-12-02 18:04   ` Michael Meissner
  2 siblings, 0 replies; 41+ messages in thread
From: Michael Meissner @ 2022-11-29 17:46 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

Can we get the three patches in this patch set reviewed?  Without them, GCC 13
can't be built on Fedora 37 which defaults to IEEE 128-bit long double.

| Date: Tue, 1 Nov 2022 22:44:01 -0400
| Subject: [PATCH 3/3] Update float 128-bit conversions, PR target/107299.
| Message-ID: <Y2HZcYMCCcyEADyD@toto.the-meissners.org>

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Ping #3: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-11-02  2:40 ` [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299 Michael Meissner
  2022-11-07 15:41   ` Ping: " Michael Meissner
  2022-11-29 17:43   ` Ping #2: " Michael Meissner
@ 2022-12-02 17:58   ` Michael Meissner
  2022-12-06  9:36   ` Kewen.Lin
  2022-12-13  6:23   ` Michael Meissner
  4 siblings, 0 replies; 41+ messages in thread
From: Michael Meissner @ 2022-12-02 17:58 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer,
	Raymond Harney

Ping for patches submitted on November 1st.  These 3 patches are needed to be
able to build GCC for PowerPC target systems where long double is configured to
be IEEE 128-bit, such as Fedora 36.

This is the first patch of 3 patches.

| Date: Tue, 1 Nov 2022 22:40:43 -0400
| Subject: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
| Message-ID: <Y2HYqx4zLCNCT0Zy@toto.the-meissners.org>
| https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604835.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Ping #3: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-11-02  2:42 ` [PATCH 2/3] Make __float128 use the _Float128 type, " Michael Meissner
  2022-11-07 15:43   ` Ping: " Michael Meissner
  2022-11-29 17:44   ` Michael Meissner
@ 2022-12-02 18:01   ` Michael Meissner
  2022-12-06 11:27   ` Kewen.Lin
  2023-01-11 20:24   ` Michael Meissner
  4 siblings, 0 replies; 41+ messages in thread
From: Michael Meissner @ 2022-12-02 18:01 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer,
	Raymond Harney

Ping for patches submitted on November 1st.  These 3 patches are needed to be
able to build GCC for PowerPC target systems where long double is configured to
be IEEE 128-bit, such as Fedora 36.

This is for the 2nd of 3 patches.

| Date: Tue, 1 Nov 2022 22:42:30 -0400
| Subject: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
| Message-ID: <Y2HZFlHH8HuvhGL4@toto.the-meissners.org>
| https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604836.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Ping #3: [PATCH 3/3] Update float 128-bit conversions, PR target/107299.
  2022-11-02  2:44 ` [PATCH 3/3] Update float 128-bit conversions, " Michael Meissner
  2022-11-07 15:44   ` Ping: " Michael Meissner
  2022-11-29 17:46   ` Ping #3: " Michael Meissner
@ 2022-12-02 18:04   ` Michael Meissner
  2 siblings, 0 replies; 41+ messages in thread
From: Michael Meissner @ 2022-12-02 18:04 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer,
	Raymond Harney

Ping for patches submitted on November 1st.  These 3 patches are needed to be
able to build GCC for PowerPC target systems where long double is configured to
be IEEE 128-bit, such as Fedora 36.

This is the 3rd of 3 patches.

| Date: Tue, 1 Nov 2022 22:44:01 -0400
| Subject: [PATCH 3/3] Update float 128-bit conversions, PR target/107299.
| Message-ID: <Y2HZcYMCCcyEADyD@toto.the-meissners.org>
| https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604837.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-11-02  2:40 ` [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299 Michael Meissner
                     ` (2 preceding siblings ...)
  2022-12-02 17:58   ` Ping #3: " Michael Meissner
@ 2022-12-06  9:36   ` Kewen.Lin
  2022-12-07  6:44     ` Michael Meissner
  2022-12-13  6:23   ` Michael Meissner
  4 siblings, 1 reply; 41+ messages in thread
From: Kewen.Lin @ 2022-12-06  9:36 UTC (permalink / raw)
  To: Michael Meissner
  Cc: gcc-patches, Segher Boessenkool, David Edelsohn, Peter Bergner,
	Will Schmidt, William Seurer

Hi Mike,

Thanks for fixing this!

on 2022/11/2 10:40, Michael Meissner wrote:
> This function reworks how the complex multiply and divide built-in functions are
> done.  Previously we created built-in declarations for doing long double complex
> multiply and divide when long double is IEEE 128-bit.  The old code also did not
> support __ibm128 complex multiply and divide if long double is IEEE 128-bit.
> 
> In terms of history, I wrote the original code just as I was starting to test
> GCC on systems where IEEE 128-bit long double was the default.  At the time, we
> had not yet started mangling the built-in function names as a way to bridge
> going from a system with 128-bit IBM long double to 128-bin IEEE long double.
                                                         ~~~ bit
> 
> The original code depends on there only being two 128-bit types invovled.  With
                                                                  ~~~~~~ involved.
> the next patch in this series, this assumption will no longer be true.  When
> long double is IEEE 128-bit, there will be 2 IEEE 128-bit types (one for the
> explicit __float128/_Float128 type and one for long double).
> 
> The problem is we cannot create two separate built-in functions that resolve to
> the same name.  This is a requirement of add_builtin_function and the C front
> end.  That means for the 3 possible modes (IFmode, KFmode, and TFmode), you can
> only use 2 of them.
> 
> This code does not create the built-in declaration with the changed name.
> Instead, it uses the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the name
> before it is written out to the assembler file like it now does for all of the
> other long double built-in functions.
> 
> We need to disable using this mapping when we are building libgcc, specifically
> when it is building the floating point 128-bit multiply and divide functions.
> The flag that is used when libgcc is built (-fbuilding-libcc) is only available
> in the C/C++ front ends.  We need to remember that we are building libgcc in the
> rs6000-c.cc support to be able to use this later to decided whether to mangle
> the decl assembler name or not.

IIUC, for the building of floating point 128-bit multiply and divide functions,
the mapping seems still to work fine.  Using the multiply as example here, when
compiling _multc3.o, it's with -mabi=ibmlongdouble, it maps the name with "tc"
which is consistent with what we need.  While compiling _mulkc3*.o, we would
check the macro __LONG_DOUBLE_IEEE128__ and use either KCmode or TCmode, either
of the mapping result would be "kc".

Could you help to elaborate why we need to disable it during libgcc building?

BR,
Kewen

> 
> When I wrote these patches, I discovered that __ibm128 complex multiply and
> divide had originally not been supported if long double is IEEE 128-bit as it
> would generate calls to __mulic3 and __divic3.  I added tests in the testsuite
> to verify that the correct name (i.e. __multc3 and __divtc3) is used in this
> case.
> 
> I tested all 3 patchs for PR target/107299 on:
> 
>     1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
>     2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
>     3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
>     4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm
> 
> Once all 3 patches have been applied, we can once again build GCC when long
> double is IEEE 128-bit.  There were no other regressions with these patches.
> Can I check these patches into the trunk?
> 
> 2022-11-01   Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 
> 	PR target/107299
> 	* config/rs6000/rs6000-c.cc (rs6000_cpu_cpp_builtins): Set
> 	building_libgcc.
> 	* config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
> 	(init_float128_ieee): Delete code to switch complex multiply and divide
> 	for long double.
> 	(complex_multiply_builtin_code): New helper function.
> 	(complex_divide_builtin_code): Likewise.
> 	(rs6000_mangle_decl_assembler_name): Add support for mangling the name
> 	of complex 128-bit multiply and divide built-in functions.
> 	* config/rs6000/rs6000.opt (building_libgcc): New target variable.
> 
> gcc/testsuite/
> 
> 	PR target/107299
> 	* gcc.target/powerpc/divic3-1.c: New test.
> 	* gcc.target/powerpc/divic3-2.c: Likewise.
> 	* gcc.target/powerpc/mulic3-1.c: Likewise.
> 	* gcc.target/powerpc/mulic3-2.c: Likewise.
> ---
>  gcc/config/rs6000/rs6000-c.cc               |   8 ++
>  gcc/config/rs6000/rs6000.cc                 | 110 +++++++++++---------
>  gcc/config/rs6000/rs6000.opt                |   4 +
>  gcc/testsuite/gcc.target/powerpc/divic3-1.c |  18 ++++
>  gcc/testsuite/gcc.target/powerpc/divic3-2.c |  17 +++
>  gcc/testsuite/gcc.target/powerpc/mulic3-1.c |  18 ++++
>  gcc/testsuite/gcc.target/powerpc/mulic3-2.c |  17 +++
>  7 files changed, 145 insertions(+), 47 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-2.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-2.c
> 
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 56609462629..5c2f3bcee9f 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -780,6 +780,14 @@ rs6000_cpu_cpp_builtins (cpp_reader *pfile)
>        || DEFAULT_ABI == ABI_ELFv2
>        || (DEFAULT_ABI == ABI_AIX && !rs6000_compat_align_parm))
>      builtin_define ("__STRUCT_PARM_ALIGN__=16");
> +
> +  /* Store whether or not we are building libgcc.  This is needed to disable
> +     generating the alternate names for 128-bit complex multiply and divide.
> +     We need to disable generating __multc3, __divtc3, __mulkc3, and __divkc3
> +     when we are building those functions in libgcc.  The variable
> +     flag_building_libgcc is only available for the C family of front-ends.
> +     We set this variable here to disable generating the alternate names.  */
> +  building_libgcc = flag_building_libgcc;
>  }
>  
>  \f
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index a85d7630b41..cfb6227e27b 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -11123,26 +11123,6 @@ init_float128_ibm (machine_mode mode)
>      }
>  }
>  
> -/* Create a decl for either complex long double multiply or complex long double
> -   divide when long double is IEEE 128-bit floating point.  We can't use
> -   __multc3 and __divtc3 because the original long double using IBM extended
> -   double used those names.  The complex multiply/divide functions are encoded
> -   as builtin functions with a complex result and 4 scalar inputs.  */
> -
> -static void
> -create_complex_muldiv (const char *name, built_in_function fncode, tree fntype)
> -{
> -  tree fndecl = add_builtin_function (name, fntype, fncode, BUILT_IN_NORMAL,
> -				      name, NULL_TREE);
> -
> -  set_builtin_decl (fncode, fndecl, true);
> -
> -  if (TARGET_DEBUG_BUILTIN)
> -    fprintf (stderr, "create complex %s, fncode: %d\n", name, (int) fncode);
> -
> -  return;
> -}
> -
>  /* Set up IEEE 128-bit floating point routines.  Use different names if the
>     arguments can be passed in a vector register.  The historical PowerPC
>     implementation of IEEE 128-bit floating point used _q_<op> for the names, so
> @@ -11154,32 +11134,6 @@ init_float128_ieee (machine_mode mode)
>  {
>    if (FLOAT128_VECTOR_P (mode))
>      {
> -      static bool complex_muldiv_init_p = false;
> -
> -      /* Set up to call __mulkc3 and __divkc3 under -mabi=ieeelongdouble.  If
> -	 we have clone or target attributes, this will be called a second
> -	 time.  We want to create the built-in function only once.  */
> -     if (mode == TFmode && TARGET_IEEEQUAD && !complex_muldiv_init_p)
> -       {
> -	 complex_muldiv_init_p = true;
> -	 built_in_function fncode_mul =
> -	   (built_in_function) (BUILT_IN_COMPLEX_MUL_MIN + TCmode
> -				- MIN_MODE_COMPLEX_FLOAT);
> -	 built_in_function fncode_div =
> -	   (built_in_function) (BUILT_IN_COMPLEX_DIV_MIN + TCmode
> -				- MIN_MODE_COMPLEX_FLOAT);
> -
> -	 tree fntype = build_function_type_list (complex_long_double_type_node,
> -						 long_double_type_node,
> -						 long_double_type_node,
> -						 long_double_type_node,
> -						 long_double_type_node,
> -						 NULL_TREE);
> -
> -	 create_complex_muldiv ("__mulkc3", fncode_mul, fntype);
> -	 create_complex_muldiv ("__divkc3", fncode_div, fntype);
> -       }
> -
>        set_optab_libfunc (add_optab, mode, "__addkf3");
>        set_optab_libfunc (sub_optab, mode, "__subkf3");
>        set_optab_libfunc (neg_optab, mode, "__negkf2");
> @@ -28142,6 +28096,25 @@ rs6000_starting_frame_offset (void)
>    return RS6000_STARTING_FRAME_OFFSET;
>  }
>  \f
> +/* Internal function to return the built-in function id for the complex
> +   multiply operation for a given mode.  */
> +
> +static inline built_in_function
> +complex_multiply_builtin_code (machine_mode mode)
> +{
> +  return (built_in_function) (BUILT_IN_COMPLEX_MUL_MIN + mode
> +			      - MIN_MODE_COMPLEX_FLOAT);
> +}
> +
> +/* Internal function to return the built-in function id for the complex divide
> +   operation for a given mode.  */
> +
> +static inline built_in_function
> +complex_divide_builtin_code (machine_mode mode)
> +{
> +  return (built_in_function) (BUILT_IN_COMPLEX_DIV_MIN + mode
> +			      - MIN_MODE_COMPLEX_FLOAT);
> +}
>  
>  /* On 64-bit Linux and Freebsd systems, possibly switch the long double library
>     function names from <foo>l to <foo>f128 if the default long double type is
> @@ -28160,11 +28133,54 @@ rs6000_starting_frame_offset (void)
>     only do this transformation if the __float128 type is enabled.  This
>     prevents us from doing the transformation on older 32-bit ports that might
>     have enabled using IEEE 128-bit floating point as the default long double
> -   type.  */
> +   type.
> +
> +   We also use the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the
> +   function names used for complex multiply and divide to the appropriate
> +   names.  */
>  
>  static tree
>  rs6000_mangle_decl_assembler_name (tree decl, tree id)
>  {
> +  /* Handle complex multiply/divide.  For IEEE 128-bit, use __mulkc3 or
> +     __divkc3 and for IBM 128-bit use __multc3 and __divtc3.  */
> +  if (TARGET_FLOAT128_TYPE
> +      && !building_libgcc
> +      && TREE_CODE (decl) == FUNCTION_DECL
> +      && DECL_IS_UNDECLARED_BUILTIN (decl)
> +      && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
> +    {
> +      built_in_function id = DECL_FUNCTION_CODE (decl);
> +      const char *newname = NULL;
> +
> +      if (id == complex_multiply_builtin_code (KCmode))
> +	newname = "__mulkc3";
> +
> +      else if (id == complex_multiply_builtin_code (ICmode))
> +	newname = "__multc3";
> +
> +      else if (id == complex_multiply_builtin_code (TCmode))
> +	newname = (TARGET_IEEEQUAD) ? "__mulkc3" : "__multc3";
> +
> +      else if (id == complex_divide_builtin_code (KCmode))
> +	newname = "__divkc3";
> +
> +      else if (id == complex_divide_builtin_code (ICmode))
> +	newname = "__divtc3";
> +
> +      else if (id == complex_divide_builtin_code (TCmode))
> +	newname = (TARGET_IEEEQUAD) ? "__divkc3" : "__divtc3";
> +
> +      if (newname)
> +	{
> +	  if (TARGET_DEBUG_BUILTIN)
> +	    fprintf (stderr, "Map complex mul/div => %s\n", newname);
> +
> +	  return get_identifier (newname);
> +	}
> +    }
> +
> +  /* Map long double built-in functions if long double is IEEE 128-bit.  */
>    if (TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
>        && TREE_CODE (decl) == FUNCTION_DECL
>        && DECL_IS_UNDECLARED_BUILTIN (decl)
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index b63a5d443af..e2de03dda92 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -100,6 +100,10 @@ unsigned int rs6000_recip_control
>  TargetVariable
>  unsigned int rs6000_debug
>  
> +;; Whether we are building libgcc or not.
> +TargetVariable
> +bool building_libgcc = false
> +
>  ;; Whether to enable the -mfloat128 stuff without necessarily enabling the
>  ;; __float128 keyword.
>  TargetSave
> diff --git a/gcc/testsuite/gcc.target/powerpc/divic3-1.c b/gcc/testsuite/gcc.target/powerpc/divic3-1.c
> new file mode 100644
> index 00000000000..1cc6b1be904
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/divic3-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-require-effective-target longdouble128 } */
> +/* { dg-require-effective-target ppc_float128_sw } */
> +/* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
> +
> +/* Check that complex divide generates the right call for __ibm128 when long
> +   double is IEEE 128-bit floating point.  */
> +
> +typedef _Complex long double c_ibm128_t __attribute__((mode(__IC__)));
> +
> +void
> +divide (c_ibm128_t *p, c_ibm128_t *q, c_ibm128_t *r)
> +{
> +  *p = *q / *r;
> +}
> +
> +/* { dg-final { scan-assembler "bl __divtc3" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/divic3-2.c b/gcc/testsuite/gcc.target/powerpc/divic3-2.c
> new file mode 100644
> index 00000000000..8ff342e0116
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/divic3-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-require-effective-target longdouble128 } */
> +/* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */
> +
> +/* Check that complex divide generates the right call for __ibm128 when long
> +   double is IBM 128-bit floating point.  */
> +
> +typedef _Complex long double c_ibm128_t __attribute__((mode(__TC__)));
> +
> +void
> +divide (c_ibm128_t *p, c_ibm128_t *q, c_ibm128_t *r)
> +{
> +  *p = *q / *r;
> +}
> +
> +/* { dg-final { scan-assembler "bl __divtc3" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/mulic3-1.c b/gcc/testsuite/gcc.target/powerpc/mulic3-1.c
> new file mode 100644
> index 00000000000..4cd773c4b06
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/mulic3-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-require-effective-target longdouble128 } */
> +/* { dg-require-effective-target ppc_float128_sw } */
> +/* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
> +
> +/* Check that complex multiply generates the right call for __ibm128 when long
> +   double is IEEE 128-bit floating point.  */
> +
> +typedef _Complex long double c_ibm128_t __attribute__((mode(__IC__)));
> +
> +void
> +multiply (c_ibm128_t *p, c_ibm128_t *q, c_ibm128_t *r)
> +{
> +  *p = *q * *r;
> +}
> +
> +/* { dg-final { scan-assembler "bl __multc3" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/mulic3-2.c b/gcc/testsuite/gcc.target/powerpc/mulic3-2.c
> new file mode 100644
> index 00000000000..36fe8bc3061
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/mulic3-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-require-effective-target longdouble128 } */
> +/* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */
> +
> +/* Check that complex multiply generates the right call for __ibm128 when long
> +   double is IBM 128-bit floating point.  */
> +
> +typedef _Complex long double c_ibm128_t __attribute__((mode(__TC__)));
> +
> +void
> +multiply (c_ibm128_t *p, c_ibm128_t *q, c_ibm128_t *r)
> +{
> +  *p = *q * *r;
> +}
> +
> +/* { dg-final { scan-assembler "bl __multc3" } } */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-11-02  2:42 ` [PATCH 2/3] Make __float128 use the _Float128 type, " Michael Meissner
                     ` (2 preceding siblings ...)
  2022-12-02 18:01   ` Ping #3: " Michael Meissner
@ 2022-12-06 11:27   ` Kewen.Lin
  2022-12-14  8:46     ` Kewen.Lin
  2023-01-11 20:24   ` Michael Meissner
  4 siblings, 1 reply; 41+ messages in thread
From: Kewen.Lin @ 2022-12-06 11:27 UTC (permalink / raw)
  To: Michael Meissner
  Cc: gcc-patches, Segher Boessenkool, Peter Bergner, David Edelsohn,
	Will Schmidt, William Seurer

Hi Mike,

Thanks for fixing this, some comments are inlined below.

on 2022/11/2 10:42, Michael Meissner wrote:
> This patch fixes the issue that GCC cannot build when the default long double
> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
> to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
> during the evrp pass.  Ultimately it is failing because the code declared the
> type to use TFmode but it used F128 functions (i.e. KFmode).
> 
> 	typedef float TFtype __attribute__((mode (TF)));
> 	typedef __complex float TCtype __attribute__((mode (TC)));
> 
> 	TCtype
> 	__mulkc3_sw (TFtype a, TFtype b, TFtype c, TFtype d)
> 	{
> 	  TFtype ac, bd, ad, bc, x, y;
> 	  TCtype res;
> 
> 	  ac = a * c;
> 	  bd = b * d;
> 	  ad = a * d;
> 	  bc = b * c;
> 
> 	  x = ac - bd;
> 	  y = ad + bc;
> 
> 	  if (__builtin_isnan (x) && __builtin_isnan (y))
> 	    {
> 	      _Bool recalc = 0;
> 	      if (__builtin_isinf (a) || __builtin_isinf (b))
> 		{
> 
> 		  a = __builtin_copysignf128 (__builtin_isinf (a) ? 1 : 0, a);
> 		  b = __builtin_copysignf128 (__builtin_isinf (b) ? 1 : 0, b);
> 		  if (__builtin_isnan (c))
> 		    c = __builtin_copysignf128 (0, c);
> 		  if (__builtin_isnan (d))
> 		    d = __builtin_copysignf128 (0, d);
> 		  recalc = 1;
> 		}
> 	      if (__builtin_isinf (c) || __builtin_isinf (d))
> 		{
> 
> 		  c = __builtin_copysignf128 (__builtin_isinf (c) ? 1 : 0, c);
> 		  d = __builtin_copysignf128 (__builtin_isinf (d) ? 1 : 0, d);
> 		  if (__builtin_isnan (a))
> 		    a = __builtin_copysignf128 (0, a);
> 		  if (__builtin_isnan (b))
> 		    b = __builtin_copysignf128 (0, b);
> 		  recalc = 1;
> 		}
> 	      if (!recalc
> 		  && (__builtin_isinf (ac) || __builtin_isinf (bd)
> 		      || __builtin_isinf (ad) || __builtin_isinf (bc)))
> 		{
> 
> 		  if (__builtin_isnan (a))
> 		    a = __builtin_copysignf128 (0, a);
> 		  if (__builtin_isnan (b))
> 		    b = __builtin_copysignf128 (0, b);
> 		  if (__builtin_isnan (c))
> 		    c = __builtin_copysignf128 (0, c);
> 		  if (__builtin_isnan (d))
> 		    d = __builtin_copysignf128 (0, d);
> 		  recalc = 1;
> 		}
> 	      if (recalc)
> 		{
> 		  x = __builtin_inff128 () * (a * c - b * d);
> 		  y = __builtin_inff128 () * (a * d + b * c);
> 		}
> 	    }
> 
> 	  __real__ res = x;
> 	  __imag__ res = y;
> 	  return res;
> 	}
> 

One further reduced test case can be:

typedef float TFtype __attribute__((mode (TF)));

TFtype test (TFtype t)
{
  return __builtin_copysignf128 (1.0q, t);
}

Since this reduced test case is quite small, maybe it's good to make it as one
test case associated with this patch.

> Currently GCC uses the long double type node for __float128 if long double is
> IEEE 128-bit.  It did not use the node for _Float128.
> 
> Originally this was noticed if you call the nansq function to make a signaling
> NaN (nansq is mapped to nansf128).  Because the type node for _Float128 is
> different from __float128, the machine independent code converts signaling NaNs
> to quiet NaNs if the types are not compatible.  The following tests used to
> fail when run on a system where long double is IEEE 128-bit:
> 
> 	gcc.dg/torture/float128-nan.c
> 	gcc.target/powerpc/nan128-1.c
> 
> This patch makes both __float128 and _Float128 use the same type node.
> 
> One side effect of not using the long double type node for __float128 is that we
> must only use KFmode for _Float128/__float128.  The libstdc++ library won't
> build if we use TFmode for _Float128 and __float128 when long double is IEEE
> 128-bit.
> 

Sorry that I didn't get the point of the latter sentence, this proposed patch
uses KFmode for _Float128 and __float128, do you mean that would be fine for
libstdc++ library building since we don't use TFmode for them?

> Another minor side effect is that the f128 round to odd fused multiply-add
> function will not merge negatition with the FMA operation when the type is long
> double.  If the type is __float128 or _Float128, then it will continue to do the
> optimization.  The round to odd functions are defined in terms of __float128
> arguments.  For example:
> 
> 	long double
> 	do_fms (long double a, long double b, long double c)
> 	{
> 	    return __builtin_fmaf128_round_to_odd (a, b, -c);
> 	}
> 
> will generate (assuming -mabi=ieeelongdouble):
> 
> 	xsnegqp 4,4
> 	xsmaddqpo 4,2,3
> 	xxlor 34,36,36
> 
> while:
> 
> 	__float128
> 	do_fms (__float128 a, __float128 b, __float128 c)
> 	{
> 	    return __builtin_fmaf128_round_to_odd (a, b, -c);
> 	}
> 
> will generate:
> 
> 	xsmsubqpo 4,2,3
> 	xxlor 34,36,36
> 

It sounds like a tiny "regression", maybe we should open a PR to track it?

> I tested all 3 patchs for PR target/107299 on:
> 
>     1)	LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
>     2)	LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
>     3)	LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
>     4)	BE Power8  using --with-cpu=power8  --with-long-double-format=ibm
> 
> Once all 3 patches have been applied, we can once again build GCC when long
> double is IEEE 128-bit.  There were no other regressions with these patches.
> Can I check these patches into the trunk?
> 
> 2022-11-01   Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 
> 	PR target/107299
> 	* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Always use the
> 	_Float128 type for __float128.
> 	(rs6000_expand_builtin): Only change a KFmode built-in to TFmode, if the
> 	built-in passes or returns TFmode.  If the predicate failed because the
> 	modes were different, use convert_move to load up the value instead of
> 	copy_to_mode_reg.
> 	* config/rs6000/rs6000.cc (rs6000_translate_mode_attribute): Don't
> 	translate IEEE 128-bit floating point modes to explicit IEEE 128-bit
> 	modes (KFmode or KCmode), even if long double is IEEE 128-bit.

You meant to say "Translate .." not "Don't .."?

> 	(rs6000_libgcc_floating_mode_supported_p): Support KFmode all of the
> 	time if we support IEEE 128-bit floating point.
> 	(rs6000_floatn_mode): _Float128 and _Float128x always uses KFmode.
> 
> gcc/testsuite/
> 
> 	PR target/107299
> 	* gcc.target/powerpc/float128-hw12.c: New test.
> 	* gcc.target/powerpc/float128-hw13.c: Likewise.
> 	* gcc.target/powerpc/float128-hw4.c: Update insns.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc           | 237 ++++++++++--------
>  gcc/config/rs6000/rs6000.cc                   |  31 ++-
>  .../gcc.target/powerpc/float128-hw12.c        | 137 ++++++++++
>  .../gcc.target/powerpc/float128-hw13.c        | 137 ++++++++++
>  .../gcc.target/powerpc/float128-hw4.c         |  10 +-
>  5 files changed, 431 insertions(+), 121 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw12.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-hw13.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc
> index 90ab39dc258..e5298f45363 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -730,25 +730,28 @@ rs6000_init_builtins (void)
>  
>    if (TARGET_FLOAT128_TYPE)
>      {
> -      if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128)
> -	ieee128_float_type_node = long_double_type_node;
> -      else
> +      /* In the past we used long_double_type_node when long double was IEEE
> +	 128-bit.  However, this means that the _Float128 type
> +	 (i.e. float128_type_node) is a different type from __float128
> +	 (i.e. ieee128_float_type_nonde).  This leads to some corner cases,
                                  ~~~~ node
> +	 such as processing signaling NaNs with the nansf128 built-in function
> +	 (which returns a _Float128 value) and assign it to a long double or
> +	 __float128 value.  The two explicit IEEE 128-bit types should always
> +	 use the same internal type.
> +
> +	 For C we only need to register the __ieee128 name for it.  For C++, we
> +	 create a distinct type which will mangle differently (u9__ieee128)
> +	 vs. _Float128 (DF128_) and behave backwards compatibly.  */
> +      if (float128t_type_node == NULL_TREE)
>  	{
> -	  /* For C we only need to register the __ieee128 name for
> -	     it.  For C++, we create a distinct type which will mangle
> -	     differently (u9__ieee128) vs. _Float128 (DF128_) and behave
> -	     backwards compatibly.  */
> -	  if (float128t_type_node == NULL_TREE)
> -	    {
> -	      float128t_type_node = make_node (REAL_TYPE);
> -	      TYPE_PRECISION (float128t_type_node)
> -		= TYPE_PRECISION (float128_type_node);
> -	      layout_type (float128t_type_node);
> -	      SET_TYPE_MODE (float128t_type_node,
> -			     TYPE_MODE (float128_type_node));
> -	    }
> -	  ieee128_float_type_node = float128t_type_node;
> +	  float128t_type_node = make_node (REAL_TYPE);
> +	  TYPE_PRECISION (float128t_type_node)
> +	    = TYPE_PRECISION (float128_type_node);
> +	  layout_type (float128t_type_node);
> +	  SET_TYPE_MODE (float128t_type_node,
> +			 TYPE_MODE (float128_type_node));
>  	}
> +      ieee128_float_type_node = float128t_type_node;
>        t = build_qualified_type (ieee128_float_type_node, TYPE_QUAL_CONST);
>        lang_hooks.types.register_builtin_type (ieee128_float_type_node,
>  					      "__ieee128");
> @@ -3265,13 +3268,13 @@ htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
>  
>  /* Expand an expression EXP that calls a built-in function,
>     with result going to TARGET if that's convenient
> -   (and in mode MODE if that's convenient).
> +   (and in mode RETURN_MODE if that's convenient).
>     SUBTARGET may be used as the target for computing one of EXP's operands.
>     IGNORE is nonzero if the value is to be ignored.
>     Use the new builtin infrastructure.  */
>  rtx
>  rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
> -		       machine_mode /* mode */, int ignore)
> +		       machine_mode return_mode, int ignore)
>  {
>    tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
>    enum rs6000_gen_builtins fcode
> @@ -3287,78 +3290,99 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
>    size_t uns_fcode = (size_t)fcode;
>    enum insn_code icode = rs6000_builtin_info[uns_fcode].icode;
>  
> -  /* TODO: The following commentary and code is inherited from the original
> -     builtin processing code.  The commentary is a bit confusing, with the
> -     intent being that KFmode is always IEEE-128, IFmode is always IBM
> -     double-double, and TFmode is the current long double.  The code is
> -     confusing in that it converts from KFmode to TFmode pattern names,
> -     when the other direction is more intuitive.  Try to address this.  */
> -
> -  /* We have two different modes (KFmode, TFmode) that are the IEEE
> -     128-bit floating point type, depending on whether long double is the
> -     IBM extended double (KFmode) or long double is IEEE 128-bit (TFmode).
> -     It is simpler if we only define one variant of the built-in function,
> -     and switch the code when defining it, rather than defining two built-
> -     ins and using the overload table in rs6000-c.cc to switch between the
> -     two.  If we don't have the proper assembler, don't do this switch
> -     because CODE_FOR_*kf* and CODE_FOR_*tf* will be CODE_FOR_nothing.  */
> -  if (FLOAT128_IEEE_P (TFmode))
> -    switch (icode)
> -      {
> -      case CODE_FOR_sqrtkf2_odd:
> -	icode = CODE_FOR_sqrttf2_odd;
> -	break;
> -      case CODE_FOR_trunckfdf2_odd:
> -	icode = CODE_FOR_trunctfdf2_odd;
> -	break;
> -      case CODE_FOR_addkf3_odd:
> -	icode = CODE_FOR_addtf3_odd;
> -	break;
> -      case CODE_FOR_subkf3_odd:
> -	icode = CODE_FOR_subtf3_odd;
> -	break;
> -      case CODE_FOR_mulkf3_odd:
> -	icode = CODE_FOR_multf3_odd;
> -	break;
> -      case CODE_FOR_divkf3_odd:
> -	icode = CODE_FOR_divtf3_odd;
> -	break;
> -      case CODE_FOR_fmakf4_odd:
> -	icode = CODE_FOR_fmatf4_odd;
> -	break;
> -      case CODE_FOR_xsxexpqp_kf:
> -	icode = CODE_FOR_xsxexpqp_tf;
> -	break;
> -      case CODE_FOR_xsxsigqp_kf:
> -	icode = CODE_FOR_xsxsigqp_tf;
> -	break;
> -      case CODE_FOR_xststdcnegqp_kf:
> -	icode = CODE_FOR_xststdcnegqp_tf;
> -	break;
> -      case CODE_FOR_xsiexpqp_kf:
> -	icode = CODE_FOR_xsiexpqp_tf;
> -	break;
> -      case CODE_FOR_xsiexpqpf_kf:
> -	icode = CODE_FOR_xsiexpqpf_tf;
> -	break;
> -      case CODE_FOR_xststdcqp_kf:
> -	icode = CODE_FOR_xststdcqp_tf;
> -	break;
> -      case CODE_FOR_xscmpexpqp_eq_kf:
> -	icode = CODE_FOR_xscmpexpqp_eq_tf;
> -	break;
> -      case CODE_FOR_xscmpexpqp_lt_kf:
> -	icode = CODE_FOR_xscmpexpqp_lt_tf;
> -	break;
> -      case CODE_FOR_xscmpexpqp_gt_kf:
> -	icode = CODE_FOR_xscmpexpqp_gt_tf;
> -	break;
> -      case CODE_FOR_xscmpexpqp_unordered_kf:
> -	icode = CODE_FOR_xscmpexpqp_unordered_tf;
> -	break;
> -      default:
> -	break;
> -      }
> +  /* For 128-bit long double, we may need both the KFmode built-in functions
> +     and IFmode built-in functions to the equivalent TFmode built-in function,
> +     if either a TFmode result is expected or any of the arguments use
> +     TFmode.  */
> +  if (TARGET_LONG_DOUBLE_128)
> +    {
> +      bool uses_tf_mode = return_mode == TFmode;
> +      if (!uses_tf_mode)
> +	{
> +	  call_expr_arg_iterator iter;
> +	  tree arg;
> +	  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
> +	    {
> +	      if (arg != error_mark_node
> +		  && TYPE_MODE (TREE_TYPE (arg)) == TFmode)
> +		{
> +		  uses_tf_mode = true;
> +		  break;
> +		}
> +	    }
> +	}
> +
> +      /* Convert KFmode built-in functions to TFmode when long double is IEEE
> +	 128-bit.  */
> +      if (uses_tf_mode && FLOAT128_IEEE_P (TFmode))
> +	switch (icode)
> +	  {
> +	  case CODE_FOR_sqrtkf2_odd:
> +	    icode = CODE_FOR_sqrttf2_odd;
> +	    break;
> +	  case CODE_FOR_trunckfdf2_odd:
> +	    icode = CODE_FOR_trunctfdf2_odd;
> +	    break;
> +	  case CODE_FOR_addkf3_odd:
> +	    icode = CODE_FOR_addtf3_odd;
> +	    break;
> +	  case CODE_FOR_subkf3_odd:
> +	    icode = CODE_FOR_subtf3_odd;
> +	    break;
> +	  case CODE_FOR_mulkf3_odd:
> +	    icode = CODE_FOR_multf3_odd;
> +	    break;
> +	  case CODE_FOR_divkf3_odd:
> +	    icode = CODE_FOR_divtf3_odd;
> +	    break;
> +	  case CODE_FOR_fmakf4_odd:
> +	    icode = CODE_FOR_fmatf4_odd;
> +	    break;
> +	  case CODE_FOR_xsxexpqp_kf:
> +	    icode = CODE_FOR_xsxexpqp_tf;
> +	    break;
> +	  case CODE_FOR_xsxsigqp_kf:
> +	    icode = CODE_FOR_xsxsigqp_tf;
> +	    break;
> +	  case CODE_FOR_xststdcnegqp_kf:
> +	    icode = CODE_FOR_xststdcnegqp_tf;
> +	    break;
> +	  case CODE_FOR_xsiexpqp_kf:
> +	    icode = CODE_FOR_xsiexpqp_tf;
> +	    break;
> +	  case CODE_FOR_xsiexpqpf_kf:
> +	    icode = CODE_FOR_xsiexpqpf_tf;
> +	    break;
> +	  case CODE_FOR_xststdcqp_kf:
> +	    icode = CODE_FOR_xststdcqp_tf;
> +	    break;
> +	  case CODE_FOR_xscmpexpqp_eq_kf:
> +	    icode = CODE_FOR_xscmpexpqp_eq_tf;
> +	    break;
> +	  case CODE_FOR_xscmpexpqp_lt_kf:
> +	    icode = CODE_FOR_xscmpexpqp_lt_tf;
> +	    break;
> +	  case CODE_FOR_xscmpexpqp_gt_kf:
> +	    icode = CODE_FOR_xscmpexpqp_gt_tf;
> +	    break;
> +	  case CODE_FOR_xscmpexpqp_unordered_kf:
> +	    icode = CODE_FOR_xscmpexpqp_unordered_tf;
> +	    break;
> +	  default:
> +	    break;
> +	  }
> +
> +      /* Convert IFmode built-in functions to TFmode when long double is IBM
> +	 128-bit.  */
> +      else if (uses_tf_mode && FLOAT128_IBM_P (TFmode))
> +	{
> +	  if (icode == CODE_FOR_packif)
> +	    icode = CODE_FOR_packtf;
> +
> +	  else if (icode == CODE_FOR_unpackif)
> +	    icode = CODE_FOR_unpacktf;
> +	}
> +    }
>  
>    /* In case of "#pragma target" changes, we initialize all builtins
>       but check for actual availability now, during expand time.  For
> @@ -3481,18 +3505,6 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */,
>  
>    if (bif_is_ibm128 (*bifaddr) && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)
>      {
> -      if (fcode == RS6000_BIF_PACK_IF)
> -	{
> -	  icode = CODE_FOR_packtf;
> -	  fcode = RS6000_BIF_PACK_TF;
> -	  uns_fcode = (size_t) fcode;
> -	}
> -      else if (fcode == RS6000_BIF_UNPACK_IF)
> -	{
> -	  icode = CODE_FOR_unpacktf;
> -	  fcode = RS6000_BIF_UNPACK_TF;
> -	  uns_fcode = (size_t) fcode;
> -	}
>      }

This remaining "if" hunk is useless.  The others looks good to me.

BR,
Kewen

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit)
  2022-11-02  2:39 Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit) Michael Meissner
                   ` (2 preceding siblings ...)
  2022-11-02  2:44 ` [PATCH 3/3] Update float 128-bit conversions, " Michael Meissner
@ 2022-12-06 14:56 ` Segher Boessenkool
  2022-12-06 15:03   ` Jakub Jelinek
  3 siblings, 1 reply; 41+ messages in thread
From: Segher Boessenkool @ 2022-12-06 14:56 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn,
	Peter Bergner, Will Schmidt, William Seurer

Hi!

On Tue, Nov 01, 2022 at 10:39:04PM -0400, Michael Meissner wrote:
> The basic issue is internally within GCC there are several types for 128-bit
> floating point.  The types are:
> 
>     1)	The long double type (TFmode or possibly DFmode).

Please be very careful to call types types and modes modes?  Modes are
not types, types are not modes, only in the simplest cases is there a
direct relationship.  Here you probably meant these modes are used for
these type, but please be explicit.

> 	In the past, _Float128 was a C extended type,
> 	but now it is a part of the C/C++ 2x standards.

Only if you select a new enough -std=, it still is an extended type if
not?


Segher

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit)
  2022-12-06 14:56 ` Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit) Segher Boessenkool
@ 2022-12-06 15:03   ` Jakub Jelinek
  2022-12-13 14:11     ` Segher Boessenkool
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Jelinek @ 2022-12-06 15:03 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn,
	Peter Bergner, Will Schmidt, William Seurer

On Tue, Dec 06, 2022 at 08:56:09AM -0600, Segher Boessenkool wrote:
> > 	In the past, _Float128 was a C extended type,
> > 	but now it is a part of the C/C++ 2x standards.
> 
> Only if you select a new enough -std=, it still is an extended type if
> not?

No, as an extension _Float{16,32,64,128}{,x} are available (where the backend
has support for such IEEE format) even in older C or C++ modes,
similarly the {f,F}{16,32,64,128} suffixes on literals (with pedwarn
on everything but C++23).  In C++ it is in all language modes treated as
distinct type from __float128, and _FloatNN is handled as extended floating
point type per C++23 rules, while __float128 is not.
Though, the libstdc++ support for the new extended floating point types
is mostly enabled only for C++23 (one exception is std::format).

	Jakub


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-12-06  9:36   ` Kewen.Lin
@ 2022-12-07  6:44     ` Michael Meissner
  2022-12-07  7:55       ` Kewen.Lin
  0 siblings, 1 reply; 41+ messages in thread
From: Michael Meissner @ 2022-12-07  6:44 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

On Tue, Dec 06, 2022 at 05:36:54PM +0800, Kewen.Lin wrote:
> Hi Mike,
> 
> Thanks for fixing this!
> 
> Could you help to elaborate why we need to disable it during libgcc building?

When you are building libgcc, you are building the __mulkc3, __divkc3
functions.  The mapping in the compiler interferes with those functions,
because at the moment, libgcc uses an alternate IEEE 128-bit type.

I have a patch for making libgcc use the 'right' type that I haven't submitted
yet.  This is because the more general fix that these 3 patches do impacts other
functions (due to __float128 and _Float128 being different in the current
compiler when -mabi=ieeelongdouble).

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-12-07  6:44     ` Michael Meissner
@ 2022-12-07  7:55       ` Kewen.Lin
  2022-12-08 22:04         ` Michael Meissner
  0 siblings, 1 reply; 41+ messages in thread
From: Kewen.Lin @ 2022-12-07  7:55 UTC (permalink / raw)
  To: Michael Meissner
  Cc: gcc-patches, Segher Boessenkool, David Edelsohn, Peter Bergner,
	William Seurer, Will Schmidt

Hi Mike,

on 2022/12/7 14:44, Michael Meissner wrote:
> On Tue, Dec 06, 2022 at 05:36:54PM +0800, Kewen.Lin wrote:
>> Hi Mike,
>>
>> Thanks for fixing this!
>>
>> Could you help to elaborate why we need to disable it during libgcc building?
> 
> When you are building libgcc, you are building the __mulkc3, __divkc3
> functions.  The mapping in the compiler interferes with those functions,
> because at the moment, libgcc uses an alternate IEEE 128-bit type.
> 

But I'm still confused.  For __mulkc3 (__divkc3 is similar),

1) with -mabi=ieeelongdouble (TARGET_IEEEQUAD true, define __LONG_DOUBLE_IEEE128__),
   the used types are:

   typedef float TFtype __attribute__ ((mode (TF)));
   typedef __complex float TCtype __attribute__ ((mode (TC)));

2) with -mabi=ibmlongdouble (TARGET_IEEEQUAD false, not __LONG_DOUBLE_IEEE128__ defined),
   the used types are:

   typedef float TFtype __attribute__ ((mode (KF)));
   typedef __complex float TCtype __attribute__ ((mode (KC)));

The proposed mapping in the current patch is:

+
+      if (id == complex_multiply_builtin_code (KCmode))
+	newname = "__mulkc3";
+
+      else if (id == complex_multiply_builtin_code (ICmode))
+	newname = "__multc3";
+
+      else if (id == complex_multiply_builtin_code (TCmode))
+	newname = (TARGET_IEEEQUAD) ? "__mulkc3" : "__multc3";

for 1), TCmode && TARGET_IEEEQUAD => "__mulkc3"
for 2), KCmode => "__mulkc3"

Both should be still with name "__mulkc3", do I miss anything?

BR,
Kewen

> I have a patch for making libgcc use the 'right' type that I haven't submitted
> yet.  This is because the more general fix that these 3 patches do impacts other
> functions (due to __float128 and _Float128 being different in the current
> compiler when -mabi=ieeelongdouble).
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-12-07  7:55       ` Kewen.Lin
@ 2022-12-08 22:04         ` Michael Meissner
  2022-12-12 10:20           ` Kewen.Lin
  0 siblings, 1 reply; 41+ messages in thread
From: Michael Meissner @ 2022-12-08 22:04 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Peter Bergner, William Seurer, Will Schmidt

On Wed, Dec 07, 2022 at 03:55:41PM +0800, Kewen.Lin wrote:
> Hi Mike,
> 
> on 2022/12/7 14:44, Michael Meissner wrote:
> > On Tue, Dec 06, 2022 at 05:36:54PM +0800, Kewen.Lin wrote:
> >> Hi Mike,
> >>
> >> Thanks for fixing this!
> >>
> >> Could you help to elaborate why we need to disable it during libgcc building?
> > 
> > When you are building libgcc, you are building the __mulkc3, __divkc3
> > functions.  The mapping in the compiler interferes with those functions,
> > because at the moment, libgcc uses an alternate IEEE 128-bit type.
> > 
> 
> But I'm still confused.  For __mulkc3 (__divkc3 is similar),
> 
> 1) with -mabi=ieeelongdouble (TARGET_IEEEQUAD true, define __LONG_DOUBLE_IEEE128__),
>    the used types are:
> 
>    typedef float TFtype __attribute__ ((mode (TF)));
>    typedef __complex float TCtype __attribute__ ((mode (TC)));
> 
> 2) with -mabi=ibmlongdouble (TARGET_IEEEQUAD false, not __LONG_DOUBLE_IEEE128__ defined),
>    the used types are:
> 
>    typedef float TFtype __attribute__ ((mode (KF)));
>    typedef __complex float TCtype __attribute__ ((mode (KC)));
> 
> The proposed mapping in the current patch is:
> 
> +
> +      if (id == complex_multiply_builtin_code (KCmode))
> +	newname = "__mulkc3";
> +
> +      else if (id == complex_multiply_builtin_code (ICmode))
> +	newname = "__multc3";
> +
> +      else if (id == complex_multiply_builtin_code (TCmode))
> +	newname = (TARGET_IEEEQUAD) ? "__mulkc3" : "__multc3";
> 
> for 1), TCmode && TARGET_IEEEQUAD => "__mulkc3"
> for 2), KCmode => "__mulkc3"
> 
> Both should be still with name "__mulkc3", do I miss anything?
> 
> BR,
> Kewen

The reason is due to the different internal types, the value range propigation
pass throws an error when we are trying to build libgcc.  This is due to the
underlying problem of different IEEE 128-bit types within the compiler.

The 128-bit IEEE support in libgcc was written before _Float128 was added to
GCC.  One consequence is that you can't get to the complex variant of
__float128.  So libgcc needs to use the attribute mode to get to that type.

But with the support for IEEE 128-bit long double changing things, it makes the
libgcc code use the wrong code.

/home/meissner/fsf-src/work102/libgcc/config/rs6000/_mulkc3.c: In function ‘__mulkc3_sw’:
/home/meissner/fsf-src/work102/libgcc/config/rs6000/_mulkc3.c:97:1: internal compiler error: in fold_stmt, at gimple-range-fold.cc:522
   97 | }
      | ^
0x122784f3 fold_using_range::fold_stmt(vrange&, gimple*, fur_source&, tree_node*)
        /home/meissner/fsf-src/work102/gcc/gimple-range-fold.cc:522
0x1226477f gimple_ranger::fold_range_internal(vrange&, gimple*, tree_node*)
        /home/meissner/fsf-src/work102/gcc/gimple-range.cc:257
0x12264b1f gimple_ranger::range_of_stmt(vrange&, gimple*, tree_node*)
        /home/meissner/fsf-src/work102/gcc/gimple-range.cc:318
0x113bdd8b range_query::value_of_stmt(gimple*, tree_node*)
        /home/meissner/fsf-src/work102/gcc/value-query.cc:134
0x1134838f rvrp_folder::value_of_stmt(gimple*, tree_node*)
        /home/meissner/fsf-src/work102/gcc/tree-vrp.cc:1023
0x111344cf substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)
        /home/meissner/fsf-src/work102/gcc/tree-ssa-propagate.cc:819
0x121ecbd3 dom_walker::walk(basic_block_def*)
        /home/meissner/fsf-src/work102/gcc/domwalk.cc:311
0x11134ee7 substitute_and_fold_engine::substitute_and_fold(basic_block_def*)
        /home/meissner/fsf-src/work102/gcc/tree-ssa-propagate.cc:998
0x11346bb7 execute_ranger_vrp(function*, bool, bool)
        /home/meissner/fsf-src/work102/gcc/tree-vrp.cc:1084
0x11347063 execute
        /home/meissner/fsf-src/work102/gcc/tree-vrp.cc:1165
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
make[1]: *** [/home/meissner/fsf-src/work102/libgcc/shared-object.mk:14: _mulkc3.o] Error 1
make[1]: Leaving directory '/home/meissner/fsf-build-ppc64le/work102/powerpc64le-unknown-linux-gnu/libgcc'
make: *** [Makefile:20623: all-target-libgcc] Error 2

> > I have a patch for making libgcc use the 'right' type that I haven't submitted
> > yet.  This is because the more general fix that these 3 patches do impacts other
> > functions (due to __float128 and _Float128 being different in the current
> > compiler when -mabi=ieeelongdouble).
> > 

The patch is to use _Float128 and _Complex _Float128 in libgcc.h instead of
trying to use attribute((mode(TF))) and attribute((mode(TC))) in libgcc.

Now, this patch fixes the specific problem of not being able to build libgcc
(along with patch #1 of the series).  But other things show the differences
from time time because we are using different internal types and the middle end
doesn't know that these types are really the same bits.

It is better long term (IMHO) if we have the two types (__float128 and
_Float128) use the same internal type (which is what is done in patches #2 and
#3).  This fixes the other issues that show up, such as creating signaling NaNs
for one internal type, and converting it to the other internal type, loses that
the NaN is signalling.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-12-08 22:04         ` Michael Meissner
@ 2022-12-12 10:20           ` Kewen.Lin
  2022-12-13  6:14             ` Michael Meissner
  0 siblings, 1 reply; 41+ messages in thread
From: Kewen.Lin @ 2022-12-12 10:20 UTC (permalink / raw)
  To: Michael Meissner
  Cc: gcc-patches, Segher Boessenkool, David Edelsohn, William Seurer,
	Will Schmidt, Peter Bergner

on 2022/12/9 06:04, Michael Meissner wrote:
> On Wed, Dec 07, 2022 at 03:55:41PM +0800, Kewen.Lin wrote:
>> Hi Mike,
>>
>> on 2022/12/7 14:44, Michael Meissner wrote:
>>> On Tue, Dec 06, 2022 at 05:36:54PM +0800, Kewen.Lin wrote:
>>>> Hi Mike,
>>>>
>>>> Thanks for fixing this!
>>>>
>>>> Could you help to elaborate why we need to disable it during libgcc building?
>>>
>>> When you are building libgcc, you are building the __mulkc3, __divkc3
>>> functions.  The mapping in the compiler interferes with those functions,
>>> because at the moment, libgcc uses an alternate IEEE 128-bit type.
>>>
>>
>> But I'm still confused.  For __mulkc3 (__divkc3 is similar),
>>
>> 1) with -mabi=ieeelongdouble (TARGET_IEEEQUAD true, define __LONG_DOUBLE_IEEE128__),
>>    the used types are:
>>
>>    typedef float TFtype __attribute__ ((mode (TF)));
>>    typedef __complex float TCtype __attribute__ ((mode (TC)));
>>
>> 2) with -mabi=ibmlongdouble (TARGET_IEEEQUAD false, not __LONG_DOUBLE_IEEE128__ defined),
>>    the used types are:
>>
>>    typedef float TFtype __attribute__ ((mode (KF)));
>>    typedef __complex float TCtype __attribute__ ((mode (KC)));
>>
>> The proposed mapping in the current patch is:
>>
>> +
>> +      if (id == complex_multiply_builtin_code (KCmode))
>> +	newname = "__mulkc3";
>> +
>> +      else if (id == complex_multiply_builtin_code (ICmode))
>> +	newname = "__multc3";
>> +
>> +      else if (id == complex_multiply_builtin_code (TCmode))
>> +	newname = (TARGET_IEEEQUAD) ? "__mulkc3" : "__multc3";
>>
>> for 1), TCmode && TARGET_IEEEQUAD => "__mulkc3"
>> for 2), KCmode => "__mulkc3"
>>
>> Both should be still with name "__mulkc3", do I miss anything?
>>
>> BR,
>> Kewen
> 
> The reason is due to the different internal types, the value range propigation
> pass throws an error when we are trying to build libgcc.  This is due to the
> underlying problem of different IEEE 128-bit types within the compiler.
> 

But this is the reason why we need patch #2 and #3, not the reason why we need
the special handling for building_libgcc in patch #1, right?

Without or with patch #1, the below ICE in libgcc exists, the ICE should have
nothing to do with the special handling for building_libgcc in patch #1.  I
think patch #2 which makes _Float128 and __float128 use the same internal
type fixes that ICE.

I still don't get the point why we need the special handling for building_libgcc,
I also tested on top of patch #1 and #2 w/ and w/o the special handling for
building_libgcc, both bootstrapped and regress-tested.

Could you have a double check?

> The 128-bit IEEE support in libgcc was written before _Float128 was added to
> GCC.  One consequence is that you can't get to the complex variant of
> __float128.  So libgcc needs to use the attribute mode to get to that type.
> 
> But with the support for IEEE 128-bit long double changing things, it makes the
> libgcc code use the wrong code.
> 
> /home/meissner/fsf-src/work102/libgcc/config/rs6000/_mulkc3.c: In function ‘__mulkc3_sw’:
> /home/meissner/fsf-src/work102/libgcc/config/rs6000/_mulkc3.c:97:1: internal compiler error: in fold_stmt, at gimple-range-fold.cc:522
>    97 | }
>       | ^
> 0x122784f3 fold_using_range::fold_stmt(vrange&, gimple*, fur_source&, tree_node*)
>         /home/meissner/fsf-src/work102/gcc/gimple-range-fold.cc:522
> 0x1226477f gimple_ranger::fold_range_internal(vrange&, gimple*, tree_node*)
>         /home/meissner/fsf-src/work102/gcc/gimple-range.cc:257
> 0x12264b1f gimple_ranger::range_of_stmt(vrange&, gimple*, tree_node*)
>         /home/meissner/fsf-src/work102/gcc/gimple-range.cc:318
> 0x113bdd8b range_query::value_of_stmt(gimple*, tree_node*)
>         /home/meissner/fsf-src/work102/gcc/value-query.cc:134
> 0x1134838f rvrp_folder::value_of_stmt(gimple*, tree_node*)
>         /home/meissner/fsf-src/work102/gcc/tree-vrp.cc:1023
> 0x111344cf substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)
>         /home/meissner/fsf-src/work102/gcc/tree-ssa-propagate.cc:819
> 0x121ecbd3 dom_walker::walk(basic_block_def*)
>         /home/meissner/fsf-src/work102/gcc/domwalk.cc:311
> 0x11134ee7 substitute_and_fold_engine::substitute_and_fold(basic_block_def*)
>         /home/meissner/fsf-src/work102/gcc/tree-ssa-propagate.cc:998
> 0x11346bb7 execute_ranger_vrp(function*, bool, bool)
>         /home/meissner/fsf-src/work102/gcc/tree-vrp.cc:1084
> 0x11347063 execute
>         /home/meissner/fsf-src/work102/gcc/tree-vrp.cc:1165
> Please submit a full bug report, with preprocessed source (by using -freport-bug).
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
> make[1]: *** [/home/meissner/fsf-src/work102/libgcc/shared-object.mk:14: _mulkc3.o] Error 1
> make[1]: Leaving directory '/home/meissner/fsf-build-ppc64le/work102/powerpc64le-unknown-linux-gnu/libgcc'
> make: *** [Makefile:20623: all-target-libgcc] Error 2
> 
>>> I have a patch for making libgcc use the 'right' type that I haven't submitted
>>> yet.  This is because the more general fix that these 3 patches do impacts other
>>> functions (due to __float128 and _Float128 being different in the current
>>> compiler when -mabi=ieeelongdouble).
>>>
> 
> The patch is to use _Float128 and _Complex _Float128 in libgcc.h instead of
> trying to use attribute((mode(TF))) and attribute((mode(TC))) in libgcc.
> 

Since your patch #2 (and #3) fixes ICE and some exposed problems, and _Float128
is to use the same internal type as __float128, types with attribute((mode(TF)))
and attribute((mode(TC))) should be correct, I assume that this patch is just
to make the types explicit be with _Float128 (for better readability and
maintainance), but not for any correctness issues.

> Now, this patch fixes the specific problem of not being able to build libgcc
> (along with patch #1 of the series).  But other things show the differences
> from time time because we are using different internal types and the middle end
> doesn't know that these types are really the same bits.
> 
> It is better long term (IMHO) if we have the two types (__float128 and
> _Float128) use the same internal type (which is what is done in patches #2 and
> #3).  This fixes the other issues that show up, such as creating signaling NaNs
> for one internal type, and converting it to the other internal type, loses that
> the NaN is signalling.
> 

I see, nice!

BR,
Kewen

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-12-12 10:20           ` Kewen.Lin
@ 2022-12-13  6:14             ` Michael Meissner
  2022-12-13 13:51               ` Segher Boessenkool
  2022-12-14  8:45               ` Kewen.Lin
  0 siblings, 2 replies; 41+ messages in thread
From: Michael Meissner @ 2022-12-13  6:14 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, William Seurer, Will Schmidt, Peter Bergner

On Mon, Dec 12, 2022 at 06:20:14PM +0800, Kewen.Lin wrote:
> Without or with patch #1, the below ICE in libgcc exists, the ICE should have
> nothing to do with the special handling for building_libgcc in patch #1.  I
> think patch #2 which makes _Float128 and __float128 use the same internal
> type fixes that ICE.
> 
> I still don't get the point why we need the special handling for building_libgcc,
> I also tested on top of patch #1 and #2 w/ and w/o the special handling for
> building_libgcc, both bootstrapped and regress-tested.
> 
> Could you have a double check?

As long as patch #2 and #3 are installed, we don't need the special handling
for building_libgcc.  Good catch.

I will send out a replacement patch for it.

> Since your patch #2 (and #3) fixes ICE and some exposed problems, and _Float128
> is to use the same internal type as __float128, types with attribute((mode(TF)))
> and attribute((mode(TC))) should be correct, I assume that this patch is just
> to make the types explicit be with _Float128 (for better readability and
> maintainance), but not for any correctness issues.

Yes, the patch is mainly for clarity.  The history is the libgcc support went
in before _Float128 went in, and I never went back to use those types when we
could use them.

With _Float128, we can just use _Complex _Float128 and not
bother with trying to get the right KC/TC for the attribute mode stuff.

However, if patches 1-3 aren't put in, just applying the patch to use _Float128
and _Complex _Float128 would fix the immediate problem (of not building GCC on
systems with IEEE 128-bit long double).  However, it is a band-aid that just
works around the problem of building __mulkc3 and __divkc3.  It doesn't fix the
other problems between __float128 and _Float128 that show up in some places
that I would like to get fixed.

So I haven't submitted the patch, because I think it is more important to get
the other issues fixed.

> > Now, this patch fixes the specific problem of not being able to build libgcc
> > (along with patch #1 of the series).  But other things show the differences
> > from time time because we are using different internal types and the middle end
> > doesn't know that these types are really the same bits.
> > 
> > It is better long term (IMHO) if we have the two types (__float128 and
> > _Float128) use the same internal type (which is what is done in patches #2 and
> > #3).  This fixes the other issues that show up, such as creating signaling NaNs
> > for one internal type, and converting it to the other internal type, loses that
> > the NaN is signalling.
> > 
> 
> I see, nice!
> 
> BR,
> Kewen

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-11-02  2:40 ` [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299 Michael Meissner
                     ` (3 preceding siblings ...)
  2022-12-06  9:36   ` Kewen.Lin
@ 2022-12-13  6:23   ` Michael Meissner
  4 siblings, 0 replies; 41+ messages in thread
From: Michael Meissner @ 2022-12-13  6:23 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

I have submitted a new replacement patch for this patch:

| Date: Tue, 1 Nov 2022 22:40:43 -0400
| Subject: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
| Message-ID: <Y2HYqx4zLCNCT0Zy@toto.the-meissners.org>
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608368.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-12-13  6:14             ` Michael Meissner
@ 2022-12-13 13:51               ` Segher Boessenkool
  2022-12-14  8:45               ` Kewen.Lin
  1 sibling, 0 replies; 41+ messages in thread
From: Segher Boessenkool @ 2022-12-13 13:51 UTC (permalink / raw)
  To: Michael Meissner, Kewen.Lin, gcc-patches, David Edelsohn,
	William Seurer, Will Schmidt, Peter Bergner

On Tue, Dec 13, 2022 at 01:14:39AM -0500, Michael Meissner wrote:
> On Mon, Dec 12, 2022 at 06:20:14PM +0800, Kewen.Lin wrote:
> > Without or with patch #1, the below ICE in libgcc exists, the ICE should have
> > nothing to do with the special handling for building_libgcc in patch #1.  I
> > think patch #2 which makes _Float128 and __float128 use the same internal
> > type fixes that ICE.
> > 
> > I still don't get the point why we need the special handling for building_libgcc,
> > I also tested on top of patch #1 and #2 w/ and w/o the special handling for
> > building_libgcc, both bootstrapped and regress-tested.
> > 
> > Could you have a double check?
> 
> As long as patch #2 and #3 are installed, we don't need the special handling
> for building_libgcc.  Good catch.
> 
> I will send out a replacement patch for it.

Please send a complete new series replacing this one.  Thanks.


Segher

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit)
  2022-12-06 15:03   ` Jakub Jelinek
@ 2022-12-13 14:11     ` Segher Boessenkool
  0 siblings, 0 replies; 41+ messages in thread
From: Segher Boessenkool @ 2022-12-13 14:11 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn,
	Peter Bergner, Will Schmidt, William Seurer

On Tue, Dec 06, 2022 at 04:03:35PM +0100, Jakub Jelinek wrote:
> On Tue, Dec 06, 2022 at 08:56:09AM -0600, Segher Boessenkool wrote:
> > > 	In the past, _Float128 was a C extended type,
> > > 	but now it is a part of the C/C++ 2x standards.
> > 
> > Only if you select a new enough -std=, it still is an extended type if
> > not?
> 
> No, as an extension _Float{16,32,64,128}{,x} are available (where the backend
> has support for such IEEE format) even in older C or C++ modes,
> similarly the {f,F}{16,32,64,128} suffixes on literals (with pedwarn
> on everything but C++23).  In C++ it is in all language modes treated as
> distinct type from __float128, and _FloatNN is handled as extended floating
> point type per C++23 rules, while __float128 is not.

Right, so there is no "in the past" here, the actual situation is quite
different.

Reviewing these patches is harder and a lot more work than writing them
can ever have been :-( :-( :-(


Segher

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299
  2022-12-13  6:14             ` Michael Meissner
  2022-12-13 13:51               ` Segher Boessenkool
@ 2022-12-14  8:45               ` Kewen.Lin
  1 sibling, 0 replies; 41+ messages in thread
From: Kewen.Lin @ 2022-12-14  8:45 UTC (permalink / raw)
  To: Michael Meissner
  Cc: gcc-patches, Segher Boessenkool, David Edelsohn, William Seurer,
	Will Schmidt, Peter Bergner

on 2022/12/13 14:14, Michael Meissner wrote:
> On Mon, Dec 12, 2022 at 06:20:14PM +0800, Kewen.Lin wrote:
>> Without or with patch #1, the below ICE in libgcc exists, the ICE should have
>> nothing to do with the special handling for building_libgcc in patch #1.  I
>> think patch #2 which makes _Float128 and __float128 use the same internal
>> type fixes that ICE.
>>
>> I still don't get the point why we need the special handling for building_libgcc,
>> I also tested on top of patch #1 and #2 w/ and w/o the special handling for
>> building_libgcc, both bootstrapped and regress-tested.
>>
>> Could you have a double check?
> 
> As long as patch #2 and #3 are installed, we don't need the special handling
> for building_libgcc.  Good catch.
> 
> I will send out a replacement patch for it.

Thanks!  I still feel patch #1 is independent, it helps to fix the issues as
shown in its associated test case, which looks an oversight in the previous
implementation to me. :)

> 
>> Since your patch #2 (and #3) fixes ICE and some exposed problems, and _Float128
>> is to use the same internal type as __float128, types with attribute((mode(TF)))
>> and attribute((mode(TC))) should be correct, I assume that this patch is just
>> to make the types explicit be with _Float128 (for better readability and
>> maintainance), but not for any correctness issues.
> 
> Yes, the patch is mainly for clarity.  The history is the libgcc support went
> in before _Float128 went in, and I never went back to use those types when we
> could use them.
> 
> With _Float128, we can just use _Complex _Float128 and not
> bother with trying to get the right KC/TC for the attribute mode stuff.
> 
> However, if patches 1-3 aren't put in, just applying the patch to use _Float128
> and _Complex _Float128 would fix the immediate problem (of not building GCC on
> systems with IEEE 128-bit long double).  However, it is a band-aid that just
> works around the problem of building __mulkc3 and __divkc3.  It doesn't fix the
> other problems between __float128 and _Float128 that show up in some places
> that I would like to get fixed.
> 
> So I haven't submitted the patch, because I think it is more important to get
> the other issues fixed.

OK, make sense, thanks for the clarification!

BR,
Kewen

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-06 11:27   ` Kewen.Lin
@ 2022-12-14  8:46     ` Kewen.Lin
  2022-12-14  9:36       ` Jakub Jelinek
  0 siblings, 1 reply; 41+ messages in thread
From: Kewen.Lin @ 2022-12-14  8:46 UTC (permalink / raw)
  To: Michael Meissner
  Cc: gcc-patches, Segher Boessenkool, Peter Bergner, David Edelsohn,
	Will Schmidt, William Seurer, Joseph Myers

on 2022/12/6 19:27, Kewen.Lin via Gcc-patches wrote:
> Hi Mike,
> 
> Thanks for fixing this, some comments are inlined below.
> 
> on 2022/11/2 10:42, Michael Meissner wrote:
>> This patch fixes the issue that GCC cannot build when the default long double
>> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
>> to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
>> during the evrp pass.  Ultimately it is failing because the code declared the
>> type to use TFmode but it used F128 functions (i.e. KFmode).

By further looking into this, I found that though __float128 and _Float128 types
are two different types, they have the same mode TFmode, the unexpected thing is
these two types have different precision.  I noticed it's due to the "workaround"
in build_common_tree_nodes:

      /* Work around the rs6000 KFmode having precision 113 not
	 128.  */
      const struct real_format *fmt = REAL_MODE_FORMAT (mode);
      gcc_assert (fmt->b == 2 && fmt->emin + fmt->emax == 3);
      int min_precision = fmt->p + ceil_log2 (fmt->emax - fmt->emin);
      if (!extended)
	gcc_assert (min_precision == n);
      if (precision < min_precision)
	precision = min_precision;

Since function useless_type_conversion_p considers two float types are compatible
if they have the same mode, so it doesn't require the explicit conversions between
these two types.  I think it's exactly what we want.  And to me, it looks unexpected
to have two types with the same mode but different precision.

So could we consider disabling the above workaround to make _Float128 have the same
precision as __float128 (long double) (the underlying TFmode)?  I tried the below
change:

diff --git a/gcc/tree.cc b/gcc/tree.cc
index 254b2373dcf..10fcb3d88ca 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9442,6 +9442,7 @@ build_common_tree_nodes (bool signed_char)
       if (!targetm.floatn_mode (n, extended).exists (&mode))
         continue;
       int precision = GET_MODE_PRECISION (mode);
+#if 0
       /* Work around the rs6000 KFmode having precision 113 not
          128.  */
       const struct real_format *fmt = REAL_MODE_FORMAT (mode);
@@ -9451,6 +9452,7 @@ build_common_tree_nodes (bool signed_char)
         gcc_assert (min_precision == n);
       if (precision < min_precision)
         precision = min_precision;
+#endif
       FLOATN_NX_TYPE_NODE (i) = make_node (REAL_TYPE);
       TYPE_PRECISION (FLOATN_NX_TYPE_NODE (i)) = precision;
       layout_type (FLOATN_NX_TYPE_NODE (i));

It can be bootstrapped (fixing the ICE in PR107299).  Comparing with the baseline
regression test results with patch #1, #2 and #3, I got some positive:

FAIL->PASS: 17_intro/headers/c++2020/all_attributes.cc (test for excess errors)
FAIL->PASS: 17_intro/headers/c++2020/all_no_exceptions.cc (test for excess errors)
FAIL->PASS: 17_intro/headers/c++2020/all_no_rtti.cc (test for excess errors)
FAIL->PASS: 17_intro/headers/c++2020/all_pedantic_errors.cc (test for excess errors)
FAIL->PASS: 17_intro/headers/c++2020/operator_names.cc (test for excess errors)
FAIL->PASS: 17_intro/headers/c++2020/stdc++.cc (test for excess errors)
FAIL->PASS: 17_intro/headers/c++2020/stdc++_multiple_inclusion.cc (test for excess errors)
FAIL->PASS: std/format/arguments/args.cc (test for excess errors)
FAIL->PASS: std/format/error.cc (test for excess errors)
FAIL->PASS: std/format/formatter/requirements.cc (test for excess errors)
FAIL->PASS: std/format/functions/format.cc (test for excess errors)
FAIL->PASS: std/format/functions/format_to_n.cc (test for excess errors)
FAIL->PASS: std/format/functions/size.cc (test for excess errors)
FAIL->PASS: std/format/functions/vformat_to.cc (test for excess errors)
FAIL->PASS: std/format/parse_ctx.cc (test for excess errors)
FAIL->PASS: std/format/string.cc (test for excess errors)
FAIL->PASS: std/format/string_neg.cc (test for excess errors)
FAIL->PASS: g++.dg/cpp23/ext-floating1.C  -std=gnu++23 (test for excess errors)

and some negative:

PASS->FAIL: gcc.dg/torture/float128-nan.c   -O0  execution test
PASS->FAIL: gcc.dg/torture/float128-nan.c   -O1  execution test
PASS->FAIL: gcc.dg/torture/float128-nan.c   -O2  execution test
PASS->FAIL: gcc.dg/torture/float128-nan.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
PASS->FAIL: gcc.dg/torture/float128-nan.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
PASS->FAIL: gcc.dg/torture/float128-nan.c   -O3 -g  execution test
PASS->FAIL: gcc.dg/torture/float128-nan.c   -Os  execution test
PASS->FAIL: gcc.target/powerpc/nan128-1.c execution test

The negative part is about nan, I haven't looked into it, but I think it may be the
reason why we need the workaround there, CC Joseph.  Anyway it needs more investigation
here, but IMHO the required information (ie. the actual precision) can be retrieved
from REAL_MODE_FORMAT(mode) of TYPE_MODE, so it should be doable to fix some other
places.

Some concerns on the way of patch #2 are:
  1) long double is with TFmode, it's not taken as compatible with _Float128 even
     if -mabi=ieeelongdouble specified, we can have inefficient IRs than before.
  2) we make type attribute with TFmode _Float128 (KFmode), but there is one actual
     type long double with TFmode, they are actually off.

Comparing with patch #2, this way to remove the workaround in build_common_tree_nodes
can still keep the things as before.  It may be something we can consider here.

BR,
Kewen

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-14  8:46     ` Kewen.Lin
@ 2022-12-14  9:36       ` Jakub Jelinek
  2022-12-14 10:11         ` Kewen.Lin
  2022-12-15 17:59         ` Segher Boessenkool
  0 siblings, 2 replies; 41+ messages in thread
From: Jakub Jelinek @ 2022-12-14  9:36 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool, Peter Bergner,
	David Edelsohn, Will Schmidt, William Seurer, Joseph Myers

On Wed, Dec 14, 2022 at 04:46:07PM +0800, Kewen.Lin via Gcc-patches wrote:
> on 2022/12/6 19:27, Kewen.Lin via Gcc-patches wrote:
> > Hi Mike,
> > 
> > Thanks for fixing this, some comments are inlined below.
> > 
> > on 2022/11/2 10:42, Michael Meissner wrote:
> >> This patch fixes the issue that GCC cannot build when the default long double
> >> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
> >> to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
> >> during the evrp pass.  Ultimately it is failing because the code declared the
> >> type to use TFmode but it used F128 functions (i.e. KFmode).
> 
> By further looking into this, I found that though __float128 and _Float128 types
> are two different types, they have the same mode TFmode, the unexpected thing is
> these two types have different precision.  I noticed it's due to the "workaround"
> in build_common_tree_nodes:
> 
>       /* Work around the rs6000 KFmode having precision 113 not
> 	 128.  */
>       const struct real_format *fmt = REAL_MODE_FORMAT (mode);
>       gcc_assert (fmt->b == 2 && fmt->emin + fmt->emax == 3);
>       int min_precision = fmt->p + ceil_log2 (fmt->emax - fmt->emin);
>       if (!extended)
> 	gcc_assert (min_precision == n);
>       if (precision < min_precision)
> 	precision = min_precision;
> 
> Since function useless_type_conversion_p considers two float types are compatible
> if they have the same mode, so it doesn't require the explicit conversions between
> these two types.  I think it's exactly what we want.  And to me, it looks unexpected
> to have two types with the same mode but different precision.
> 
> So could we consider disabling the above workaround to make _Float128 have the same
> precision as __float128 (long double) (the underlying TFmode)?  I tried the below
> change:

The hacks with different precisions of powerpc 128-bit floating types are
very unfortunate, it is I assume because the middle-end asserted that scalar
floating point types with different modes have different precision.
We no longer assert that, as BFmode and HFmode (__bf16 and _Float16) have
the same 16-bit precision as well and e.g. C++ FE knows to treat standard
vs. extended floating point types vs. other unknown floating point types
differently in finding result type of binary operations or in which type
comparisons will be done.  That said, we'd need some target hooks to
preserve the existing behavior with __float128/__ieee128 vs. __ibm128
vs. _Float128 with both -mabi=ibmlongdouble and -mabi=ieeelongdouble.

I bet the above workaround in generic code was added for a reason, it would
surprise me if _Float128 worked at all without that hack.
Shouldn't float128_type_node be adjusted instead the same way?

	Jakub


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-14  9:36       ` Jakub Jelinek
@ 2022-12-14 10:11         ` Kewen.Lin
  2022-12-14 10:33           ` Jakub Jelinek
  2022-12-15  7:45           ` Kewen.Lin
  2022-12-15 17:59         ` Segher Boessenkool
  1 sibling, 2 replies; 41+ messages in thread
From: Kewen.Lin @ 2022-12-14 10:11 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool, Peter Bergner,
	David Edelsohn, Will Schmidt, William Seurer, Joseph Myers

Hi Jakub,

Thanks for the comments!

on 2022/12/14 17:36, Jakub Jelinek wrote:
> On Wed, Dec 14, 2022 at 04:46:07PM +0800, Kewen.Lin via Gcc-patches wrote:
>> on 2022/12/6 19:27, Kewen.Lin via Gcc-patches wrote:
>>> Hi Mike,
>>>
>>> Thanks for fixing this, some comments are inlined below.
>>>
>>> on 2022/11/2 10:42, Michael Meissner wrote:
>>>> This patch fixes the issue that GCC cannot build when the default long double
>>>> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
>>>> to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
>>>> during the evrp pass.  Ultimately it is failing because the code declared the
>>>> type to use TFmode but it used F128 functions (i.e. KFmode).
>>
>> By further looking into this, I found that though __float128 and _Float128 types
>> are two different types, they have the same mode TFmode, the unexpected thing is
>> these two types have different precision.  I noticed it's due to the "workaround"
>> in build_common_tree_nodes:
>>
>>       /* Work around the rs6000 KFmode having precision 113 not
>> 	 128.  */
>>       const struct real_format *fmt = REAL_MODE_FORMAT (mode);
>>       gcc_assert (fmt->b == 2 && fmt->emin + fmt->emax == 3);
>>       int min_precision = fmt->p + ceil_log2 (fmt->emax - fmt->emin);
>>       if (!extended)
>> 	gcc_assert (min_precision == n);
>>       if (precision < min_precision)
>> 	precision = min_precision;
>>
>> Since function useless_type_conversion_p considers two float types are compatible
>> if they have the same mode, so it doesn't require the explicit conversions between
>> these two types.  I think it's exactly what we want.  And to me, it looks unexpected
>> to have two types with the same mode but different precision.
>>
>> So could we consider disabling the above workaround to make _Float128 have the same
>> precision as __float128 (long double) (the underlying TFmode)?  I tried the below
>> change:
> 
> The hacks with different precisions of powerpc 128-bit floating types are
> very unfortunate, it is I assume because the middle-end asserted that scalar
> floating point types with different modes have different precision.
> We no longer assert that, as BFmode and HFmode (__bf16 and _Float16) have
> the same 16-bit precision as well and e.g. C++ FE knows to treat standard
> vs. extended floating point types vs. other unknown floating point types
> differently in finding result type of binary operations or in which type
> comparisons will be done.  

It's good news, for now those three long double modes on Power have different
precisions, if they can have the same precision, I'd expect the ICE should be
gone.

> That said, we'd need some target hooks to
> preserve the existing behavior with __float128/__ieee128 vs. __ibm128
> vs. _Float128 with both -mabi=ibmlongdouble and -mabi=ieeelongdouble.
> 
> I bet the above workaround in generic code was added for a reason, it would
> surprise me if _Float128 worked at all without that hack.

OK, I'll have a look at those nan failures soon.

> Shouldn't float128_type_node be adjusted instead the same way?

Not sure, the regression testing only had nan related failures exposed.

BR,
Kewen

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-14 10:11         ` Kewen.Lin
@ 2022-12-14 10:33           ` Jakub Jelinek
  2022-12-15  7:54             ` Kewen.Lin
  2022-12-15  7:45           ` Kewen.Lin
  1 sibling, 1 reply; 41+ messages in thread
From: Jakub Jelinek @ 2022-12-14 10:33 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool, Peter Bergner,
	David Edelsohn, Will Schmidt, William Seurer, Joseph Myers

On Wed, Dec 14, 2022 at 06:11:26PM +0800, Kewen.Lin wrote:
> > The hacks with different precisions of powerpc 128-bit floating types are
> > very unfortunate, it is I assume because the middle-end asserted that scalar
> > floating point types with different modes have different precision.
> > We no longer assert that, as BFmode and HFmode (__bf16 and _Float16) have
> > the same 16-bit precision as well and e.g. C++ FE knows to treat standard
> > vs. extended floating point types vs. other unknown floating point types
> > differently in finding result type of binary operations or in which type
> > comparisons will be done.  
> 
> It's good news, for now those three long double modes on Power have different
> precisions, if they can have the same precision, I'd expect the ICE should be
> gone.

I'm talking mainly about r13-3292, the asserts now check different modes
have different precision unless it is half vs. brain or vice versa, but
could be changed further, but if the precision is the same, the FEs
and the middle-end needs to know how to deal with those.
For C++23, say when __ibm128 is the same as long double and _Float128 is
supported, the 2 types are unordered (neither is a subset or superset of
the other because there are many _Float128 values one can't represent
in double double (whether anything with exponent larger than what double
can represent or most of the more precise values), but because of the
variable precision there are double double values that can't be represented
in _Float128 either) and so we can error on comparisons of those or on
arithmetics with such arguments (unless explicitly converted first).
But for backwards compatibility we can't do that for __float128 vs. __ibm128
and so need to backwards compatibly decide what wins.  And for the
middle-end say even for mode conversions decide what is widening and what is
narrowing even when they are unordered.

	Jakub


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-14 10:11         ` Kewen.Lin
  2022-12-14 10:33           ` Jakub Jelinek
@ 2022-12-15  7:45           ` Kewen.Lin
  2022-12-15 18:28             ` Joseph Myers
  1 sibling, 1 reply; 41+ messages in thread
From: Kewen.Lin @ 2022-12-15  7:45 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool, Peter Bergner,
	David Edelsohn, Will Schmidt, William Seurer, Joseph Myers

>> I bet the above workaround in generic code was added for a reason, it would
>> surprise me if _Float128 worked at all without that hack.
> 
> OK, I'll have a look at those nan failures soon.

By investigating the exposed NaN failures, I found it's due to that it wants
to convert _Float128 type constant to long double type constant, it goes
through function real_convert which clears the signalling bit in the context
of !HONOR_SNANS (arg).

  if (r->cl == rvc_nan)
    r->signalling = 0;

The test cases don't have the explicit option -fsignaling-nans, I'm inclined
to believe it's intentional since there is only a sNaN generation.  If so,
we don't want this kind of conversion which is useless and can clear signalling
bit unexpectedly, one shortcut is to just copy the corresponding REAL_VALUE_TYPE
and rebuild with the given type if the modes are the same.

-----
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index e80be8049e1..d036b09dc6f 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -2178,6 +2178,14 @@ fold_convert_const_real_from_real (tree type, const_tree arg1)
   REAL_VALUE_TYPE value;
   tree t;

+  /* If the underlying modes are the same, just copy the
+     TREE_REAL_CST information and rebuild with the given type.  */
+  if (TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (arg1)))
+    {
+      t = build_real (type, TREE_REAL_CST (arg1));
+      return t;
+    }
+
   /* Don't perform the operation if flag_signaling_nans is on
      and the operand is a signaling NaN.  */
   if (HONOR_SNANS (arg1)

-----

The above diff can fix all exposed NaN failures.

BR,
Kewen

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-14 10:33           ` Jakub Jelinek
@ 2022-12-15  7:54             ` Kewen.Lin
  0 siblings, 0 replies; 41+ messages in thread
From: Kewen.Lin @ 2022-12-15  7:54 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool, Peter Bergner,
	David Edelsohn, Will Schmidt, William Seurer, Joseph Myers

on 2022/12/14 18:33, Jakub Jelinek wrote:
> On Wed, Dec 14, 2022 at 06:11:26PM +0800, Kewen.Lin wrote:
>>> The hacks with different precisions of powerpc 128-bit floating types are
>>> very unfortunate, it is I assume because the middle-end asserted that scalar
>>> floating point types with different modes have different precision.
>>> We no longer assert that, as BFmode and HFmode (__bf16 and _Float16) have
>>> the same 16-bit precision as well and e.g. C++ FE knows to treat standard
>>> vs. extended floating point types vs. other unknown floating point types
>>> differently in finding result type of binary operations or in which type
>>> comparisons will be done.  
>>
>> It's good news, for now those three long double modes on Power have different
>> precisions, if they can have the same precision, I'd expect the ICE should be
>> gone.
> 
> I'm talking mainly about r13-3292, the asserts now check different modes
> have different precision unless it is half vs. brain or vice versa, but
> could be changed further, but if the precision is the same, the FEs
> and the middle-end needs to know how to deal with those.
> For C++23, say when __ibm128 is the same as long double and _Float128 is
> supported, the 2 types are unordered (neither is a subset or superset of
> the other because there are many _Float128 values one can't represent
> in double double (whether anything with exponent larger than what double
> can represent or most of the more precise values), but because of the
> variable precision there are double double values that can't be represented
> in _Float128 either) and so we can error on comparisons of those or on
> arithmetics with such arguments (unless explicitly converted first).
> But for backwards compatibility we can't do that for __float128 vs. __ibm128
> and so need to backwards compatibly decide what wins.  And for the
> middle-end say even for mode conversions decide what is widening and what is
> narrowing even when they are unordered.

Thanks for the pointer!  I don't have good understanding on the backwards
compatibility on those conversions, guessing Mike, Segher and David would have
more insights.

BR,
Kewen

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-14  9:36       ` Jakub Jelinek
  2022-12-14 10:11         ` Kewen.Lin
@ 2022-12-15 17:59         ` Segher Boessenkool
  2022-12-16  0:09           ` Michael Meissner
  1 sibling, 1 reply; 41+ messages in thread
From: Segher Boessenkool @ 2022-12-15 17:59 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Kewen.Lin, Michael Meissner, gcc-patches, Peter Bergner,
	David Edelsohn, Will Schmidt, William Seurer, Joseph Myers

Hi!

On Wed, Dec 14, 2022 at 10:36:03AM +0100, Jakub Jelinek wrote:
> On Wed, Dec 14, 2022 at 04:46:07PM +0800, Kewen.Lin via Gcc-patches wrote:
> > Since function useless_type_conversion_p considers two float types are compatible
> > if they have the same mode, so it doesn't require the explicit conversions between
> > these two types.  I think it's exactly what we want.  And to me, it looks unexpected
> > to have two types with the same mode but different precision.
> > 
> > So could we consider disabling the above workaround to make _Float128 have the same
> > precision as __float128 (long double) (the underlying TFmode)?  I tried the below
> > change:
> 
> The hacks with different precisions of powerpc 128-bit floating types are
> very unfortunate, it is I assume because the middle-end asserted that scalar
> floating point types with different modes have different precision.

IEEE QP and double-double cannot be ordered, neither represents a subset
of the other.  But the current middle end does require a total ordering
for all floating point types (that can be converted to each other).

Mike's precision hack was supposed to give us some time until an actual
fix was made.  But no one has worked on that, and there have been
failures found with the precision hack as well, it worked remarkably
well but it isn't perfect.

We cannot move forward in a meaningful way until these problems are
fixed.  We can move around like headless chickens some more of course.


Segher

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-15  7:45           ` Kewen.Lin
@ 2022-12-15 18:28             ` Joseph Myers
  2022-12-15 18:49               ` Segher Boessenkool
  0 siblings, 1 reply; 41+ messages in thread
From: Joseph Myers @ 2022-12-15 18:28 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: Jakub Jelinek, Michael Meissner, gcc-patches, Segher Boessenkool,
	Peter Bergner, David Edelsohn, Will Schmidt, William Seurer

On Thu, 15 Dec 2022, Kewen.Lin via Gcc-patches wrote:

> By investigating the exposed NaN failures, I found it's due to that it wants
> to convert _Float128 type constant to long double type constant, it goes
> through function real_convert which clears the signalling bit in the context
> of !HONOR_SNANS (arg).
> 
>   if (r->cl == rvc_nan)
>     r->signalling = 0;
> 
> The test cases don't have the explicit option -fsignaling-nans, I'm inclined
> to believe it's intentional since there is only a sNaN generation.  If so,
> we don't want this kind of conversion which is useless and can clear signalling
> bit unexpectedly, one shortcut is to just copy the corresponding REAL_VALUE_TYPE
> and rebuild with the given type if the modes are the same.

I think this approach - treating floating-point conversions to a type with 
the same mode consistently as a copy rather than a convertFormat operation 
- is reasonable.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-15 18:28             ` Joseph Myers
@ 2022-12-15 18:49               ` Segher Boessenkool
  2022-12-15 18:56                 ` Jakub Jelinek
  0 siblings, 1 reply; 41+ messages in thread
From: Segher Boessenkool @ 2022-12-15 18:49 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Kewen.Lin, Jakub Jelinek, Michael Meissner, gcc-patches,
	Peter Bergner, David Edelsohn, Will Schmidt, William Seurer

On Thu, Dec 15, 2022 at 06:28:19PM +0000, Joseph Myers wrote:
> On Thu, 15 Dec 2022, Kewen.Lin via Gcc-patches wrote:
> > By investigating the exposed NaN failures, I found it's due to that it wants
> > to convert _Float128 type constant to long double type constant, it goes
> > through function real_convert which clears the signalling bit in the context
> > of !HONOR_SNANS (arg).
> > 
> >   if (r->cl == rvc_nan)
> >     r->signalling = 0;
> > 
> > The test cases don't have the explicit option -fsignaling-nans, I'm inclined
> > to believe it's intentional since there is only a sNaN generation.  If so,
> > we don't want this kind of conversion which is useless and can clear signalling
> > bit unexpectedly, one shortcut is to just copy the corresponding REAL_VALUE_TYPE
> > and rebuild with the given type if the modes are the same.
> 
> I think this approach - treating floating-point conversions to a type with 
> the same mode consistently as a copy rather than a convertFormat operation 
> - is reasonable.

Certainly.  But different types with the same mode having different
precision is not so very reasonable, and will likely cause other
problems as well.

We cannot use precision to order modes or types, that is the core
problem.  A conversion from IEEE QP to double-double (or vice versa) is
neither widening nor narrowing.


Segher

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-15 18:49               ` Segher Boessenkool
@ 2022-12-15 18:56                 ` Jakub Jelinek
  2022-12-15 20:26                   ` Segher Boessenkool
  0 siblings, 1 reply; 41+ messages in thread
From: Jakub Jelinek @ 2022-12-15 18:56 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Joseph Myers, Kewen.Lin, Michael Meissner, gcc-patches,
	Peter Bergner, David Edelsohn, Will Schmidt, William Seurer

On Thu, Dec 15, 2022 at 12:49:27PM -0600, Segher Boessenkool wrote:
> On Thu, Dec 15, 2022 at 06:28:19PM +0000, Joseph Myers wrote:
> > On Thu, 15 Dec 2022, Kewen.Lin via Gcc-patches wrote:
> > > By investigating the exposed NaN failures, I found it's due to that it wants
> > > to convert _Float128 type constant to long double type constant, it goes
> > > through function real_convert which clears the signalling bit in the context
> > > of !HONOR_SNANS (arg).
> > > 
> > >   if (r->cl == rvc_nan)
> > >     r->signalling = 0;
> > > 
> > > The test cases don't have the explicit option -fsignaling-nans, I'm inclined
> > > to believe it's intentional since there is only a sNaN generation.  If so,
> > > we don't want this kind of conversion which is useless and can clear signalling
> > > bit unexpectedly, one shortcut is to just copy the corresponding REAL_VALUE_TYPE
> > > and rebuild with the given type if the modes are the same.
> > 
> > I think this approach - treating floating-point conversions to a type with 
> > the same mode consistently as a copy rather than a convertFormat operation 
> > - is reasonable.
> 
> Certainly.  But different types with the same mode having different
> precision is not so very reasonable, and will likely cause other
> problems as well.
> 
> We cannot use precision to order modes or types, that is the core
> problem.  A conversion from IEEE QP to double-double (or vice versa) is
> neither widening nor narrowing.

Sure.  For optabs, I bet we don't necessarily need to care that much, if
precision is the same, we can ask for widening and narrowing conversion
and expect only one to be implemented or both doing the same thing between
such modes.  But when using libcalls, which library function we use is quite
important because not all of them might be actually implemented in the
library (better keep doing what we've done before).

	Jakub


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-15 18:56                 ` Jakub Jelinek
@ 2022-12-15 20:26                   ` Segher Boessenkool
  0 siblings, 0 replies; 41+ messages in thread
From: Segher Boessenkool @ 2022-12-15 20:26 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Joseph Myers, Kewen.Lin, Michael Meissner, gcc-patches,
	Peter Bergner, David Edelsohn, Will Schmidt, William Seurer

On Thu, Dec 15, 2022 at 07:56:14PM +0100, Jakub Jelinek wrote:
> On Thu, Dec 15, 2022 at 12:49:27PM -0600, Segher Boessenkool wrote:
> > Certainly.  But different types with the same mode having different
> > precision is not so very reasonable, and will likely cause other
> > problems as well.
> > 
> > We cannot use precision to order modes or types, that is the core
> > problem.  A conversion from IEEE QP to double-double (or vice versa) is
> > neither widening nor narrowing.
> 
> Sure.  For optabs, I bet we don't necessarily need to care that much, if
> precision is the same, we can ask for widening and narrowing conversion
> and expect only one to be implemented or both doing the same thing between
> such modes.  But when using libcalls, which library function we use is quite
> important because not all of them might be actually implemented in the
> library (better keep doing what we've done before).

I don't think we should name non-widenings widening, or non-narrowings
narrowing.  We should call conversions conversions.

Even if it doesn't cause technical problems the way it is now (of which
I am not convinced at all), faulty names like this are mightily
confusing.  This is a very real *practical* problem :-(


Segher

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-15 17:59         ` Segher Boessenkool
@ 2022-12-16  0:09           ` Michael Meissner
  2022-12-16 17:55             ` Segher Boessenkool
  0 siblings, 1 reply; 41+ messages in thread
From: Michael Meissner @ 2022-12-16  0:09 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jakub Jelinek, Kewen.Lin, Michael Meissner, gcc-patches,
	Peter Bergner, David Edelsohn, Will Schmidt, William Seurer,
	Joseph Myers

On Thu, Dec 15, 2022 at 11:59:49AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Dec 14, 2022 at 10:36:03AM +0100, Jakub Jelinek wrote:
> > On Wed, Dec 14, 2022 at 04:46:07PM +0800, Kewen.Lin via Gcc-patches wrote:
> > > Since function useless_type_conversion_p considers two float types are compatible
> > > if they have the same mode, so it doesn't require the explicit conversions between
> > > these two types.  I think it's exactly what we want.  And to me, it looks unexpected
> > > to have two types with the same mode but different precision.
> > > 
> > > So could we consider disabling the above workaround to make _Float128 have the same
> > > precision as __float128 (long double) (the underlying TFmode)?  I tried the below
> > > change:
> > 
> > The hacks with different precisions of powerpc 128-bit floating types are
> > very unfortunate, it is I assume because the middle-end asserted that scalar
> > floating point types with different modes have different precision.
> 
> IEEE QP and double-double cannot be ordered, neither represents a subset
> of the other.  But the current middle end does require a total ordering
> for all floating point types (that can be converted to each other).
> 
> Mike's precision hack was supposed to give us some time until an actual
> fix was made.  But no one has worked on that, and there have been
> failures found with the precision hack as well, it worked remarkably
> well but it isn't perfect.
> 
> We cannot move forward in a meaningful way until these problems are
> fixed.  We can move around like headless chickens some more of course.

In general I tend to think most of these automatic widenings are
problematical.  But there are cases where it makes sense.

Lets see.  On the PowerPC, there is no support for 32-bit decimal arithmetic.
There you definately the compiler to automatically promoto SDmode to DDmode to
do the arithmetic and then possibly convert it back.  Similarly for the limited
16-bit floating point modes, where you have operations to pack and unpack the
object, but you have no arithmetic.

But I would argue that you NEVER want to automatically promoto DFmode to either
KFmode, TFmode, or IFmode, since those modes are (almost) always going to be
slower than doing the emulation.  This is particularly true if we only support
a subset of operations, where some things can be done inline, but a lot of
operations would need to be done via emulation (such as on power8 where we
don't have IEEE 128-bit support in hardware).

If the machine independent part of the compiler decides oh we can do this
operation because some operations are present (such as move, negate, absolute
value, and compare), then you likely will wind up promoting the 64-bit type(s)
to 128-bit, doing a call to a slower 128-bit function, and then truncating the
value back to 64-bit is faster than calling a 64-bit emulation function.  And
even if the operation is does have a named insn to do the operation, it doesn't
mean that you want to use that operation in general.

I recall in the past that for some x86 boxes, the 80-bit XFmode insns floating
point stack operations on the x86 were really slow compared to the current
SFmode and DFmode SSE operations.  But for some of the older machines, it may
have been faster.  And chosing a different -march=<xxx> would change whether or
not you want to do the optimization.  Having these tables built statically can
be a recipie for disaster.  For floating point at least, I would prefer if a
target had an option to dispense with the statically built get_wider tables,
and did everything via target hooks.

While for the PowerPC, we want to control what is the logical wider type for
floating point types, I can imagine we don't want all backends to have to
implment these hooks if they just have the standard 2-3 floating point modes.

I purposelly haven't been looking into 16-bit floating point support, but I
have to imagine there you have the problem that there are at least 2-3
different 16-bit formats roaming around.  This is essentially the same issue in
the PowerPC where you have 2 128-bit floating point types, neither of which is
a pure subset of the other.

To my way of thinking, it is a many branching tree.  On the PowerPC, you want
SDmode to promoto to DDmode, and possibly to TDmode.  And SFmode mode would
promote to DFmode, but DFmode would not generally promote automtically to
IFmode, TFmode, or KFmode.  We don't have any machines that support it, but I
lets say some machine wanted to support both decimal types (DFP and BID).  You
would logically not want any DFP type to promoto to a BID type or vice versa.

Sure, explicit conversions would be allowed, but not the invisibile conversions
done to promote the type.

In terms of these machine dependent types, there are some issues that show up
when a port creates these special types.

   1)	It would be nice if _Complex worked with MD types.  It is tiresome to
	have to use attribute((mode(...))) to get access to the complex variant
	of the type.

   2)	It would be nice the machine back end could define its own set of
	suffixes for floating point constants.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-16  0:09           ` Michael Meissner
@ 2022-12-16 17:55             ` Segher Boessenkool
  2022-12-16 21:53               ` Michael Meissner
  0 siblings, 1 reply; 41+ messages in thread
From: Segher Boessenkool @ 2022-12-16 17:55 UTC (permalink / raw)
  To: Michael Meissner, Jakub Jelinek, Kewen.Lin, gcc-patches,
	Peter Bergner, David Edelsohn, Will Schmidt, William Seurer,
	Joseph Myers

Hi!

On Thu, Dec 15, 2022 at 07:09:38PM -0500, Michael Meissner wrote:
> On Thu, Dec 15, 2022 at 11:59:49AM -0600, Segher Boessenkool wrote:
> > On Wed, Dec 14, 2022 at 10:36:03AM +0100, Jakub Jelinek wrote:
> > > The hacks with different precisions of powerpc 128-bit floating types are
> > > very unfortunate, it is I assume because the middle-end asserted that scalar
> > > floating point types with different modes have different precision.
> > 
> > IEEE QP and double-double cannot be ordered, neither represents a subset
> > of the other.  But the current middle end does require a total ordering
> > for all floating point types (that can be converted to each other).
> > 
> > Mike's precision hack was supposed to give us some time until an actual
> > fix was made.  But no one has worked on that, and there have been
> > failures found with the precision hack as well, it worked remarkably
> > well but it isn't perfect.
> > 
> > We cannot move forward in a meaningful way until these problems are
> > fixed.  We can move around like headless chickens some more of course.
> 
> In general I tend to think most of these automatic widenings are
> problematical.  But there are cases where it makes sense.

These things are *not* widening at all, that is the problem.  For some
values it is lossy, in either direction.

> Lets see.  On the PowerPC, there is no support for 32-bit decimal arithmetic.
> There you definately the compiler to automatically promoto SDmode to DDmode to
> do the arithmetic and then possibly convert it back.  Similarly for the limited
> 16-bit floating point modes, where you have operations to pack and unpack the
> object, but you have no arithmetic.

And those things *are* widening, non-lossy in all cases.  Well-defined
etc.

> But I would argue that you NEVER want to automatically promoto DFmode to either
> KFmode, TFmode, or IFmode, since those modes are (almost) always going to be
> slower than doing the emulation.  This is particularly true if we only support
> a subset of operations, where some things can be done inline, but a lot of
> operations would need to be done via emulation (such as on power8 where we
> don't have IEEE 128-bit support in hardware).

TFmode is either the same actual mode as either KFmode or IFmode, let's
reduce confusion by not talking about it at all anymore.

The middle end should never convert to another mode without a good
reason for it.  But OTOH, both IFmode and KFmode can represent all
values in DFmode, you just have to be careful about semantics when
eventually converting back.

> If the machine independent part of the compiler decides oh we can do this
> operation because some operations are present (such as move, negate, absolute
> value, and compare), then you likely will wind up promoting the 64-bit type(s)
> to 128-bit, doing a call to a slower 128-bit function, and then truncating the
> value back to 64-bit is faster than calling a 64-bit emulation function.

This does not in general have the correct semantics though (without
-ffast-math), so the compiler will not do things like it.

> While for the PowerPC, we want to control what is the logical wider type for
> floating point types, I can imagine we don't want all backends to have to
> implment these hooks if they just have the standard 2-3 floating point modes.

KFmode is not wider than IFmode.  IFmode is not wider than KFmode.
KFmode can represent values IFmode cannot.  IFmode can represent valuse
KFmode cannot.

> I purposelly haven't been looking into 16-bit floating point support, but I
> have to imagine there you have the problem that there are at least 2-3
> different 16-bit formats roaming around.  This is essentially the same issue in
> the PowerPC where you have 2 128-bit floating point types, neither of which is
> a pure subset of the other.

There are at least six 16-bit float formats in actual use.  But we care
mostly about IEEE binary16 and about bfloat16, each with or without
non-normal values (and the encodings for those reused for extra normal
numbers, perhaps).

> To my way of thinking, it is a many branching tree.  On the PowerPC, you want
> SDmode to promoto to DDmode,

But it is the backend that should be in control there, not generic code.
And that is the way things already work!

> and possibly to TDmode.  And SFmode mode would
> promote to DFmode,

In general we do not, it would not have the correct semantics.

Anyway, to get us back on track a bit:

There should not be any non-explicit conversions from KFmode to IFmode
or vice versa.  Each of those modes can represent values the other can
not.  There is no ordering between those modes in that sense, we should
not pretend there is one.


Segher

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-12-16 17:55             ` Segher Boessenkool
@ 2022-12-16 21:53               ` Michael Meissner
  0 siblings, 0 replies; 41+ messages in thread
From: Michael Meissner @ 2022-12-16 21:53 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, Jakub Jelinek, Kewen.Lin, gcc-patches,
	Peter Bergner, David Edelsohn, Will Schmidt, William Seurer,
	Joseph Myers

On Fri, Dec 16, 2022 at 11:55:27AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Dec 15, 2022 at 07:09:38PM -0500, Michael Meissner wrote:
> > On Thu, Dec 15, 2022 at 11:59:49AM -0600, Segher Boessenkool wrote:
> > > On Wed, Dec 14, 2022 at 10:36:03AM +0100, Jakub Jelinek wrote:
> > > > The hacks with different precisions of powerpc 128-bit floating types are
> > > > very unfortunate, it is I assume because the middle-end asserted that scalar
> > > > floating point types with different modes have different precision.
> > > 
> > > IEEE QP and double-double cannot be ordered, neither represents a subset
> > > of the other.  But the current middle end does require a total ordering
> > > for all floating point types (that can be converted to each other).
> > > 
> > > Mike's precision hack was supposed to give us some time until an actual
> > > fix was made.  But no one has worked on that, and there have been
> > > failures found with the precision hack as well, it worked remarkably
> > > well but it isn't perfect.
> > > 
> > > We cannot move forward in a meaningful way until these problems are
> > > fixed.  We can move around like headless chickens some more of course.
> > 
> > In general I tend to think most of these automatic widenings are
> > problematical.  But there are cases where it makes sense.
> 
> These things are *not* widening at all, that is the problem.  For some
> values it is lossy, in either direction.

Ummm yes and no.

Going from SFmode to DFmode is a widening, as is SDmode to DDmode.  Since all
values within SFmode or SDmode can be represented in DFmode/DDmode.  That is
needed, since not all machines have full support for arithmetic.

Going from DFmode to KFmode is still a widening, but in general we may want to
prevent it from happing due to the speed of KFmode operations compared to
DFmode.  Likewise going from DFmode to IFmode is a widening since all values in
DFmode can be represented in IFmode.

Obviously going from IFmode to KFmode or the reverse is where the issue it.

> > Lets see.  On the PowerPC, there is no support for 32-bit decimal arithmetic.
> > There you definately the compiler to automatically promoto SDmode to DDmode to
> > do the arithmetic and then possibly convert it back.  Similarly for the limited
> > 16-bit floating point modes, where you have operations to pack and unpack the
> > object, but you have no arithmetic.
> 
> And those things *are* widening, non-lossy in all cases.  Well-defined
> etc.

Yes, but we just need to improve the hooks to prevent cases where it is not
defined.

> > But I would argue that you NEVER want to automatically promoto DFmode to either
> > KFmode, TFmode, or IFmode, since those modes are (almost) always going to be
> > slower than doing the emulation.  This is particularly true if we only support
> > a subset of operations, where some things can be done inline, but a lot of
> > operations would need to be done via emulation (such as on power8 where we
> > don't have IEEE 128-bit support in hardware).
> 
> TFmode is either the same actual mode as either KFmode or IFmode, let's
> reduce confusion by not talking about it at all anymore.
> 
> The middle end should never convert to another mode without a good
> reason for it.  But OTOH, both IFmode and KFmode can represent all
> values in DFmode, you just have to be careful about semantics when
> eventually converting back.

It doesn't.  Where the issue is is when you call a built-in function and that
built-in function uses different types than what you pass.  Then conversions
have to be inserted.

> > If the machine independent part of the compiler decides oh we can do this
> > operation because some operations are present (such as move, negate, absolute
> > value, and compare), then you likely will wind up promoting the 64-bit type(s)
> > to 128-bit, doing a call to a slower 128-bit function, and then truncating the
> > value back to 64-bit is faster than calling a 64-bit emulation function.
> 
> This does not in general have the correct semantics though (without
> -ffast-math), so the compiler will not do things like it.

It would happen if we didn't set the hooks to prevent it, but we did.  Maybe
there are places that need more hooks.

> > While for the PowerPC, we want to control what is the logical wider type for
> > floating point types, I can imagine we don't want all backends to have to
> > implment these hooks if they just have the standard 2-3 floating point modes.
> 
> KFmode is not wider than IFmode.  IFmode is not wider than KFmode.
> KFmode can represent values IFmode cannot.  IFmode can represent valuse
> KFmode cannot.

There the issue is the historical issue that GCC believes there is only number
(precision) that says whether one type is wider than another.  And to some
extent, precision is wrong, in that do you want precision to mean the number of
bytes in a value, the mantissa size, the exponent size, or whether special
cases matter.

> > I purposelly haven't been looking into 16-bit floating point support, but I
> > have to imagine there you have the problem that there are at least 2-3
> > different 16-bit formats roaming around.  This is essentially the same issue in
> > the PowerPC where you have 2 128-bit floating point types, neither of which is
> > a pure subset of the other.
> 
> There are at least six 16-bit float formats in actual use.  But we care
> mostly about IEEE binary16 and about bfloat16, each with or without
> non-normal values (and the encodings for those reused for extra normal
> numbers, perhaps).

You and I may only care about those two types.  Other port maintainers may need
to consider any of the other variants.

> > To my way of thinking, it is a many branching tree.  On the PowerPC, you want
> > SDmode to promoto to DDmode,
> 
> But it is the backend that should be in control there, not generic code.
> And that is the way things already work!

No that is not the way things work.  The way the types are laid out, it demands
that each precision is unique with the class of binary vs. decimal.  And that
without hooks, that for any floating point type, you can ask for a wider
precision, and it will give it to you.  If we didn't have the requirement that
there only be one type per precision then things would be simpler.  We would
need some way of going to the wider type rather than the pre-built tables we
currently.

> > and possibly to TDmode.  And SFmode mode would
> > promote to DFmode,
> 
> In general we do not, it would not have the correct semantics.

When you combine two types in a binary operator and the types are different,
you have to either convert one side to the other type, or you have to convert
both sides to a common type.

> Anyway, to get us back on track a bit:
> 
> There should not be any non-explicit conversions from KFmode to IFmode
> or vice versa.  Each of those modes can represent values the other can
> not.  There is no ordering between those modes in that sense, we should
> not pretend there is one.

And we have already prevented that (subject to places that were missed).  But
when a built-in is called with another type, it forces conversions to be done.

In the specific cases, since currently GCC uses different internal types for
__float128 and _Float128 in the case long doubles are IEEE 128-bit, if for
instance, you have a __float128 type and you assign it the __builtin_nanq
function that involves a conversion from the _Float128 type to __float128.

In the case where long double is IBM double double or just double, then
__float128 and _Float128 are the same internal type, and no conversion is
done.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
  2022-11-02  2:42 ` [PATCH 2/3] Make __float128 use the _Float128 type, " Michael Meissner
                     ` (3 preceding siblings ...)
  2022-12-06 11:27   ` Kewen.Lin
@ 2023-01-11 20:24   ` Michael Meissner
  4 siblings, 0 replies; 41+ messages in thread
From: Michael Meissner @ 2023-01-11 20:24 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, William Seurer

On Tue, Nov 01, 2022 at 10:42:30PM -0400, Michael Meissner wrote:
> This patch fixes the issue that GCC cannot build when the default long double
> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
> to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
> during the evrp pass.  Ultimately it is failing because the code declared the
> type to use TFmode but it used F128 functions (i.e. KFmode).

Unfortunately, this patch no longer works against the trunk.  I have a simpler
patch to libgcc that uses the _Complex _Float128 and _Float128 types for
building the IEEE 128-bit support in libgcc.  It doesn't fix the problem in the
compiler, but it will allow us to go forward and build GCC on targets that have
IEEE 128-bit floating point support (i.e. Fedora 36).

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2023-01-11 20:24 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-02  2:39 Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit) Michael Meissner
2022-11-02  2:40 ` [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299 Michael Meissner
2022-11-07 15:41   ` Ping: " Michael Meissner
2022-11-29 17:43   ` Ping #2: " Michael Meissner
2022-12-02 17:58   ` Ping #3: " Michael Meissner
2022-12-06  9:36   ` Kewen.Lin
2022-12-07  6:44     ` Michael Meissner
2022-12-07  7:55       ` Kewen.Lin
2022-12-08 22:04         ` Michael Meissner
2022-12-12 10:20           ` Kewen.Lin
2022-12-13  6:14             ` Michael Meissner
2022-12-13 13:51               ` Segher Boessenkool
2022-12-14  8:45               ` Kewen.Lin
2022-12-13  6:23   ` Michael Meissner
2022-11-02  2:42 ` [PATCH 2/3] Make __float128 use the _Float128 type, " Michael Meissner
2022-11-07 15:43   ` Ping: " Michael Meissner
2022-11-29 17:44   ` Michael Meissner
2022-12-02 18:01   ` Ping #3: " Michael Meissner
2022-12-06 11:27   ` Kewen.Lin
2022-12-14  8:46     ` Kewen.Lin
2022-12-14  9:36       ` Jakub Jelinek
2022-12-14 10:11         ` Kewen.Lin
2022-12-14 10:33           ` Jakub Jelinek
2022-12-15  7:54             ` Kewen.Lin
2022-12-15  7:45           ` Kewen.Lin
2022-12-15 18:28             ` Joseph Myers
2022-12-15 18:49               ` Segher Boessenkool
2022-12-15 18:56                 ` Jakub Jelinek
2022-12-15 20:26                   ` Segher Boessenkool
2022-12-15 17:59         ` Segher Boessenkool
2022-12-16  0:09           ` Michael Meissner
2022-12-16 17:55             ` Segher Boessenkool
2022-12-16 21:53               ` Michael Meissner
2023-01-11 20:24   ` Michael Meissner
2022-11-02  2:44 ` [PATCH 3/3] Update float 128-bit conversions, " Michael Meissner
2022-11-07 15:44   ` Ping: " Michael Meissner
2022-11-29 17:46   ` Ping #3: " Michael Meissner
2022-12-02 18:04   ` Michael Meissner
2022-12-06 14:56 ` Patch [0/3] for PR target/107299 (GCC does not build on PowerPC when long double is IEEE 128-bit) Segher Boessenkool
2022-12-06 15:03   ` Jakub Jelinek
2022-12-13 14:11     ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).